Результаты поиска по 'clustering':
Найдено статей: 45
  1. Plokhotnikov K.E.
    The problem of choosing solutions in the classical format of the description of a molecular system
    Computer Research and Modeling, 2023, v. 15, no. 6, pp. 1573-1600

    The numerical methods developed by the author recently for calculating the molecular system based on the direct solution of the Schrodinger equation by the Monte Carlo method have shown a huge uncertainty in the choice of solutions. On the one hand, it turned out to be possible to build many new solutions; on the other hand, the problem of their connection with reality has become sharply aggravated. In ab initio quantum mechanical calculations, the problem of choosing solutions is not so acute after the transition to the classical format of describing a molecular system in terms of potential energy, the method of molecular dynamics, etc. In this paper, we investigate the problem of choosing solutions in the classical format of describing a molecular system without taking into account quantum mechanical prerequisites. As it turned out, the problem of choosing solutions in the classical format of describing a molecular system is reduced to a specific marking of the configuration space in the form of a set of stationary points and reconstruction of the corresponding potential energy function. In this formulation, the solution of the choice problem is reduced to two possible physical and mathematical problems: to find all its stationary points for a given potential energy function (the direct problem of the choice problem), to reconstruct the potential energy function for a given set of stationary points (the inverse problem of the choice problem). In this paper, using a computational experiment, the direct problem of the choice problem is discussed using the example of a description of a monoatomic cluster. The number and shape of the locally equilibrium (saddle) configurations of the binary potential are numerically estimated. An appropriate measure is introduced to distinguish configurations in space. The format of constructing the entire chain of multiparticle contributions to the potential energy function is proposed: binary, threeparticle, etc., multiparticle potential of maximum partiality. An infinite number of locally equilibrium (saddle) configurations for the maximum multiparticle potential is discussed and illustrated. A method of variation of the number of stationary points by combining multiparticle contributions to the potential energy function is proposed. The results of the work listed above are aimed at reducing the huge arbitrariness of the choice of the form of potential that is currently taking place. Reducing the arbitrariness of choice is expressed in the fact that the available knowledge about the set of a very specific set of stationary points is consistent with the corresponding form of the potential energy function.

  2. Okhapkina E.P., Okhapkin V.P.
    Approaches to a social network groups clustering
    Computer Research and Modeling, 2015, v. 7, no. 5, pp. 1127-1139

    The research is devoted to the problem of the use of social networks as a tool of the illegal activity and as a source of information that could be dangerous to society. The article presents the structure of the multiagent system with which a social network groups could be clustered according to the criteria uniquely defines a group as a destructive. The agents’ of the system clustering algorithm is described.

    Views (last year): 8. Citations: 2 (RSCI).
  3. Musaev A.A., Grigoriev D.A.
    Extracting knowledge from text messages: overview and state-of-the-art
    Computer Research and Modeling, 2021, v. 13, no. 6, pp. 1291-1315

    In general, solving the information explosion problem can be delegated to systems for automatic processing of digital data. These systems are intended for recognizing, sorting, meaningfully processing and presenting data in formats readable and interpretable by humans. The creation of intelligent knowledge extraction systems that handle unstructured data would be a natural solution in this area. At the same time, the evident progress in these tasks for structured data contrasts with the limited success of unstructured data processing, and, in particular, document processing. Currently, this research area is undergoing active development and investigation. The present paper is a systematic survey on both Russian and international publications that are dedicated to the leading trend in automatic text data processing: Text Mining (TM). We cover the main tasks and notions of TM, as well as its place in the current AI landscape. Furthermore, we analyze the complications that arise during the processing of texts written in natural language (NLP) which are weakly structured and often provide ambiguous linguistic information. We describe the stages of text data preparation, cleaning, and selecting features which, alongside the data obtained via morphological, syntactic, and semantic analysis, constitute the input for the TM process. This process can be represented as mapping a set of text documents to «knowledge». Using the case of stock trading, we demonstrate the formalization of the problem of making a trade decision based on a set of analytical recommendations. Examples of such mappings are methods of Information Retrieval (IR), text summarization, sentiment analysis, document classification and clustering, etc. The common point of all tasks and techniques of TM is the selection of word forms and their derivatives used to recognize content in NL symbol sequences. Considering IR as an example, we examine classic types of search, such as searching for word forms, phrases, patterns and concepts. Additionally, we consider the augmentation of patterns with syntactic and semantic information. Next, we provide a general description of all NLP instruments: morphological, syntactic, semantic and pragmatic analysis. Finally, we end the paper with a comparative analysis of modern TM tools which can be helpful for selecting a suitable TM platform based on the user’s needs and skills.

  4. Ignatev N.A., Tuliev U.Y.
    Semantic structuring of text documents based on patterns of natural language entities
    Computer Research and Modeling, 2022, v. 14, no. 5, pp. 1185-1197

    The technology of creating patterns from natural language words (concepts) based on text data in the bag of words model is considered. Patterns are used to reduce the dimension of the original space in the description of documents and search for semantically related words by topic. The process of dimensionality reduction is implemented through the formation of patterns of latent features. The variety of structures of document relations is investigated in order to divide them into themes in the latent space.

    It is considered that a given set of documents (objects) is divided into two non-overlapping classes, for the analysis of which it is necessary to use a common dictionary. The belonging of words to a common vocabulary is initially unknown. Class objects are considered as opposition to each other. Quantitative parameters of oppositionality are determined through the values of the stability of each feature and generalized assessments of objects according to non-overlapping sets of features.

    To calculate the stability, the feature values are divided into non-intersecting intervals, the optimal boundaries of which are determined by a special criterion. The maximum stability is achieved under the condition that the boundaries of each interval contain values of one of the two classes.

    The composition of features in sets (patterns of words) is formed from a sequence ordered by stability values. The process of formation of patterns and latent features based on them is implemented according to the rules of hierarchical agglomerative grouping.

    A set of latent features is used for cluster analysis of documents using metric grouping algorithms. The analysis applies the coefficient of content authenticity based on the data on the belonging of documents to classes. The coefficient is a numerical characteristic of the dominance of class representatives in groups.

    To divide documents into topics, it is proposed to use the union of groups in relation to their centers. As patterns for each topic, a sequence of words ordered by frequency of occurrence from a common dictionary is considered.

    The results of a computational experiment on collections of abstracts of scientific dissertations are presented. Sequences of words from the general dictionary on 4 topics are formed.

  5. Malkov S.Yu., Davydova O.I.
    Modernization as a global process: the experience of mathematical modeling
    Computer Research and Modeling, 2021, v. 13, no. 4, pp. 859-873

    The article analyzes empirical data on the long-term demographic and economic dynamics of the countries of the world for the period from the beginning of the 19th century to the present. Population and GDP of a number of countries of the world for the period 1500–2016 were selected as indicators characterizing the long-term demographic and economic dynamics of the countries of the world. Countries were chosen in such a way that they included representatives with different levels of development (developed and developing countries), as well as countries from different regions of the world (North America, South America, Europe, Asia, Africa). A specially developed mathematical model was used for modeling and data processing. The presented model is an autonomous system of differential equations that describes the processes of socio-economic modernization, including the process of transition from an agrarian society to an industrial and post-industrial one. The model contains the idea that the process of modernization begins with the emergence of an innovative sector in a traditional society, developing on the basis of new technologies. The population is gradually moving from the traditional sector to the innovation sector. Modernization is completed when most of the population moves to the innovation sector.

    Statistical methods of data processing and Big Data methods, including hierarchical clustering were used. Using the developed algorithm based on the random descent method, the parameters of the model were identified and verified on the basis of empirical series, and the model was tested using statistical data reflecting the changes observed in developed and developing countries during the period of modernization taking place over the past centuries. Testing the model has demonstrated its high quality — the deviations of the calculated curves from statistical data are usually small and occur during periods of wars and economic crises. Thus, the analysis of statistical data on the long-term demographic and economic dynamics of the countries of the world made it possible to determine general patterns and formalize them in the form of a mathematical model. The model will be used to forecast demographic and economic dynamics in different countries of the world.

  6. Fedorov V.A., Khruschev S.S., Kovalenko I.B.
    Analysis of Brownian and molecular dynamics trajectories of to reveal the mechanisms of protein-protein interactions
    Computer Research and Modeling, 2023, v. 15, no. 3, pp. 723-738

    The paper proposes a set of fairly simple analysis algorithms that can be used to analyze a wide range of protein-protein interactions. In this work, we jointly use the methods of Brownian and molecular dynamics to describe the process of formation of a complex of plastocyanin and cytochrome f proteins in higher plants. In the diffusion-collision complex, two clusters of structures were revealed, the transition between which is possible with the preservation of the position of the center of mass of the molecules and is accompanied only by a rotation of plastocyanin by 134 degrees. The first and second clusters of structures of collisional complexes differ in that in the first cluster with a positively charged region near the small domain of cytochrome f, only the “lower” plastocyanin region contacts, while in the second cluster, both negatively charged regions. The “upper” negatively charged region of plastocyanin in the first cluster is in contact with the amino acid residue of lysine K122. When the final complex is formed, the plastocyanin molecule rotates by 69 degrees around an axis passing through both areas of electrostatic contact. With this rotation, water is displaced from the regions located near the cofactors of the molecules and formed by hydrophobic amino acid residues. This leads to the appearance of hydrophobic contacts, a decrease in the distance between the cofactors to a distance of less than 1.5 nm, and further stabilization of the complex in a position suitable for electron transfer. Characteristics such as contact matrices, rotation axes during the transition between states, and graphs of changes in the number of contacts during the modeling process make it possible to determine the key amino acid residues involved in the formation of the complex and to reveal the physicochemical mechanisms underlying this process.

  7. Aksenov A.A., Kalugina M.D., Lobanov A.I., Kashirin V.S.
    Numerical simulation of fluid flow in a blood pump in the FlowVision software package
    Computer Research and Modeling, 2023, v. 15, no. 4, pp. 1025-1038

    A numerical simulation of fluid flow in a blood pump was performed using the FlowVision software package. This test problem, provided by the Center for Devices and Radiological Health of the US. Food and Drug Administration, involved considering fluid flow according to several design modes. At the same time for each case of calculation a certain value of liquid flow rate and rotor speed was set. Necessary data for calculations in the form of exact geometry, flow conditions and fluid characteristics were provided to all research participants, who used different software packages for modeling. Numerical simulations were performed in FlowVision for six calculation modes with the Newtonian fluid and standard $k-\varepsilon$ turbulence model, in addition, the fifth mode with the $k-\omega$ SST turbulence model and with the Caro rheological fluid model were performed. In the first stage of the numerical simulation, the convergence over the mesh was investigated, on the basis of which a final mesh with a number of cells of the order of 6 million was chosen. Due to the large number of cells, in order to accelerate the study, part of the calculations was performed on the Lomonosov-2 cluster. As a result of numerical simulation, we obtained and analyzed values of pressure difference between inlet and outlet of the pump, velocity between rotor blades and in the area of diffuser, and also, we carried out visualization of velocity distribution in certain cross-sections. For all design modes there was compared the pressure difference received numerically with the experimental data, and for the fifth calculation mode there was also compared with the experiment by speed distribution between rotor blades and in the area of diffuser. Data analysis has shown good correlation of calculation results in FlowVision with experimental results and numerical simulation in other software packages. The results obtained in FlowVision for solving the US FDA test suggest that FlowVision software package can be used for solving a wide range of hemodynamic problems.

  8. Kirilyuk I.L., Sen'ko O.V.
    Assessing the validity of clustering of panel data by Monte Carlo methods (using as example the data of the Russian regional economy)
    Computer Research and Modeling, 2020, v. 12, no. 6, pp. 1501-1513

    The paper considers a method for studying panel data based on the use of agglomerative hierarchical clustering — grouping objects based on the similarities and differences in their features into a hierarchy of clusters nested into each other. We used 2 alternative methods for calculating Euclidean distances between objects — the distance between the values averaged over observation interval, and the distance using data for all considered years. Three alternative methods for calculating the distances between clusters were compared. In the first case, the distance between the nearest elements from two clusters is considered to be distance between these clusters, in the second — the average over pairs of elements, in the third — the distance between the most distant elements. The efficiency of using two clustering quality indices, the Dunn and Silhouette index, was studied to select the optimal number of clusters and evaluate the statistical significance of the obtained solutions. The method of assessing statistical reliability of cluster structure consisted in comparing the quality of clustering on a real sample with the quality of clustering on artificially generated samples of panel data with the same number of objects, features and lengths of time series. Generation was made from a fixed probability distribution. At the same time, simulation methods imitating Gaussian white noise and random walk were used. Calculations with the Silhouette index showed that a random walk is characterized not only by spurious regression, but also by “spurious clustering”. Clustering was considered reliable for a given number of selected clusters if the index value on the real sample turned out to be greater than the value of the 95% quantile for artificial data. A set of time series of indicators characterizing production in the regions of the Russian Federation was used as a sample of real data. For these data only Silhouette shows reliable clustering at the level p < 0.05. Calculations also showed that index values for real data are generally closer to values for random walks than for white noise, but it have significant differences from both. Since three-dimensional feature space is used, the quality of clustering was also evaluated visually. Visually, one can distinguish clusters of points located close to each other, also distinguished as clusters by the applied hierarchical clustering algorithm.

  9. Yashina M.V., Tatashev A.G.
    Double-circuit system with clusters of different lengths and unequal arrangement of two nodes on the circuits
    Computer Research and Modeling, 2024, v. 16, no. 1, pp. 217-240

    We study a system that fulfills the class of driving systems developed by A. P. Buslaev (Buslaev networks). In this system, in each of two closed loops there is a segment called a cluster, and it moves at a constant speed if there are no delays. The lengths of the clusters are $l_1^{}$ and $l_2^{}$. There are two common points of the contours, called nodes. Delays in the movement of clusters are due to the fact that two clusters cannot pass through a node at the same time. The contours have the same height, the glazing is accepted. The nodes divide each contour into parts, the length of one of which is equal to $d_i^{}$, and the other $1-d_i^{}$, $i=1,\,2$, — contour number. Studies of the spectrum of average speeds of systems, i.\,e. set of pairs of results $(v_1^{},\,v_2^{})$, where $v_i^{}$ — cluster of average movement speed $i$ taking into account delays, for different initial states and fixed values $l_1^{}$, $l_2^{}$, $d_1^{}$, $d_2^{}$. 12 scenarios of system behavior have been identified, and for each of these manifestations sufficient conditions for its implementation have been found, and each of these observed spectra contains one or two pairs of average velocities.

  10. Gankevich I.G., Balyan S.G., Abrahamyan S.A., Korkhov V.V.
    Applications of on-demand virtual clusters to high performance computing
    Computer Research and Modeling, 2015, v. 7, no. 3, pp. 511-516

    Virtual machines are usually associated with an ability to create them on demand by calling web services, then these machines are used to deliver resident services to their clients; however, providing clients with an ability to run an arbitrary programme on the newly created machines is beyond their power. Such kind of usage is useful in a high performance computing environment where most of the resources are consumed by batch programmes and not by daemons or services. In this case a cluster of virtual machines is created on demand to run a distributed or parallel programme and to save its output to a network attached storage. Upon completion this cluster is destroyed and resources are released. With certain modifications this approach can be extended to interactively deliver computational resources to the user thus providing virtual desktop as a service. Experiments show that the process of creating virtual clusters on demand can be made efficient in both cases.

    Views (last year): 1.
Pages: « first previous next

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"