Latest issue Issue 5, 2024 Vol. 16

All issues

2024 Vol. 16
- Issue 5
- Issue 4
- Issue 3
- Issue 2
- Issue 1 (special issue)
2023 Vol. 15
- Issue 6
- Issue 5
- Issue 4 (special issue)
- Issue 3
- Issue 2 (special issue)
- Issue 1
2022 Vol. 14
- Issue 6
- Issue 5
- Issue 4 (special issue)
- Issue 3
- Issue 2 (special issue)
- Issue 1
2021 Vol. 13
- Issue 6
- Issue 5
- Issue 4
- Issue 3
- Issue 2 (special issue)
- Issue 1
2020 Vol. 12
- Issue 6
- Issue 5
- Issue 4
- Issue 3
- Issue 2
- Issue 1
2019 Vol. 11
- Issue 6
- Issue 5
- Issue 4
- Issue 3
- Issue 2
- Issue 1
2018 Vol. 10
- Issue 6
- Issue 5 (special issue)
- Issue 4
- Issue 3 (special issue)
- Issue 2
- Issue 1
2017 Vol. 9
- Issue 6
- Issue 5
- Issue 4
- Issue 3
- Issue 2
- Issue 1
2016 Vol. 8
- Issue 6
- Issue 5
- Issue 4
- Issue 3
- Issue 2
- Issue 1
2015 Vol. 7
- Issue 6
- Issue 5
- Issue 4
- Issue 3 (special issue)
- Issue 2
- Issue 1
2014 Vol. 6
- Issue 6 (special issue)
- Issue 5
- Issue 4
- Issue 3
- Issue 2
- Issue 1
2013 Vol. 5
- Issue 6 (special issue)
- Issue 5
- Issue 4
- Issue 3
- Issue 2
- Issue 1
2012 Vol. 4
- Issue 4
- Issue 3
- Issue 2
- Issue 1
2011 Vol. 3
- Issue 4
- Issue 3
- Issue 2
- Issue 1
2010 Vol. 2
- Issue 4
- Issue 3
- Issue 2
- Issue 1
2009 Vol. 1
- Issue 4
- Issue 3
- Issue 2
- Issue 1

Результаты поиска по 'optimal clustering':

Найдено статей: 8

Vlasov A.A., Pilgeikina I.A., Skorikova I.A.
Method of forming multiprogram control of an isolated intersection
Computer Research and Modeling, 2021, v. 13, no. 2, pp. 295-303

The simplest and most desirable method of traffic signal control is precalculated regulation, when the parameters of the traffic light object operation are calculated in advance and activated in accordance to a schedule. This work proposes a method of forming a signal plan that allows one to calculate the control programs and set the period of their activity. Preparation of initial data for the calculation includes the formation of a time series of daily traffic intensity with an interval of 15 minutes. When carrying out field studies, it is possible that part of the traffic intensity measurements is missing. To fill up the missing traffic intensity measurements, the spline interpolation method is used. The next step of the method is to calculate the daily set of signal plans. The work presents the interdependencies, which allow one to calculate the optimal durations of the control cycle and the permitting phase movement and to set the period of their activity. The present movement control systems have a limit on the number of control programs. To reduce the signal plans' number and to determine their activity period, the clusterization using the $k$-means method in the transport phase space is introduced In the new daily signal plan, the duration of the phases is determined by the coordinates of the received cluster centers, and the activity periods are set by the elements included in the cluster. Testing on a numerical illustration showed that, when the number of clusters is 10, the deviation of the optimal phase duration from the cluster centers does not exceed 2 seconds. To evaluate the effectiveness of the developed methodology, a real intersection with traffic light regulation was considered as an example. Based on field studies of traffic patterns and traffic demand, a microscopic model for the SUMO (Simulation of Urban Mobility) program was developed. The efficiency assessment is based on the transport losses estimated by the time spent on movement. Simulation modeling of the multiprogram control of traffic lights showed a 20% reduction in the delay time at the traffic light object in comparison with the single-program control. The proposed method allows automation of the process of calculating daily signal plans and setting the time of their activity.

Keywords: traffic light regulation, multiprogram control, time series, clustering, $k$-means.
Mezentsev Y.A., Razumnikova O.M., Estraykh I.V., Tarasova I.V., Trubnikova O.A.
Tasks and algorithms for optimal clustering of multidimensional objects by a variety of heterogeneous indicators and their applications in medicine
Computer Research and Modeling, 2024, v. 16, no. 3, pp. 673-693

The work is devoted to the description of the author’s formal statements of the clustering problem for a given number of clusters, algorithms for their solution, as well as the results of using this toolkit in medicine.

The solution of the formulated problems by exact algorithms of implementations of even relatively low dimensions before proving optimality is impossible in a finite time due to their belonging to the NP class.

In this regard, we have proposed a hybrid algorithm that combines the advantages of precise methods based on clustering in paired distances at the initial stage with the speed of methods for solving simplified problems of splitting by cluster centers at the final stage. In the development of this direction, a sequential hybrid clustering algorithm using random search in the paradigm of swarm intelligence has been developed. The article describes it and presents the results of calculations of applied clustering problems.

To determine the effectiveness of the developed tools for optimal clustering of multidimensional objects according to a variety of heterogeneous indicators, a number of computational experiments were performed using data sets including socio-demographic, clinical anamnestic, electroencephalographic and psychometric data on the cognitive status of patients of the cardiology clinic. An experimental proof of the effectiveness of using local search algorithms in the paradigm of swarm intelligence within the framework of a hybrid algorithm for solving optimal clustering problems has been obtained.

The results of the calculations indicate the actual resolution of the main problem of using the discrete optimization apparatus — limiting the available dimensions of task implementations. We have shown that this problem is eliminated while maintaining an acceptable proximity of the clustering results to the optimal ones. The applied significance of the obtained clustering results is also due to the fact that the developed optimal clustering toolkit is supplemented by an assessment of the stability of the formed clusters, which allows for known factors (the presence of stenosis or older age) to additionally identify those patients whose cognitive resources are insufficient to overcome the influence of surgical anesthesia, as a result of which there is a unidirectional effect of postoperative deterioration of complex visual-motor reaction, attention and memory. This effect indicates the possibility of differentiating the classification of patients using the proposed tools.

Keywords: optimal clustering, paired distances, cluster centers, hybrid algorithm, local search, swarm intelligence.
Oleynik E.B., Ivashina N.V., Shmidt Y.D.
Migration processes modelling: methods and tools (overview)
Computer Research and Modeling, 2021, v. 13, no. 6, pp. 1205-1232

Migration has a significant impact on the shaping of the demographic structure of the territories population, the state of regional and local labour markets. As a rule, rapid change in the working-age population of any territory due to migration processes results in an imbalance in supply and demand on labour markets and a change in the demographic structure of the population. Migration is also to a large extent a reflection of socio-economic processes taking place in the society. Hence, the issues related to the study of migration factors, the direction, intensity and structure of migration flows, and the prediction of their magnitude are becoming topical issues these days.

Mathematical tools are often used to analyze, predict migration processes and assess their consequences, allowing for essentially accurate modelling of migration processes for different territories on the basis of the available statistical data. In recent years, quite a number of scientific papers on modelling internal and external migration flows using mathematical methods have appeared both in Russia and in foreign countries in recent years. Consequently, there has been a need to systematize the currently most commonly used methods and tools applied in migration modelling to form a coherent picture of the main trends and research directions in this field.

The presented review considers the main approaches to migration modelling and the main components of migration modelling methodology, i. e. stages, methods, models and model classification. Their comparative analysis was also conducted and general recommendations on the choice of mathematical tools for modelling were developed. The review contains two sections: migration modelling methods and migration models. The first section describes the main methods used in the model development process — econometric, cellular automata, system-dynamic, probabilistic, balance, optimization and cluster analysis. Based on the analysis of modern domestic and foreign publications on migration, the most common classes of models — regression, agent-based, simulation, optimization, probabilistic, balance, dynamic and combined — were identified and described. The features, advantages and disadvantages of different types of migration process models were considered.

Keywords: migration, migration processes, migration models, methods, regression models, cellular automata, agent-based models, balance models, dynamic models.
Lyubushin A.A., Rodionov E.A.
Analysis of predictive properties of ground tremor using Huang decomposition
Computer Research and Modeling, 2024, v. 16, no. 4, pp. 939-958

A method is proposed for analyzing the tremor of the earth’s surface, measured by means of space geodesy, in order to highlight the prognostic effects of seismicity activation. The method is illustrated by the example of a joint analysis of a set of synchronous time series of daily vertical displacements of the earth’s surface on the Japanese Islands for the time interval 2009–2023. The analysis is based on dividing the source data (1047 time series) into blocks (clusters of stations) and sequentially applying the principal component method. The station network is divided into clusters using the K-means method from the maximum pseudo-F-statistics criterion, and for Japan the optimal number of clusters was chosen to be 15. The Huang decomposition method into a sequence of independent empirical oscillation modes (EMD — Empirical Mode Decomposition) is applied to the time series of principal components from station blocks. To provide the stability of estimates of the waveforms of the EMD decomposition, averaging of 1000 independent additive realizations of white noise of limited amplitude was performed. Using the Cholesky decomposition of the covariance matrix of the waveforms of the first three EMD components in a sliding time window, indicators of abnormal tremor behavior were determined. By calculating the correlation function between the average indicators of anomalous behavior and the released seismic energy in the vicinity of the Japanese Islands, it was established that bursts in the measure of anomalous tremor behavior precede emissions of seismic energy. The purpose of the article is to clarify common hypotheses that movements of the earth’s crust recorded by space geodesy may contain predictive information. That displacements recorded by geodetic methods respond to the effects of earthquakes is widely known and has been demonstrated many times. But isolating geodetic effects that predict seismic events is much more challenging. In our paper, we propose one method for detecting predictive effects in space geodesy data.

Keywords: tremor of the earth’s surface, cluster analysis, principal component method, Huang decomposition, measure of anomalous behavior of time series, correlation function.
Ignatev N.A., Tuliev U.Y.
Semantic structuring of text documents based on patterns of natural language entities
Computer Research and Modeling, 2022, v. 14, no. 5, pp. 1185-1197

The technology of creating patterns from natural language words (concepts) based on text data in the bag of words model is considered. Patterns are used to reduce the dimension of the original space in the description of documents and search for semantically related words by topic. The process of dimensionality reduction is implemented through the formation of patterns of latent features. The variety of structures of document relations is investigated in order to divide them into themes in the latent space.

It is considered that a given set of documents (objects) is divided into two non-overlapping classes, for the analysis of which it is necessary to use a common dictionary. The belonging of words to a common vocabulary is initially unknown. Class objects are considered as opposition to each other. Quantitative parameters of oppositionality are determined through the values of the stability of each feature and generalized assessments of objects according to non-overlapping sets of features.

To calculate the stability, the feature values are divided into non-intersecting intervals, the optimal boundaries of which are determined by a special criterion. The maximum stability is achieved under the condition that the boundaries of each interval contain values of one of the two classes.

The composition of features in sets (patterns of words) is formed from a sequence ordered by stability values. The process of formation of patterns and latent features based on them is implemented according to the rules of hierarchical agglomerative grouping.

A set of latent features is used for cluster analysis of documents using metric grouping algorithms. The analysis applies the coefficient of content authenticity based on the data on the belonging of documents to classes. The coefficient is a numerical characteristic of the dominance of class representatives in groups.

To divide documents into topics, it is proposed to use the union of groups in relation to their centers. As patterns for each topic, a sequence of words ordered by frequency of occurrence from a common dictionary is considered.

The results of a computational experiment on collections of abstracts of scientific dissertations are presented. Sequences of words from the general dictionary on 4 topics are formed.

Keywords: topic modeling, hierarchical agglomerative grouping, ontology, general vocabulary, content authenticity.
Kirilyuk I.L., Sen'ko O.V.
Assessing the validity of clustering of panel data by Monte Carlo methods (using as example the data of the Russian regional economy)
Computer Research and Modeling, 2020, v. 12, no. 6, pp. 1501-1513

The paper considers a method for studying panel data based on the use of agglomerative hierarchical clustering — grouping objects based on the similarities and differences in their features into a hierarchy of clusters nested into each other. We used 2 alternative methods for calculating Euclidean distances between objects — the distance between the values averaged over observation interval, and the distance using data for all considered years. Three alternative methods for calculating the distances between clusters were compared. In the first case, the distance between the nearest elements from two clusters is considered to be distance between these clusters, in the second — the average over pairs of elements, in the third — the distance between the most distant elements. The efficiency of using two clustering quality indices, the Dunn and Silhouette index, was studied to select the optimal number of clusters and evaluate the statistical significance of the obtained solutions. The method of assessing statistical reliability of cluster structure consisted in comparing the quality of clustering on a real sample with the quality of clustering on artificially generated samples of panel data with the same number of objects, features and lengths of time series. Generation was made from a fixed probability distribution. At the same time, simulation methods imitating Gaussian white noise and random walk were used. Calculations with the Silhouette index showed that a random walk is characterized not only by spurious regression, but also by “spurious clustering”. Clustering was considered reliable for a given number of selected clusters if the index value on the real sample turned out to be greater than the value of the 95% quantile for artificial data. A set of time series of indicators characterizing production in the regions of the Russian Federation was used as a sample of real data. For these data only Silhouette shows reliable clustering at the level p < 0.05. Calculations also showed that index values for real data are generally closer to values for random walks than for white noise, but it have significant differences from both. Since three-dimensional feature space is used, the quality of clustering was also evaluated visually. Visually, one can distinguish clusters of points located close to each other, also distinguished as clusters by the applied hierarchical clustering algorithm.

Keywords: clustering validity, panel data, mesoeconomics, regional economics.
Irkhin I.A., Bulatov V.G., Vorontsov K.V.
Additive regularizarion of topic models with fast text vectorizartion
Computer Research and Modeling, 2020, v. 12, no. 6, pp. 1515-1528

The probabilistic topic model of a text document collection finds two matrices: a matrix of conditional probabilities of topics in documents and a matrix of conditional probabilities of words in topics. Each document is represented by a multiset of words also called the “bag of words”, thus assuming that the order of words is not important for revealing the latent topics of the document. Under this assumption, the problem is reduced to a low-rank non-negative matrix factorization governed by likelihood maximization. In general, this problem is ill-posed having an infinite set of solutions. In order to regularize the solution, a weighted sum of optimization criteria is added to the log-likelihood. When modeling large text collections, storing the first matrix seems to be impractical, since its size is proportional to the number of documents in the collection. At the same time, the topical vector representation (embedding) of documents is necessary for solving many text analysis tasks, such as information retrieval, clustering, classification, and summarization of texts. In practice, the topical embedding is calculated for a document “on-the-fly”, which may require dozens of iterations over all the words of the document. In this paper, we propose a way to calculate a topical embedding quickly, by one pass over document words. For this, an additional constraint is introduced into the model in the form of an equation, which calculates the first matrix from the second one in linear time. Although formally this constraint is not an optimization criterion, in fact it plays the role of a regularizer and can be used in combination with other regularizers within the additive regularization framework ARTM. Experiments on three text collections have shown that the proposed method improves the model in terms of sparseness, difference, logLift and coherence measures of topic quality. The open source libraries BigARTM and TopicNet were used for the experiments.

Keywords: natural language processing, unsupervised learning, topic modeling, additive regularization of topic model, EM-algorithm, PLSA, LDA, ARTM, BigARTM, TopicNet.
Ketova K.V., Kasatkina E.V.
The solution of the logistics task of fuel supply for the regional distributed heat supply system
Computer Research and Modeling, 2012, v. 4, no. 2, pp. 451-470

The technique for solving the logistic task of fuel supply in the region, including the interconnected tasks of routing, clustering, optimal distribution of resources and stock control is proposed. The calculations have been carried out on the example of fuel supply system of the Udmurt Republic.

Keywords: logistics, fuel supply, routing, clustering, stock control, genetic algorithm.
Views (last year): 1. Citations: 6 (RSCI).

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"