All issues
- 2024 Vol. 16
- 2023 Vol. 15
- 2022 Vol. 14
- 2021 Vol. 13
- 2020 Vol. 12
- 2019 Vol. 11
- 2018 Vol. 10
- 2017 Vol. 9
- 2016 Vol. 8
- 2015 Vol. 7
- 2014 Vol. 6
- 2013 Vol. 5
- 2012 Vol. 4
- 2011 Vol. 3
- 2010 Vol. 2
- 2009 Vol. 1
-
Regularization, robustness and sparsity of probabilistic topic models
Computer Research and Modeling, 2012, v. 4, no. 4, pp. 693-706Views (last year): 25. Citations: 12 (RSCI).We propose a generalized probabilistic topic model of text corpora which can incorporate heuristics of Bayesian regularization, sampling, frequent parameters update, and robustness in any combinations. Wellknown models PLSA, LDA, CVB0, SWB, and many others can be considered as special cases of the proposed broad family of models. We propose the robust PLSA model and show that it is more sparse and performs better that regularized models like LDA.
-
On the investigation of plasma turbulence by the analysis of the spectra
Computer Research and Modeling, 2012, v. 4, no. 4, pp. 793-802Views (last year): 2. Citations: 4 (RSCI).The article describes the examples of the analysis of the experimental data spectra for identifying typical structures of processes forming plasma turbulence. The method is based on the original algorithm which is close to the one-sample bootstrap. The base model for description of the fine structure of stochastic processes is finite local-scale normal mixtures. For finding the statistical estimates (maximum likelihood estimates) well known EM algorithm is used. The efficiency of the proposed research technique is demonstrated for a number of spectra’s set obtained in different modes of low-frequency plasma turbulence.
-
Estimation of models parameters for time series with Markov switching regimes
Computer Research and Modeling, 2018, v. 10, no. 6, pp. 903-918Views (last year): 36.The paper considers the problem of estimating the parameters of time series described by regression models with Markov switching of two regimes at random instants of time with independent Gaussian noise. For the solution, we propose a variant of the EM algorithm based on the iterative procedure, during which an estimation of the regression parameters is performed for a given sequence of regime switching and an evaluation of the switching sequence for the given parameters of the regression models. In contrast to the well-known methods of estimating regression parameters in the models with Markov switching, which are based on the calculation of a posteriori probabilities of discrete states of the switching sequence, in the paper the estimates are calculated of the switching sequence, which are optimal by the criterion of the maximum of a posteriori probability. As a result, the proposed algorithm turns out to be simpler and requires less calculations. Computer modeling allows to reveal the factors influencing accuracy of estimation. Such factors include the number of observations, the number of unknown regression parameters, the degree of their difference in different modes of operation, and the signal-to-noise ratio which is associated with the coefficient of determination in regression models. The proposed algorithm is applied to the problem of estimating parameters in regression models for the rate of daily return of the RTS index, depending on the returns of the S&P 500 index and Gazprom shares for the period from 2013 to 2018. Comparison of the estimates of the parameters found using the proposed algorithm is carried out with the estimates that are formed using the EViews econometric package and with estimates of the ordinary least squares method without taking into account regimes switching. The account of regimes switching allows to receive more exact representation about structure of a statistical dependence of investigated variables. In switching models, the increase in the signal-to-noise ratio leads to the fact that the differences in the estimates produced by the proposed algorithm and using the EViews program are reduced.
-
Additive regularizarion of topic models with fast text vectorizartion
Computer Research and Modeling, 2020, v. 12, no. 6, pp. 1515-1528The probabilistic topic model of a text document collection finds two matrices: a matrix of conditional probabilities of topics in documents and a matrix of conditional probabilities of words in topics. Each document is represented by a multiset of words also called the “bag of words”, thus assuming that the order of words is not important for revealing the latent topics of the document. Under this assumption, the problem is reduced to a low-rank non-negative matrix factorization governed by likelihood maximization. In general, this problem is ill-posed having an infinite set of solutions. In order to regularize the solution, a weighted sum of optimization criteria is added to the log-likelihood. When modeling large text collections, storing the first matrix seems to be impractical, since its size is proportional to the number of documents in the collection. At the same time, the topical vector representation (embedding) of documents is necessary for solving many text analysis tasks, such as information retrieval, clustering, classification, and summarization of texts. In practice, the topical embedding is calculated for a document “on-the-fly”, which may require dozens of iterations over all the words of the document. In this paper, we propose a way to calculate a topical embedding quickly, by one pass over document words. For this, an additional constraint is introduced into the model in the form of an equation, which calculates the first matrix from the second one in linear time. Although formally this constraint is not an optimization criterion, in fact it plays the role of a regularizer and can be used in combination with other regularizers within the additive regularization framework ARTM. Experiments on three text collections have shown that the proposed method improves the model in terms of sparseness, difference, logLift and coherence measures of topic quality. The open source libraries BigARTM and TopicNet were used for the experiments.
Indexed in Scopus
Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU
The journal is included in the Russian Science Citation Index
The journal is included in the RSCI
International Interdisciplinary Conference "Mathematics. Computing. Education"