Regularization, robustness and sparsity of probabilistic topic models

 pdf (170K)  / List of references

We propose a generalized probabilistic topic model of text corpora which can incorporate heuristics of Bayesian regularization, sampling, frequent parameters update, and robustness in any combinations. Wellknown models PLSA, LDA, CVB0, SWB, and many others can be considered as special cases of the proposed broad family of models. We propose the robust PLSA model and show that it is more sparse and performs better that regularized models like LDA.

Keywords: text analysis, topic modeling, probabilistic latent semantic analysis, EM-algorithm, latent Dirichlet allocation, Gibbs sampling, Bayesian regularization, perplexity, robusteness
Citation in English: Vorontsov K.V., Potapenko A.A. Regularization, robustness and sparsity of probabilistic topic models // Computer Research and Modeling, 2012, vol. 4, no. 4, pp. 693-706
Citation in English: Vorontsov K.V., Potapenko A.A. Regularization, robustness and sparsity of probabilistic topic models // Computer Research and Modeling, 2012, vol. 4, no. 4, pp. 693-706
DOI: 10.20537/2076-7633-2012-4-4-693-706
According to Crossref, this article is cited by:
  • Maria Saburova, Archil Maysuradze. Knowledge Engineering and Semantic Web. / Communications in Computer and Information Science. 2015. — V. 518. — P. 168. DOI: 10.1007/978-3-319-24543-0_13
Please note that citation information may be incomplete as it includes data from Crossref cited-by program partners only.
Views (last year): 25. Citations: 12 (RSCI).

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"