Regularization, robustness and sparsity of probabilistic topic models

 pdf (170K)  / List of references

We propose a generalized probabilistic topic model of text corpora which can incorporate heuristics of Bayesian regularization, sampling, frequent parameters update, and robustness in any combinations. Wellknown models PLSA, LDA, CVB0, SWB, and many others can be considered as special cases of the proposed broad family of models. We propose the robust PLSA model and show that it is more sparse and performs better that regularized models like LDA.

Keywords: text analysis, topic modeling, probabilistic latent semantic analysis, EM-algorithm, latent Dirichlet allocation, Gibbs sampling, Bayesian regularization, perplexity, robusteness
Citation in English: Vorontsov K.V., Potapenko A.A. Regularization, robustness and sparsity of probabilistic topic models // Computer Research and Modeling, 2012, vol. 4, no. 4, pp. 693-706
Citation in English: Vorontsov K.V., Potapenko A.A. Regularization, robustness and sparsity of probabilistic topic models // Computer Research and Modeling, 2012, vol. 4, no. 4, pp. 693-706
DOI: 10.20537/2076-7633-2012-4-4-693-706

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the List of Russian peer-reviewed journals publishing the main research results of PhD and doctoral dissertations.

International Interdisciplinary Conference "Mathematics. Computing. Education"

The journal is included in the RSCI

Indexed in Scopus