Automated citation graph building from a corpora of scientific documents

 pdf (124K)  / Annotation

List of references:

  1. E. Agichtein, V. Ganti. Mining reference tables for automatic text segmentation / Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. — ACM Press, 2004. — P. 20–29.
  2. D. Beeferman, A. Berger, J. Lafferty. Statistical models for text segmentation / Machine Learning. — 1999. — P. 177–210.
  3. M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, S. Fienberg. Adaptive name matching in information integration // IEEE Intelligent Systems. — 2003. — V. 18. — P. 16–23. — DOI: 10.1109/MIS.2003.1234765.
  4. V. Borkar, K. Deshmukh, S. Sarawagi. Automatic segmentation of text into structured records. — 2001.
  5. T. Brants. Topic-based document segmentation with probabilistic latent semantic analysis / Proceedings of CIKM. — ACM Press, 2002. — P. 211–218. — McLean. — ads: 2002evn..conf..203B.
  6. F. Y. Y. Choi, P. Wiemer-Hastings, J. Moore. Latent semantic analysis for text segmentation / Proceedings of EMNLP. — 2001. — P. 109–117.
  7. P. Christen. A survey of indexing techniques for scalable record linkage and deduplication // IEEE Transactions on Knowledge and Data Engineering. — 2012. — V. 24. — P. 1537–1555. — DOI: 10.1109/TKDE.2011.127.
  8. P. Christen, T. Churches, J. X. Zhu. Probabilistic name and address cleaning and standardisation. — 2002.
  9. W. W. Cohen, J. Richman. Learning to match and cluster large high-dimensional data sets for data integration. — 2002.
  10. W. W. Cohen, P. Ravikumar, S. E. Fienberg. A comparison of string distance metrics for namematching tasks. — 2003. — P. 73–78.
  11. E. Cortez, A. S. da Silva, M. A. Gon¸calves, F. Mesquita, E. S. de Moura. Flux-cim: flexible unsupervised extraction of citation metadata / JCDL ’07: Proceedings of the 7th ACM.
  12. I. G. Councill, C. L. Giles, M. y. Kan. Parscit: An open-source crf reference string parsing package / International language resources and evaluation. — European Language Resources Association, 2008.
  13. A. K. Elmagarmid, P. G. Ipeirotis, V. S. Verykios. Duplicate record detection: A survey / Transactions on knowledge and data engineering. — 2007. — P. 2007. — MathSciNet: MR2269091.
  14. H. H. C. L. Giles, E. Manavoglu, H. Zha, Z. Zhang, E. A. Fox. Automatic document metadata extraction using support vector machines / JCDL ’03: Proceedings of the 3rd ACM.
  15. T. Kudoh, Yu. Matsumoto. Use of support vector learning for chunk identification / Proceedings of CoNLL-2000 and LLL-2000. — 2000. — P. 142–144.
  16. J. Lafferty. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. — Morgan Kaufmann, 2001. — P. 282–289.
  17. A. McCallum, K. Nigam, L. H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. — 2000.
  18. J. R. Quinlan. C4.5: programs for machine learning. — San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 1993.
  19. L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition // Proceedings of the IEEE. — 1989. — V. 77, no. 2. — P. 257–286. — DOI: 10.1109/5.18626.
  20. M. Skounakis, M. Craven, S. Ray. Hierarchical hidden markov models for information extraction / Proceedings of the 18th International Joint Conference on Artificial Intelligence. — Morgan Kaufmann, 2003. — P. 427–433.
  21. S.-Z. Yu. Hidden semi-markov models // Artificial Intelligence. — 2010. — MathSciNet: MR2724430.
  22. T. Zhang, F. Damerau, D. Johnson. Text chunking based on a generalization of winnow // Journal of Machine Learning Research. — 2001. — no. 2. — P. 615–637.

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"