An efficient algorithm for ${\mathrm{\LaTeX}}$ documents comparing

 pdf (331K)  / List of references

The problem is constructing the differences that arise on ${\mathrm{\LaTeX}}$ documents editing. Each document is represented as a parse tree whose nodes are called tokens. The smallest possible text representation of the document that does not change the syntax tree is constructed. All of the text is splitted into fragments whose boundaries correspond to tokens. A map of the initial text fragment sequence to the similar sequence of the edited document corresponding to the minimum distance is built with Hirschberg algorithm A map of text characters corresponding to the text fragment sequences map is cunstructed. Tokens, that chars are all deleted, or all inserted, or all not changed, are selected in the parse trees. The map for the trees formed with other tokens is built using Zhang–Shasha algorithm.

Keywords: automation, editing distance, text analysis, lexeme, machine learning, metric, parse tree, syntax tree, token, ${\mathrm{\LaTeX}}$
Citation in English: Chuvilin K.V. An efficient algorithm for ${\mathrm{\LaTeX}}$ documents comparing // Computer Research and Modeling, 2015, vol. 7, no. 2, pp. 329-345
Citation in English: Chuvilin K.V. An efficient algorithm for ${\mathrm{\LaTeX}}$ documents comparing // Computer Research and Modeling, 2015, vol. 7, no. 2, pp. 329-345
DOI: 10.20537/2076-7633-2015-7-2-329-345
According to Crossref, this article is cited by:
Please note that citation information may be incomplete as it includes data from Crossref cited-by program partners only.
Views (last year): 2. Citations: 2 (RSCI).

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"