Semi-automated detection of controversy in social media content: an approach based on pre-trained models

 pdf (2059K)

Detecting controversy in online discussions is critical for managing public relations, as it helps inform various processes from policymaking to business. This work aims to expand approaches to online controversy detection based on the expressed emotions. Controversy was defined as an online content phenomenon of provoking disagreements and conflict. This study builds upon prior semantic methods by analyzing estimates of emotional connotations of messages. Modern language models for emotion recognition and named entity recognition are explored as tools of controversy detection. The outputs of these models were aggregated by entity to estimate the entity’s emotional connotation. The emotional divergence score based on the dispersion of emotions was proposed to quantify controversy in user content. Then, entities with sufficiently high emotional divergence relative to the domain of discussions were selected as markers of controversy. A case study of Reddit data related to Sri-Lankan 2022 political crisis was conducted, showing the capabilities of emotional divergence score in controversy detection. A total of two datasets were collected with different methodologies: one aimed at collecting earlier messages and another aimed at collecting more recent ones. The collected data contained discussions of policy, public figures, organizations and locations tied to the crisis. When measured on manually annotated data samples, the proposed method achieved a recall value of 0.705 and a precision value close to 0.496 for the first dataset, while recall of 0.716 and precision of 0.436 were recorded for the second dataset. The main factors that limit the precision were found to be the quality of underlying models and false positives: highly discussed non-controversial markers. Lastly, it was identified that a study of regular emotional distribution of social media content may be helpful for improving controversy detection quality.

Keywords: controversy detection, social media, natural language processing, sentiment analysis, named entities recognition
Citation in English: Zaida A.V., Savelev A.O. Semi-automated detection of controversy in social media content: an approach based on pre-trained models // Computer Research and Modeling, 2026, vol. 18, no. 2, pp. 501-517
Citation in English: Zaida A.V., Savelev A.O. Semi-automated detection of controversy in social media content: an approach based on pre-trained models // Computer Research and Modeling, 2026, vol. 18, no. 2, pp. 501-517
DOI: 10.20537/2076-7633-2026-18-2-501-517

Copyright © 2026 Zaida A.V., Savelev A.O.

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"