All issues
- 2025 Vol. 17
- 2024 Vol. 16
- 2023 Vol. 15
- 2022 Vol. 14
- 2021 Vol. 13
- 2020 Vol. 12
- 2019 Vol. 11
- 2018 Vol. 10
- 2017 Vol. 9
- 2016 Vol. 8
- 2015 Vol. 7
- 2014 Vol. 6
- 2013 Vol. 5
- 2012 Vol. 4
- 2011 Vol. 3
- 2010 Vol. 2
- 2009 Vol. 1
-
Computational treatment of natural language text for intent detection
Computer Research and Modeling, 2024, v. 16, no. 7, pp. 1539-1554Intent detection plays a crucial role in task-oriented conversational systems. To understand the user’s goal, the system relies on its intent detector to classify the user’s utterance, which may be expressed in different forms of natural language, into intent classes. However, lack of data, and the efficacy of intent detection systems has been hindered by the fact that the user’s intent text is typically characterized by short, general sentences and colloquial expressions. The process of algorithmically determining user intent from a given statement is known as intent detection. The goal of this study is to develop an intent detection model that will accurately classify and detect user intent. The model calculates the similarity score of the three models used to determine their similarities. The proposed model uses Contextual Semantic Search (CSS) capabilities for semantic search, Latent Dirichlet Allocation (LDA) for topic modeling, the Bidirectional Encoder Representations from Transformers (BERT) semantic matching technique, and the combination of LDA and BERT for text classification and detection. The dataset acquired is from the broad twitter corpus (BTC) and comprises various meta data. To prepare the data for analysis, a pre-processing step was applied. A sample of 1432 instances were selected out of the 5000 available datasets because manual annotation is required and could be time-consuming. To compare the performance of the model with the existing model, the similarity scores, precision, recall, f1 score, and accuracy were computed. The results revealed that LDA-BERT achieved an accuracy of 95.88% for intent detection, BERT with an accuracy of 93.84%, and LDA with an accuracy of 92.23%. This shows that LDA-BERT performs better than other models. It is hoped that the novel model will aid in ensuring information security and social media intelligence. For future work, an unsupervised LDA-BERT without any labeled data can be studied with the model.
-
Automating high-quality concept banks: leveraging LLMs and multimodal evaluation metrics
Computer Research and Modeling, 2024, v. 16, no. 7, pp. 1555-1567Interpretability in recent deep learning models has become an epicenter of research particularly in sensitive domains such as healthcare, and finance. Concept bottleneck models have emerged as a promising approach for achieving transparency and interpretability by leveraging a set of humanunderstandable concepts as an intermediate representation before the prediction layer. However, manual concept annotation is discouraged due to the time and effort involved. Our work explores the potential of large language models (LLMs) for generating high-quality concept banks and proposes a multimodal evaluation metric to assess the quality of generated concepts. We investigate three key research questions: the ability of LLMs to generate concept banks comparable to existing knowledge bases like ConceptNet, the sufficiency of unimodal text-based semantic similarity for evaluating concept-class label associations, and the effectiveness of multimodal information in quantifying concept generation quality compared to unimodal concept-label semantic similarity. Our findings reveal that multimodal models outperform unimodal approaches in capturing concept-class label similarity. Furthermore, our generated concepts for the CIFAR-10 and CIFAR-100 datasets surpass those obtained from ConceptNet and the baseline comparison, demonstrating the standalone capability of LLMs in generating highquality concepts. Being able to automatically generate and evaluate high-quality concepts will enable researchers to quickly adapt and iterate to a newer dataset with little to no effort before they can feed that into concept bottleneck models.
-
Extraction of characters and events from narratives
Computer Research and Modeling, 2024, v. 16, no. 7, pp. 1593-1600Events and character extraction from narratives is a fundamental task in text analysis. The application of event extraction techniques ranges from the summarization of different documents to the analysis of medical notes. We identify events based on a framework named “four W” (Who, What, When, Where) to capture all the essential components like the actors, actions, time, and places. In this paper, we explore two prominent techniques for event extraction: statistical parsing of syntactic trees and semantic role labeling. While these techniques were investigated by different researchers in isolation, we directly compare the performance of the two approaches on our custom dataset, which we have annotated.
Our analysis shows that statistical parsing of syntactic trees outperforms semantic role labeling in event and character extraction, especially in identifying specific details. Nevertheless, semantic role labeling demonstrate good performance in correct actor identification. We evaluate the effectiveness of both approaches by comparing different metrics like precision, recall, and F1-scores, thus, demonstrating their respective advantages and limitations.
Moreover, as a part of our work, we propose different future applications of event extraction techniques that we plan to investigate. The areas where we want to apply these techniques include code analysis and source code authorship attribution. We consider using event extraction to retrieve key code elements as variable assignments and function calls, which can further help us to analyze the behavior of programs and identify the project’s contributors. Our work provides novel understandings of the performance and efficiency of statistical parsing and semantic role labeling techniques, offering researchers new directions for the application of these techniques.
-
Extracting knowledge from text messages: overview and state-of-the-art
Computer Research and Modeling, 2021, v. 13, no. 6, pp. 1291-1315In general, solving the information explosion problem can be delegated to systems for automatic processing of digital data. These systems are intended for recognizing, sorting, meaningfully processing and presenting data in formats readable and interpretable by humans. The creation of intelligent knowledge extraction systems that handle unstructured data would be a natural solution in this area. At the same time, the evident progress in these tasks for structured data contrasts with the limited success of unstructured data processing, and, in particular, document processing. Currently, this research area is undergoing active development and investigation. The present paper is a systematic survey on both Russian and international publications that are dedicated to the leading trend in automatic text data processing: Text Mining (TM). We cover the main tasks and notions of TM, as well as its place in the current AI landscape. Furthermore, we analyze the complications that arise during the processing of texts written in natural language (NLP) which are weakly structured and often provide ambiguous linguistic information. We describe the stages of text data preparation, cleaning, and selecting features which, alongside the data obtained via morphological, syntactic, and semantic analysis, constitute the input for the TM process. This process can be represented as mapping a set of text documents to «knowledge». Using the case of stock trading, we demonstrate the formalization of the problem of making a trade decision based on a set of analytical recommendations. Examples of such mappings are methods of Information Retrieval (IR), text summarization, sentiment analysis, document classification and clustering, etc. The common point of all tasks and techniques of TM is the selection of word forms and their derivatives used to recognize content in NL symbol sequences. Considering IR as an example, we examine classic types of search, such as searching for word forms, phrases, patterns and concepts. Additionally, we consider the augmentation of patterns with syntactic and semantic information. Next, we provide a general description of all NLP instruments: morphological, syntactic, semantic and pragmatic analysis. Finally, we end the paper with a comparative analysis of modern TM tools which can be helpful for selecting a suitable TM platform based on the user’s needs and skills.
Indexed in Scopus
Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU
The journal is included in the Russian Science Citation Index
The journal is included in the RSCI
International Interdisciplinary Conference "Mathematics. Computing. Education"