Результаты поиска по 'text analysis':
Найдено статей: 21
  1. Editor’s note
    Computer Research and Modeling, 2024, v. 16, no. 7, pp. 1533-1538
  2. Adekotujo A.S., Enikuomehin T., Aribisala B., Mazzara M., Zubair A.F.
    Computational treatment of natural language text for intent detection
    Computer Research and Modeling, 2024, v. 16, no. 7, pp. 1539-1554

    text-align: justify;">Intent detection plays a crucial role in task-oriented conversational systems. To understand the user’s goal, the system relies on its intent detector to classify the user’s utterance, which may be expressed in different forms of natural language, into intent classes. However, lack of data, and the efficacy of intent detection systems has been hindered by the fact that the user’s intent text is typically characterized by short, general sentences and colloquial expressions. The process of algorithmically determining user intent from a given statement is known as intent detection. The goal of this study is to develop an intent detection model that will accurately classify and detect user intent. The model calculates the similarity score of the three models used to determine their similarities. The proposed model uses Contextual Semantic Search (CSS) capabilities for semantic search, Latent Dirichlet Allocation (LDA) for topic modeling, the Bidirectional Encoder Representations from Transformers (BERT) semantic matching technique, and the combination of LDA and BERT for text classification and detection. The dataset acquired is from the broad twitter corpus (BTC) and comprises various meta data. To prepare the data for analysis, a pre-processing step was applied. A sample of 1432 instances were selected out of the 5000 available datasets because manual annotation is required and could be time-consuming. To compare the performance of the model with the existing model, the similarity scores, precision, recall, f1 score, and accuracy were computed. The results revealed that LDA-BERT achieved an accuracy of 95.88% for intent detection, BERT with an accuracy of 93.84%, and LDA with an accuracy of 92.23%. This shows that LDA-BERT performs better than other models. It is hoped that the novel model will aid in ensuring information security and social media intelligence. For future work, an unsupervised LDA-BERT without any labeled data can be studied with the model.

  3. text-align: justify;">In recent years, the use of neural network models for solving aerodynamics problems has become widespread. These models, trained on a set of previously obtained solutions, predict solutions to new problems. They are, in essence, interpolation algorithms. An alternative approach is to construct a neural network operator. This is a neural network that reproduces a numerical method used to solve a problem. It allows to find the solution in iterations. The paper considers the construction of such an operator using the UNet neural network with a spatial attention mechanism. It solves flow problems on a rectangular uniform grid that is common to a streamlined body and flow field. A correction mechanism is proposed to clarify the obtained solution. The problem of the stability of such an algorithm for solving a stationary problem is analyzed, and a comparison is made with other variants of its construction, including pushforward trick and positional encoding. The issue of selecting a set of iterations for forming a train dataset is considered, and the behavior of the solution is assessed using repeated use of a neural network operator.

    text-align: justify;">A demonstration of the method is provided for the case of flow around a rounded plate with a turbulent flow, with various options for rounding, for fixed parameters of the incoming flow, with Reynolds number $\text{Re} = 10^5$ and Mach number $M = 0.15$. Since flows with these parameters of the incoming flow can be considered incompressible, only velocity components are directly studied. At the same time, the neural network model used to construct the operator has a common decoder for both velocity components. Comparison of flow fields and velocity profiles along the normal and outline of the body, obtained using a neural network operator and numerical methods, is carried out. Analysis is performed both on the plate and rounding. Simulation results confirm that the neural network operator allows finding a solution with high accuracy and stability.

  4. Vorontsov K.V., Potapenko A.A.
    Regularization, robustness and sparsity of probabilistic topic models
    Computer Research and Modeling, 2012, v. 4, no. 4, pp. 693-706

    text-align: justify;">We propose a generalized probabilistic topic model of text corpora which can incorporate heuristics of Bayesian regularization, sampling, frequent parameters update, and robustness in any combinations. Wellknown models PLSA, LDA, CVB0, SWB, and many others can be considered as special cases of the proposed broad family of models. We propose the robust PLSA model and show that it is more sparse and performs better that regularized models like LDA.

    Views (last year): 25. Citations: 12 (RSCI).
  5. Kulikov Y.M., Son E.E.
    CABARET scheme implementation for free shear layer modeling
    Computer Research and Modeling, 2017, v. 9, no. 6, pp. 881-903

    text-align: justify;">In present paper we reexamine the properties of CABARET numerical scheme formulated for a weakly compressible fluid flow basing the results of free shear layer modeling. Kelvin–Helmholtz instability and successive generation of two-dimensional turbulence provide a wide field for a scheme analysis including temporal evolution of the integral energy and enstrophy curves, the vorticity patterns and energy spectra, as well as the dispersion relation for the instability increment. The most part of calculations is performed for Reynolds number $\text{Re} = 4 \times 10^5$ for square grids sequentially refined in the range of $128^2-2048^2$ nodes. An attention is paid to the problem of underresolved layers generating a spurious vortex during the vorticity layers roll-up. This phenomenon takes place only on a coarse grid with $128^2$ nodes, while the fully regularized evolution pattern of vorticity appears only when approaching $1024^2$-node grid. We also discuss the vorticity resolution properties of grids used with respect to dimensional estimates for the eddies at the borders of the inertial interval, showing that the available range of grids appears to be sufficient for a good resolution of small–scale vorticity patches. Nevertheless, we claim for the convergence achieved for the domains occupied by large-scale structures.

    text-align: justify;">The generated turbulence evolution is consistent with theoretical concepts imposing the emergence of large vortices, which collect all the kinetic energy of motion, and solitary small-scale eddies. The latter resemble the coherent structures surviving in the filamentation process and almost noninteracting with other scales. The dissipative characteristics of numerical method employed are discussed in terms of kinetic energy dissipation rate calculated directly and basing theoretical laws for incompressible (via enstrophy curves) and compressible (with respect to the strain rate tensor and dilatation) fluid models. The asymptotic behavior of the kinetic energy and enstrophy cascades comply with two-dimensional turbulence laws $E(k) \propto k^{−3}, \omega^2(k) \propto k^{−1}$. Considering the instability increment as a function of dimensionless wave number shows a good agreement with other papers, however, commonly used method of instability growth rate calculation is not always accurate, so some modification is proposed. Thus, the implemented CABARET scheme possessing remarkably small numerical dissipation and good vorticity resolution is quite competitive approach compared to other high-order accuracy methods

    Views (last year): 17.
  6. Zabello K.K., Garbaruk A.V.
    Investigation of the accuracy of the lattice Boltzmann method in calculating acoustic wave propagation
    Computer Research and Modeling, 2025, v. 17, no. 6, pp. 1069-1081

    text-align: justify;">The article presents a systematic investigation of the capabilities of the lattice Boltzmann method (LBM) for modeling the propagation of acoustic waves. The study considers the problem of wave propagation from a point harmonic source in an unbounded domain, both in a quiescent medium (Mach number $M=0$) and in the presence of a uniform mean flow ($M=0.2$). Both scenarios admit analytical solutions within the framework of linear acoustics, allowing for a quantitative assessment of the accuracy of the numerical method.

    text-align: justify;">The numerical implementation employs the two-dimensional D2Q9 velocity model and the Bhatnagar – Gross – Krook (BGK) collision operator. The oscillatory source is modeled using Gou’s scheme, while spurious high-order moment noise generated by the source is suppressed via a regularization procedure applied to the distribution functions. To minimize wave reflections from the boundaries of the computational domain, a hybrid approach is used, combining characteristic boundary conditions based on Riemann invariants with perfectly matched layers (PML) featuring a parabolic damping profile.

    text-align: justify;">A detailed analysis is conducted to assess the influence of computational parameters on the accuracy of the method. The dependence of the error on the PML thickness ($L_{\text{PML}}^{}$) and the maximum damping coefficient ($\sigma_{\max}^{}$), the dimensionless source amplitude ($Q'_0$), and the grid resolution is thoroughly examined. The results demonstrate that the LBM is suitable for simulating acoustic wave propagation and exhibits second-order accuracy. It is shown that achieving high accuracy (relative pressure error below $1\,\%$) requires a spatial resolution of at least $20$ grid points per wavelength ($\lambda$). The minimal effective PML parameters ensuring negligible boundary reflections are identified as $\sigma_{\max}^{}\geqslant 0.02$ and $L_{\text{PML}}^{} \geqslant 2\lambda$. Additionally, it is shown that for source amplitudes $Q_0' \geqslant 0.1$, nonlinear effects become significant compared to other sources of error.

  7. Kochergin A.V., Kholmatova Z.Sh.
    Extraction of characters and events from narratives
    Computer Research and Modeling, 2024, v. 16, no. 7, pp. 1593-1600

    text-align: justify;">Events and character extraction from narratives is a fundamental task in text analysis. The application of event extraction techniques ranges from the summarization of different documents to the analysis of medical notes. We identify events based on a framework named “four W” (Who, What, When, Where) to capture all the essential components like the actors, actions, time, and places. In this paper, we explore two prominent techniques for event extraction: statistical parsing of syntactic trees and semantic role labeling. While these techniques were investigated by different researchers in isolation, we directly compare the performance of the two approaches on our custom dataset, which we have annotated.

    text-align: justify;">Our analysis shows that statistical parsing of syntactic trees outperforms semantic role labeling in event and character extraction, especially in identifying specific details. Nevertheless, semantic role labeling demonstrate good performance in correct actor identification. We evaluate the effectiveness of both approaches by comparing different metrics like precision, recall, and F1-scores, thus, demonstrating their respective advantages and limitations.

    text-align: justify;">Moreover, as a part of our work, we propose different future applications of event extraction techniques that we plan to investigate. The areas where we want to apply these techniques include code analysis and source code authorship attribution. We consider using event extraction to retrieve key code elements as variable assignments and function calls, which can further help us to analyze the behavior of programs and identify the project’s contributors. Our work provides novel understandings of the performance and efficiency of statistical parsing and semantic role labeling techniques, offering researchers new directions for the application of these techniques.

  8. Orlova I.N., Golubtsova A.N., Orlov V.A., Orlov N.V.
    Research on the achievability of a goal in a medical quest
    Computer Research and Modeling, 2025, v. 17, no. 6, pp. 1149-1179

    text-align: justify;">The work presents an experimental study of the tree structure that occurs during a medical examination. At each meeting with a medical specialist, the patient receives a certain number of areas for consulting other specialists or for tests. A tree of directions arises, each branch of which the patient should pass. Depending on the branching of the tree, it can be as final — and in this case the examination can be completed — and endless when the patient’s goal cannot be achieved. In the work both experimentally and theoretically studied the critical properties of the transition of the system from the forest of the final trees to the forest endless, depending on the probabilistic characteristics of the tree.

    text-align: justify;">For the description, a model is proposed in which a discrete function of the probability of the number of branches on the node repeats the dynamics of a continuous gaussian distribution. The characteristics of the distribution of the Gauss (mathematical expectation of $x_0$, the average quadratic deviation of $\sigma$) are model parameters. In the selected setting, the task refers to the problems of branching random processes (BRP) in the heterogeneous model of Galton – Watson.

    text-align: justify;">Experimental study is carried out by numerical modeling on the final grilles. A phase diagram was built, the boundaries of areas of various phases are determined. A comparison was made with the phase diagram obtained from theoretical criteria for macrosystems, and an adequate correspondence was established. It is shown that on the final grilles the transition is blurry.

    text-align: justify;">The description of the blurry phase transition was carried out using two approaches. In the first, standard approach, the transition is described using the so-called inclusion function, which makes the meaning of the share of one of the phases in the general set. It was established that such an approach in this system is ineffective, since the found position of the conditional boundary of the blurred transition is determined only by the size of the chosen experimental lattice and does not bear objective meaning.

    text-align: justify;">The second, original approach is proposed, based on the introduction of an parameter of order equal to the reverse average tree height, and the analysis of its behavior. It was established that the dynamics of such an order parameter in the $\sigma = \text{const}$ section with very small differences has the type of distribution of Fermi – Dirac ($\sigma$ performs the same function as the temperature for the distribution of Fermi – Dirac, $x_0$ — energy function). An empirical expression has been selected for the order parameter, an analogue of the chemical potential is introduced and calculated, which makes sense of the characteristic scale of the order parameter — that is, the values of $x_0$, in which the order can be considered a disorder. This criterion is the basis for determining the boundary of the conditional transition in this approach. It was established that this boundary corresponds to the average height of a tree equal to two generations. Based on the found properties, recommendations for medical institutions are proposed to control the provision of limb of the path of patients.

    text-align: justify;">The model discussed and its description using conditionally-infinite trees have applications to many hierarchical systems. These systems include: internet routing networks, bureaucratic networks, trade and logistics networks, citation networks, game strategies, population dynamics problems, and others.

  9. Chuvilin K.V.
    An efficient algorithm for ${\mathrm{\LaTeX}}$ documents comparing
    Computer Research and Modeling, 2015, v. 7, no. 2, pp. 329-345

    text-align: justify;">The problem is constructing the differences that arise on ${\mathrm{\LaTeX}}$ documents editing. Each document is represented as a parse tree whose nodes are called tokens. The smallest possible text representation of the document that does not change the syntax tree is constructed. All of the text is splitted into fragments whose boundaries correspond to tokens. A map of the initial text fragment sequence to the similar sequence of the edited document corresponding to the minimum distance is built with Hirschberg algorithm A map of text characters corresponding to the text fragment sequences map is cunstructed. Tokens, that chars are all deleted, or all inserted, or all not changed, are selected in the parse trees. The map for the trees formed with other tokens is built using Zhang–Shasha algorithm.

    Views (last year): 2. Citations: 2 (RSCI).
  10. Makarov I.S., Bagantsova E.R., Iashin P.A., Kovaleva M.D., Zakharova E.M.
    Development of and research into a rigid algorithm for analyzing Twitter publications and its influence on the movements of the cryptocurrency market
    Computer Research and Modeling, 2023, v. 15, no. 1, pp. 157-170

    text-align: justify;">Social media is a crucial indicator of the position of assets in the financial market. The paper describes the rigid solution for the classification problem to determine the influence of social media activity on financial market movements. Reputable crypto traders influencers are selected. Twitter posts packages are used as data. The methods of text, which are characterized by the numerous use of slang words and abbreviations, and preprocessing consist in lemmatization of Stanza and the use of regular expressions. A word is considered as an element of a vector of a data unit in the course of solving the problem of binary classification. The best markup parameters for processing Binance candles are searched for. Methods of feature selection, which is necessary for a precise description of text data and the subsequent process of establishing dependence, are represented by machine learning and statistical analysis. First, the feature selection is used based on the information criterion. This approach is implemented in a random forest model and is relevant for the task of feature selection for splitting nodes in a decision tree. The second one is based on the rigid compilation of a binary vector during a rough check of the presence or absence of a word in the package and counting the sum of the elements of this vector. Then a decision is made depending on the superiority of this sum over the threshold value that is predetermined previously by analyzing the frequency distribution of mentions of the word. The algorithm used to solve the problem was named benchmark and analyzed as a tool. Similar algorithms are often used in automated trading strategies. In the course of the study, observations of the influence of frequently occurring words, which are used as a basis of dimension 2 and 3 in vectorization, are described as well.

Pages: next last »

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"