Результаты поиска по 'regular expression':
Найдено статей: 7
  1. Matyushkin I.V., Rubis P.D., Zapletina M.A.
    Experimental study of the dynamics of single and connected in a lattice complex-valued mappings: the architecture and interface of author’s software for modeling
    Computer Research and Modeling, 2021, v. 13, no. 6, pp. 1101-1124

    The paper describes a free software for research in the field of holomorphic dynamics based on the computational capabilities of the MATLAB environment. The software allows constructing not only single complex-valued mappings, but also their collectives as linearly connected, on a square or hexagonal lattice. In the first case, analogs of the Julia set (in the form of escaping points with color indication of the escape velocity), Fatou (with chaotic dynamics highlighting), and the Mandelbrot set generated by one of two free parameters are constructed. In the second case, only the dynamics of a cellular automaton with a complex-valued state of the cells and of all the coefficients in the local transition function is considered. The abstract nature of object-oriented programming makes it possible to combine both types of calculations within a single program that describes the iterated dynamics of one object.

    The presented software provides a set of options for the field shape, initial conditions, neighborhood template, and boundary cells neighborhood features. The mapping display type can be specified by a regular expression for the MATLAB interpreter. This paper provides some UML diagrams, a short introduction to the user interface, and some examples.

    The following cases are considered as example illustrations containing new scientific knowledge:

    1) a linear fractional mapping in the form $Az^{n} +B/z^{n} $, for which the cases $n=2$, $4$, $n>1$, are known. In the portrait of the Fatou set, attention is drawn to the characteristic (for the classical quadratic mapping) figures of <>, showing short-period regimes, components of conventionally chaotic dynamics in the sea;

    2) for the Mandelbrot set with a non-standard position of the parameter in the exponent $z(t+1)\Leftarrow z(t)^{\mu } $ sketch calculations reveal some jagged structures and point clouds resembling Cantor's dust, which are not Cantor's bouquets that are characteristic for exponential mapping. Further detailing of these objects with complex topology is required.

  2. Bratsun D.A., Buzmakov M.D.
    Repressilator with time-delayed gene expression. Part II. Stochastic description
    Computer Research and Modeling, 2021, v. 13, no. 3, pp. 587-609

    The repressilator is the first genetic regulatory network in synthetic biology, which was artificially constructed in 2000. It is a closed network of three genetic elements $lacI$, $\lambda cI$ and $tetR$, which have a natural origin, but are not found in nature in such a combination. The promoter of each of the three genes controls the next cistron via the negative feedback, suppressing the expression of the neighboring gene. In our previous paper [Bratsun et al., 2018], we proposed a mathematical model of a delayed repressillator and studied its properties within the framework of a deterministic description. We assume that delay can be both natural, i.e. arises during the transcription / translation of genes due to the multistage nature of these processes, and artificial, i.e. specially to be introduced into the work of the regulatory network using gene engineering technologies. In this work, we apply the stochastic description of dynamic processes in a delayed repressilator, which is an important addition to deterministic analysis due to the small number of molecules involved in gene regulation. The stochastic study is carried out numerically using the Gillespie algorithm, which is modified for time delay systems. We present the description of the algorithm, its software implementation, and the results of benchmark simulations for a onegene delayed autorepressor. When studying the behavior of a repressilator, we show that a stochastic description in a number of cases gives new information about the behavior of a system, which does not reduce to deterministic dynamics even when averaged over a large number of realizations. We show that in the subcritical range of parameters, where deterministic analysis predicts the absolute stability of the system, quasi-regular oscillations may be excited due to the nonlinear interaction of noise and delay. Earlier, we have discovered within the framework of the deterministic description, that there exists a long-lived transient regime, which is represented in the phase space by a slow manifold. This mode reflects the process of long-term synchronization of protein pulsations in the work of the repressilator genes. In this work, we show that the transition to the cooperative mode of gene operation occurs a two order of magnitude faster, when the effect of the intrinsic noise is taken into account. We have obtained the probability distribution of moment when the phase trajectory leaves the slow manifold and have determined the most probable time for such a transition. The influence of the intrinsic noise of chemical reactions on the dynamic properties of the repressilator is discussed.

  3. Makarov I.S., Bagantsova E.R., Iashin P.A., Kovaleva M.D., Zakharova E.M.
    Development of and research into a rigid algorithm for analyzing Twitter publications and its influence on the movements of the cryptocurrency market
    Computer Research and Modeling, 2023, v. 15, no. 1, pp. 157-170

    Social media is a crucial indicator of the position of assets in the financial market. The paper describes the rigid solution for the classification problem to determine the influence of social media activity on financial market movements. Reputable crypto traders influencers are selected. Twitter posts packages are used as data. The methods of text, which are characterized by the numerous use of slang words and abbreviations, and preprocessing consist in lemmatization of Stanza and the use of regular expressions. A word is considered as an element of a vector of a data unit in the course of solving the problem of binary classification. The best markup parameters for processing Binance candles are searched for. Methods of feature selection, which is necessary for a precise description of text data and the subsequent process of establishing dependence, are represented by machine learning and statistical analysis. First, the feature selection is used based on the information criterion. This approach is implemented in a random forest model and is relevant for the task of feature selection for splitting nodes in a decision tree. The second one is based on the rigid compilation of a binary vector during a rough check of the presence or absence of a word in the package and counting the sum of the elements of this vector. Then a decision is made depending on the superiority of this sum over the threshold value that is predetermined previously by analyzing the frequency distribution of mentions of the word. The algorithm used to solve the problem was named benchmark and analyzed as a tool. Similar algorithms are often used in automated trading strategies. In the course of the study, observations of the influence of frequently occurring words, which are used as a basis of dimension 2 and 3 in vectorization, are described as well.

  4. Makarov I.S., Bagantsova E.R., Iashin P.A., Kovaleva M.D., Gorbachev R.A.
    Development of and research on an algorithm for distinguishing features in Twitter publications for a classification problem with known markup
    Computer Research and Modeling, 2023, v. 15, no. 1, pp. 171-183

    Social media posts play an important role in demonstration of financial market state, and their analysis is a powerful tool for trading. The article describes the result of a study of the impact of social media activities on the movement of the financial market. The top authoritative influencers are selected. Twitter posts are used as data. Such texts usually include slang and abbreviations, so methods for preparing primary text data, including Stanza, regular expressions are presented. Two approaches to the representation of a point in time in the format of text data are considered. The difference of the influence of a single tweet or a whole package consisting of tweets collected over a certain period of time is investigated. A statistical approach in the form of frequency analysis is also considered, metrics defined by the significance of a particular word when identifying the relationship between price changes and Twitter posts are introduced. Frequency analysis involves the study of the occurrence distributions of various words and bigrams in the text for positive, negative or general trends. To build the markup, changes in the market are processed into a binary vector using various parameters, thus setting the task of binary classification. The parameters for Binance candlesticks are sorted out for better description of the movement of the cryptocurrency market, their variability is also explored in this article. Sentiment is studied using Stanford Core NLP. The result of statistical analysis is relevant to feature selection for further binary or multiclass classification tasks. The presented methods of text analysis contribute to the increase of the accuracy of models designed to solve natural language processing problems by selecting words, improving the quality of vectorization. Such algorithms are often used in automated trading strategies to predict the price of an asset, the trend of its movement.

  5. Zenkov A.V.
    A novel method of stylometry based on the statistic of numerals
    Computer Research and Modeling, 2017, v. 9, no. 5, pp. 837-850

    A new method of statistical analysis of texts is suggested. The frequency distribution of the first significant digits in numerals of English-language texts is considered. We have taken into account cardinal as well as ordinal numerals expressed both in figures, and verbally. To identify the author’s use of numerals, we previously deleted from the text all idiomatic expressions and set phrases accidentally containing numerals, as well as itemizations and page numbers, etc. Benford’s law is found to hold approximately for the frequencies of various first significant digits of compound literary texts by different authors; a marked predominance of the digit 1 is observed. In coherent authorial texts, characteristic deviations from Benford’s law arise which are statistically stable significant author peculiarities that allow, under certain conditions, to consider the problem of authorship and distinguish between texts by different authors. The text should be large enough (at least about 200 kB). At the end of $\{1, 2, \ldots, 9\}$ digits row, the frequency distribution is subject to strong fluctuations and thus unrepresentative for our purpose. The aim of the theoretical explanation of the observed empirical regularity is not intended, which, however, does not preclude the applicability of the proposed methodology for text attribution. The approach suggested and the conclusions are backed by the examples of the computer analysis of works by W.M. Thackeray, M. Twain, R. L. Stevenson, J. Joyce, sisters Bront¨e, and J.Austen. On the basis of technique suggested, we examined the authorship of a text earlier ascribed to L. F. Baum (the result agrees with that obtained by different means). We have shown that the authorship of Harper Lee’s “To Kill a Mockingbird” pertains to her, whereas the primary draft, “Go Set a Watchman”, seems to have been written in collaboration with Truman Capote. All results are confirmed on the basis of parametric Pearson’s chi-squared test as well as non-parametric Mann –Whitney U test and Kruskal –Wallis test.

    Views (last year): 10.
  6. Makarov I.S., Bagantsova E.R., Iashin P.A., Kovaleva M.D., Gorbachev R.A.
    Development of and research on machine learning algorithms for solving the classification problem in Twitter publications
    Computer Research and Modeling, 2023, v. 15, no. 1, pp. 185-195

    Posts on social networks can both predict the movement of the financial market, and in some cases even determine its direction. The analysis of posts on Twitter contributes to the prediction of cryptocurrency prices. The specificity of the community is represented in a special vocabulary. Thus, slang expressions and abbreviations are used in posts, the presence of which makes it difficult to vectorize text data, as a result of which preprocessing methods such as Stanza lemmatization and the use of regular expressions are considered. This paper describes created simplest machine learning models, which may work despite such problems as lack of data and short prediction timeframe. A word is considered as an element of a binary vector of a data unit in the course of the problem of binary classification solving. Basic words are determined according to the frequency analysis of mentions of a word. The markup is based on Binance candlesticks with variable parameters for a more accurate description of the trend of price changes. The paper introduces metrics that reflect the distribution of words depending on their belonging to a positive or negative classes. To solve the classification problem, we used a dense model with parameters selected by Keras Tuner, logistic regression, a random forest classifier, a naive Bayesian classifier capable of working with a small sample, which is very important for our task, and the k-nearest neighbors method. The constructed models were compared based on the accuracy metric of the predicted labels. During the investigation we recognized that the best approach is to use models which predict price movements of a single coin. Our model deals with posts that mention LUNA project, which no longer exist. This approach to solving binary classification of text data is widely used to predict the price of an asset, the trend of its movement, which is often used in automated trading.

  7. Chuvilin K.V.
    The use of syntax trees in order to automate the correction of LaTeX documents
    Computer Research and Modeling, 2012, v. 4, no. 4, pp. 871-883

    The problem is to automate the correction of LaTeX documents. Each document is represented as a parse tree. The modified Zhang-Shasha algorithm is used to construct a mapping of tree vertices of the original document to the tree vertices of the edited document, which corresponds to the minimum editing distance. Vertex to vertex maps form the training set, which is used to generate rules for automatic correction. The statistics of the applicability to the edited documents is collected for each rule. It is used for quality assessment and improvement of the rules.

    Citations: 5 (RSCI).

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"