All issues
- 2024 Vol. 16
- 2023 Vol. 15
- 2022 Vol. 14
- 2021 Vol. 13
- 2020 Vol. 12
- 2019 Vol. 11
- 2018 Vol. 10
- 2017 Vol. 9
- 2016 Vol. 8
- 2015 Vol. 7
- 2014 Vol. 6
- 2013 Vol. 5
- 2012 Vol. 4
- 2011 Vol. 3
- 2010 Vol. 2
- 2009 Vol. 1
-
Development of and research into a rigid algorithm for analyzing Twitter publications and its influence on the movements of the cryptocurrency market
Computer Research and Modeling, 2023, v. 15, no. 1, pp. 157-170Social media is a crucial indicator of the position of assets in the financial market. The paper describes the rigid solution for the classification problem to determine the influence of social media activity on financial market movements. Reputable crypto traders influencers are selected. Twitter posts packages are used as data. The methods of text, which are characterized by the numerous use of slang words and abbreviations, and preprocessing consist in lemmatization of Stanza and the use of regular expressions. A word is considered as an element of a vector of a data unit in the course of solving the problem of binary classification. The best markup parameters for processing Binance candles are searched for. Methods of feature selection, which is necessary for a precise description of text data and the subsequent process of establishing dependence, are represented by machine learning and statistical analysis. First, the feature selection is used based on the information criterion. This approach is implemented in a random forest model and is relevant for the task of feature selection for splitting nodes in a decision tree. The second one is based on the rigid compilation of a binary vector during a rough check of the presence or absence of a word in the package and counting the sum of the elements of this vector. Then a decision is made depending on the superiority of this sum over the threshold value that is predetermined previously by analyzing the frequency distribution of mentions of the word. The algorithm used to solve the problem was named benchmark and analyzed as a tool. Similar algorithms are often used in automated trading strategies. In the course of the study, observations of the influence of frequently occurring words, which are used as a basis of dimension 2 and 3 in vectorization, are described as well.
-
Development of and research on an algorithm for distinguishing features in Twitter publications for a classification problem with known markup
Computer Research and Modeling, 2023, v. 15, no. 1, pp. 171-183Social media posts play an important role in demonstration of financial market state, and their analysis is a powerful tool for trading. The article describes the result of a study of the impact of social media activities on the movement of the financial market. The top authoritative influencers are selected. Twitter posts are used as data. Such texts usually include slang and abbreviations, so methods for preparing primary text data, including Stanza, regular expressions are presented. Two approaches to the representation of a point in time in the format of text data are considered. The difference of the influence of a single tweet or a whole package consisting of tweets collected over a certain period of time is investigated. A statistical approach in the form of frequency analysis is also considered, metrics defined by the significance of a particular word when identifying the relationship between price changes and Twitter posts are introduced. Frequency analysis involves the study of the occurrence distributions of various words and bigrams in the text for positive, negative or general trends. To build the markup, changes in the market are processed into a binary vector using various parameters, thus setting the task of binary classification. The parameters for Binance candlesticks are sorted out for better description of the movement of the cryptocurrency market, their variability is also explored in this article. Sentiment is studied using Stanford Core NLP. The result of statistical analysis is relevant to feature selection for further binary or multiclass classification tasks. The presented methods of text analysis contribute to the increase of the accuracy of models designed to solve natural language processing problems by selecting words, improving the quality of vectorization. Such algorithms are often used in automated trading strategies to predict the price of an asset, the trend of its movement.
Indexed in Scopus
Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU
The journal is included in the Russian Science Citation Index
The journal is included in the RSCI
International Interdisciplinary Conference "Mathematics. Computing. Education"