Development of and research into a rigid algorithm for analyzing Twitter publications and its influence on the movements of the cryptocurrency market

 pdf (2869K)

Social media is a crucial indicator of the position of assets in the financial market. The paper describes the rigid solution for the classification problem to determine the influence of social media activity on financial market movements. Reputable crypto traders influencers are selected. Twitter posts packages are used as data. The methods of text, which are characterized by the numerous use of slang words and abbreviations, and preprocessing consist in lemmatization of Stanza and the use of regular expressions. A word is considered as an element of a vector of a data unit in the course of solving the problem of binary classification. The best markup parameters for processing Binance candles are searched for. Methods of feature selection, which is necessary for a precise description of text data and the subsequent process of establishing dependence, are represented by machine learning and statistical analysis. First, the feature selection is used based on the information criterion. This approach is implemented in a random forest model and is relevant for the task of feature selection for splitting nodes in a decision tree. The second one is based on the rigid compilation of a binary vector during a rough check of the presence or absence of a word in the package and counting the sum of the elements of this vector. Then a decision is made depending on the superiority of this sum over the threshold value that is predetermined previously by analyzing the frequency distribution of mentions of the word. The algorithm used to solve the problem was named benchmark and analyzed as a tool. Similar algorithms are often used in automated trading strategies. In the course of the study, observations of the influence of frequently occurring words, which are used as a basis of dimension 2 and 3 in vectorization, are described as well.

Keywords: text analysis, natural language processing, Twitter activity, frequency analysis, feature selection, classification problem, financial markets, decision tree, random forest, benchmark
Citation in English: Makarov I.S., Bagantsova E.R., Iashin P.A., Kovaleva M.D., Zakharova E.M. Development of and research into a rigid algorithm for analyzing Twitter publications and its influence on the movements of the cryptocurrency market // Computer Research and Modeling, 2023, vol. 15, no. 1, pp. 157-170
Citation in English: Makarov I.S., Bagantsova E.R., Iashin P.A., Kovaleva M.D., Zakharova E.M. Development of and research into a rigid algorithm for analyzing Twitter publications and its influence on the movements of the cryptocurrency market // Computer Research and Modeling, 2023, vol. 15, no. 1, pp. 157-170
DOI: 10.20537/2076-7633-2023-15-1-157-170

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"