Результаты поиска по 'random forest':
Найдено статей: 10
  1. Editor's note
    Computer Research and Modeling, 2021, v. 13, no. 4, pp. 669-671
  2. Editor’s note
    Computer Research and Modeling, 2022, v. 14, no. 6, pp. 1217-1219
  3. Tinkov O.V., Polishchuk P.G., Khachatryan D.S., Kolotaev A.V., Balaev A.N., Osipov V.N., Grigorev B.Y.
    Quantitative analysis of “structure – anticancer activity” and rational molecular design of bi-functional VEGFR-2/HDAC-inhibitors
    Computer Research and Modeling, 2019, v. 11, no. 5, pp. 911-930

    Inhibitors of histone deacetylases (HDACi) have considered as a promising class of drugs for the treatment of cancers because of their effects on cell growth, differentiation, and apoptosis. Angiogenesis play an important role in the growth of most solid tumors and the progression of metastasis. The vascular endothelial growth factor (VEGF) is a key angiogenic agent, which is secreted by malignant tumors, which induces the proliferation and the migration of vascular endothelial cells. Currently, the most promising strategy in the fight against cancer is the creation of hybrid drugs that simultaneously act on several physiological targets. In this work, a series of hybrids bearing N-phenylquinazolin-4-amine and hydroxamic acid moieties were studied as dual VEGFR-2/HDAC inhibitors using simplex representation of the molecular structure and Support Vector Machine (SVM). The total sample of 42 compounds was divided into training and test sets. Five-fold cross-validation (5-fold) was used for internal validation. Satisfactory quantitative structure—activity relationship (QSAR) models were constructed (R2test = 0.64–0.87) for inhibitors of HDAC, VEGFR-2 and human breast cancer cell line MCF-7. The interpretation of the obtained QSAR models was carried out. The coordinated effect of different molecular fragments on the increase of antitumor activity of the studied compounds was estimated. Among the substituents of the N-phenyl fragment, the positive contribution of para bromine for all three types of activity can be distinguished. The results of the interpretation were used for molecular design of potential dual VEGFR-2/HDAC inhibitors. For comparative QSAR research we used physicochemical descriptors calculated by the program HYBOT, the method of Random Forest (RF), and on-line version of the expert system OCHEM (https://ochem.eu). In the modeling of OCHEM PyDescriptor descriptors and extreme gradient boosting was chosen. In addition, the models obtained with the help of the expert system OCHEM were used for virtual screening of 300 compounds to select promising VEGFR-2/HDAC inhibitors for further synthesis and testing.

  4. Zimina S.V., Petrov M.N.
    Application of Random Forest to construct a local operator for flow fields refinement in external aerodynamics problems
    Computer Research and Modeling, 2021, v. 13, no. 4, pp. 761-778

    Numerical modeling of turbulent flows requires finding the balance between accuracy and computational efficiency. For example, DNS and LES models allow to obtain more accurate results, comparing to RANS models, but are more computationally expensive. Because of this, modern applied simulations are mostly performed with RANS models. But even RANS models can be computationally expensive for complex geometries or series simulations due to the necessity of resolving the boundary layer. Some methods, such as wall functions and near-wall domain decomposition, allow to significantly improve the speed of RANS simulations. However, they inevitably lose precision due to using a simplified model in the near-wall domain. To obtain a model that is both accurate and computationally efficient, it is possible to construct a surrogate model based on previously made simulations using the precise model.

    In this paper, an operator is constructed that allows reconstruction of the flow field obtained by an accurate model based on the flow field obtained by the simplified model. Spalart–Allmaras model with approximate nearwall domain decomposition and Spalart–Allmaras model resolving the near-wall region are taken as the simplified and the base models respectively. The operator is constructed using a local approach, i. e. to reconstruct a point in the flow field, only features (flow variables and their derivatives) at this point in the field are used. The operator is constructed using the Random Forest algorithm. The efficiency and accuracy of the obtained surrogate model are demonstrated on the supersonic flow over a compression corner with different values for angle $\alpha$ and Reynolds number. The investigation has been conducted into interpolation and extrapolation both by $Re$ and $\alpha$.

  5. Yifter T.T., Razoumny Y.N., Orlovsky A.V., Lobanov V.K.
    Monitoring the spread of Sosnowskyi’s hogweed using a random forest machine learning algorithm in Google Earth Engine
    Computer Research and Modeling, 2022, v. 14, no. 6, pp. 1357-1370

    Examining the spectral response of plants from data collected using remote sensing has a lot of potential for solving real-world problems in different fields of research. In this study, we have used the spectral property to identify the invasive plant Heracleum sosnowskyi Manden from satellite imagery. H. sosnowskyi is an invasive plant that causes many harms to humans, animals and the ecosystem at large. We have used data collected from the years 2018 to 2020 containing sample geolocation data from the Moscow Region where this plant exists and we have used Sentinel-2 imagery for the spectral analysis towards the aim of detecting it from the satellite imagery. We deployed a Random Forest (RF) machine learning model within the framework of Google Earth Engine (GEE). The algorithm learns from the collected data, which is made up of 12 bands of Sentinel-2, and also includes the digital elevation together with some spectral indices, which are used as features in the algorithm. The approach used is to learn the biophysical parameters of H. sosnowskyi from its reflectances by fitting the RF model directly from the data. Our results demonstrate how the combination of remote sensing and machine learning can assist in locating H. sosnowskyi, which aids in controlling its invasive expansion. Our approach provides a high detection accuracy of the plant, which is 96.93%.

  6. Marchanko L.N., Kasianok Y.A., Gaishun V.E., Bruttan I.V.
    Modeling of rheological characteristics of aqueous suspensions based on nanoscale silicon dioxide particles
    Computer Research and Modeling, 2024, v. 16, no. 5, pp. 1217-1252

    The rheological behavior of aqueous suspensions based on nanoscale silicon dioxide particles strongly depends on the dynamic viscosity, which affects directly the use of nanofluids. The purpose of this work is to develop and validate models for predicting dynamic viscosity from independent input parameters: silicon dioxide concentration SiO2, pH acidity, and shear rate $\gamma$. The influence of the suspension composition on its dynamic viscosity is analyzed. Groups of suspensions with statistically homogeneous composition have been identified, within which the interchangeability of compositions is possible. It is shown that at low shear rates, the rheological properties of suspensions differ significantly from those obtained at higher speeds. Significant positive correlations of the dynamic viscosity of the suspension with SiO2 concentration and pH acidity were established, and negative correlations with the shear rate $\gamma$. Regression models with regularization of the dependence of the dynamic viscosity $\eta$ on the concentrations of SiO2, NaOH, H3PO4, surfactant (surfactant), EDA (ethylenediamine), shear rate γ were constructed. For more accurate prediction of dynamic viscosity, the models using algorithms of neural network technologies and machine learning (MLP multilayer perceptron, RBF radial basis function network, SVM support vector method, RF random forest method) were trained. The effectiveness of the constructed models was evaluated using various statistical metrics, including the average absolute approximation error (MAE), the average quadratic error (MSE), the coefficient of determination $R^2$, and the average percentage of absolute relative deviation (AARD%). The RF model proved to be the best model in the training and test samples. The contribution of each component to the constructed model is determined. It is shown that the concentration of SiO2 has the greatest influence on the dynamic viscosity, followed by pH acidity and shear rate γ. The accuracy of the proposed models is compared to the accuracy of models previously published. The results confirm that the developed models can be considered as a practical tool for studying the behavior of nanofluids, which use aqueous suspensions based on nanoscale particles of silicon dioxide.

  7. Makarov I.S., Bagantsova E.R., Iashin P.A., Kovaleva M.D., Zakharova E.M.
    Development of and research into a rigid algorithm for analyzing Twitter publications and its influence on the movements of the cryptocurrency market
    Computer Research and Modeling, 2023, v. 15, no. 1, pp. 157-170

    Social media is a crucial indicator of the position of assets in the financial market. The paper describes the rigid solution for the classification problem to determine the influence of social media activity on financial market movements. Reputable crypto traders influencers are selected. Twitter posts packages are used as data. The methods of text, which are characterized by the numerous use of slang words and abbreviations, and preprocessing consist in lemmatization of Stanza and the use of regular expressions. A word is considered as an element of a vector of a data unit in the course of solving the problem of binary classification. The best markup parameters for processing Binance candles are searched for. Methods of feature selection, which is necessary for a precise description of text data and the subsequent process of establishing dependence, are represented by machine learning and statistical analysis. First, the feature selection is used based on the information criterion. This approach is implemented in a random forest model and is relevant for the task of feature selection for splitting nodes in a decision tree. The second one is based on the rigid compilation of a binary vector during a rough check of the presence or absence of a word in the package and counting the sum of the elements of this vector. Then a decision is made depending on the superiority of this sum over the threshold value that is predetermined previously by analyzing the frequency distribution of mentions of the word. The algorithm used to solve the problem was named benchmark and analyzed as a tool. Similar algorithms are often used in automated trading strategies. In the course of the study, observations of the influence of frequently occurring words, which are used as a basis of dimension 2 and 3 in vectorization, are described as well.

  8. Gesture recognition is an urgent challenge in developing systems of human-machine interfaces. We analyzed machine learning methods for gesture classification based on electromyographic muscle signals to identify the most effective one. Methods such as the naive Bayesian classifier (NBC), logistic regression, decision tree, random forest, gradient boosting, support vector machine (SVM), $k$-nearest neighbor algorithm, and ensembles (NBC and decision tree, NBC and gradient boosting, gradient boosting and decision tree) were considered. Electromyography (EMG) was chosen as a method of obtaining information about gestures. This solution does not require the location of the hand in the field of view of the camera and can be used to recognize finger movements. To test the effectiveness of the selected methods of gesture recognition, a device was developed for recording the EMG signal, which includes three electrodes and an EMG sensor connected to the microcontroller and the power supply. The following gestures were chosen: clenched fist, “thumb up”, “Victory”, squeezing an index finger and waving a hand from right to left. Accuracy, precision, recall and execution time were used to evaluate the effectiveness of classifiers. These parameters were calculated for three options for the location of EMG electrodes on the forearm. According to the test results, the most effective methods are $k$-nearest neighbors’ algorithm, random forest and the ensemble of NBC and gradient boosting, the average accuracy of ensemble for three electrode positions was 81.55%. The position of the electrodes was also determined at which machine learning methods achieve the maximum accuracy. In this position, one of the differential electrodes is located at the intersection of the flexor digitorum profundus and flexor pollicis longus, the second — above the flexor digitorum superficialis.

  9. Krasnov F.V., Smaznevich I.S., Baskakova E.N.
    Bibliographic link prediction using contrast resampling technique
    Computer Research and Modeling, 2021, v. 13, no. 6, pp. 1317-1336

    The paper studies the problem of searching for fragments with missing bibliographic links in a scientific article using automatic binary classification. To train the model, we propose a new contrast resampling technique, the innovation of which is the consideration of the context of the link, taking into account the boundaries of the fragment, which mostly affects the probability of presence of a bibliographic links in it. The training set was formed of automatically labeled samples that are fragments of three sentences with class labels «without link» and «with link» that satisfy the requirement of contrast: samples of different classes are distanced in the source text. The feature space was built automatically based on the term occurrence statistics and was expanded by constructing additional features — entities (names, numbers, quotes and abbreviations) recognized in the text.

    A series of experiments was carried out on the archives of the scientific journals «Law enforcement review» (273 articles) and «Journal Infectology» (684 articles). The classification was carried out by the models Nearest Neighbors, RBF SVM, Random Forest, Multilayer Perceptron, with the selection of optimal hyperparameters for each classifier.

    Experiments have confirmed the hypothesis put forward. The highest accuracy was reached by the neural network classifier (95%), which is however not as fast as the linear one that showed also high accuracy with contrast resampling (91–94%). These values are superior to those reported for NER and Sentiment Analysis on comparable data. The high computational efficiency of the proposed method makes it possible to integrate it into applied systems and to process documents online.

  10. Makarov I.S., Bagantsova E.R., Iashin P.A., Kovaleva M.D., Gorbachev R.A.
    Development of and research on machine learning algorithms for solving the classification problem in Twitter publications
    Computer Research and Modeling, 2023, v. 15, no. 1, pp. 185-195

    Posts on social networks can both predict the movement of the financial market, and in some cases even determine its direction. The analysis of posts on Twitter contributes to the prediction of cryptocurrency prices. The specificity of the community is represented in a special vocabulary. Thus, slang expressions and abbreviations are used in posts, the presence of which makes it difficult to vectorize text data, as a result of which preprocessing methods such as Stanza lemmatization and the use of regular expressions are considered. This paper describes created simplest machine learning models, which may work despite such problems as lack of data and short prediction timeframe. A word is considered as an element of a binary vector of a data unit in the course of the problem of binary classification solving. Basic words are determined according to the frequency analysis of mentions of a word. The markup is based on Binance candlesticks with variable parameters for a more accurate description of the trend of price changes. The paper introduces metrics that reflect the distribution of words depending on their belonging to a positive or negative classes. To solve the classification problem, we used a dense model with parameters selected by Keras Tuner, logistic regression, a random forest classifier, a naive Bayesian classifier capable of working with a small sample, which is very important for our task, and the k-nearest neighbors method. The constructed models were compared based on the accuracy metric of the predicted labels. During the investigation we recognized that the best approach is to use models which predict price movements of a single coin. Our model deals with posts that mention LUNA project, which no longer exist. This approach to solving binary classification of text data is widely used to predict the price of an asset, the trend of its movement, which is often used in automated trading.

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"