Результаты поиска по 'data analysis':
Найдено статей: 117
  1. Zavodskikh R.K., Efanov N.N.
    Performance prediction for chosen types of loops over one-dimensional arrays with embedding-driven intermediate representations analysis
    Computer Research and Modeling, 2023, v. 15, no. 1, pp. 211-224

    The method for mapping of intermediate representations (IR) set of C, C++ programs to vector embedding space is considered to create an empirical estimation framework for static performance prediction using LLVM compiler infrastructure. The usage of embeddings makes programs easier to compare due to avoiding Control Flow Graphs (CFG) and Data Flow Graphs (DFG) direct comparison. This method is based on transformation series of the initial IR such as: instrumentation — injection of artificial instructions in an instrumentation compiler’s pass depending on load offset delta in the current instruction compared to the previous one, mapping of instrumented IR into multidimensional vector with IR2Vec and dimension reduction with t-SNE (t-distributed stochastic neighbor embedding) method. The D1 cache miss ratio measured with perf stat tool is considered as performance metric. A heuristic criterion of programs having more or less cache miss ratio is given. This criterion is based on embeddings of programs in 2D-space. The instrumentation compiler’s pass developed in this work is described: how it generates and injects artificial instructions into IR within the used memory model. The software pipeline that implements the performance estimation based on LLVM compiler infrastructure is given. Computational experiments are performed on synthetic tests which are the sets of programs with the same CFGs but with different sequences of offsets used when accessing the one-dimensional array of a given size. The correlation coefficient between performance metric and distance to the worst program’s embedding is measured and proved to be negative regardless of t-SNE initialization. This fact proves the heuristic criterion to be true. The process of such synthetic tests generation is also considered. Moreover, the variety of performance metric in programs set in such a test is proposed as a metric to be improved with exploration of more tests generators.

  2. Reshitko M.A., Usov A.B., Ougolnitsky G.A.
    Water consumption control model for regions with low water availability
    Computer Research and Modeling, 2023, v. 15, no. 5, pp. 1395-1410

    This paper considers the problem of water consumption in the regions of Russia with low water availability. We provide a review of the existing methods to control quality and quantity of water resources at different scales — from households to worldwide. The paper itself considers regions with low “water availability” parameter which is amount of water per person per year. Special attention is paid to the regions, where this parameter is low because of natural features of the region, not because of high population. In such regions many resources are spend on water processing infrastructure to store water and transport water from other regions. In such regions the main water consumers are industry and agriculture.

    We propose dynamic two-level hierarchical model which matches water consumption of a region with its gross regional product. On the top level there is a regional administration (supervisor) and on the lower level there are region enterprises (agents). The supervisor sets fees for water consumption. We study the model with Pontryagin’s maximum principle and provide agents’s optimal control in analytical form. For the supervisor’s control we provide numerical algorithm. The model has six free coefficients, which can be chosen so the model represents a particular region. We use data from Russia Federal State Statistics Service for identification process of a model. For numerical analysis we use trust region reflective algorithms. We provide calculations for a few regions with low water availability. It is shown that it is possible to reduce water consumption of a region more than by 20% while gross regional product drop is less than 10%.

  3. Govorkov D.A., Novikov V.P., Solovyev I.G., Tsibulsky V.R.
    Interval analysis of vegetation cover dynamics
    Computer Research and Modeling, 2020, v. 12, no. 5, pp. 1191-1205

    In the development of the previously obtained result on modeling the dynamics of vegetation cover, due to variations in the temperature background, a new scheme for the interval analysis of the dynamics of floristic images of formations is presented in the case when the parameter of the response rate of the model of the dynamics of each counting plant species is set by the interval of scatter of its possible values. The detailed description of the functional parameters of macromodels of biodiversity, desired in fundamental research, taking into account the essential reasons for the observed evolutionary processes, may turn out to be a problematic task. The use of more reliable interval estimates of the variability of functional parameters “bypasses” the problem of uncertainty in the primary assessment of the evolution of the phyto-resource potential of the developed controlled territories. The solutions obtained preserve not only a qualitative picture of the dynamics of species diversity, but also give a rigorous, within the framework of the initial assumptions, a quantitative assessment of the degree of presence of each plant species. The practical significance of two-sided estimation schemes based on the construction of equations for the upper and lower boundaries of the trajectories of the scatter of solutions depends on the conditions and measure of proportional correspondence of the intervals of scatter of the initial parameters with the intervals of scatter of solutions. For dynamic systems, the desired proportionality is not always ensured. The given examples demonstrate the acceptable accuracy of interval estimation of evolutionary processes. It is important to note that the constructions of the estimating equations generate vanishing intervals of scatter of solutions for quasi-constant temperature perturbations of the system. In other words, the trajectories of stationary temperature states of the vegetation cover are not roughened by the proposed interval estimation scheme. The rigor of the result of interval estimation of the species composition of the vegetation cover of formations can become a determining factor when choosing a method in the problems of analyzing the dynamics of species diversity and the plant potential of territorial systems of resource-ecological monitoring. The possibilities of the proposed approach are illustrated by geoinformation images of the computational analysis of the dynamics of the vegetation cover of the Yamal Peninsula and by the graphs of the retro-perspective analysis of the floristic variability of the formations of the landscapelithological group “Upper” based on the data of the summer temperature background of the Salehard weather station from 2010 to 1935. The developed indicators of floristic variability and the given graphs characterize the dynamics of species diversity, both on average and individually in the form of intervals of possible states for each species of plant.

  4. Drobotenko M.I., Nevecherya A.P.
    Forecasting the labor force dynamics in a multisectoral labor market
    Computer Research and Modeling, 2021, v. 13, no. 1, pp. 235-250

    The article considers the problem of forecasting the number of employed and unemployed persons in a multisectoral labor market using a balance mathematical model of labor force intersectoral dynamics.

    The balance mathematical model makes it possible to calculate the values of intersectoral dynamics indicators using only statistical data on sectoral employment and unemployment provided by the Federal State Statistics Service. Intersectoral dynamics indicators of labor force calculated for several years in a row are used to build trends for each of these indicators. The found trends are used to calculation of forecasted intersectoral dynamics indicators of labor force. The sectoral employment and unemployment of researched multisectoral labor market is forecasted based on values these forecasted indicators.

    The proposed approach was applied to forecast the employed persons in the economic sectors of the Russian Federation in 2011–2016. The following types of trends were used to describe changes of intersectoral dynamics indicators values: linear, non-linear, constant. The procedure for selecting trends is clearly demonstrated by the example of indicators that determine the labor force movements from the “Transport and communications” sector to the “Healthcare and social services” sector, as well as from the “Public administration and military security, social security” sector to the “Education” sector.

    Several approaches to forecasting was compared: a) naive forecast, within which the labor market indicators was forecasted only using a constant trend; b) forecasting based on a balance model using only a constant trend for all intersectoral dynamics indicators of labor force; c) forecasting directly by the number employed persons in economic sectors using the types of trends considered in the article; d) forecasting based on a balance model with the trends choice for each intersectoral dynamics indicators of labor force.

    The article shows that the use of a balance model provides a better forecast quality compared to forecasting directly by the number of employed persons. The use of trends in intersectoral dynamics indicators improves the quality of the forecast. The article also provides analysis examples of the multisectoral labor market in the Russian Federation. Using the balance model, the following information was obtained: the labor force flows distribution outgoing from concrete sectors by sectors of the economy; the sectoral structure of the labor force flows ingoing in concrete sectors. This information is not directly contained in the data provided by the Federal State Statistics Service.

  5. Abramov V.S., Petrov M.N.
    Application of the Dynamic Mode Decomposition in search of unstable modes in laminar-turbulent transition problem
    Computer Research and Modeling, 2023, v. 15, no. 4, pp. 1069-1090

    Laminar-turbulent transition is the subject of an active research related to improvement of economic efficiency of air vehicles, because in the turbulent boundary layer drag increases, which leads to higher fuel consumption. One of the directions of such research is the search for efficient methods, that can be used to find the position of the transition in space. Using this information about laminar-turbulent transition location when designing an aircraft, engineers can predict its performance and profitability at the initial stages of the project. Traditionally, $e^N$ method is applied to find the coordinates of a laminar-turbulent transition. It is a well known approach in industry. However, despite its widespread use, this method has a number of significant drawbacks, since it relies on parallel flow assumption, which limits the scenarios for its application, and also requires computationally expensive calculations in a wide range of frequencies and wave numbers. Alternatively, flow analysis can be done by using Dynamic Mode Decomposition, which allows one to analyze flow disturbances using flow data directly. Since Dynamic Mode Decomposition is a dimensionality reduction method, the number of computations can be dramatically reduced. Furthermore, usage of Dynamic Mode Decomposition expands the applicability of the whole method, due to the absence of assumptions about the parallel flow in its derivation.

    The presented study proposes an approach to finding the location of a laminar-turbulent transition using the Dynamic Mode Decomposition method. The essence of this approach is to divide the boundary layer region into sets of subregions, for each of which the transition point is independently calculated, using Dynamic Mode Decomposition for flow analysis, after which the results are averaged to produce the final result. This approach is validated by laminar-turbulent transition predictions of subsonic and supersonic flows over a 2D flat plate with zero pressure gradient. The results demonstrate the fundamental applicability and high accuracy of the described method in a wide range of conditions. The study focuses on comparison with the $e^N$ method and proves the advantages of the proposed approach. It is shown that usage of Dynamic Mode Decomposition leads to significantly faster execution due to less intensive computations, while the accuracy is comparable to the such of the solution obtained with the $e^N$ method. This indicates the prospects for using the described approach in a real world applications.

  6. Ilyasov D.V., Molchanov A.G., Glagolev M.V., Suvorov G.G., Sirin A.A.
    Modelling of carbon dioxide net ecosystem exchange of hayfield on drained peat soil: land use scenario analysis
    Computer Research and Modeling, 2020, v. 12, no. 6, pp. 1427-1449

    The data of episodic field measurements of carbon dioxide balance components (soil respiration — Rsoil, ecosystem respiration — Reco, net ecosystem exchange — NEE) of hayfields under use and abandoned one are interpreted by modelling. The field measurements were carried within five field campaigns in 2018 and 2019 on the drained part of the Dubna Peatland in Taldom District, Moscow Oblast, Russia. The territory is within humid continental climate zone. Peatland drainage was done out for milled peat extraction. After extraction was stopped, the residual peat deposit (1–1.5 m) was ploughed and grassed (Poa pratensis L.) for hay production. The current ground water level (GWL) varies from 0.3–0.5 m below the surface during wet and up to 1.0 m during dry periods. Daily dynamics of CO2 fluxes was measured using dynamic chamber method in 2018 (August) and 2019 (May, June, August) for abandoned ditch spacing only with sanitary mowing once in 5 years and the ditch spacing with annual mowing. NEE and Reco were measured on the sites with original vegetation, and Rsoil — after vegetation removal. To model a seasonal dynamics of NEE, the dependence of its components (Reco, Rsoil, and Gross ecosystematmosphere exchange of carbon dioxide — GEE) from soil and air temperature, GWL, photosynthetically active radiation, underground and aboveground plant biomass were used. The parametrization of the models has been carried out considering the stability of coefficients estimated by the bootstrap method. R2 (α = 0.05) between simulated and measured Reco was 0.44 (p < 0.0003) on abandoned and 0.59 (p < 0.04) on under use hayfield, and GEE was 0.57 (p < 0.0002) and 0.77 (p < 0.00001), respectively. Numerical experiments were carried out to assess the influence of different haymaking regime on NEE. It was found that NEE for the season (May 15 – September 30) did not differ much between the hayfield without mowing (4.5±1.0 tC·ha–1·season–1) and the abandoned one (6.2±1.4). Single mowing during the season leads to increase of NEE up to 6.5±0.9, and double mowing — up to 7.5±1.4 tC·ha–1·season–1. This means increase of carbon losses and CO2 emission into the atmosphere. Carbon loss on hayfield for both single and double mowing scenario was comparable with abandoned hayfield. The value of removed phytomass for single and double mowing was 0.8±0.1 tC·ha–1·season–1 and 1.4±0.1 (45% carbon content in dry phytomass) or 3.0 and 4.4 t·ha–1·season–1 of hay (17% moisture content). In comparison with the fallow, the removal of biomass of 0.8±0.1 at single and 1.4±0.1 tC·ha–1·season–1 double mowing is accompanied by an increase in carbon loss due to CO2 emissions, i.e., the growth of NEE by 0.3±0.1 and 1.3±0.6 tC·ha–1·season–1, respectively. This corresponds to the growth of NEE for each ton of withdrawn phytomass per hectare of 0.4±0.2 tС·ha–1·season–1 at single mowing, and 0.9±0.7 tС·ha–1·season–1 at double mowing. Therefore, single mowing is more justified in terms of carbon loss than double mowing. Extensive mowing does not increase CO2 emissions into the atmosphere and allows, in addition, to “replace” part of the carbon loss by agricultural production.

  7. Andreeva A.A., Anand M., Lobanov A.I., Nikolaev A.V., Panteleev M.A.
    Using extended ODE systems to investigate the mathematical model of the blood coagulation
    Computer Research and Modeling, 2022, v. 14, no. 4, pp. 931-951

    Many properties of ordinary differential equations systems solutions are determined by the properties of the equations in variations. An ODE system, which includes both the original nonlinear system and the equations in variations, will be called an extended system further. When studying the properties of the Cauchy problem for the systems of ordinary differential equations, the transition to extended systems allows one to study many subtle properties of solutions. For example, the transition to the extended system allows one to increase the order of approximation for numerical methods, gives the approaches to constructing a sensitivity function without using numerical differentiation procedures, allows to use methods of increased convergence order for the inverse problem solution. Authors used the Broyden method belonging to the class of quasi-Newtonian methods. The Rosenbroke method with complex coefficients was used to solve the stiff systems of the ordinary differential equations. In our case, it is equivalent to the second order approximation method for the extended system.

    As an example of the proposed approach, several related mathematical models of the blood coagulation process were considered. Based on the analysis of the numerical calculations results, the conclusion was drawn that it is necessary to include a description of the factor XI positive feedback loop in the model equations system. Estimates of some reaction constants based on the numerical inverse problem solution were given.

    Effect of factor V release on platelet activation was considered. The modification of the mathematical model allowed to achieve quantitative correspondence in the dynamics of the thrombin production with experimental data for an artificial system. Based on the sensitivity analysis, the hypothesis tested that there is no influence of the lipid membrane composition (the number of sites for various factors of the clotting system, except for thrombin sites) on the dynamics of the process.

  8. Moiseev N.A., Nazarova D.I., Semina N.S., Maksimov D.A.
    Changepoint detection on financial data using deep learning approach
    Computer Research and Modeling, 2024, v. 16, no. 2, pp. 555-575

    The purpose of this study is to develop a methodology for change points detection in time series, including financial data. The theoretical basis of the study is based on the pieces of research devoted to the analysis of structural changes in financial markets, description of the proposed algorithms for detecting change points and peculiarities of building classical and deep machine learning models for solving this type of problems. The development of such tools is of interest to investors and other stakeholders, providing them with additional approaches to the effective analysis of financial markets and interpretation of available data.

    To address the research objective, a neural network was trained. In the course of the study several ways of training sample formation were considered, differing in the nature of statistical parameters. In order to improve the quality of training and obtain more accurate results, a methodology for feature generation was developed for the formation of features that serve as input data for the neural network. These features, in turn, were derived from an analysis of mathematical expectations and standard deviations of time series data over specific intervals. The potential for combining these features to achieve more stable results is also under investigation.

    The results of model experiments were analyzed to compare the effectiveness of the proposed model with other existing changepoint detection algorithms that have gained widespread usage in practical applications. A specially generated dataset, developed using proprietary methods, was utilized as both training and testing data. Furthermore, the model, trained on various features, was tested on daily data from the S&P 500 index to assess its effectiveness in a real financial context.

    As the principles of the model’s operation are described, possibilities for its further improvement are considered, including the modernization of the proposed model’s structure, optimization of training data generation, and feature formation. Additionally, the authors are tasked with advancing existing concepts for real-time changepoint detection.

  9. Guskov V.P., Gushchanskiy D.E., Kulabukhova N.V., Abrahamyan S.A., Balyan S.G., Degtyarev A.B., Bogdanov A.V.
    An interactive tool for developing distributed telemedicine systems
    Computer Research and Modeling, 2015, v. 7, no. 3, pp. 521-527

    Getting a qualified medical examination can be difficult for people in remote areas because medical staff available can either be inaccessible or it might lack expert knowledge at proper level. Telemedicine technologies can help in such situations. On one hand, such technologies allow highly qualified doctors to consult remotely, thereby increasing the quality of diagnosis and plan treatment. On the other hand, computer-aided analysis of the research results, anamnesis and information on similar cases assist medical staff in their routine activities and decision-making.

    Creating telemedicine system for a particular domain is a laborious process. It’s not sufficient to pick proper medical experts and to fill the knowledge base of the analytical module. It’s also necessary to organize the entire infrastructure of the system to meet the requirements in terms of reliability, fault tolerance, protection of personal data and so on. Tools with reusable infrastructure elements, which are common to such systems, are able to decrease the amount of work needed for the development of telemedicine systems.

    An interactive tool for creating distributed telemedicine systems is described in the article. A list of requirements for the systems is presented; structural solutions for meeting the requirements are suggested. A composition of such elements applicable for distributed systems is described in the article. A cardiac telemedicine system is described as a foundation of the tool

    Views (last year): 3. Citations: 4 (RSCI).
  10. Kamenev G.K., Kamenev I.G.
    Multicriterial metric data analysis in human capital modelling
    Computer Research and Modeling, 2020, v. 12, no. 5, pp. 1223-1245

    The article describes a model of a human in the informational economy and demonstrates the multicriteria optimizational approach to the metric analysis of model-generated data. The traditional approach using the identification and study involves the model’s identification by time series and its further prediction. However, this is not possible when some variables are not explicitly observed and only some typical borders or population features are known, which is often the case in the social sciences, making some models pure theoretical. To avoid this problem, we propose a method of metric data analysis (MMDA) for identification and study of such models, based on the construction and analysis of the Kolmogorov – Shannon metric nets of the general population in a multidimensional space of social characteristics. Using this method, the coefficients of the model are identified and the features of its phase trajectories are studied. In this paper, we are describing human according to his role in information processing, considering his awareness and cognitive abilities. We construct two lifetime indices of human capital: creative individual (generalizing cognitive abilities) and productive (generalizing the amount of information mastered by a person) and formulate the problem of their multi-criteria (two-criteria) optimization taking into account life expectancy. This approach allows us to identify and economically justify the new requirements for the education system and the information environment of human existence. It is shown that the Pareto-frontier exists in the optimization problem, and its type depends on the mortality rates: at high life expectancy there is one dominant solution, while for lower life expectancy there are different types of Paretofrontier. In particular, the Pareto-principle applies to Russia: a significant increase in the creative human capital of an individual (summarizing his cognitive abilities) is possible due to a small decrease in the creative human capital (summarizing awareness). It is shown that the increase in life expectancy makes competence approach (focused on the development of cognitive abilities) being optimal, while for low life expectancy the knowledge approach is preferable.

Pages: « first previous next

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"