Результаты поиска по 'reinforcement':
Найдено статей: 24
  1. Zhdanova O.L., Kolbina E.A., Frisman E.Y.
    Evolutionary effects of non-selective sustainable harvesting in a genetically heterogeneous population
    Computer Research and Modeling, 2025, v. 17, no. 4, pp. 717-735

    The problem of harvest optimization remains a central challenge in mathematical biology. The concept of Maximum Sustainable Yield (MSY), widely used in optimal exploitation theory, proposes maintaining target populations at levels ensuring maximum reproduction, theoretically balancing economic benefits with resource conservation. While MSYbased management promotes population stability and system resilience, it faces significant limitations due to complex intrapopulation structures and nonlinear dynamics in exploited species. Of particular concern are the evolutionary consequences of harvesting, as artificial selection may drive changes divergent from natural selection pressures. Empirical evidence confirms that selective harvesting alters behavioral traits, reduces offspring quality, and modifies population gene pools. In contrast, the genetic impacts of non-selective harvesting remain poorly understood and require further investigation.

    This study examines how non-selective harvesting with constant removal rates affects evolution in genetically heterogeneous populations. We model genetic diversity controlled by a single diallelic locus, where different genotypes dominate at high/low densities: r-strategists (high fecundity) versus K-strategists (resource-limited resilience). The classical ecological and genetic model with discrete time is considered. The model assumes that the fitness of each genotype linearly depends on the population size. By including the harvesting withdrawal coefficient, the model allows for linking the problem of optimizing harvest with the that of predicting genotype selection.

    Analytical results demonstrate that under MSY harvesting the equilibrium genetic composition remains unchanged while population size halves. The type of genetic equilibrium may shift, as optimal harvest rates differ between equilibria. Natural K-strategist dominance may reverse toward r-strategists, whose high reproduction compensates for harvest losses. Critical harvesting thresholds triggering strategy shifts were identified.

    These findings explain why exploited populations show slow recovery after harvesting cessation: exploitation reinforces adaptations beneficial under removal pressure but maladaptive in natural conditions. For instance, captive arctic foxes select for high-productivity genotypes, whereas wild populations favor lower-fecundity/higher-survival phenotypes. This underscores the necessity of incorporating genetic dynamics into sustainable harvesting management strategies, as MSY policies may inadvertently alter evolutionary trajectories through density-dependent selection processes. Recovery periods must account for genetic adaptation timescales in management frameworks.

  2. Chen J., Lobanov A.V., Rogozin A.V.
    Nonsmooth Distributed Min-Max Optimization Using the Smoothing Technique
    Computer Research and Modeling, 2023, v. 15, no. 2, pp. 469-480

    Distributed saddle point problems (SPPs) have numerous applications in optimization, matrix games and machine learning. For example, the training of generated adversarial networks is represented as a min-max optimization problem, and training regularized linear models can be reformulated as an SPP as well. This paper studies distributed nonsmooth SPPs with Lipschitz-continuous objective functions. The objective function is represented as a sum of several components that are distributed between groups of computational nodes. The nodes, or agents, exchange information through some communication network that may be centralized or decentralized. A centralized network has a universal information aggregator (a server, or master node) that directly communicates to each of the agents and therefore can coordinate the optimization process. In a decentralized network, all the nodes are equal, the server node is not present, and each agent only communicates to its immediate neighbors.

    We assume that each of the nodes locally holds its objective and can compute its value at given points, i. e. has access to zero-order oracle. Zero-order information is used when the gradient of the function is costly, not possible to compute or when the function is not differentiable. For example, in reinforcement learning one needs to generate a trajectory to evaluate the current policy. This policy evaluation process can be interpreted as the computation of the function value. We propose an approach that uses a smoothing technique, i. e., applies a first-order method to the smoothed version of the initial function. It can be shown that the stochastic gradient of the smoothed function can be viewed as a random two-point gradient approximation of the initial function. Smoothing approaches have been studied for distributed zero-order minimization, and our paper generalizes the smoothing technique on SPPs.

  3. Chuvilin K.V.
    The use of syntax trees in order to automate the correction of LaTeX documents
    Computer Research and Modeling, 2012, v. 4, no. 4, pp. 871-883

    The problem is to automate the correction of LaTeX documents. Each document is represented as a parse tree. The modified Zhang-Shasha algorithm is used to construct a mapping of tree vertices of the original document to the tree vertices of the edited document, which corresponds to the minimum editing distance. Vertex to vertex maps form the training set, which is used to generate rules for automatic correction. The statistics of the applicability to the edited documents is collected for each rule. It is used for quality assessment and improvement of the rules.

    Citations: 5 (RSCI).
  4. Vetrin R.L., Koberg K.
    Reinforcement learning in optimisation of financial market trading strategy parameters
    Computer Research and Modeling, 2024, v. 16, no. 7, pp. 1793-1812

    High frequency algorithmic trading became is a subclass of trading which is focused on gaining basis-point like profitability on sub-second time frames. Such trading strategies do not depend on most of the factors eligible for the longer-term trading and require specific approach. There were many attempts to utilize machine learning techniques to both high and low frequency trading. However, it is still having limited application in the real world trading due to high exposure to overfitting, requirements for rapid adaptation to new market regimes and overall instability of the results. We conducted a comprehensive research on combination of known quantitative theory and reinforcement learning methods in order derive more effective and robust approach at construction of automated trading system in an attempt to create a support for a known algorithmic trading techniques. Using classical price behavior theories as well as modern application cases in sub-millisecond trading, we utilized the Reinforcement Learning models in order to improve quality of the algorithms. As a result, we derived a robust model which utilize Deep Reinforcement learning in order to optimise static market making trading algorithms’ parameters capable of online learning on live data. More specifically, we explored the system in the derivatives cryptocurrency market which mostly not dependent on external factors in short terms. Our research was implemented in high-frequency environment and the final models showed capability to operate within accepted high-frequency trading time-frames. We compared various combinations of Deep Reinforcement Learning approaches and the classic algorithms and evaluated robustness and effectiveness of improvements for each combination.

Pages: « first previous

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"