All issues
- 2024 Vol. 16
- 2023 Vol. 15
- 2022 Vol. 14
- 2021 Vol. 13
- 2020 Vol. 12
- 2019 Vol. 11
- 2018 Vol. 10
- 2017 Vol. 9
- 2016 Vol. 8
- 2015 Vol. 7
- 2014 Vol. 6
- 2013 Vol. 5
- 2012 Vol. 4
- 2011 Vol. 3
- 2010 Vol. 2
- 2009 Vol. 1
-
Survey of convex optimization of Markov decision processes
Computer Research and Modeling, 2023, v. 15, no. 2, pp. 329-353This article reviews both historical achievements and modern results in the field of Markov Decision Process (MDP) and convex optimization. This review is the first attempt to cover the field of reinforcement learning in Russian in the context of convex optimization. The fundamental Bellman equation and the criteria of optimality of policy — strategies based on it, which make decisions based on the known state of the environment at the moment, are considered. The main iterative algorithms of policy optimization based on the solution of the Bellman equations are also considered. An important section of this article was the consideration of an alternative to the $Q$-learning approach — the method of direct maximization of the agent’s average reward for the chosen strategy from interaction with the environment. Thus, the solution of this convex optimization problem can be represented as a linear programming problem. The paper demonstrates how the convex optimization apparatus is used to solve the problem of Reinforcement Learning (RL). In particular, it is shown how the concept of strong duality allows us to naturally modify the formulation of the RL problem, showing the equivalence between maximizing the agent’s reward and finding his optimal strategy. The paper also discusses the complexity of MDP optimization with respect to the number of state–action–reward triples obtained as a result of interaction with the environment. The optimal limits of the MDP solution complexity are presented in the case of an ergodic process with an infinite horizon, as well as in the case of a non-stationary process with a finite horizon, which can be restarted several times in a row or immediately run in parallel in several threads. The review also reviews the latest results on reducing the gap between the lower and upper estimates of the complexity of MDP optimization with average remuneration (Averaged MDP, AMDP). In conclusion, the real-valued parametrization of agent policy and a class of gradient optimization methods through maximizing the $Q$-function of value are considered. In particular, a special class of MDPs with restrictions on the value of policy (Constrained Markov Decision Process, CMDP) is presented, for which a general direct-dual approach to optimization with strong duality is proposed.
-
Proof of the connection between the Backman model with degenerate cost functions and the model of stable dynamics
Computer Research and Modeling, 2022, v. 14, no. 2, pp. 335-342Since 1950s the field of city transport modelling has progressed rapidly. The first equilibrium distribution models of traffic flow appeared. The most popular model (which is still being widely used) was the Beckmann model, based on the two Wardrop principles. The core of the model could be briefly described as the search for the Nash equilibrium in a population demand game, in which losses of agents (drivers) are calculated based on the chosen path and demands of this path with correspondences being fixed. The demands (costs) of a path are calculated as the sum of the demands of different path segments (graph edges), that are included in the path. The costs of an edge (edge travel time) are determined by the amount of traffic on this edge (more traffic means larger travel time). The flow on a graph edge is determined by the sum of flows over all paths passing through the given edge. Thus, the cost of traveling along a path is determined not only by the choice of the path, but also by the paths other drivers have chosen. Thus, it is a standard game theory task. The way cost functions are constructed allows us to narrow the search for equilibrium to solving an optimization problem (game is potential in this case). If the cost functions are monotone and non-decreasing, the optimization problem is convex. Actually, different assumptions about the cost functions form different models. The most popular model is based on the BPR cost function. Such functions are massively used in calculations of real cities. However, in the beginning of the XXI century, Yu. E. Nesterov and A. de Palma showed that Beckmann-type models have serious weak points. Those could be fixed using the stable dynamics model, as it was called by the authors. The search for equilibrium here could be also reduced to an optimization problem, moreover, the problem of linear programming. In 2013, A.V.Gasnikov discovered that the stable dynamics model can be obtained by a passage to the limit in the Beckmann model. However, it was made only for several practically important, but still special cases. Generally, the question if this passage to the limit is possible remains open. In this paper, we provide the justification of the possibility of the above-mentioned passage to the limit in the general case, when the cost function for traveling along the edge as a function of the flow along the edge degenerates into a function equal to fixed costs until the capacity is reached and it is equal to plus infinity when the capacity is exceeded.
-
Linear and nonlinear optimization models of multiple covering of a bounded plane domain with circles
Computer Research and Modeling, 2019, v. 11, no. 6, pp. 1101-1110Problems of multiple covering ($k$-covering) of a bounded set $G$ with equal circles of a given radius are well known. They are thoroughly studied under the assumption that $G$ is a finite set. There are several papers concerned with studying this problem in the case where $G$ is a connected set. In this paper, we study the problem of minimizing the number of circles that form a $k$-covering, $k \geqslant 1$, provided that $G$ is a bounded convex plane domain.
For the above-mentioned problem, we state a 0-1 linear model, a general integer linear model, and a nonlinear model, imposing a constraint on the minimum distance between the centers of covering circles. The latter constraint is due to the fact that in practice one can place at most one device at each point. We establish necessary and sufficient solvability conditions for the linear models and describe one (easily realizable) variant of these conditions in the case where the covered set $G$ is a rectangle.
We propose some methods for finding an approximate number of circles of a given radius that provide the desired $k$-covering of the set $G$, both with and without constraints on distances between the circles’ centers. We treat the calculated values as approximate upper bounds for the number of circles. We also propose a technique that allows one to get approximate lower bounds for the number of circles that is necessary for providing a $k$-covering of the set $G$. In the general linear model, as distinct from the 0-1 linear model, we require no additional constraint. The difference between the upper and lower bounds for the number of circles characterizes the quality (acceptability) of the constructed $k$-covering.
We state a nonlinear mathematical model for the $k$-covering problem with the above-mentioned constraints imposed on distances between the centers of covering circles. For this model, we propose an algorithm which (in certain cases) allows one to find more exact solutions to covering problems than those calculated from linear models.
For implementing the proposed approach, we have developed computer programs and performed numerical experiments. Results of numerical experiments demonstrate the effectiveness of the method.
-
On the using the differential schemes to transport equation with drain in grid modeling
Computer Research and Modeling, 2020, v. 12, no. 5, pp. 1149-1164Modern power transportation systems are the complex engineering systems. Such systems include both point facilities (power producers, consumers, transformer substations, etc.) and the distributed elements (f.e. power lines). Such structures are presented in the form of the graphs with different types of nodes under creating the mathematical models. It is necessary to solve the system of partial differential equations of the hyperbolic type to study the dynamic effects in such systems.
An approach similar to one already applied in modeling similar problems earlier used in the work. New variant of the splitting method was used proposed by the authors. Unlike most known works, the splitting is not carried out according to physical processes (energy transport without dissipation, separately dissipative processes). We used splitting to the transport equations with the drain and the exchange between Reimann’s invariants. This splitting makes possible to construct the hybrid schemes for Riemann invariants with a high order of approximation and minimal dissipation error. An example of constructing such a hybrid differential scheme is described for a single-phase power line. The difference scheme proposed is based on the analysis of the properties of the schemes in the space of insufficient coefficients.
Examples of the model problem numerical solutions using the proposed splitting and the difference scheme are given. The results of the numerical calculations shows that the difference scheme allows to reproduce the arising regions of large gradients. It is shown that the difference schemes also allow detecting resonances in such the systems.
Indexed in Scopus
Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU
The journal is included in the Russian Science Citation Index
The journal is included in the RSCI
International Interdisciplinary Conference "Mathematics. Computing. Education"