All issues
- 2026 Vol. 18
- 2025 Vol. 17
- 2024 Vol. 16
- 2023 Vol. 15
- 2022 Vol. 14
- 2021 Vol. 13
- 2020 Vol. 12
- 2019 Vol. 11
- 2018 Vol. 10
- 2017 Vol. 9
- 2016 Vol. 8
- 2015 Vol. 7
- 2014 Vol. 6
- 2013 Vol. 5
- 2012 Vol. 4
- 2011 Vol. 3
- 2010 Vol. 2
- 2009 Vol. 1
-
Quantum-inspired episode selection for Monte Carlo reinforcement learning via QUBO optimization
Computer Research and Modeling, 2026, v. 18, no. 2, pp. 273-288Monte Carlo (MC) reinforcement learning suffers from high sample complexity, especially in environments with sparse rewards, large state spaces, and strongly correlated trajectories that reduce the statistical efficiency of return estimation. These well-known limitations often lead to slow convergence and unstable learning dynamics, particularly in settings where only a small fraction of collected trajectories is actually informative for policy improvement. A key challenge is therefore to identify a compact yet diverse subset of episodes that contributes most to the accuracy of value estimates while preserving sufficient exploration of the environment. To address this challenge, we reformulate episode selection as a Quadratic Unconstrained Binary Optimization (QUBO) problem and solve it using quantum-inspired sampling techniques. Our method, MC+ QUBO, inserts a combinatorial filtering step into the standard MC policy-evaluation pipeline: given a batch of trajectories, it selects a subset that maximizes cumulative reward and encourages broad state-space coverage. This selection procedure is expressed as a QUBO model, where linear terms favor high-return episodes, quadratic terms penalize redundancy between trajectories, and additional coupling terms can be used to enforce coverage-related constraints or promote structural diversity. Within this framework, we investigate two black-box QUBO solvers: Simulated Quantum Annealing (SQA), which emulates tunneling-based exploration of the search landscape, and Simulated Bifurcation (SB), a dynamical-systems-based iterative optimization method. Both solvers demonstrate the ability to efficiently navigate the combinatorial structure of the trajectory-selection problem and to handle batch sizes that are otherwise computationally expensive for exhaustive or deterministic search. Experiments in a finite-horizon GridWorld environment show that MC+QUBO consistently outperforms vanilla MC in convergence speed, stability of return estimates, and final policy quality. These results highlight the promise of quantum-inspired optimization as a practical decision-making subroutine within reinforcement-learning algorithms, offering a scalable way to improve sample efficiency without modifying the underlying learning paradigm.
-
Augmented data routing algorithms for satellite delay-tolerant networks. Development and validation
Computer Research and Modeling, 2022, v. 14, no. 4, pp. 983-993The problem of centralized planning for data transmission routes in delay tolerant networks is considered. The original problem is extended with additional requirements to nodes storage and communication process. First, it is assumed that the connection between the nodes of the graph is established using antennas. Second, it is assumed that each node has a storage of finite capacity. The existing works do not consider these requirements. It is assumed that we have in advance information about messages to be processed, information about the network configuration at specified time points taken with a certain time periods, information on time delays for the orientation of the antennas for data transmission and restrictions on the amount of data storage on each satellite of the grouping. Two wellknown algorithms — CGR and Earliest Delivery with All Queues are improved to satisfy the extended requirements. The obtained algorithms solve the optimal message routing problem separately for each message. The problem of validation of the algorithms under conditions of lack of test data is considered as well. Possible approaches to the validation based on qualitative conjectures are proposed and tested, and experiment results are described. A performance comparison of the two implementations of the problem solving algorithms is made. Two algorithms named RDTNAS-CG and RDTNAS-AQ have been developed based on the CGR and Earliest Delivery with All Queues algorithms, respectively. The original algorithms have been significantly expanded and an augmented implementation has been developed. Validation experiments were carried to check the minimum «quality» requirements for the correctness of the algorithms. Comparative analysis of the performance of the two algorithms showed that the RDTNAS-AQ algorithm is several orders of magnitude faster than RDTNAS-CG.
Indexed in Scopus
Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU
The journal is included in the Russian Science Citation Index
The journal is included in the RSCI
International Interdisciplinary Conference "Mathematics. Computing. Education"




