Результаты поиска по 'gradient methods':
Найдено статей: 51
  1. Akindinov G.D., Matyukhin V.V., Krivorotko O.I.
    Numerical solving of an inverse problem of a hyperbolic heat equation with small parameter
    Computer Research and Modeling, 2023, v. 15, no. 2, pp. 245-258

    In this paper we describe an algorithm of numerical solving of an inverse problem on a hyperbolic heat equation with additional second time derivative with a small parameter. The problem in this case is finding an initial distribution with given final distribution. This algorithm allows finding a solution to the problem for any admissible given precision. Algorithm allows evading difficulties analogous to the case of heat equation with inverted time. Furthermore, it allows finding an optimal grid size by learning on a relatively big grid size and small amount of iterations of a gradient method and later extrapolates to the required grid size using Richardson’s method. This algorithm allows finding an adequate estimate of Lipschitz constant for the gradient of the target functional. Finally, this algorithm may easily be applied to the problems with similar structure, for example in solving equations for plasma, social processes and various biological problems. The theoretical novelty of the paper consists in the developing of an optimal procedure of finding of the required grid size using Richardson extrapolations for optimization problems with inexact gradient in ill-posed problems.

  2. Chernov I.A., Manicheva S.V.
    Adjoint grid parabolic quazilinear boundary-value problems
    Computer Research and Modeling, 2012, v. 4, no. 2, pp. 275-291

    In the paper we construct the adjoint problem for the explicit and implicit parabolic quazi-linear grid boundary-value problems with one spatial variable; the coefficients of the problems depend on the solution at the same time and earlier times. Dependence on the history of the solution is via the state vector; its evolution is described by the differential equation. Many models of diffusion mass transport are reduced to such boundary-value problems. Having solutions to the direct and adjoint problems, one can obtain the exact value of the gradient of a functional in the space of parameters the problem also depends on. We present solving algorithms, including the parallel one.

    Views (last year): 1.
  3. Demianov A.Y., Dinariev O.Y., Lisitsin D.A.
    Numerical simulation of frequency dependence of dielectric permittivity and electrical conductivity of saturated porous media
    Computer Research and Modeling, 2016, v. 8, no. 5, pp. 765-773

    This article represents numerical simulation technique for determining effective spectral electromagnetic properties (effective electrical conductivity and relative dielectric permittivity) of saturated porous media. Information about these properties is vastly applied during the interpretation of petrophysical exploration data of boreholes and studying of rock core samples. The main feature of the present paper consists in the fact, that it involves three-dimensional saturated digital rock models, which were constructed based on the combined data considering microscopic structure of the porous media and the information about capillary equilibrium of oil-water mixture in pores. Data considering microscopic structure of the model are obtained by means of X-ray microscopic tomography. Information about distributions of saturating fluids is based on hydrodynamic simulations with density functional technique. In order to determine electromagnetic properties of the numerical model time-domain Fourier transform of Maxwell equations is considered. In low frequency approximation the problem can be reduced to solving elliptic equation for the distribution of complex electric potential. Finite difference approximation is based on discretization of the model with homogeneous isotropic orthogonal grid. This discretization implies that each computational cell contains exclusively one medium: water, oil or rock. In order to obtain suitable numerical model the distributions of saturating components is segmented. Such kind of modification enables avoiding usage of heterogeneous grids and disregards influence on the results of simulations of the additional techniques, required in order to determine properties of cells, filled with mixture of media. Corresponding system of differential equations is solved by means of biconjugate gradient stabilized method with multigrid preconditioner. Based on the results of complex electric potential computations average values of electrical conductivity and relative dielectric permittivity is calculated. For the sake of simplicity, this paper considers exclusively simulations with no spectral dependence of conductivities and permittivities of model components. The results of numerical simulations of spectral dependence of effective characteristics of heterogeneously saturated porous media (electrical conductivity and relative dielectric permittivity) in broad range of frequencies and multiple water saturations are represented in figures and table. Efficiency of the presented approach for determining spectral electrical properties of saturated rocks is discussed in conclusion.

    Views (last year): 8.
  4. Gasnikov A.V., Kovalev D.A.
    A hypothesis about the rate of global convergence for optimal methods (Newton’s type) in smooth convex optimization
    Computer Research and Modeling, 2018, v. 10, no. 3, pp. 305-314

    In this paper we discuss lower bounds for convergence of convex optimization methods of high order and attainability of this bounds. We formulate a hypothesis that covers all the cases. It is noticeable that we provide this statement without a proof. Newton method is the most famous method that uses gradient and Hessian of optimized function. However, it converges locally even for strongly convex functions. Global convergence can be achieved with cubic regularization of Newton method [Nesterov, Polyak, 2006], whose iteration cost is comparable with iteration cost of Newton method and is equivalent to inversion of Hessian of optimized function. Yu.Nesterov proposed accelerated variant of Newton method with cubic regularization in 2008 [Nesterov, 2008]. R.Monteiro and B. Svaiter managed to improve global convergence of cubic regularized method in 2013 [Monteiro, Svaiter, 2013]. Y.Arjevani, O. Shamir and R. Shiff showed that convergence bound of Monteiro and Svaiter is optimal (cannot be improved by more than logarithmic factor with any second order method) in 2017 [Arjevani et al., 2017]. They also managed to find bounds for convex optimization methods of p-th order for $p ≥ 2$. However, they got bounds only for first and second order methods for strongly convex functions. In 2018 Yu.Nesterov proposed third order convex optimization methods with rate of convergence that is close to this lower bounds and with similar to Newton method cost of iteration [Nesterov, 2018]. Consequently, it was showed that high order methods can be practical. In this paper we formulate lower bounds for p-th order methods for $p ≥ 3$ for strongly convex unconstrained optimization problems. This paper can be viewed as a little survey of state of the art of high order optimization methods.

    Views (last year): 21. Citations: 1 (RSCI).
  5. Ivanova A.S., Omelchenko S.S., Kotliarova E.V., Matyukhin V.V.
    Calibration of model parameters for calculating correspondence matrix for Moscow
    Computer Research and Modeling, 2020, v. 12, no. 5, pp. 961-978

    In this paper, we consider the problem of restoring the correspondence matrix based on the observations of real correspondences in Moscow. Following the conventional approach [Gasnikov et al., 2013], the transport network is considered as a directed graph whose edges correspond to road sections and the graph vertices correspond to areas that the traffic participants leave or enter. The number of city residents is considered constant. The problem of restoring the correspondence matrix is to calculate all the correspondence from the $i$ area to the $j$ area.

    To restore the matrix, we propose to use one of the most popular methods of calculating the correspondence matrix in urban studies — the entropy model. In our work, which is based on the work [Wilson, 1978], we describe the evolutionary justification of the entropy model and the main idea of the transition to solving the problem of entropy-linear programming (ELP) in calculating the correspondence matrix. To solve the ELP problem, it is proposed to pass to the dual problem. In this paper, we describe several numerical optimization methods for solving this problem: the Sinkhorn method and the Accelerated Sinkhorn method. We provide numerical experiments for the following variants of cost functions: a linear cost function and a superposition of the power and logarithmic cost functions. In these functions, the cost is a combination of average time and distance between areas, which depends on the parameters. The correspondence matrix is calculated for multiple sets of parameters and then we calculate the quality of the restored matrix relative to the known correspondence matrix.

    We assume that the noise in the restored correspondence matrix is Gaussian, as a result, we use the standard deviation as a quality metric. The article provides an overview of gradient-free optimization methods for solving non-convex problems. Since the number of parameters of the cost function is small, we use the grid search method to find the optimal parameters of the cost function. Thus, the correspondence matrix calculated for each set of parameters and then the quality of the restored matrix is evaluated relative to the known correspondence matrix. Further, according to the minimum residual value for each cost function, we determine for which cost function and at what parameter values the restored matrix best describes real correspondence.

  6. Gladin E.L., Zainullina K.E.
    Ellipsoid method for convex stochastic optimization in small dimension
    Computer Research and Modeling, 2021, v. 13, no. 6, pp. 1137-1147

    The article considers minimization of the expectation of convex function. Problems of this type often arise in machine learning and a variety of other applications. In practice, stochastic gradient descent (SGD) and similar procedures are usually used to solve such problems. We propose to use the ellipsoid method with mini-batching, which converges linearly and can be more efficient than SGD for a class of problems. This is verified by our experiments, which are publicly available. The algorithm does not require neither smoothness nor strong convexity of the objective to achieve linear convergence. Thus, its complexity does not depend on the conditional number of the problem. We prove that the method arrives at an approximate solution with given probability when using mini-batches of size proportional to the desired accuracy to the power −2. This enables efficient parallel execution of the algorithm, whereas possibilities for batch parallelization of SGD are rather limited. Despite fast convergence, ellipsoid method can result in a greater total number of calls to oracle than SGD, which works decently with small batches. Complexity is quadratic in dimension of the problem, hence the method is suitable for relatively small dimensionalities.

  7. Bazarova A.I., Beznosikov A.N., Gasnikov A.V.
    Linearly convergent gradient-free methods for minimization of parabolic approximation
    Computer Research and Modeling, 2022, v. 14, no. 2, pp. 239-255

    Finding the global minimum of a nonconvex function is one of the key and most difficult problems of the modern optimization. In this paper we consider special classes of nonconvex problems which have a clear and distinct global minimum.

    In the first part of the paper we consider two classes of «good» nonconvex functions, which can be bounded below and above by a parabolic function. This class of problems has not been widely studied in the literature, although it is rather interesting from an applied point of view. Moreover, for such problems first-order and higher-order methods may be completely ineffective in finding a global minimum. This is due to the fact that the function may oscillate heavily or may be very noisy. Therefore, our new methods use only zero-order information and are based on grid search. The size and fineness of this grid, and hence the guarantee of convergence speed and oracle complexity, depend on the «goodness» of the problem. In particular, we show that if the function is bounded by fairly close parabolic functions, then the complexity is independent of the dimension of the problem. We show that our new methods converge with a linear convergence rate $\log(1/\varepsilon)$ to a global minimum on the cube.

    In the second part of the paper, we consider the nonconvex optimization problem from a different angle. We assume that the target minimizing function is the sum of the convex quadratic problem and a nonconvex «noise» function proportional to the distance to the global solution. Considering functions with such noise assumptions for zero-order methods is new in the literature. For such a problem, we use the classical gradient-free approach with gradient approximation through finite differences. We show how the convergence analysis for our problems can be reduced to the standard analysis for convex optimization problems. In particular, we achieve a linear convergence rate for such problems as well.

    Experimental results confirm the efficiency and practical applicability of all the obtained methods.

  8. Vostrikov D.D., Konin G.O., Lobanov A.V., Matyukhin V.V.
    Influence of the mantissa finiteness on the accuracy of gradient-free optimization methods
    Computer Research and Modeling, 2023, v. 15, no. 2, pp. 259-280

    Gradient-free optimization methods or zeroth-order methods are widely used in training neural networks, reinforcement learning, as well as in industrial tasks where only the values of a function at a point are available (working with non-analytical functions). In particular, the method of error back propagation in PyTorch works exactly on this principle. There is a well-known fact that computer calculations use heuristics of floating-point numbers, and because of this, the problem of finiteness of the mantissa arises.

    In this paper, firstly, we reviewed the most popular methods of gradient approximation: Finite forward/central difference (FFD/FCD), Forward/Central wise component (FWC/CWC), Forward/Central randomization on $l_2$ sphere (FSSG2/CFFG2); secondly, we described current theoretical representations of the noise introduced by the inaccuracy of calculating the function at a point: adversarial noise, random noise; thirdly, we conducted a series of experiments on frequently encountered classes of problems, such as quadratic problem, logistic regression, SVM, to try to determine whether the real nature of machine noise corresponds to the existing theory. It turned out that in reality (at least for those classes of problems that were considered in this paper), machine noise turned out to be something between adversarial noise and random, and therefore the current theory about the influence of the mantissa limb on the search for the optimum in gradient-free optimization problems requires some adjustment.

  9. Gladin E.L., Borodich E.D.
    Variance reduction for minimax problems with a small dimension of one of the variables
    Computer Research and Modeling, 2022, v. 14, no. 2, pp. 257-275

    The paper is devoted to convex-concave saddle point problems where the objective is a sum of a large number of functions. Such problems attract considerable attention of the mathematical community due to the variety of applications in machine learning, including adversarial learning, adversarial attacks and robust reinforcement learning, to name a few. The individual functions in the sum usually represent losses related to examples from a data set. Additionally, the formulation admits a possibly nonsmooth composite term. Such terms often reflect regularization in machine learning problems. We assume that the dimension of one of the variable groups is relatively small (about a hundred or less), and the other one is large. This case arises, for example, when one considers the dual formulation for a minimization problem with a moderate number of constraints. The proposed approach is based on using Vaidya’s cutting plane method to minimize with respect to the outer block of variables. This optimization algorithm is especially effective when the dimension of the problem is not very large. An inexact oracle for Vaidya’s method is calculated via an approximate solution of the inner maximization problem, which is solved by the accelerated variance reduced algorithm Katyusha. Thus, we leverage the structure of the problem to achieve fast convergence. Separate complexity bounds for gradients of different components with respect to different variables are obtained in the study. The proposed approach is imposing very mild assumptions about the objective. In particular, neither strong convexity nor smoothness is required with respect to the low-dimensional variable group. The number of steps of the proposed algorithm as well as the arithmetic complexity of each step explicitly depend on the dimensionality of the outer variable, hence the assumption that it is relatively small.

  10. Kutalev A.A., Lapina A.A.
    Modern ways to overcome neural networks catastrophic forgetting and empirical investigations on their structural issues
    Computer Research and Modeling, 2023, v. 15, no. 1, pp. 45-56

    This paper presents the results of experimental validation of some structural issues concerning the practical use of methods to overcome catastrophic forgetting of neural networks. A comparison of current effective methods like EWC (Elastic Weight Consolidation) and WVA (Weight Velocity Attenuation) is made and their advantages and disadvantages are considered. It is shown that EWC is better for tasks where full retention of learned skills is required on all the tasks in the training queue, while WVA is more suitable for sequential tasks with very limited computational resources, or when reuse of representations and acceleration of learning from task to task is required rather than exact retention of the skills. The attenuation of the WVA method must be applied to the optimization step, i. e. to the increments of neural network weights, rather than to the loss function gradient itself, and this is true for any gradient optimization method except the simplest stochastic gradient descent (SGD). The choice of the optimal weights attenuation function between the hyperbolic function and the exponent is considered. It is shown that hyperbolic attenuation is preferable because, despite comparable quality at optimal values of the hyperparameter of the WVA method, it is more robust to hyperparameter deviations from the optimal value (this hyperparameter in the WVA method provides a balance between preservation of old skills and learning a new skill). Empirical observations are presented that support the hypothesis that the optimal value of this hyperparameter does not depend on the number of tasks in the sequential learning queue. And, consequently, this hyperparameter can be picked up on a small number of tasks and used on longer sequences.

Pages: previous next last »

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"