More generally, if the objective function is not a quadratic function, then many optimization methods use other methods to ensure that some subsequence of iterations converges to an optimal solution. The first and still popular method for ensuring convergence relies on line searches , which optimize a function along one dimension. A second and increasingly popular method for ensuring convergence uses trust regions . Both line searches and trust regions are used in modern methods of non-differentiable optimization . Usually a global optimizer is much slower than advanced local optimizers (such as BFGS ), so often an efficient global optimizer can be constructed by starting the local optimizer from different starting points.
Previous rigorous approaches for this problem rely on dynamic programming (DP) and, while sample efficient, have running time quadratic in the sample size. As our main contribution, we provide new sample near-linear time algorithms for the problem that – while not being minimax optimal – achieve a significantly better sample-time tradeoff on large datasets compared to the DP approach. Our experimental evaluation shows that, compared with the DP approach, our algorithms provide a convergence rate that is only off by a factor of 2 to 4, while achieving speedups of three orders of magnitude.
We will start this section by running experiments for one optimization technique at a time for various problems (all will be cross-entropy error function minimization for different neural network architectures with the problem of MNIST classification). The goal of these experiments is to find the best parameters’ set for given optimization technique – so for each of the algorithms and problems we will search through the parameter space – using a “brute force” approach (sometimes using “recommended default” values of parameters). We choose two basic evaluation metrics: error function value itself and test accuracy after 42 epochs (20000 iterations with mini batch size of 128). You should probably not stick too much to the test accuracy evaluation as non overfitting preventing routines were applied and so the results might suffer from it. Please consider test set accuracy as a sanity check rather than a mature proper evaluation metric.