Latest de news

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

By
Published On :: 2020

We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. Our main theoretical result provides an explicit bound on the sample or evaluation complexity: we show that these methods are guaranteed to converge to within any pre-specified tolerance of the optimal policy with a number of zero-order evaluations that is an explicit polynomial of the error tolerance, dimension, and curvature properties of the problem. Our analysis reveals some interesting differences between the settings of additive driving noise and random initialization, as well as the settings of one-point and two-point reward feedback. Our theory is corroborated by simulations of derivative-free methods in application to these systems. Along the way, we derive convergence rates for stochastic zero-order optimization algorithms when applied to a certain class of non-convex problems.

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

Distributed Feature Screening via Componentwise Debiasing

Lower Bounds for Testing Graphical Models: Colorings and Antiferromagnetic Ising Models

A New Class of Time Dependent Latent Factor Models with Applications

Tensor Train Decomposition on TensorFlow (T3F)

Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent

Noise Accumulation in High Dimensional Classification and Total Signal Index

Latent Simplex Position Model: High Dimensional Multi-view Clustering with Uncertainty Quantification

Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables

Switching Regression Models and Causal Inference in the Presence of Discrete Latent Variables

Skill Rating for Multiplayer Games. Introducing Hypernode Graphs and their Spectral Theory

High-Dimensional Inference for Cluster-Based Graphical Models

Fast Rates for General Unbounded Loss Functions: From ERM to Generalized Bayes

Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions

Smoothed Nonparametric Derivative Estimation using Weighted Difference Quotients

WONDER: Weighted One-shot Distributed Ridge Regression in High Dimensions

The weight function in the subtree kernel is decisive

Estimation of a Low-rank Topic-Based Model for Information Cascades

(1 + epsilon)-class Classification: an Anomaly Detection Method for Highly Imbalanced or Incomplete Data Sets

High-dimensional Gaussian graphical models on network-linked data

Identifiability of Additive Noise Models Using Conditional Variances

Access thousands of newspapers and magazines with PressReader

Reliability estimation in a multicomponent stress-strength model for Burr XII distribution under progressive censoring

A Bayesian sparse finite mixture model for clustering data from a heterogeneous population

Bayesian modeling and prior sensitivity analysis for zero–one augmented beta regression models with an application to psychometric data

Adaptive two-treatment three-period crossover design for normal responses

Recent developments in complex and spatially correlated functional data

A note on the “L-logistic regression models: Prior sensitivity analysis, robustness to outliers and applications”

On estimating the location parameter of the selected exponential population under the LINEX loss function

Application of weighted and unordered majorization orders in comparisons of parallel systems with exponentiated generalized gamma components

Multivariate normal approximation of the maximum likelihood estimator via the delta method

Robust Bayesian model selection for heavy-tailed linear regression using finite mixtures

A joint mean-correlation modeling approach for longitudinal zero-inflated count data

Simple step-stress models with a cure fraction

Bayesian approach for the zero-modified Poisson–Lindley regression model

Option pricing with bivariate risk-neutral density via copula and heteroscedastic model: A Bayesian approach

Bayesian modelling of the abilities in dichotomous IRT models via regression with missing values in the covariates

The limiting distribution of the Gibbs sampler for the intrinsic conditional autoregressive model

Keeping the balance—Bridge sampling for marginal likelihood estimation in finite mixture, mixture of experts and Markov mixture models

Spatiotemporal point processes: regression, model specifications and future directions

A note on monotonicity of spatial epidemic models

Estimation of parameters in the &#36;operatorname{DDRCINAR}(p)&#36; model

A Jackson network under general regime

Density for solutions to stochastic differential equations with unbounded drift

Spatially adaptive Bayesian image reconstruction through locally-modulated Markov random field models

L-Logistic regression models: Prior sensitivity analysis, robustness to outliers and applications

Influence measures for the Waring regression model

A temporal perspective on the rate of convergence in first-passage percolation under a moment condition

Hierarchical modelling of power law processes for the analysis of repairable systems with different truncation times: An empirical Bayes approach

The Finish Line: Design Features

The Finish Line: Building Walls in the Land Down Under

Meeting Codes with Wall Assemblies

VIDEO: The Great Heights of the Building Arts

Is Gen Z’s Interest in the Trades Just a Dream?

Only 12 per cent of leading charities publicly recognise a trade union, analysis suggests

Companies' 'Green' Efforts Include Products’ Material Content

World Wide Security Goes Green!

Incident involving highwall collapse spurs MSHA safety alert

Conagra Brands Announces Sustainable Development Award Winners

CBC Flooring's Indelval is environmentally friendly

Ultrabond ECO 885 Premium Grade Polyolefin Backed Carpet Adhesive

Tuftex, Anderson partner for Color Coordinates selling system

ARDEX unveils MC Moisture Control Systems

Cooperativa Ceramica d'Imola North America Debuts 5 New Programs

Subscribe To Our Newsletter

Estimation of parameters in the $operatorname{DDRCINAR}(p)$ model