ea

Clean sweep: Oregon's Sabrina Ionescu is unanimous Player of the Year after winning Wooden Award

Sabrina Ionescu wins the Wooden Award for the second year in a row, becoming the fifth in the trophy's history to win in back-to-back seasons. With the honor, she completes a complete sweep of the national postseason player of the year awards. As a senior, Ionescu matched her own single-season mark with eight triple-doubles in 2019-20, and she was incredibly efficient from the field with a career-best 51.8 field goal percentage.




ea

Oregon's Sabrina Ionescu, Ruthy Hebard, Satou Sabally share meaning of Naismith Starting 5 honor

Pac-12 Networks' Ashley Adamson speaks with Oregon stars Sabrina Ionescu, Ruthy Hebard and Satou Sabally to hear how special their recent Naismith Starting 5 honor was, as the Ducks comprise three of the nation's top five players. Ionescu (point guard), Sabally (small forward) and Hebard (power forward) led the Ducks to a 31-2 record in the 2019-20 season before it was cut short.




ea

Sabrina Ionescu, Ruthy Hebard, Satou Sabally on staying connected, WNBA Draft, Oregon's historic season

Pac-12 Networks' Ashley Adamson catches up with Oregon's "Big 3" of Sabrina Ionescu, Ruthy Hebard and Satou Sabally to hear how they're adjusting to the new world without sports while still preparing for the WNBA Draft on April 17. They also share how they're staying hungry for basketball during the hiatus.




ea

Aari McDonald on returning for her senior year at Arizona: 'We're ready to set the bar higher'

Arizona's Aari McDonald and Pac-12 Networks' Ashley Adamson discuss the guard's decision to return for her senior season in Tucson and how she now has the opportunity to be the face of the league. McDonald, the Pac-12 Defensive Player of the Year, was one of the nation's top scorers in 2019-20, averaging 20.6 points per game.




ea

WNBA Draft Profile: Do-it-all OSU talent Mikayla Pivec has her sights set on a pro breakout

Oregon State guard Mikayla Pivec is the epitome of a versatile player. Her 1,030 career rebounds were the most in school history, and she finished just one assist shy of becoming the first in OSU history to tally 1,500 points, 1,000 rebounds and 500 assists. She'll head to the WNBA looking to showcase her talents at the next level following the 2020 WNBA Draft.




ea

WNBA Draft Profile: UCLA guard Japreece Dean ready to lead at the next level

UCLA guard Japreece Dean is primed to shine at the next level as she heads to the WNBA Draft in April. The do-it-all point-woman was an All-Pac-12 honoree last season, and one of only seven D-1 hoopers with at least 13 points and 5.5 assists per game.




ea

Ruthy Hebard, Sabrina Ionescu 'represent everything that is great about basketball'

Ruthy Hebard and Sabrina Ionescu have had a remarkable four years together in Eugene, rewriting the history books and pushing the Ducks into the national spotlight. Catch the debut of "Our Stories Unfinished Business: Sabrina Ionescu and Ruthy Hebard" at Wednesday, April 15 at 7 p.m. PT/ 8 p.m. MT on Pac-12 Network.




ea

Dr. Michelle Tom shares journey from ASU women's hoops to treating COVID-19 patients

Pac-12 Networks' Ashley Adamson speaks with former Arizona State women's basketball player Michelle Tom, who is now a doctor treating COVID-19 patients Winslow Indian Health Care Center and Little Colorado Medical Center in Eastern Arizona.




ea

Baylor women sign transfer point guard for 3rd year in row

Baylor has signed a transfer point guard for the third year in a row, and this one can play multiple seasons with the Lady Bears. Jaden Owens is transferring from UCLA after signing a national letter of intent with Baylor, which had graduate transfers at point guard each of the past two seasons. The Texas native just completed her freshman season with the Bruins and has three seasons of eligibility remaining.




ea

'A pioneer, a trailblazer' - Reaction to McGraw's retirement

Notre Dame coach Muffet McGraw retired after 33 seasons Wednesday. What she did for me in those four years, I came in as a girl and left as a woman.'' - WNBA player Kayla McBride, who played for Notre Dame from 2010-14.




ea

Oregon State's Aleah Goodman, Maddie Washington reflect on earning 2020 Pac-12 Sportsmanship Award

The Pac-12 Student-Athlete Advisory Committee voted to award the Oregon State women’s basketball team with the Pac-12 Sportsmanship Award for the 2019-20 season, honoring their character and sportsmanship before a rivalry game against Oregon in Jan. 2020 -- the day Kobe Bryant, his daughter, Gigi, and seven others passed away in a helicopter crash in Southern California. In the above video, Aleah Goodman and Madison Washington share how the teams came together as one in a circle of prayer before the game.




ea

Natalie Chou breaks through stereotypes, inspires young Asian American girls on 'Our Stories' quick look

Watch the debut of "Our Stories - Natalie Chou" on Sunday, May 10 at 12:30 p.m. PT/ 1:30 p.m. MT on Pac-12 Network.




ea

Stanford's Tara VanDerveer on Haley Jones' versatile freshman year: 'It was really incredible'

During Friday's "Pac-12 Perspective," Stanford head coach Tara VanDerveer spoke about Haley Jones' positionless game and how the Cardinal used the dynamic freshman in 2019-20. Download and listen wherever you get your podcasts.




ea

Pac-12 women's basketball student-athletes reflect on the influence of their moms ahead of Mother's Day

Pac-12 student-athletes give shout-outs to their moms ahead of Mother's Day on May 10th, 2020 including UCLA's Michaela Onyenwere, Oregon's Sabrina Ionescu and Satou Sabally, Arizona's Aari McDonald, Cate Reese, and Lacie Hull, Stanford's Kiana Williams, USC's Endyia Rogers, and Aliyah Jeune, and Utah's Brynna Maxwell.




ea

Drift estimation for stochastic reaction-diffusion systems

Gregor Pasemann, Wilhelm Stannat.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 547--579.

Abstract:
A parameter estimation problem for a class of semilinear stochastic evolution equations is considered. Conditions for consistency and asymptotic normality are given in terms of growth and continuity properties of the nonlinear part. Emphasis is put on the case of stochastic reaction-diffusion systems. Robustness results for statistical inference under model uncertainty are provided.




ea

Gaussian field on the symmetric group: Prediction and learning

François Bachoc, Baptiste Broto, Fabrice Gamboa, Jean-Michel Loubes.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 503--546.

Abstract:
In the framework of the supervised learning of a real function defined on an abstract space $mathcal{X}$, Gaussian processes are widely used. The Euclidean case for $mathcal{X}$ is well known and has been widely studied. In this paper, we explore the less classical case where $mathcal{X}$ is the non commutative finite group of permutations (namely the so-called symmetric group $S_{N}$). We provide an application to Gaussian process based optimization of Latin Hypercube Designs. We also extend our results to the case of partial rankings.




ea

Univariate mean change point detection: Penalization, CUSUM and optimality

Daren Wang, Yi Yu, Alessandro Rinaldo.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1917--1961.

Abstract:
The problem of univariate mean change point detection and localization based on a sequence of $n$ independent observations with piecewise constant means has been intensively studied for more than half century, and serves as a blueprint for change point problems in more complex settings. We provide a complete characterization of this classical problem in a general framework in which the upper bound $sigma ^{2}$ on the noise variance, the minimal spacing $Delta $ between two consecutive change points and the minimal magnitude $kappa $ of the changes, are allowed to vary with $n$. We first show that consistent localization of the change points is impossible in the low signal-to-noise ratio regime $frac{kappa sqrt{Delta }}{sigma }preceq sqrt{log (n)}$. In contrast, when $frac{kappa sqrt{Delta }}{sigma }$ diverges with $n$ at the rate of at least $sqrt{log (n)}$, we demonstrate that two computationally-efficient change point estimators, one based on the solution to an $ell _{0}$-penalized least squares problem and the other on the popular wild binary segmentation algorithm, are both consistent and achieve a localization rate of the order $frac{sigma ^{2}}{kappa ^{2}}log (n)$. We further show that such rate is minimax optimal, up to a $log (n)$ term.




ea

Bayesian variance estimation in the Gaussian sequence model with partial information on the means

Gianluca Finocchio, Johannes Schmidt-Hieber.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 239--271.

Abstract:
Consider the Gaussian sequence model under the additional assumption that a fixed fraction of the means is known. We study the problem of variance estimation from a frequentist Bayesian perspective. The maximum likelihood estimator (MLE) for $sigma^{2}$ is biased and inconsistent. This raises the question whether the posterior is able to correct the MLE in this case. By developing a new proving strategy that uses refined properties of the posterior distribution, we find that the marginal posterior is inconsistent for any i.i.d. prior on the mean parameters. In particular, no assumption on the decay of the prior needs to be imposed. Surprisingly, we also find that consistency can be retained for a hierarchical prior based on Gaussian mixtures. In this case we also establish a limiting shape result and determine the limit distribution. In contrast to the classical Bernstein-von Mises theorem, the limit is non-Gaussian. We show that the Bayesian analysis leads to new statistical estimators outperforming the correctly calibrated MLE in a numerical simulation study.




ea

Estimation of linear projections of non-sparse coefficients in high-dimensional regression

David Azriel, Armin Schwartzman.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 174--206.

Abstract:
In this work we study estimation of signals when the number of parameters is much larger than the number of observations. A large body of literature assumes for these kind of problems a sparse structure where most of the parameters are zero or close to zero. When this assumption does not hold, one can focus on low-dimensional functions of the parameter vector. In this work we study one-dimensional linear projections. Specifically, in the context of high-dimensional linear regression, the parameter of interest is ${oldsymbol{eta}}$ and we study estimation of $mathbf{a}^{T}{oldsymbol{eta}}$. We show that $mathbf{a}^{T}hat{oldsymbol{eta}}$, where $hat{oldsymbol{eta}}$ is the least squares estimator, using pseudo-inverse when $p>n$, is minimax and admissible. Thus, for linear projections no regularization or shrinkage is needed. This estimator is easy to analyze and confidence intervals can be constructed. We study a high-dimensional dataset from brain imaging where it is shown that the signal is weak, non-sparse and significantly different from zero.




ea

Monotone least squares and isotonic quantiles

Alexandre Mösching, Lutz Dümbgen.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 24--49.

Abstract:
We consider bivariate observations $(X_{1},Y_{1}),ldots,(X_{n},Y_{n})$ such that, conditional on the $X_{i}$, the $Y_{i}$ are independent random variables. Precisely, the conditional distribution function of $Y_{i}$ equals $F_{X_{i}}$, where $(F_{x})_{x}$ is an unknown family of distribution functions. Under the sole assumption that $xmapsto F_{x}$ is isotonic with respect to stochastic order, one can estimate $(F_{x})_{x}$ in two ways: (i) For any fixed $y$ one estimates the antitonic function $xmapsto F_{x}(y)$ via nonparametric monotone least squares, replacing the responses $Y_{i}$ with the indicators $1_{[Y_{i}le y]}$. (ii) For any fixed $eta in (0,1)$ one estimates the isotonic quantile function $xmapsto F_{x}^{-1}(eta)$ via a nonparametric version of regression quantiles. We show that these two approaches are closely related, with (i) being more flexible than (ii). Then, under mild regularity conditions, we establish rates of convergence for the resulting estimators $hat{F}_{x}(y)$ and $hat{F}_{x}^{-1}(eta)$, uniformly over $(x,y)$ and $(x,eta)$ in certain rectangles as well as uniformly in $y$ or $eta$ for a fixed $x$.




ea

Beta-Binomial stick-breaking non-parametric prior

María F. Gil–Leyva, Ramsés H. Mena, Theodoros Nicoleris.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1479--1507.

Abstract:
A new class of nonparametric prior distributions, termed Beta-Binomial stick-breaking process, is proposed. By allowing the underlying length random variables to be dependent through a Beta marginals Markov chain, an appealing discrete random probability measure arises. The chain’s dependence parameter controls the ordering of the stick-breaking weights, and thus tunes the model’s label-switching ability. Also, by tuning this parameter, the resulting class contains the Dirichlet process and the Geometric process priors as particular cases, which is of interest for MCMC implementations. Some properties of the model are discussed and a density estimation algorithm is proposed and tested with simulated datasets.




ea

A Bayesian approach to disease clustering using restricted Chinese restaurant processes

Claudia Wehrhahn, Samuel Leonard, Abel Rodriguez, Tatiana Xifara.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1449--1478.

Abstract:
Identifying disease clusters (areas with an unusually high incidence of a particular disease) is a common problem in epidemiology and public health. We describe a Bayesian nonparametric mixture model for disease clustering that constrains clusters to be made of adjacent areal units. This is achieved by modifying the exchangeable partition probability function associated with the Ewen’s sampling distribution. We call the resulting prior the Restricted Chinese Restaurant Process, as the associated full conditional distributions resemble those associated with the standard Chinese Restaurant Process. The model is illustrated using synthetic data sets and in an application to oral cancer mortality in Germany.




ea

A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables

Ryoya Oda, Hirokazu Yanagihara.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1386--1412.

Abstract:
We put forward a variable selection method for selecting explanatory variables in a normality-assumed multivariate linear regression. It is cumbersome to calculate variable selection criteria for all subsets of explanatory variables when the number of explanatory variables is large. Therefore, we propose a fast and consistent variable selection method based on a generalized $C_{p}$ criterion. The consistency of the method is provided by a high-dimensional asymptotic framework such that the sample size and the sum of the dimensions of response vectors and explanatory vectors divided by the sample size tend to infinity and some positive constant which are less than one, respectively. Through numerical simulations, it is shown that the proposed method has a high probability of selecting the true subset of explanatory variables and is fast under a moderate sample size even when the number of dimensions is large.




ea

$k$-means clustering of extremes

Anja Janßen, Phyllis Wan.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1211--1233.

Abstract:
The $k$-means clustering algorithm and its variant, the spherical $k$-means clustering, are among the most important and popular methods in unsupervised learning and pattern detection. In this paper, we explore how the spherical $k$-means algorithm can be applied in the analysis of only the extremal observations from a data set. By making use of multivariate extreme value analysis we show how it can be adopted to find “prototypes” of extremal dependence and derive a consistency result for our suggested estimator. In the special case of max-linear models we show furthermore that our procedure provides an alternative way of statistical inference for this class of models. Finally, we provide data examples which show that our method is able to find relevant patterns in extremal observations and allows us to classify extremal events.




ea

Conditional density estimation with covariate measurement error

Xianzheng Huang, Haiming Zhou.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 970--1023.

Abstract:
We consider estimating the density of a response conditioning on an error-prone covariate. Motivated by two existing kernel density estimators in the absence of covariate measurement error, we propose a method to correct the existing estimators for measurement error. Asymptotic properties of the resultant estimators under different types of measurement error distributions are derived. Moreover, we adjust bandwidths readily available from existing bandwidth selection methods developed for error-free data to obtain bandwidths for the new estimators. Extensive simulation studies are carried out to compare the proposed estimators with naive estimators that ignore measurement error, which also provide empirical evidence for the effectiveness of the proposed bandwidth selection methods. A real-life data example is used to illustrate implementation of these methods under practical scenarios. An R package, lpme, is developed for implementing all considered methods, which we demonstrate via an R code example in Appendix B.2.




ea

Estimation of a semiparametric transformation model: A novel approach based on least squares minimization

Benjamin Colling, Ingrid Van Keilegom.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 769--800.

Abstract:
Consider the following semiparametric transformation model $Lambda_{ heta }(Y)=m(X)+varepsilon $, where $X$ is a $d$-dimensional covariate, $Y$ is a univariate response variable and $varepsilon $ is an error term with zero mean and independent of $X$. We assume that $m$ is an unknown regression function and that ${Lambda _{ heta }: heta inTheta }$ is a parametric family of strictly increasing functions. Our goal is to develop two new estimators of the transformation parameter $ heta $. The main idea of these two estimators is to minimize, with respect to $ heta $, the $L_{2}$-distance between the transformation $Lambda _{ heta }$ and one of its fully nonparametric estimators. We consider in particular the nonparametric estimator based on the least-absolute deviation loss constructed in Colling and Van Keilegom (2019). We establish the consistency and the asymptotic normality of the two proposed estimators of $ heta $. We also carry out a simulation study to illustrate and compare the performance of our new parametric estimators to that of the profile likelihood estimator constructed in Linton et al. (2008).




ea

A Statistical Learning Approach to Modal Regression

This paper studies the nonparametric modal regression problem systematically from a statistical learning viewpoint. Originally motivated by pursuing a theoretical understanding of the maximum correntropy criterion based regression (MCCR), our study reveals that MCCR with a tending-to-zero scale parameter is essentially modal regression. We show that the nonparametric modal regression problem can be approached via the classical empirical risk minimization. Some efforts are then made to develop a framework for analyzing and implementing modal regression. For instance, the modal regression function is described, the modal regression risk is defined explicitly and its Bayes rule is characterized; for the sake of computational tractability, the surrogate modal regression risk, which is termed as the generalization risk in our study, is introduced. On the theoretical side, the excess modal regression risk, the excess generalization risk, the function estimation error, and the relations among the above three quantities are studied rigorously. It turns out that under mild conditions, function estimation consistency and convergence may be pursued in modal regression as in vanilla regression protocols such as mean regression, median regression, and quantile regression. On the practical side, the implementation issues of modal regression including the computational algorithm and the selection of the tuning parameters are discussed. Numerical validations on modal regression are also conducted to verify our findings.




ea

Neyman-Pearson classification: parametrics and sample size requirement

The Neyman-Pearson (NP) paradigm in binary classification seeks classifiers that achieve a minimal type II error while enforcing the prioritized type I error controlled under some user-specified level $alpha$. This paradigm serves naturally in applications such as severe disease diagnosis and spam detection, where people have clear priorities among the two error types. Recently, Tong, Feng, and Li (2018) proposed a nonparametric umbrella algorithm that adapts all scoring-type classification methods (e.g., logistic regression, support vector machines, random forest) to respect the given type I error (i.e., conditional probability of classifying a class $0$ observation as class $1$ under the 0-1 coding) upper bound $alpha$ with high probability, without specific distributional assumptions on the features and the responses. Universal the umbrella algorithm is, it demands an explicit minimum sample size requirement on class $0$, which is often the more scarce class, such as in rare disease diagnosis applications. In this work, we employ the parametric linear discriminant analysis (LDA) model and propose a new parametric thresholding algorithm, which does not need the minimum sample size requirements on class $0$ observations and thus is suitable for small sample applications such as rare disease diagnosis. Leveraging both the existing nonparametric and the newly proposed parametric thresholding rules, we propose four LDA-based NP classifiers, for both low- and high-dimensional settings. On the theoretical front, we prove NP oracle inequalities for one proposed classifier, where the rate for excess type II error benefits from the explicit parametric model assumption. Furthermore, as NP classifiers involve a sample splitting step of class $0$ observations, we construct a new adaptive sample splitting scheme that can be applied universally to NP classifiers, and this adaptive strategy reduces the type II error of these classifiers. The proposed NP classifiers are implemented in the R package nproc.




ea

Perturbation Bounds for Procrustes, Classical Scaling, and Trilateration, with Applications to Manifold Learning

One of the common tasks in unsupervised learning is dimensionality reduction, where the goal is to find meaningful low-dimensional structures hidden in high-dimensional data. Sometimes referred to as manifold learning, this problem is closely related to the problem of localization, which aims at embedding a weighted graph into a low-dimensional Euclidean space. Several methods have been proposed for localization, and also manifold learning. Nonetheless, the robustness property of most of them is little understood. In this paper, we obtain perturbation bounds for classical scaling and trilateration, which are then applied to derive performance bounds for Isomap, Landmark Isomap, and Maximum Variance Unfolding. A new perturbation bound for procrustes analysis plays a key role.




ea

Practical Locally Private Heavy Hitters

We present new practical local differentially private heavy hitters algorithms achieving optimal or near-optimal worst-case error and running time -- TreeHist and Bitstogram. In both algorithms, server running time is $ ilde O(n)$ and user running time is $ ilde O(1)$, hence improving on the prior state-of-the-art result of Bassily and Smith [STOC 2015] requiring $O(n^{5/2})$ server time and $O(n^{3/2})$ user time. With a typically large number of participants in local algorithms (in the millions), this reduction in time complexity, in particular at the user side, is crucial for making locally private heavy hitters algorithms usable in practice. We implemented Algorithm TreeHist to verify our theoretical analysis and compared its performance with the performance of Google's RAPPOR code.




ea

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. Our main theoretical result provides an explicit bound on the sample or evaluation complexity: we show that these methods are guaranteed to converge to within any pre-specified tolerance of the optimal policy with a number of zero-order evaluations that is an explicit polynomial of the error tolerance, dimension, and curvature properties of the problem. Our analysis reveals some interesting differences between the settings of additive driving noise and random initialization, as well as the settings of one-point and two-point reward feedback. Our theory is corroborated by simulations of derivative-free methods in application to these systems. Along the way, we derive convergence rates for stochastic zero-order optimization algorithms when applied to a certain class of non-convex problems.




ea

A Unified Framework for Structured Graph Learning via Spectral Constraints

Graph learning from data is a canonical problem that has received substantial attention in the literature. Learning a structured graph is essential for interpretability and identification of the relationships among data. In general, learning a graph with a specific structure is an NP-hard combinatorial problem and thus designing a general tractable algorithm is challenging. Some useful structured graphs include connected, sparse, multi-component, bipartite, and regular graphs. In this paper, we introduce a unified framework for structured graph learning that combines Gaussian graphical model and spectral graph theory. We propose to convert combinatorial structural constraints into spectral constraints on graph matrices and develop an optimization framework based on block majorization-minimization to solve structured graph learning problem. The proposed algorithms are provably convergent and practically amenable for a number of graph based applications such as data clustering. Extensive numerical experiments with both synthetic and real data sets illustrate the effectiveness of the proposed algorithms. An open source R package containing the code for all the experiments is available at https://CRAN.R-project.org/package=spectralGraphTopology.




ea

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customization. Leveraging the MXNet ecosystem, the deep learning models in GluonCV and GluonNLP can be deployed onto a variety of platforms with different programming languages. The Apache 2.0 license has been adopted by GluonCV and GluonNLP to allow for software distribution, modification, and usage.




ea

Distributed Feature Screening via Componentwise Debiasing

Feature screening is a powerful tool in processing high-dimensional data. When the sample size N and the number of features p are both large, the implementation of classic screening methods can be numerically challenging. In this paper, we propose a distributed screening framework for big data setup. In the spirit of 'divide-and-conquer', the proposed framework expresses a correlation measure as a function of several component parameters, each of which can be distributively estimated using a natural U-statistic from data segments. With the component estimates aggregated, we obtain a final correlation estimate that can be readily used for screening features. This framework enables distributed storage and parallel computing and thus is computationally attractive. Due to the unbiased distributive estimation of the component parameters, the final aggregated estimate achieves a high accuracy that is insensitive to the number of data segments m. Under mild conditions, we show that the aggregated correlation estimator is as efficient as the centralized estimator in terms of the probability convergence bound and the mean squared error rate; the corresponding screening procedure enjoys sure screening property for a wide range of correlation measures. The promising performances of the new method are supported by extensive numerical examples.




ea

On the consistency of graph-based Bayesian semi-supervised learning and the scalability of sampling algorithms

This paper considers a Bayesian approach to graph-based semi-supervised learning. We show that if the graph parameters are suitably scaled, the graph-posteriors converge to a continuum limit as the size of the unlabeled data set grows. This consistency result has profound algorithmic implications: we prove that when consistency holds, carefully designed Markov chain Monte Carlo algorithms have a uniform spectral gap, independent of the number of unlabeled inputs. Numerical experiments illustrate and complement the theory.




ea

Learning with Fenchel-Young losses

Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction. Understanding the core principles and theoretical properties underpinning these losses is key to choose the right loss for the right problem, as well as to create new losses which combine their strengths. In this paper, we introduce Fenchel-Young losses, a generic way to construct a convex loss function for a regularized prediction function. We provide an in-depth study of their properties in a very broad setting, covering all the aforementioned supervised learning tasks, and revealing new connections between sparsity, generalized entropies, and separation margins. We show that Fenchel-Young losses unify many well-known loss functions and allow to create useful new ones easily. Finally, we derive efficient predictive and training algorithms, making Fenchel-Young losses appealing both in theory and practice.




ea

Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables

We consider the problem of learning causal models from observational data generated by linear non-Gaussian acyclic causal models with latent variables. Without considering the effect of latent variables, the inferred causal relationships among the observed variables are often wrong. Under faithfulness assumption, we propose a method to check whether there exists a causal path between any two observed variables. From this information, we can obtain the causal order among the observed variables. The next question is whether the causal effects can be uniquely identified as well. We show that causal effects among observed variables cannot be identified uniquely under mere assumptions of faithfulness and non-Gaussianity of exogenous noises. However, we are able to propose an efficient method that identifies the set of all possible causal effects that are compatible with the observational data. We present additional structural conditions on the causal graph under which causal effects among observed variables can be determined uniquely. Furthermore, we provide necessary and sufficient graphical conditions for unique identification of the number of variables in the system. Experiments on synthetic data and real-world data show the effectiveness of our proposed algorithm for learning causal models.




ea

Branch and Bound for Piecewise Linear Neural Network Verification

The success of Deep Learning and its potential use in many safety-critical applicationshas motivated research on formal verification of Neural Network (NN) models. In thiscontext, verification involves proving or disproving that an NN model satisfies certaininput-output properties. Despite the reputation of learned NN models as black boxes,and the theoretical hardness of proving useful properties about them, researchers havebeen successful in verifying some classes of models by exploiting their piecewise linearstructure and taking insights from formal methods such as Satisifiability Modulo Theory.However, these methods are still far from scaling to realistic neural networks. To facilitateprogress on this crucial area, we exploit the Mixed Integer Linear Programming (MIP) formulation of verification to propose a family of algorithms based on Branch-and-Bound (BaB). We show that our family contains previous verification methods as special cases.With the help of the BaB framework, we make three key contributions. Firstly, we identifynew methods that combine the strengths of multiple existing approaches, accomplishingsignificant performance improvements over previous state of the art. Secondly, we introducean effective branching strategy on ReLU non-linearities. This branching strategy allows usto efficiently and successfully deal with high input dimensional problems with convolutionalnetwork architecture, on which previous methods fail frequently. Finally, we proposecomprehensive test data sets and benchmarks which includes a collection of previouslyreleased testcases. We use the data sets to conduct a thorough experimental comparison ofexisting and new algorithms and to provide an inclusive analysis of the factors impactingthe hardness of verification problems.




ea

Dynamical Systems as Temporal Feature Spaces

Parametrised state space models in the form of recurrent networks are often used in machine learning to learn from data streams exhibiting temporal dependencies. To break the black box nature of such models it is important to understand the dynamical features of the input-driving time series that are formed in the state space. We propose a framework for rigorous analysis of such state representations in vanishing memory state space models such as echo state networks (ESN). In particular, we consider the state space a temporal feature space and the readout mapping from the state space a kernel machine operating in that feature space. We show that: (1) The usual ESN strategy of randomly generating input-to-state, as well as state coupling leads to shallow memory time series representations, corresponding to cross-correlation operator with fast exponentially decaying coefficients; (2) Imposing symmetry on dynamic coupling yields a constrained dynamic kernel matching the input time series with straightforward exponentially decaying motifs or exponentially decaying motifs of the highest frequency; (3) Simple ring (cycle) high-dimensional reservoir topology specified only through two free parameters can implement deep memory dynamic kernels with a rich variety of matching motifs. We quantify richness of feature representations imposed by dynamic kernels and demonstrate that for dynamic kernel associated with cycle reservoir topology, the kernel richness undergoes a phase transition close to the edge of stability.




ea

Ensemble Learning for Relational Data

We present a theoretical analysis framework for relational ensemble models. We show that ensembles of collective classifiers can improve predictions for graph data by reducing errors due to variance in both learning and inference. In addition, we propose a relational ensemble framework that combines a relational ensemble learning approach with a relational ensemble inference approach for collective classification. The proposed ensemble techniques are applicable for both single and multiple graph settings. Experiments on both synthetic and real-world data demonstrate the effectiveness of the proposed framework. Finally, our experimental results support the theoretical analysis and confirm that ensemble algorithms that explicitly focus on both learning and inference processes and aim at reducing errors associated with both, are the best performers.




ea

Learning Causal Networks via Additive Faithfulness

In this paper we introduce a statistical model, called additively faithful directed acyclic graph (AFDAG), for causal learning from observational data. Our approach is based on additive conditional independence (ACI), a recently proposed three-way statistical relation that shares many similarities with conditional independence but without resorting to multi-dimensional kernels. This distinct feature strikes a balance between a parametric model and a fully nonparametric model, which makes the proposed model attractive for handling large networks. We develop an estimator for AFDAG based on a linear operator that characterizes ACI, and establish the consistency and convergence rates of this estimator, as well as the uniform consistency of the estimated DAG. Moreover, we introduce a modified PC-algorithm to implement the estimating procedure efficiently, so that its complexity is determined by the level of sparseness rather than the dimension of the network. Through simulation studies we show that our method outperforms existing methods when commonly assumed conditions such as Gaussian or Gaussian copula distributions do not hold. Finally, the usefulness of AFDAG formulation is demonstrated through an application to a proteomics data set.




ea

Expected Policy Gradients for Reinforcement Learning

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in the sampled trajectory. For continuous action spaces, we first derive a practical result for Gaussian policies and quadratic critics and then extend it to a universal analytical method, covering a broad class of actors and critics, including Gaussian, exponential families, and policies with bounded support. For Gaussian policies, we introduce an exploration method that uses covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions. For discrete action spaces, we derive a variant of EPG based on softmax policies. We also establish a new general policy gradient theorem, of which the stochastic and deterministic policy gradient theorems are special cases. Furthermore, we prove that EPG reduces the variance of the gradient estimates without requiring deterministic policies and with little computational overhead. Finally, we provide an extensive experimental evaluation of EPG and show that it outperforms existing approaches on multiple challenging control domains.




ea

Unique Sharp Local Minimum in L1-minimization Complete Dictionary Learning

We study the problem of globally recovering a dictionary from a set of signals via $ell_1$-minimization. We assume that the signals are generated as i.i.d. random linear combinations of the $K$ atoms from a complete reference dictionary $D^*in mathbb R^{K imes K}$, where the linear combination coefficients are from either a Bernoulli type model or exact sparse model. First, we obtain a necessary and sufficient norm condition for the reference dictionary $D^*$ to be a sharp local minimum of the expected $ell_1$ objective function. Our result substantially extends that of Wu and Yu (2015) and allows the combination coefficient to be non-negative. Secondly, we obtain an explicit bound on the region within which the objective value of the reference dictionary is minimal. Thirdly, we show that the reference dictionary is the unique sharp local minimum, thus establishing the first known global property of $ell_1$-minimization dictionary learning. Motivated by the theoretical results, we introduce a perturbation based test to determine whether a dictionary is a sharp local minimum of the objective function. In addition, we also propose a new dictionary learning algorithm based on Block Coordinate Descent, called DL-BCD, which is guaranteed to decrease the obective function monotonically. Simulation studies show that DL-BCD has competitive performance in terms of recovery rate compared to other state-of-the-art dictionary learning algorithms when the reference dictionary is generated from random Gaussian matrices.




ea

Representation Learning for Dynamic Graphs: A Survey

Graphs arise naturally in many real-world applications including social networks, recommender systems, ontologies, biology, and computational finance. Traditionally, machine learning models for graphs have been mostly designed for static graphs. However, many applications involve evolving graphs. This introduces important challenges for learning and inference since nodes, attributes, and edges change over time. In this survey, we review the recent advances in representation learning for dynamic graphs, including dynamic knowledge graphs. We describe existing models from an encoder-decoder perspective, categorize these encoders and decoders based on the techniques they employ, and analyze the approaches in each category. We also review several prominent applications and widely used datasets and highlight directions for future research.




ea

GADMM: Fast and Communication Efficient Framework for Distributed Machine Learning

When the data is distributed across multiple servers, lowering the communication cost between the servers (or workers) while solving the distributed learning problem is an important problem and is the focus of this paper. In particular, we propose a fast, and communication-efficient decentralized framework to solve the distributed machine learning (DML) problem. The proposed algorithm, Group Alternating Direction Method of Multipliers (GADMM) is based on the Alternating Direction Method of Multipliers (ADMM) framework. The key novelty in GADMM is that it solves the problem in a decentralized topology where at most half of the workers are competing for the limited communication resources at any given time. Moreover, each worker exchanges the locally trained model only with two neighboring workers, thereby training a global model with a lower amount of communication overhead in each exchange. We prove that GADMM converges to the optimal solution for convex loss functions, and numerically show that it converges faster and more communication-efficient than the state-of-the-art communication-efficient algorithms such as the Lazily Aggregated Gradient (LAG) and dual averaging, in linear and logistic regression tasks on synthetic and real datasets. Furthermore, we propose Dynamic GADMM (D-GADMM), a variant of GADMM, and prove its convergence under the time-varying network topology of the workers.




ea

Top books to read at home

Looking for a new book to read while staying safely at home? The Library has expanded its ebook collection to over 6000 




ea

Researching the Pacific: The Pacific Manuscripts Bureau

The State Library holds a superb collection of original documents, illustrations, photographs and books about the Pacifi




ea

Access thousands of newspapers and magazines with PressReader

Want to access thousands of newspapers and magazines wherever you are?




ea

Health & Active Living Challenge




ea

Oriented first passage percolation in the mean field limit

Nicola Kistler, Adrien Schertzer, Marius A. Schmidt.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 2, 414--425.

Abstract:
The Poisson clumping heuristic has lead Aldous to conjecture the value of the oriented first passage percolation on the hypercube in the limit of large dimensions. Aldous’ conjecture has been rigorously confirmed by Fill and Pemantle ( Ann. Appl. Probab. 3 (1993) 593–629) by means of a variance reduction trick. We present here a streamlined and, we believe, more natural proof based on ideas emerged in the study of Derrida’s random energy models.