de

Series 02: Slides of suburbs in Sydney NSW, ca 1960s-1980s




de

Wedding photographs of William Thomas Cadell and Anne Macansh set in Harriet Scott graphic




de

Correspondence relating to Lewis Harold Bell Lasseter, 1931




de

The Most Excellent Order of the British Empire Association (New South Wales) further records, 1979-2012




de

Selected Poems of Henry Lawson: Correspondence: Vol.1




de

Hand made homes

The gold rush was a time of opportunism when people came from far and wide to stake their claim.




de

It's all in the detail

It's been a productive last few weeks on the project.




de

Oregon State's Destiny Slocum enters transfer portal

Oregon State basketball player Destiny Slocum has opted to enter the transfer portal for her final season of eligibility. Slocum, a 5-foot-7 guard, averaged a team-best 14.9 points and had 4.7 assists a game this past season with the Beavers, who finished the season ranked No. 14 with a 23-9 record. In a statement released by the university on Thursday, Slocum thanked everyone who supported her in the decision.




de

Clean sweep: Oregon's Sabrina Ionescu is unanimous Player of the Year after winning Wooden Award

Sabrina Ionescu wins the Wooden Award for the second year in a row, becoming the fifth in the trophy's history to win in back-to-back seasons. With the honor, she completes a complete sweep of the national postseason player of the year awards. As a senior, Ionescu matched her own single-season mark with eight triple-doubles in 2019-20, and she was incredibly efficient from the field with a career-best 51.8 field goal percentage.




de

WNBA Draft Profile: Transcendent guard Sabrina Ionescu projects as top pick

After sweeping every national player of the year award, Sabrina Ionescu is off to the WNBA level where her skills will make an instant impact — not just to her new team but the league as a whole. She averaged 17.5 points, 8.6 rebounds and 9.1 assists for the Ducks in 2019-20, rewriting her own NCAA career triple-double record and becoming the first in college basketball history with at least 2,000 points, 1,000 rebounds and 1,000 assists.




de

WNBA Draft Profile: Versatile forward Satou Sabally can provide instant impact

Athletic forward Satou Sabally is preparing to take the leap to the WNBA level following three productive seasons at Oregon. As a junior, she averaged 16.2 points and 6.9 rebounds per game while helping the Ducks sweep the Pac-12 regular season and tournament titles. At 6-foot-4, she also drained 45 3-pointers for Oregon in 2019-20 while notching a career-best average of 2.3 assists per game.




de

WNBA Draft Profile: UCLA guard Japreece Dean ready to lead at the next level

UCLA guard Japreece Dean is primed to shine at the next level as she heads to the WNBA Draft in April. The do-it-all point-woman was an All-Pac-12 honoree last season, and one of only seven D-1 hoopers with at least 13 points and 5.5 assists per game.




de

Inside Sabrina Ionescu and Ruthy Hebard's lasting bond on quick look of 'Our Stories'

Learn how Oregon stars Sabrina Ionescu and Ruthy Hebard developed a lasting bond as college freshmen and carried that through storied four-year careers for the Ducks. Watch "Our Stories Unfinished Business: Sabrina Ionescu and Ruthy Hebard" debuting Wednesday, April 15 at 7 p.m. PT/ 8 p.m. MT on Pac-12 Network.




de

Detroit Mercy hires Gilbert as women's basketball coach

DETROIT (AP) -- Detroit Mercy hired AnnMarie Gilbert as women’s basketball coach.




de

A Star Wars look at Sabrina Ionescu's Oregon accolades

See some of Sabrina Ionescu's remarkable accomplishments at Oregon set to the Star Wars opening crawl.




de

UCLA's Natalie Chou on her role models, inspiring Asian-American girls in basketball

Pac-12 Networks' Mike Yam has a conversation with UCLA's Natalie Chou during Wednesday's "Pac-12 Perspective" podcast. Chou reflects on her role models, passion for basketball and how her mom has made a big impact on her hoops career.




de

Stanford's Tara VanDerveer on Haley Jones' versatile freshman year: 'It was really incredible'

During Friday's "Pac-12 Perspective," Stanford head coach Tara VanDerveer spoke about Haley Jones' positionless game and how the Cardinal used the dynamic freshman in 2019-20. Download and listen wherever you get your podcasts.




de

Pac-12 women's basketball student-athletes reflect on the influence of their moms ahead of Mother's Day

Pac-12 student-athletes give shout-outs to their moms ahead of Mother's Day on May 10th, 2020 including UCLA's Michaela Onyenwere, Oregon's Sabrina Ionescu and Satou Sabally, Arizona's Aari McDonald, Cate Reese, and Lacie Hull, Stanford's Kiana Williams, USC's Endyia Rogers, and Aliyah Jeune, and Utah's Brynna Maxwell.




de

The limiting behavior of isotonic and convex regression estimators when the model is misspecified

Eunji Lim.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 2053--2097.

Abstract:
We study the asymptotic behavior of the least squares estimators when the model is possibly misspecified. We consider the setting where we wish to estimate an unknown function $f_{*}:(0,1)^{d} ightarrow mathbb{R}$ from observations $(X,Y),(X_{1},Y_{1}),cdots ,(X_{n},Y_{n})$; our estimator $hat{g}_{n}$ is the minimizer of $sum _{i=1}^{n}(Y_{i}-g(X_{i}))^{2}/n$ over $gin mathcal{G}$ for some set of functions $mathcal{G}$. We provide sufficient conditions on the metric entropy of $mathcal{G}$, under which $hat{g}_{n}$ converges to $g_{*}$ as $n ightarrow infty $, where $g_{*}$ is the minimizer of $|g-f_{*}| riangleq mathbb{E}(g(X)-f_{*}(X))^{2}$ over $gin mathcal{G}$. As corollaries of our theorem, we establish $|hat{g}_{n}-g_{*}| ightarrow 0$ as $n ightarrow infty $ when $mathcal{G}$ is the set of monotone functions or the set of convex functions. We also make a connection between the convergence rate of $|hat{g}_{n}-g_{*}|$ and the metric entropy of $mathcal{G}$. As special cases of our finding, we compute the convergence rate of $|hat{g}_{n}-g_{*}|^{2}$ when $mathcal{G}$ is the set of bounded monotone functions or the set of bounded convex functions.




de

Nonparametric confidence intervals for conditional quantiles with large-dimensional covariates

Laurent Gardes.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 661--701.

Abstract:
The first part of the paper is dedicated to the construction of a $gamma$ - nonparametric confidence interval for a conditional quantile with a level depending on the sample size. When this level tends to 0 or 1 as the sample size increases, the conditional quantile is said to be extreme and is located in the tail of the conditional distribution. The proposed confidence interval is constructed by approximating the distribution of the order statistics selected with a nearest neighbor approach by a Beta distribution. We show that its coverage probability converges to the preselected probability $gamma $ and its accuracy is illustrated on a simulation study. When the dimension of the covariate increases, the coverage probability of the confidence interval can be very different from $gamma $. This is a well known consequence of the data sparsity especially in the tail of the distribution. In a second part, a dimension reduction procedure is proposed in order to select more appropriate nearest neighbors in the right tail of the distribution and in turn to obtain a better coverage probability for extreme conditional quantiles. This procedure is based on the Tail Conditional Independence assumption introduced in (Gardes, Extremes , pp. 57–95, 18(3) , 2018).




de

Statistical convergence of the EM algorithm on Gaussian mixture models

Ruofei Zhao, Yuanzhi Li, Yuekai Sun.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 632--660.

Abstract:
We study the convergence behavior of the Expectation Maximization (EM) algorithm on Gaussian mixture models with an arbitrary number of mixture components and mixing weights. We show that as long as the means of the components are separated by at least $Omega (sqrt{min {M,d}})$, where $M$ is the number of components and $d$ is the dimension, the EM algorithm converges locally to the global optimum of the log-likelihood. Further, we show that the convergence rate is linear and characterize the size of the basin of attraction to the global optimum.




de

Generalised cepstral models for the spectrum of vector time series

Maddalena Cavicchioli.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 605--631.

Abstract:
The paper treats the modeling of stationary multivariate stochastic processes via a frequency domain model expressed in terms of cepstrum theory. The proposed model nests the vector exponential model of [20] as a special case, and extends the generalised cepstral model of [36] to the multivariate setting, answering a question raised by the last authors in their paper. Contemporarily, we extend the notion of generalised autocovariance function of [35] to vector time series. Then we derive explicit matrix formulas connecting generalised cepstral and autocovariance matrices of the process, and prove the consistency and asymptotic properties of the Whittle likelihood estimators of model parameters. Asymptotic theory for the special case of the vector exponential model is a significant addition to the paper of [20]. We also provide a mathematical machinery, based on matrix differentiation, and computational methods to derive our results, which differ significantly from those employed in the univariate case. The utility of the proposed model is illustrated through Monte Carlo simulation from a bivariate process characterized by a high dynamic range, and an empirical application on time varying minimum variance hedge ratios through the second moments of future and spot prices in the corn commodity market.




de

On the Letac-Massam conjecture and existence of high dimensional Bayes estimators for graphical models

Emanuel Ben-David, Bala Rajaratnam.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 580--604.

Abstract:
The Wishart distribution defined on the open cone of positive-definite matrices plays a central role in multivariate analysis and multivariate distribution theory. Its domain of parameters is often referred to as the Gindikin set. In recent years, varieties of useful extensions of the Wishart distribution have been proposed in the literature for the purposes of studying Markov random fields and graphical models. In particular, generalizations of the Wishart distribution, referred to as Type I and Type II (graphical) Wishart distributions introduced by Letac and Massam in Annals of Statistics (2007) play important roles in both frequentist and Bayesian inference for Gaussian graphical models. These distributions have been especially useful in high-dimensional settings due to the flexibility offered by their multiple-shape parameters. Concerning Type I and Type II Wishart distributions, a conjecture of Letac and Massam concerns the domain of multiple-shape parameters of these distributions. The conjecture also has implications for the existence of Bayes estimators corresponding to these high dimensional priors. The conjecture, which was first posed in the Annals of Statistics, has now been an open problem for about 10 years. In this paper, we give a necessary condition for the Letac and Massam conjecture to hold. More precisely, we prove that if the Letac and Massam conjecture holds on a decomposable graph, then no two separators of the graph can be nested within each other. For this, we analyze Type I and Type II Wishart distributions on appropriate Markov equivalent perfect DAG models and succeed in deriving the aforementioned necessary condition. This condition in particular identifies a class of counterexamples to the conjecture.




de

Consistent model selection criteria and goodness-of-fit test for common time series models

Jean-Marc Bardet, Kare Kamila, William Kengne.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 2009--2052.

Abstract:
This paper studies the model selection problem in a large class of causal time series models, which includes both the ARMA or AR($infty $) processes, as well as the GARCH or ARCH($infty $), APARCH, ARMA-GARCH and many others processes. To tackle this issue, we consider a penalized contrast based on the quasi-likelihood of the model. We provide sufficient conditions for the penalty term to ensure the consistency of the proposed procedure as well as the consistency and the asymptotic normality of the quasi-maximum likelihood estimator of the chosen model. We also propose a tool for diagnosing the goodness-of-fit of the chosen model based on a Portmanteau test. Monte-Carlo experiments and numerical applications on illustrative examples are performed to highlight the obtained asymptotic results. Moreover, using a data-driven choice of the penalty, they show the practical efficiency of this new model selection procedure and Portemanteau test.




de

Univariate mean change point detection: Penalization, CUSUM and optimality

Daren Wang, Yi Yu, Alessandro Rinaldo.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1917--1961.

Abstract:
The problem of univariate mean change point detection and localization based on a sequence of $n$ independent observations with piecewise constant means has been intensively studied for more than half century, and serves as a blueprint for change point problems in more complex settings. We provide a complete characterization of this classical problem in a general framework in which the upper bound $sigma ^{2}$ on the noise variance, the minimal spacing $Delta $ between two consecutive change points and the minimal magnitude $kappa $ of the changes, are allowed to vary with $n$. We first show that consistent localization of the change points is impossible in the low signal-to-noise ratio regime $frac{kappa sqrt{Delta }}{sigma }preceq sqrt{log (n)}$. In contrast, when $frac{kappa sqrt{Delta }}{sigma }$ diverges with $n$ at the rate of at least $sqrt{log (n)}$, we demonstrate that two computationally-efficient change point estimators, one based on the solution to an $ell _{0}$-penalized least squares problem and the other on the popular wild binary segmentation algorithm, are both consistent and achieve a localization rate of the order $frac{sigma ^{2}}{kappa ^{2}}log (n)$. We further show that such rate is minimax optimal, up to a $log (n)$ term.




de

Asymptotics and optimal bandwidth for nonparametric estimation of density level sets

Wanli Qiao.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 302--344.

Abstract:
Bandwidth selection is crucial in the kernel estimation of density level sets. A risk based on the symmetric difference between the estimated and true level sets is usually used to measure their proximity. In this paper we provide an asymptotic $L^{p}$ approximation to this risk, where $p$ is characterized by the weight function in the risk. In particular the excess risk corresponds to an $L^{2}$ type of risk, and is adopted to derive an optimal bandwidth for nonparametric level set estimation of $d$-dimensional density functions ($dgeq 1$). A direct plug-in bandwidth selector is developed for kernel density level set estimation and its efficacy is verified in numerical studies.




de

Bayesian variance estimation in the Gaussian sequence model with partial information on the means

Gianluca Finocchio, Johannes Schmidt-Hieber.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 239--271.

Abstract:
Consider the Gaussian sequence model under the additional assumption that a fixed fraction of the means is known. We study the problem of variance estimation from a frequentist Bayesian perspective. The maximum likelihood estimator (MLE) for $sigma^{2}$ is biased and inconsistent. This raises the question whether the posterior is able to correct the MLE in this case. By developing a new proving strategy that uses refined properties of the posterior distribution, we find that the marginal posterior is inconsistent for any i.i.d. prior on the mean parameters. In particular, no assumption on the decay of the prior needs to be imposed. Surprisingly, we also find that consistency can be retained for a hierarchical prior based on Gaussian mixtures. In this case we also establish a limiting shape result and determine the limit distribution. In contrast to the classical Bernstein-von Mises theorem, the limit is non-Gaussian. We show that the Bayesian analysis leads to new statistical estimators outperforming the correctly calibrated MLE in a numerical simulation study.




de

Perspective maximum likelihood-type estimation via proximal decomposition

Patrick L. Combettes, Christian L. Müller.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 207--238.

Abstract:
We introduce a flexible optimization model for maximum likelihood-type estimation (M-estimation) that encompasses and generalizes a large class of existing statistical models, including Huber’s concomitant M-estimator, Owen’s Huber/Berhu concomitant estimator, the scaled lasso, support vector machine regression, and penalized estimation with structured sparsity. The model, termed perspective M-estimation, leverages the observation that convex M-estimators with concomitant scale as well as various regularizers are instances of perspective functions, a construction that extends a convex function to a jointly convex one in terms of an additional scale variable. These nonsmooth functions are shown to be amenable to proximal analysis, which leads to principled and provably convergent optimization algorithms via proximal splitting. We derive novel proximity operators for several perspective functions of interest via a geometrical approach based on duality. We then devise a new proximal splitting algorithm to solve the proposed M-estimation problem and establish the convergence of both the scale and regression iterates it produces to a solution. Numerical experiments on synthetic and real-world data illustrate the broad applicability of the proposed framework.




de

Exact recovery in block spin Ising models at the critical line

Matthias Löwe, Kristina Schubert.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1796--1815.

Abstract:
We show how to exactly reconstruct the block structure at the critical line in the so-called Ising block model. This model was recently re-introduced by Berthet, Rigollet and Srivastava in [2]. There the authors show how to exactly reconstruct blocks away from the critical line and they give an upper and a lower bound on the number of observations one needs; thereby they establish a minimax optimal rate (up to constants). Our technique relies on a combination of their methods with fluctuation results obtained in [20]. The latter are extended to the full critical regime. We find that the number of necessary observations depends on whether the interaction parameter between two blocks is positive or negative: In the first case, there are about $Nlog N$ observations required to exactly recover the block structure, while in the latter case $sqrt{N}log N$ observations suffice.




de

Efficient estimation in expectile regression using envelope models

Tuo Chen, Zhihua Su, Yi Yang, Shanshan Ding.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 143--173.

Abstract:
As a generalization of the classical linear regression, expectile regression (ER) explores the relationship between the conditional expectile of a response variable and a set of predictor variables. ER with respect to different expectile levels can provide a comprehensive picture of the conditional distribution of the response variable given the predictors. We adopt an efficient estimation method called the envelope model ([8]) in ER, and construct a novel envelope expectile regression (EER) model. Estimation of the EER parameters can be performed using the generalized method of moments (GMM). We establish the consistency and derive the asymptotic distribution of the EER estimators. In addition, we show that the EER estimators are asymptotically more efficient than the ER estimators. Numerical experiments and real data examples are provided to demonstrate the efficiency gains attained by EER compared to ER, and the efficiency gains can further lead to improvements in prediction.




de

Nonparametric false discovery rate control for identifying simultaneous signals

Sihai Dave Zhao, Yet Tien Nguyen.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 110--142.

Abstract:
It is frequently of interest to identify simultaneous signals, defined as features that exhibit statistical significance across each of several independent experiments. For example, genes that are consistently differentially expressed across experiments in different animal species can reveal evolutionarily conserved biological mechanisms. However, in some problems the test statistics corresponding to these features can have complicated or unknown null distributions. This paper proposes a novel nonparametric false discovery rate control procedure that can identify simultaneous signals even without knowing these null distributions. The method is shown, theoretically and in simulations, to asymptotically control the false discovery rate. It was also used to identify genes that were both differentially expressed and proximal to differentially accessible chromatin in the brains of mice exposed to a conspecific intruder. The proposed method is available in the R package github.com/sdzhao/ssa.




de

Model-based clustering with envelopes

Wenjing Wang, Xin Zhang, Qing Mai.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 82--109.

Abstract:
Clustering analysis is an important unsupervised learning technique in multivariate statistics and machine learning. In this paper, we propose a set of new mixture models called CLEMM (in short for Clustering with Envelope Mixture Models) that is based on the widely used Gaussian mixture model assumptions and the nascent research area of envelope methodology. Formulated mostly for regression models, envelope methodology aims for simultaneous dimension reduction and efficient parameter estimation, and includes a very recent formulation of envelope discriminant subspace for classification and discriminant analysis. Motivated by the envelope discriminant subspace pursuit in classification, we consider parsimonious probabilistic mixture models where the cluster analysis can be improved by projecting the data onto a latent lower-dimensional subspace. The proposed CLEMM framework and the associated envelope-EM algorithms thus provide foundations for envelope methods in unsupervised and semi-supervised learning problems. Numerical studies on simulated data and two benchmark data sets show significant improvement of our propose methods over the classical methods such as Gaussian mixture models, K-means and hierarchical clustering algorithms. An R package is available at https://github.com/kusakehan/CLEMM.




de

Non-parametric adaptive estimation of order 1 Sobol indices in stochastic models, with an application to Epidemiology

Gwenaëlle Castellan, Anthony Cousien, Viet Chi Tran.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 50--81.

Abstract:
Global sensitivity analysis is a set of methods aiming at quantifying the contribution of an uncertain input parameter of the model (or combination of parameters) on the variability of the response. We consider here the estimation of the Sobol indices of order 1 which are commonly-used indicators based on a decomposition of the output’s variance. In a deterministic framework, when the same inputs always give the same outputs, these indices are usually estimated by replicated simulations of the model. In a stochastic framework, when the response given a set of input parameters is not unique due to randomness in the model, metamodels are often used to approximate the mean and dispersion of the response by deterministic functions. We propose a new non-parametric estimator without the need of defining a metamodel to estimate the Sobol indices of order 1. The estimator is based on warped wavelets and is adaptive in the regularity of the model. The convergence of the mean square error to zero, when the number of simulations of the model tend to infinity, is computed and an elbow effect is shown, depending on the regularity of the model. Applications in Epidemiology are carried to illustrate the use of non-parametric estimators.




de

Simultaneous transformation and rounding (STAR) models for integer-valued data

Daniel R. Kowal, Antonio Canale.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1744--1772.

Abstract:
We propose a simple yet powerful framework for modeling integer-valued data, such as counts, scores, and rounded data. The data-generating process is defined by Simultaneously Transforming and Rounding (STAR) a continuous-valued process, which produces a flexible family of integer-valued distributions capable of modeling zero-inflation, bounded or censored data, and over- or underdispersion. The transformation is modeled as unknown for greater distributional flexibility, while the rounding operation ensures a coherent integer-valued data-generating process. An efficient MCMC algorithm is developed for posterior inference and provides a mechanism for adaptation of successful Bayesian models and algorithms for continuous data to the integer-valued data setting. Using the STAR framework, we design a new Bayesian Additive Regression Tree model for integer-valued data, which demonstrates impressive predictive distribution accuracy for both synthetic data and a large healthcare utilization dataset. For interpretable regression-based inference, we develop a STAR additive model, which offers greater flexibility and scalability than existing integer-valued models. The STAR additive model is applied to study the recent decline in Amazon river dolphins.




de

On change-point estimation under Sobolev sparsity

Aurélie Fischer, Dominique Picard.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1648--1689.

Abstract:
In this paper, we consider the estimation of a change-point for possibly high-dimensional data in a Gaussian model, using a maximum likelihood method. We are interested in how dimension reduction can affect the performance of the method. We provide an estimator of the change-point that has a minimax rate of convergence, up to a logarithmic factor. The minimax rate is in fact composed of a fast rate —dimension-invariant— and a slow rate —increasing with the dimension. Moreover, it is proved that considering the case of sparse data, with a Sobolev regularity, there is a bound on the separation of the regimes above which there exists an optimal choice of dimension reduction, leading to the fast rate of estimation. We propose an adaptive dimension reduction procedure based on Lepski’s method and show that the resulting estimator attains the fast rate of convergence. Our results are then illustrated by a simulation study. In particular, practical strategies are suggested to perform dimension reduction.




de

Asymptotic seed bias in respondent-driven sampling

Yuling Yan, Bret Hanlon, Sebastien Roch, Karl Rohe.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1577--1610.

Abstract:
Respondent-driven sampling (RDS) collects a sample of individuals in a networked population by incentivizing the sampled individuals to refer their contacts into the sample. This iterative process is initialized from some seed node(s). Sometimes, this selection creates a large amount of seed bias. Other times, the seed bias is small. This paper gains a deeper understanding of this bias by characterizing its effect on the limiting distribution of various RDS estimators. Using classical tools and results from multi-type branching processes [12], we show that the seed bias is negligible for the Generalized Least Squares (GLS) estimator and non-negligible for both the inverse probability weighted and Volz-Heckathorn (VH) estimators. In particular, we show that (i) above a critical threshold, VH converge to a non-trivial mixture distribution, where the mixture component depends on the seed node, and the mixture distribution is possibly multi-modal. Moreover, (ii) GLS converges to a Gaussian distribution independent of the seed node, under a certain condition on the Markov process. Numerical experiments with both simulated data and empirical social networks suggest that these results appear to hold beyond the Markov conditions of the theorems.




de

Nonconcave penalized estimation in sparse vector autoregression model

Xuening Zhu.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1413--1448.

Abstract:
High dimensional time series receive considerable attention recently, whose temporal and cross-sectional dependency could be captured by the vector autoregression (VAR) model. To tackle with the high dimensionality, penalization methods are widely employed. However, theoretically, the existing studies of the penalization methods mainly focus on $i.i.d$ data, therefore cannot quantify the effect of the dependence level on the convergence rate. In this work, we use the spectral properties of the time series to quantify the dependence and derive a nonasymptotic upper bound for the estimation errors. By focusing on the nonconcave penalization methods, we manage to establish the oracle properties of the penalized VAR model estimation by considering the effects of temporal and cross-sectional dependence. Extensive numerical studies are conducted to compare the finite sample performance using different penalization functions. Lastly, an air pollution data of mainland China is analyzed for illustration purpose.




de

Computing the degrees of freedom of rank-regularized estimators and cousins

Rahul Mazumder, Haolei Weng.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1348--1385.

Abstract:
Estimating a low rank matrix from its linear measurements is a problem of central importance in contemporary statistical analysis. The choice of tuning parameters for estimators remains an important challenge from a theoretical and practical perspective. To this end, Stein’s Unbiased Risk Estimate (SURE) framework provides a well-grounded statistical framework for degrees of freedom estimation. In this paper, we use the SURE framework to obtain degrees of freedom estimates for a general class of spectral regularized matrix estimators—our results generalize beyond the class of estimators that have been studied thus far. To this end, we use a result due to Shapiro (2002) pertaining to the differentiability of symmetric matrix valued functions, developed in the context of semidefinite optimization algorithms. We rigorously verify the applicability of Stein’s Lemma towards the derivation of degrees of freedom estimates; and also present new techniques based on Gaussian convolution to estimate the degrees of freedom of a class of spectral estimators, for which Stein’s Lemma does not directly apply.




de

Rate optimal Chernoff bound and application to community detection in the stochastic block models

Zhixin Zhou, Ping Li.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1302--1347.

Abstract:
The Chernoff coefficient is known to be an upper bound of Bayes error probability in classification problem. In this paper, we will develop a rate optimal Chernoff bound on the Bayes error probability. The new bound is not only an upper bound but also a lower bound of Bayes error probability up to a constant factor. Moreover, we will apply this result to community detection in the stochastic block models. As a clustering problem, the optimal misclassification rate of community detection problem can be characterized by our rate optimal Chernoff bound. This can be formalized by deriving a minimax error rate over certain parameter space of stochastic block models, then achieving such an error rate by a feasible algorithm employing multiple steps of EM type updates.




de

Consistency and asymptotic normality of Latent Block Model estimators

Vincent Brault, Christine Keribin, Mahendra Mariadassou.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1234--1268.

Abstract:
The Latent Block Model (LBM) is a model-based method to cluster simultaneously the $d$ columns and $n$ rows of a data matrix. Parameter estimation in LBM is a difficult and multifaceted problem. Although various estimation strategies have been proposed and are now well understood empirically, theoretical guarantees about their asymptotic behavior is rather sparse and most results are limited to the binary setting. We prove here theoretical guarantees in the valued settings. We show that under some mild conditions on the parameter space, and in an asymptotic regime where $log (d)/n$ and $log (n)/d$ tend to $0$ when $n$ and $d$ tend to infinity, (1) the maximum-likelihood estimate of the complete model (with known labels) is consistent and (2) the log-likelihood ratios are equivalent under the complete and observed (with unknown labels) models. This equivalence allows us to transfer the asymptotic consistency, and under mild conditions, asymptotic normality, to the maximum likelihood estimate under the observed model. Moreover, the variational estimator is also consistent and, under the same conditions, asymptotically normal.




de

Conditional density estimation with covariate measurement error

Xianzheng Huang, Haiming Zhou.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 970--1023.

Abstract:
We consider estimating the density of a response conditioning on an error-prone covariate. Motivated by two existing kernel density estimators in the absence of covariate measurement error, we propose a method to correct the existing estimators for measurement error. Asymptotic properties of the resultant estimators under different types of measurement error distributions are derived. Moreover, we adjust bandwidths readily available from existing bandwidth selection methods developed for error-free data to obtain bandwidths for the new estimators. Extensive simulation studies are carried out to compare the proposed estimators with naive estimators that ignore measurement error, which also provide empirical evidence for the effectiveness of the proposed bandwidth selection methods. A real-life data example is used to illustrate implementation of these methods under practical scenarios. An R package, lpme, is developed for implementing all considered methods, which we demonstrate via an R code example in Appendix B.2.




de

On the distribution, model selection properties and uniqueness of the Lasso estimator in low and high dimensions

Karl Ewald, Ulrike Schneider.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 944--969.

Abstract:
We derive expressions for the finite-sample distribution of the Lasso estimator in the context of a linear regression model in low as well as in high dimensions by exploiting the structure of the optimization problem defining the estimator. In low dimensions, we assume full rank of the regressor matrix and present expressions for the cumulative distribution function as well as the densities of the absolutely continuous parts of the estimator. Our results are presented for the case of normally distributed errors, but do not hinge on this assumption and can easily be generalized. Additionally, we establish an explicit formula for the correspondence between the Lasso and the least-squares estimator. We derive analogous results for the distribution in less explicit form in high dimensions where we make no assumptions on the regressor matrix at all. In this setting, we also investigate the model selection properties of the Lasso and show that possibly only a subset of models might be selected by the estimator, completely independently of the observed response vector. Finally, we present a condition for uniqueness of the estimator that is necessary as well as sufficient.




de

Reduction problems and deformation approaches to nonstationary covariance functions over spheres

Emilio Porcu, Rachid Senoussi, Enner Mendoza, Moreno Bevilacqua.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 890--916.

Abstract:
The paper considers reduction problems and deformation approaches for nonstationary covariance functions on the $(d-1)$-dimensional spheres, $mathbb{S}^{d-1}$, embedded in the $d$-dimensional Euclidean space. Given a covariance function $C$ on $mathbb{S}^{d-1}$, we chase a pair $(R,Psi)$, for a function $R:[-1,+1] o mathbb{R}$ and a smooth bijection $Psi$, such that $C$ can be reduced to a geodesically isotropic one: $C(mathbf{x},mathbf{y})=R(langle Psi (mathbf{x}),Psi (mathbf{y}) angle )$, with $langle cdot ,cdot angle $ denoting the dot product. The problem finds motivation in recent statistical literature devoted to the analysis of global phenomena, defined typically over the sphere of $mathbb{R}^{3}$. The application domains considered in the manuscript makes the problem mathematically challenging. We show the uniqueness of the representation in the reduction problem. Then, under some regularity assumptions, we provide an inversion formula to recover the bijection $Psi$, when it exists, for a given $C$. We also give sufficient conditions for reducibility.




de

Estimation of a semiparametric transformation model: A novel approach based on least squares minimization

Benjamin Colling, Ingrid Van Keilegom.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 769--800.

Abstract:
Consider the following semiparametric transformation model $Lambda_{ heta }(Y)=m(X)+varepsilon $, where $X$ is a $d$-dimensional covariate, $Y$ is a univariate response variable and $varepsilon $ is an error term with zero mean and independent of $X$. We assume that $m$ is an unknown regression function and that ${Lambda _{ heta }: heta inTheta }$ is a parametric family of strictly increasing functions. Our goal is to develop two new estimators of the transformation parameter $ heta $. The main idea of these two estimators is to minimize, with respect to $ heta $, the $L_{2}$-distance between the transformation $Lambda _{ heta }$ and one of its fully nonparametric estimators. We consider in particular the nonparametric estimator based on the least-absolute deviation loss constructed in Colling and Van Keilegom (2019). We establish the consistency and the asymptotic normality of the two proposed estimators of $ heta $. We also carry out a simulation study to illustrate and compare the performance of our new parametric estimators to that of the profile likelihood estimator constructed in Linton et al. (2008).




de

Detection of sparse positive dependence

Ery Arias-Castro, Rong Huang, Nicolas Verzelen.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 702--730.

Abstract:
In a bivariate setting, we consider the problem of detecting a sparse contamination or mixture component, where the effect manifests itself as a positive dependence between the variables, which are otherwise independent in the main component. We first look at this problem in the context of a normal mixture model. In essence, the situation reduces to a univariate setting where the effect is a decrease in variance. In particular, a higher criticism test based on the pairwise differences is shown to achieve the detection boundary defined by the (oracle) likelihood ratio test. We then turn to a Gaussian copula model where the marginal distributions are unknown. Standard invariance considerations lead us to consider rank tests. In fact, a higher criticism test based on the pairwise rank differences achieves the detection boundary in the normal mixture model, although not in the very sparse regime. We do not know of any rank test that has any power in that regime.




de

A Model of Fake Data in Data-driven Analysis

Data-driven analysis has been increasingly used in various decision making processes. With more sources, including reviews, news, and pictures, can now be used for data analysis, the authenticity of data sources is in doubt. While previous literature attempted to detect fake data piece by piece, in the current work, we try to capture the fake data sender's strategic behavior to detect the fake data source. Specifically, we model the tension between a data receiver who makes data-driven decisions and a fake data sender who benefits from misleading the receiver. We propose a potentially infinite horizon continuous time game-theoretic model with asymmetric information to capture the fact that the receiver does not initially know the existence of fake data and learns about it during the course of the game. We use point processes to model the data traffic, where each piece of data can occur at any discrete moment in a continuous time flow. We fully solve the model and employ numerical examples to illustrate the players' strategies and payoffs for insights. Specifically, our results show that maintaining some suspicion about the data sources and understanding that the sender can be strategic are very helpful to the data receiver. In addition, based on our model, we propose a methodology of detecting fake data that is complementary to the previous studies on this topic, which suggested various approaches on analyzing the data piece by piece. We show that after analyzing each piece of data, understanding a source by looking at the its whole history of pushing data can be helpful.




de

Universal Latent Space Model Fitting for Large Networks with Edge Covariates

Latent space models are effective tools for statistical modeling and visualization of network data. Due to their close connection to generalized linear models, it is also natural to incorporate covariate information in them. The current paper presents two universal fitting algorithms for networks with edge covariates: one based on nuclear norm penalization and the other based on projected gradient descent. Both algorithms are motivated by maximizing the likelihood function for an existing class of inner-product models, and we establish their statistical rates of convergence for these models. In addition, the theory informs us that both methods work simultaneously for a wide range of different latent space models that allow latent positions to affect edge formation in flexible ways, such as distance models. Furthermore, the effectiveness of the methods is demonstrated on a number of real world network data sets for different statistical tasks, including community detection with and without edge covariates, and network assisted learning.




de

DESlib: A Dynamic ensemble selection library in Python

DESlib is an open-source python library providing the implementation of several dynamic selection techniques. The library is divided into three modules: (i) dcs, containing the implementation of dynamic classifier selection methods (DCS); (ii) des, containing the implementation of dynamic ensemble selection methods (DES); (iii) static, with the implementation of static ensemble techniques. The library is fully documented (documentation available online on Read the Docs), has a high test coverage (codecov.io) and is part of the scikit-learn-contrib supported projects. Documentation, code and examples can be found on its GitHub page: https://github.com/scikit-learn-contrib/DESlib.




de

Weighted Message Passing and Minimum Energy Flow for Heterogeneous Stochastic Block Models with Side Information

We study the misclassification error for community detection in general heterogeneous stochastic block models (SBM) with noisy or partial label information. We establish a connection between the misclassification rate and the notion of minimum energy on the local neighborhood of the SBM. We develop an optimally weighted message passing algorithm to reconstruct labels for SBM based on the minimum energy flow and the eigenvectors of a certain Markov transition matrix. The general SBM considered in this paper allows for unequal-size communities, degree heterogeneity, and different connection probabilities among blocks. We focus on how to optimally weigh the message passing to improve misclassification.




de

High-Dimensional Interactions Detection with Sparse Principal Hessian Matrix

In statistical learning framework with regressions, interactions are the contributions to the response variable from the products of the explanatory variables. In high-dimensional problems, detecting interactions is challenging due to combinatorial complexity and limited data information. We consider detecting interactions by exploring their connections with the principal Hessian matrix. Specifically, we propose a one-step synthetic approach for estimating the principal Hessian matrix by a penalized M-estimator. An alternating direction method of multipliers (ADMM) is proposed to efficiently solve the encountered regularized optimization problem. Based on the sparse estimator, we detect the interactions by identifying its nonzero components. Our method directly targets at the interactions, and it requires no structural assumption on the hierarchy of the interactions effects. We show that our estimator is theoretically valid, computationally efficient, and practically useful for detecting the interactions in a broad spectrum of scenarios.