or

Optimal prediction in the linearly transformed spiked model

Edgar Dobriban, William Leeb, Amit Singer.

Source: The Annals of Statistics, Volume 48, Number 1, 491--513.

Abstract:
We consider the linearly transformed spiked model , where the observations $Y_{i}$ are noisy linear transforms of unobserved signals of interest $X_{i}$: egin{equation*}Y_{i}=A_{i}X_{i}+varepsilon_{i},end{equation*} for $i=1,ldots ,n$. The transform matrices $A_{i}$ are also observed. We model the unobserved signals (or regression coefficients) $X_{i}$ as vectors lying on an unknown low-dimensional space. Given only $Y_{i}$ and $A_{i}$ how should we predict or recover their values? The naive approach of performing regression for each observation separately is inaccurate due to the large noise level. Instead, we develop optimal methods for predicting $X_{i}$ by “borrowing strength” across the different samples. Our linear empirical Bayes methods scale to large datasets and rely on weak moment assumptions. We show that this model has wide-ranging applications in signal processing, deconvolution, cryo-electron microscopy, and missing data with noise. For missing data, we show in simulations that our methods are more robust to noise and to unequal sampling than well-known matrix completion methods.




or

Uniformly valid confidence intervals post-model-selection

François Bachoc, David Preinerstorfer, Lukas Steinberger.

Source: The Annals of Statistics, Volume 48, Number 1, 440--463.

Abstract:
We suggest general methods to construct asymptotically uniformly valid confidence intervals post-model-selection. The constructions are based on principles recently proposed by Berk et al. ( Ann. Statist. 41 (2013) 802–837). In particular, the candidate models used can be misspecified, the target of inference is model-specific, and coverage is guaranteed for any data-driven model selection procedure. After developing a general theory, we apply our methods to practically important situations where the candidate set of models, from which a working model is selected, consists of fixed design homoskedastic or heteroskedastic linear models, or of binary regression models with general link functions. In an extensive simulation study, we find that the proposed confidence intervals perform remarkably well, even when compared to existing methods that are tailored only for specific model selection procedures.




or

Concentration and consistency results for canonical and curved exponential-family models of random graphs

Michael Schweinberger, Jonathan Stewart.

Source: The Annals of Statistics, Volume 48, Number 1, 374--396.

Abstract:
Statistical inference for exponential-family models of random graphs with dependent edges is challenging. We stress the importance of additional structure and show that additional structure facilitates statistical inference. A simple example of a random graph with additional structure is a random graph with neighborhoods and local dependence within neighborhoods. We develop the first concentration and consistency results for maximum likelihood and $M$-estimators of a wide range of canonical and curved exponential-family models of random graphs with local dependence. All results are nonasymptotic and applicable to random graphs with finite populations of nodes, although asymptotic consistency results can be obtained as well. In addition, we show that additional structure can facilitate subgraph-to-graph estimation, and present concentration results for subgraph-to-graph estimators. As an application, we consider popular curved exponential-family models of random graphs, with local dependence induced by transitivity and parameter vectors whose dimensions depend on the number of nodes.




or

Testing for principal component directions under weak identifiability

Davy Paindaveine, Julien Remy, Thomas Verdebout.

Source: The Annals of Statistics, Volume 48, Number 1, 324--345.

Abstract:
We consider the problem of testing, on the basis of a $p$-variate Gaussian random sample, the null hypothesis $mathcal{H}_{0}:oldsymbol{ heta}_{1}=oldsymbol{ heta}_{1}^{0}$ against the alternative $mathcal{H}_{1}:oldsymbol{ heta}_{1} eq oldsymbol{ heta}_{1}^{0}$, where $oldsymbol{ heta}_{1}$ is the “first” eigenvector of the underlying covariance matrix and $oldsymbol{ heta}_{1}^{0}$ is a fixed unit $p$-vector. In the classical setup where eigenvalues $lambda_{1}>lambda_{2}geq cdots geq lambda_{p}$ are fixed, the Anderson ( Ann. Math. Stat. 34 (1963) 122–148) likelihood ratio test (LRT) and the Hallin, Paindaveine and Verdebout ( Ann. Statist. 38 (2010) 3245–3299) Le Cam optimal test for this problem are asymptotically equivalent under the null hypothesis, hence also under sequences of contiguous alternatives. We show that this equivalence does not survive asymptotic scenarios where $lambda_{n1}/lambda_{n2}=1+O(r_{n})$ with $r_{n}=O(1/sqrt{n})$. For such scenarios, the Le Cam optimal test still asymptotically meets the nominal level constraint, whereas the LRT severely overrejects the null hypothesis. Consequently, the former test should be favored over the latter one whenever the two largest sample eigenvalues are close to each other. By relying on the Le Cam’s asymptotic theory of statistical experiments, we study the non-null and optimality properties of the Le Cam optimal test in the aforementioned asymptotic scenarios and show that the null robustness of this test is not obtained at the expense of power. Our asymptotic investigation is extensive in the sense that it allows $r_{n}$ to converge to zero at an arbitrary rate. While we restrict to single-spiked spectra of the form $lambda_{n1}>lambda_{n2}=cdots =lambda_{np}$ to make our results as striking as possible, we extend our results to the more general elliptical case. Finally, we present an illustrative real data example.




or

Sparse high-dimensional regression: Exact scalable algorithms and phase transitions

Dimitris Bertsimas, Bart Van Parys.

Source: The Annals of Statistics, Volume 48, Number 1, 300--323.

Abstract:
We present a novel binary convex reformulation of the sparse regression problem that constitutes a new duality perspective. We devise a new cutting plane method and provide evidence that it can solve to provable optimality the sparse regression problem for sample sizes $n$ and number of regressors $p$ in the 100,000s, that is, two orders of magnitude better than the current state of the art, in seconds. The ability to solve the problem for very high dimensions allows us to observe new phase transition phenomena. Contrary to traditional complexity theory which suggests that the difficulty of a problem increases with problem size, the sparse regression problem has the property that as the number of samples $n$ increases the problem becomes easier in that the solution recovers 100% of the true signal, and our approach solves the problem extremely fast (in fact faster than Lasso), while for small number of samples $n$, our approach takes a larger amount of time to solve the problem, but importantly the optimal solution provides a statistically more relevant regressor. We argue that our exact sparse regression approach presents a superior alternative over heuristic methods available at present.




or

Bootstrap confidence regions based on M-estimators under nonstandard conditions

Stephen M. S. Lee, Puyudi Yang.

Source: The Annals of Statistics, Volume 48, Number 1, 274--299.

Abstract:
Suppose that a confidence region is desired for a subvector $ heta $ of a multidimensional parameter $xi =( heta ,psi )$, based on an M-estimator $hat{xi }_{n}=(hat{ heta }_{n},hat{psi }_{n})$ calculated from a random sample of size $n$. Under nonstandard conditions $hat{xi }_{n}$ often converges at a nonregular rate $r_{n}$, in which case consistent estimation of the distribution of $r_{n}(hat{ heta }_{n}- heta )$, a pivot commonly chosen for confidence region construction, is most conveniently effected by the $m$ out of $n$ bootstrap. The above choice of pivot has three drawbacks: (i) the shape of the region is either subjectively prescribed or controlled by a computationally intensive depth function; (ii) the region is not transformation equivariant; (iii) $hat{xi }_{n}$ may not be uniquely defined. To resolve the above difficulties, we propose a one-dimensional pivot derived from the criterion function, and prove that its distribution can be consistently estimated by the $m$ out of $n$ bootstrap, or by a modified version of the perturbation bootstrap. This leads to a new method for constructing confidence regions which are transformation equivariant and have shapes driven solely by the criterion function. A subsampling procedure is proposed for selecting $m$ in practice. Empirical performance of the new method is illustrated with examples drawn from different nonstandard M-estimation settings. Extension of our theory to row-wise independent triangular arrays is also explored.




or

Statistical inference for model parameters in stochastic gradient descent

Xi Chen, Jason D. Lee, Xin T. Tong, Yichen Zhang.

Source: The Annals of Statistics, Volume 48, Number 1, 251--273.

Abstract:
The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing works focus on the convergence of the objective function or the error of the obtained solution, we investigate the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain smoothness conditions. Our main contributions are twofold. First, in the fixed dimension setup, we propose two consistent estimators of the asymptotic covariance of the average iterate from SGD: (1) a plug-in estimator, and (2) a batch-means estimator, which is computationally more efficient and only uses the iterates from SGD. Both proposed estimators allow us to construct asymptotically exact confidence intervals and hypothesis tests. Second, for high-dimensional linear regression, using a variant of the SGD algorithm, we construct a debiased estimator of each regression coefficient that is asymptotically normal. This gives a one-pass algorithm for computing both the sparse regression coefficients and confidence intervals, which is computationally attractive and applicable to online data.




or

Spectral and matrix factorization methods for consistent community detection in multi-layer networks

Subhadeep Paul, Yuguo Chen.

Source: The Annals of Statistics, Volume 48, Number 1, 230--250.

Abstract:
We consider the problem of estimating a consensus community structure by combining information from multiple layers of a multi-layer network using methods based on the spectral clustering or a low-rank matrix factorization. As a general theme, these “intermediate fusion” methods involve obtaining a low column rank matrix by optimizing an objective function and then using the columns of the matrix for clustering. However, the theoretical properties of these methods remain largely unexplored. In the absence of statistical guarantees on the objective functions, it is difficult to determine if the algorithms optimizing the objectives will return good community structures. We investigate the consistency properties of the global optimizer of some of these objective functions under the multi-layer stochastic blockmodel. For this purpose, we derive several new asymptotic results showing consistency of the intermediate fusion techniques along with the spectral clustering of mean adjacency matrix under a high dimensional setup, where the number of nodes, the number of layers and the number of communities of the multi-layer graph grow. Our numerical study shows that the intermediate fusion techniques outperform late fusion methods, namely spectral clustering on aggregate spectral kernel and module allegiance matrix in sparse networks, while they outperform the spectral clustering of mean adjacency matrix in multi-layer networks that contain layers with both homophilic and heterophilic communities.




or

Optimal rates for community estimation in the weighted stochastic block model

Min Xu, Varun Jog, Po-Ling Loh.

Source: The Annals of Statistics, Volume 48, Number 1, 183--204.

Abstract:
Community identification in a network is an important problem in fields such as social science, neuroscience and genetics. Over the past decade, stochastic block models (SBMs) have emerged as a popular statistical framework for this problem. However, SBMs have an important limitation in that they are suited only for networks with unweighted edges; in various scientific applications, disregarding the edge weights may result in a loss of valuable information. We study a weighted generalization of the SBM, in which observations are collected in the form of a weighted adjacency matrix and the weight of each edge is generated independently from an unknown probability density determined by the community membership of its endpoints. We characterize the optimal rate of misclustering error of the weighted SBM in terms of the Renyi divergence of order 1/2 between the weight distributions of within-community and between-community edges, substantially generalizing existing results for unweighted SBMs. Furthermore, we present a computationally tractable algorithm based on discretization that achieves the optimal error rate. Our method is adaptive in the sense that the algorithm, without assuming knowledge of the weight densities, performs as well as the best algorithm that knows the weight densities.




or

New $G$-formula for the sequential causal effect and blip effect of treatment in sequential causal inference

Xiaoqin Wang, Li Yin.

Source: The Annals of Statistics, Volume 48, Number 1, 138--160.

Abstract:
In sequential causal inference, two types of causal effects are of practical interest, namely, the causal effect of the treatment regime (called the sequential causal effect) and the blip effect of treatment on the potential outcome after the last treatment. The well-known $G$-formula expresses these causal effects in terms of the standard parameters. In this article, we obtain a new $G$-formula that expresses these causal effects in terms of the point observable effects of treatments similar to treatment in the framework of single-point causal inference. Based on the new $G$-formula, we estimate these causal effects by maximum likelihood via point observable effects with methods extended from single-point causal inference. We are able to increase precision of the estimation without introducing biases by an unsaturated model imposing constraints on the point observable effects. We are also able to reduce the number of point observable effects in the estimation by treatment assignment conditions.




or

Model assisted variable clustering: Minimax-optimal recovery and algorithms

Florentina Bunea, Christophe Giraud, Xi Luo, Martin Royer, Nicolas Verzelen.

Source: The Annals of Statistics, Volume 48, Number 1, 111--137.

Abstract:
The problem of variable clustering is that of estimating groups of similar components of a $p$-dimensional vector $X=(X_{1},ldots ,X_{p})$ from $n$ independent copies of $X$. There exists a large number of algorithms that return data-dependent groups of variables, but their interpretation is limited to the algorithm that produced them. An alternative is model-based clustering, in which one begins by defining population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of $G$-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a $G$-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the model-defined clusters exactly, and show that they are different for the two metrics. We therefore develop two algorithms, COD and PECOK, tailored to $G$-block covariance models, and study their minimax-optimality with respect to each metric. Of independent interest is the fact that the analysis of the PECOK algorithm, which is based on a corrected convex relaxation of the popular $K$-means algorithm, provides the first statistical analysis of such algorithms for variable clustering. Additionally, we compare our methods with another popular clustering method, spectral clustering. Extensive simulation studies, as well as our data analyses, confirm the applicability of our approach.




or

Robust sparse covariance estimation by thresholding Tyler’s M-estimator

John Goes, Gilad Lerman, Boaz Nadler.

Source: The Annals of Statistics, Volume 48, Number 1, 86--110.

Abstract:
Estimating a high-dimensional sparse covariance matrix from a limited number of samples is a fundamental task in contemporary data analysis. Most proposals to date, however, are not robust to outliers or heavy tails. Toward bridging this gap, in this work we consider estimating a sparse shape matrix from $n$ samples following a possibly heavy-tailed elliptical distribution. We propose estimators based on thresholding either Tyler’s M-estimator or its regularized variant. We prove that in the joint limit as the dimension $p$ and the sample size $n$ tend to infinity with $p/n ogamma>0$, our estimators are minimax rate optimal. Results on simulated data support our theoretical analysis.




or

Rerandomization in $2^{K}$ factorial experiments

Xinran Li, Peng Ding, Donald B. Rubin.

Source: The Annals of Statistics, Volume 48, Number 1, 43--63.

Abstract:
With many pretreatment covariates and treatment factors, the classical factorial experiment often fails to balance covariates across multiple factorial effects simultaneously. Therefore, it is intuitive to restrict the randomization of the treatment factors to satisfy certain covariate balance criteria, possibly conforming to the tiers of factorial effects and covariates based on their relative importances. This is rerandomization in factorial experiments. We study the asymptotic properties of this experimental design under the randomization inference framework without imposing any distributional or modeling assumptions of the covariates and outcomes. We derive the joint asymptotic sampling distribution of the usual estimators of the factorial effects, and show that it is symmetric, unimodal and more “concentrated” at the true factorial effects under rerandomization than under the classical factorial experiment. We quantify this advantage of rerandomization using the notions of “central convex unimodality” and “peakedness” of the joint asymptotic sampling distribution. We also construct conservative large-sample confidence sets for the factorial effects.




or

The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression

Emmanuel J. Candès, Pragya Sur.

Source: The Annals of Statistics, Volume 48, Number 1, 27--42.

Abstract:
This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp “phase transition.” We introduce an explicit boundary curve $h_{mathrm{MLE}}$, parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients, with the following property: in the limit of large sample sizes $n$ and number of features $p$ proportioned in such a way that $p/n ightarrow kappa $, we show that if the problem is sufficiently high dimensional in the sense that $kappa >h_{mathrm{MLE}}$, then the MLE does not exist with probability one. Conversely, if $kappa <h_{mathrm{MLE}}$, the MLE asymptotically exists with probability one.




or

Tracy–Widom limit for Kendall’s tau

Zhigang Bao.

Source: The Annals of Statistics, Volume 47, Number 6, 3504--3532.

Abstract:
In this paper, we study a high-dimensional random matrix model from nonparametric statistics called the Kendall rank correlation matrix, which is a natural multivariate extension of the Kendall rank correlation coefficient. We establish the Tracy–Widom law for its largest eigenvalue. It is the first Tracy–Widom law for a nonparametric random matrix model, and also the first Tracy–Widom law for a high-dimensional U-statistic.




or

Bootstrapping and sample splitting for high-dimensional, assumption-lean inference

Alessandro Rinaldo, Larry Wasserman, Max G’Sell.

Source: The Annals of Statistics, Volume 47, Number 6, 3438--3469.

Abstract:
Several new methods have been recently proposed for performing valid inference after model selection. An older method is sample splitting: use part of the data for model selection and the rest for inference. In this paper, we revisit sample splitting combined with the bootstrap (or the Normal approximation). We show that this leads to a simple, assumption-lean approach to inference and we establish results on the accuracy of the method. In fact, we find new bounds on the accuracy of the bootstrap and the Normal approximation for general nonlinear parameters with increasing dimension which we then use to assess the accuracy of regression inference. We define new parameters that measure variable importance and that can be inferred with greater accuracy than the usual regression coefficients. Finally, we elucidate an inference-prediction trade-off: splitting increases the accuracy and robustness of inference but can decrease the accuracy of the predictions.




or

Minimax posterior convergence rates and model selection consistency in high-dimensional DAG models based on sparse Cholesky factors

Kyoungjae Lee, Jaeyong Lee, Lizhen Lin.

Source: The Annals of Statistics, Volume 47, Number 6, 3413--3437.

Abstract:
In this paper we study the high-dimensional sparse directed acyclic graph (DAG) models under the empirical sparse Cholesky prior. Among our results, strong model selection consistency or graph selection consistency is obtained under more general conditions than those in the existing literature. Compared to Cao, Khare and Ghosh [ Ann. Statist. (2019) 47 319–348], the required conditions are weakened in terms of the dimensionality, sparsity and lower bound of the nonzero elements in the Cholesky factor. Furthermore, our result does not require the irrepresentable condition, which is necessary for Lasso-type methods. We also derive the posterior convergence rates for precision matrices and Cholesky factors with respect to various matrix norms. The obtained posterior convergence rates are the fastest among those of the existing Bayesian approaches. In particular, we prove that our posterior convergence rates for Cholesky factors are the minimax or at least nearly minimax depending on the relative size of true sparseness for the entire dimension. The simulation study confirms that the proposed method outperforms the competing methods.




or

On testing for high-dimensional white noise

Zeng Li, Clifford Lam, Jianfeng Yao, Qiwei Yao.

Source: The Annals of Statistics, Volume 47, Number 6, 3382--3412.

Abstract:
Testing for white noise is a classical yet important problem in statistics, especially for diagnostic checks in time series modeling and linear regression. For high-dimensional time series in the sense that the dimension $p$ is large in relation to the sample size $T$, the popular omnibus tests including the multivariate Hosking and Li–McLeod tests are extremely conservative, leading to substantial power loss. To develop more relevant tests for high-dimensional cases, we propose a portmanteau-type test statistic which is the sum of squared singular values of the first $q$ lagged sample autocovariance matrices. It, therefore, encapsulates all the serial correlations (up to the time lag $q$) within and across all component series. Using the tools from random matrix theory and assuming both $p$ and $T$ diverge to infinity, we derive the asymptotic normality of the test statistic under both the null and a specific VMA(1) alternative hypothesis. As the actual implementation of the test requires the knowledge of three characteristic constants of the population cross-sectional covariance matrix and the value of the fourth moment of the standardized innovations, nontrivial estimations are proposed for these parameters and their integration leads to a practically usable test. Extensive simulation confirms the excellent finite-sample performance of the new test with accurate size and satisfactory power for a large range of finite $(p,T)$ combinations, therefore, ensuring wide applicability in practice. In particular, the new tests are consistently superior to the traditional Hosking and Li–McLeod tests.




or

A smeary central limit theorem for manifolds with application to high-dimensional spheres

Benjamin Eltzner, Stephan F. Huckemann.

Source: The Annals of Statistics, Volume 47, Number 6, 3360--3381.

Abstract:
The (CLT) central limit theorems for generalized Fréchet means (data descriptors assuming values in manifolds, such as intrinsic means, geodesics, etc.) on manifolds from the literature are only valid if a certain empirical process of Hessians of the Fréchet function converges suitably, as in the proof of the prototypical BP-CLT [ Ann. Statist. 33 (2005) 1225–1259]. This is not valid in many realistic scenarios and we provide for a new very general CLT. In particular, this includes scenarios where, in a suitable chart, the sample mean fluctuates asymptotically at a scale $n^{alpha }$ with exponents $alpha <1/2$ with a nonnormal distribution. As the BP-CLT yields only fluctuations that are, rescaled with $n^{1/2}$, asymptotically normal, just as the classical CLT for random vectors, these lower rates, somewhat loosely called smeariness, had to date been observed only on the circle. We make the concept of smeariness on manifolds precise, give an example for two-smeariness on spheres of arbitrary dimension, and show that smeariness, although “almost never” occurring, may have serious statistical implications on a continuum of sample scenarios nearby. In fact, this effect increases with dimension, striking in particular in high dimension low sample size scenarios.




or

On optimal designs for nonregular models

Yi Lin, Ryan Martin, Min Yang.

Source: The Annals of Statistics, Volume 47, Number 6, 3335--3359.

Abstract:
Classically, Fisher information is the relevant object in defining optimal experimental designs. However, for models that lack certain regularity, the Fisher information does not exist, and hence, there is no notion of design optimality available in the literature. This article seeks to fill the gap by proposing a so-called Hellinger information , which generalizes Fisher information in the sense that the two measures agree in regular problems, but the former also exists for certain types of nonregular problems. We derive a Hellinger information inequality, showing that Hellinger information defines a lower bound on the local minimax risk of estimators. This provides a connection between features of the underlying model—in particular, the design—and the performance of estimators, motivating the use of this new Hellinger information for nonregular optimal design problems. Hellinger optimal designs are derived for several nonregular regression problems, with numerical results empirically demonstrating the efficiency of these designs compared to alternatives.




or

Sampling and estimation for (sparse) exchangeable graphs

Victor Veitch, Daniel M. Roy.

Source: The Annals of Statistics, Volume 47, Number 6, 3274--3299.

Abstract:
Sparse exchangeable graphs on $mathbb{R}_{+}$, and the associated graphex framework for sparse graphs, generalize exchangeable graphs on $mathbb{N}$, and the associated graphon framework for dense graphs. We develop the graphex framework as a tool for statistical network analysis by identifying the sampling scheme that is naturally associated with the models of the framework, formalizing two natural notions of consistent estimation of the parameter (the graphex) underlying these models, and identifying general consistent estimators in each case. The sampling scheme is a modification of independent vertex sampling that throws away vertices that are isolated in the sampled subgraph. The estimators are variants of the empirical graphon estimator, which is known to be a consistent estimator for the distribution of dense exchangeable graphs; both can be understood as graph analogues to the empirical distribution in the i.i.d. sequence setting. Our results may be viewed as a generalization of consistent estimation via the empirical graphon from the dense graph regime to also include sparse graphs.




or

Quantile regression under memory constraint

Xi Chen, Weidong Liu, Yichen Zhang.

Source: The Annals of Statistics, Volume 47, Number 6, 3244--3273.

Abstract:
This paper studies the inference problem in quantile regression (QR) for a large sample size $n$ but under a limited memory constraint, where the memory can only store a small batch of data of size $m$. A natural method is the naive divide-and-conquer approach, which splits data into batches of size $m$, computes the local QR estimator for each batch and then aggregates the estimators via averaging. However, this method only works when $n=o(m^{2})$ and is computationally expensive. This paper proposes a computationally efficient method, which only requires an initial QR estimator on a small batch of data and then successively refines the estimator via multiple rounds of aggregations. Theoretically, as long as $n$ grows polynomially in $m$, we establish the asymptotic normality for the obtained estimator and show that our estimator with only a few rounds of aggregations achieves the same efficiency as the QR estimator computed on all the data. Moreover, our result allows the case that the dimensionality $p$ goes to infinity. The proposed method can also be applied to address the QR problem under distributed computing environment (e.g., in a large-scale sensor network) or for real-time streaming data.




or

Statistical inference for autoregressive models under heteroscedasticity of unknown form

Ke Zhu.

Source: The Annals of Statistics, Volume 47, Number 6, 3185--3215.

Abstract:
This paper provides an entire inference procedure for the autoregressive model under (conditional) heteroscedasticity of unknown form with a finite variance. We first establish the asymptotic normality of the weighted least absolute deviations estimator (LADE) for the model. Second, we develop the random weighting (RW) method to estimate its asymptotic covariance matrix, leading to the implementation of the Wald test. Third, we construct a portmanteau test for model checking, and use the RW method to obtain its critical values. As a special weighted LADE, the feasible adaptive LADE (ALADE) is proposed and proved to have the same efficiency as its infeasible counterpart. The importance of our entire methodology based on the feasible ALADE is illustrated by simulation results and the real data analysis on three U.S. economic data sets.




or

Sorted concave penalized regression

Long Feng, Cun-Hui Zhang.

Source: The Annals of Statistics, Volume 47, Number 6, 3069--3098.

Abstract:
The Lasso is biased. Concave penalized least squares estimation (PLSE) takes advantage of signal strength to reduce this bias, leading to sharper error bounds in prediction, coefficient estimation and variable selection. For prediction and estimation, the bias of the Lasso can be also reduced by taking a smaller penalty level than what selection consistency requires, but such smaller penalty level depends on the sparsity of the true coefficient vector. The sorted $ell_{1}$ penalized estimation (Slope) was proposed for adaptation to such smaller penalty levels. However, the advantages of concave PLSE and Slope do not subsume each other. We propose sorted concave penalized estimation to combine the advantages of concave and sorted penalizations. We prove that sorted concave penalties adaptively choose the smaller penalty level and at the same time benefits from signal strength, especially when a significant proportion of signals are stronger than the corresponding adaptively selected penalty levels. A local convex approximation for sorted concave penalties, which extends the local linear and quadratic approximations for separable concave penalties, is developed to facilitate the computation of sorted concave PLSE and proven to possess desired prediction and estimation error bounds. Our analysis of prediction and estimation errors requires the restricted eigenvalue condition on the design, not beyond, and provides selection consistency under a required minimum signal strength condition in addition. Thus, our results also sharpens existing results on concave PLSE by removing the upper sparse eigenvalue component of the sparse Riesz condition.




or

Testing for independence of large dimensional vectors

Taras Bodnar, Holger Dette, Nestor Parolya.

Source: The Annals of Statistics, Volume 47, Number 5, 2977--3008.

Abstract:
In this paper, new tests for the independence of two high-dimensional vectors are investigated. We consider the case where the dimension of the vectors increases with the sample size and propose multivariate analysis of variance-type statistics for the hypothesis of a block diagonal covariance matrix. The asymptotic properties of the new test statistics are investigated under the null hypothesis and the alternative hypothesis using random matrix theory. For this purpose, we study the weak convergence of linear spectral statistics of central and (conditionally) noncentral Fisher matrices. In particular, a central limit theorem for linear spectral statistics of large dimensional (conditionally) noncentral Fisher matrices is derived which is then used to analyse the power of the tests under the alternative. The theoretical results are illustrated by means of a simulation study where we also compare the new tests with several alternative, in particular with the commonly used corrected likelihood ratio test. It is demonstrated that the latter test does not keep its nominal level, if the dimension of one sub-vector is relatively small compared to the dimension of the other sub-vector. On the other hand, the tests proposed in this paper provide a reasonable approximation of the nominal level in such situations. Moreover, we observe that one of the proposed tests is most powerful under a variety of correlation scenarios.




or

Inference for the mode of a log-concave density

Charles R. Doss, Jon A. Wellner.

Source: The Annals of Statistics, Volume 47, Number 5, 2950--2976.

Abstract:
We study a likelihood ratio test for the location of the mode of a log-concave density. Our test is based on comparison of the log-likelihoods corresponding to the unconstrained maximum likelihood estimator of a log-concave density and the constrained maximum likelihood estimator where the constraint is that the mode of the density is fixed, say at $m$. The constrained estimation problem is studied in detail in Doss and Wellner (2018). Here, the results of that paper are used to show that, under the null hypothesis (and strict curvature of $-log f$ at the mode), the likelihood ratio statistic is asymptotically pivotal: that is, it converges in distribution to a limiting distribution which is free of nuisance parameters, thus playing the role of the $chi_{1}^{2}$ distribution in classical parametric statistical problems. By inverting this family of tests, we obtain new (likelihood ratio based) confidence intervals for the mode of a log-concave density $f$. These new intervals do not depend on any smoothing parameters. We study the new confidence intervals via Monte Carlo methods and illustrate them with two real data sets. The new intervals seem to have several advantages over existing procedures. Software implementing the test and confidence intervals is available in the R package verb+logcondens.mode+.




or

Projected spline estimation of the nonparametric function in high-dimensional partially linear models for massive data

Heng Lian, Kaifeng Zhao, Shaogao Lv.

Source: The Annals of Statistics, Volume 47, Number 5, 2922--2949.

Abstract:
In this paper, we consider the local asymptotics of the nonparametric function in a partially linear model, within the framework of the divide-and-conquer estimation. Unlike the fixed-dimensional setting in which the parametric part does not affect the nonparametric part, the high-dimensional setting makes the issue more complicated. In particular, when a sparsity-inducing penalty such as lasso is used to make the estimation of the linear part feasible, the bias introduced will propagate to the nonparametric part. We propose a novel approach for estimation of the nonparametric function and establish the local asymptotics of the estimator. The result is useful for massive data with possibly different linear coefficients in each subpopulation but common nonparametric function. Some numerical illustrations are also presented.




or

Test for high-dimensional correlation matrices

Shurong Zheng, Guanghui Cheng, Jianhua Guo, Hongtu Zhu.

Source: The Annals of Statistics, Volume 47, Number 5, 2887--2921.

Abstract:
Testing correlation structures has attracted extensive attention in the literature due to both its importance in real applications and several major theoretical challenges. The aim of this paper is to develop a general framework of testing correlation structures for the one , two and multiple sample testing problems under a high-dimensional setting when both the sample size and data dimension go to infinity. Our test statistics are designed to deal with both the dense and sparse alternatives. We systematically investigate the asymptotic null distribution, power function and unbiasedness of each test statistic. Theoretically, we make great efforts to deal with the nonindependency of all random matrices of the sample correlation matrices. We use simulation studies and real data analysis to illustrate the versatility and practicability of our test statistics.




or

Eigenvalue distributions of variance components estimators in high-dimensional random effects models

Zhou Fan, Iain M. Johnstone.

Source: The Annals of Statistics, Volume 47, Number 5, 2855--2886.

Abstract:
We study the spectra of MANOVA estimators for variance component covariance matrices in multivariate random effects models. When the dimensionality of the observations is large and comparable to the number of realizations of each random effect, we show that the empirical spectra of such estimators are well approximated by deterministic laws. The Stieltjes transforms of these laws are characterized by systems of fixed-point equations, which are numerically solvable by a simple iterative procedure. Our proof uses operator-valued free probability theory, and we establish a general asymptotic freeness result for families of rectangular orthogonally invariant random matrices, which is of independent interest. Our work is motivated in part by the estimation of components of covariance between multiple phenotypic traits in quantitative genetics, and we specialize our results to common experimental designs that arise in this application.




or

Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model

Aryeh Kontorovich, Iosif Pinelis.

Source: The Annals of Statistics, Volume 47, Number 5, 2822--2854.

Abstract:
We provide an exact nonasymptotic lower bound on the minimax expected excess risk (EER) in the agnostic probably-approximately-correct (PAC) machine learning classification model and identify minimax learning algorithms as certain maximally symmetric and minimally randomized “voting” procedures. Based on this result, an exact asymptotic lower bound on the minimax EER is provided. This bound is of the simple form $c_{infty}/sqrt{ u}$ as $ u oinfty$, where $c_{infty}=0.16997dots$ is a universal constant, $ u=m/d$, $m$ is the size of the training sample and $d$ is the Vapnik–Chervonenkis dimension of the hypothesis class. It is shown that the differences between these asymptotic and nonasymptotic bounds, as well as the differences between these two bounds and the maximum EER of any learning algorithms that minimize the empirical risk, are asymptotically negligible, and all these differences are due to ties in the mentioned “voting” procedures. A few easy to compute nonasymptotic lower bounds on the minimax EER are also obtained, which are shown to be close to the exact asymptotic lower bound $c_{infty}/sqrt{ u}$ even for rather small values of the ratio $ u=m/d$. As an application of these results, we substantially improve existing lower bounds on the tail probability of the excess risk. Among the tools used are Bayes estimation and apparently new identities and inequalities for binomial distributions.




or

A unified treatment of multiple testing with prior knowledge using the p-filter

Aaditya K. Ramdas, Rina F. Barber, Martin J. Wainwright, Michael I. Jordan.

Source: The Annals of Statistics, Volume 47, Number 5, 2790--2821.

Abstract:
There is a significant literature on methods for incorporating knowledge into multiple testing procedures so as to improve their power and precision. Some common forms of prior knowledge include (a) beliefs about which hypotheses are null, modeled by nonuniform prior weights; (b) differing importances of hypotheses, modeled by differing penalties for false discoveries; (c) multiple arbitrary partitions of the hypotheses into (possibly overlapping) groups and (d) knowledge of independence, positive or arbitrary dependence between hypotheses or groups, suggesting the use of more aggressive or conservative procedures. We present a unified algorithmic framework called p-filter for global null testing and false discovery rate (FDR) control that allows the scientist to incorporate all four types of prior knowledge (a)–(d) simultaneously, recovering a variety of known algorithms as special cases.




or

Distance multivariance: New dependence measures for random vectors

Björn Böttcher, Martin Keller-Ressel, René L. Schilling.

Source: The Annals of Statistics, Volume 47, Number 5, 2757--2789.

Abstract:
We introduce two new measures for the dependence of $nge2$ random variables: distance multivariance and total distance multivariance . Both measures are based on the weighted $L^{2}$-distance of quantities related to the characteristic functions of the underlying random variables. These extend distance covariance (introduced by Székely, Rizzo and Bakirov) from pairs of random variables to $n$-tuplets of random variables. We show that total distance multivariance can be used to detect the independence of $n$ random variables and has a simple finite-sample representation in terms of distance matrices of the sample points, where distance is measured by a continuous negative definite function. Under some mild moment conditions, this leads to a test for independence of multiple random vectors which is consistent against all alternatives.




or

Phase transition in the spiked random tensor with Rademacher prior

Wei-Kuo Chen.

Source: The Annals of Statistics, Volume 47, Number 5, 2734--2756.

Abstract:
We consider the problem of detecting a deformation from a symmetric Gaussian random $p$-tensor $(pgeq3)$ with a rank-one spike sampled from the Rademacher prior. Recently, in Lesieur et al. (Barbier, Krzakala, Macris, Miolane and Zdeborová (2017)), it was proved that there exists a critical threshold $eta_{p}$ so that when the signal-to-noise ratio exceeds $eta_{p}$, one can distinguish the spiked and unspiked tensors and weakly recover the prior via the minimal mean-square-error method. On the other side, Perry, Wein and Bandeira (Perry, Wein and Bandeira (2017)) proved that there exists a $eta_{p}'<eta_{p}$ such that any statistical hypothesis test cannot distinguish these two tensors, in the sense that their total variation distance asymptotically vanishes, when the signa-to-noise ratio is less than $eta_{p}'$. In this work, we show that $eta_{p}$ is indeed the critical threshold that strictly separates the distinguishability and indistinguishability between the two tensors under the total variation distance. Our approach is based on a subtle analysis of the high temperature behavior of the pure $p$-spin model with Ising spin, arising initially from the field of spin glasses. In particular, we identify the signal-to-noise criticality $eta_{p}$ as the critical temperature, distinguishing the high and low temperature behavior, of the Ising pure $p$-spin mean-field spin glass model.




or

An operator theoretic approach to nonparametric mixture models

Robert A. Vandermeulen, Clayton D. Scott.

Source: The Annals of Statistics, Volume 47, Number 5, 2704--2733.

Abstract:
When estimating finite mixture models, it is common to make assumptions on the mixture components, such as parametric assumptions. In this work, we make no distributional assumptions on the mixture components and instead assume that observations from the mixture model are grouped, such that observations in the same group are known to be drawn from the same mixture component. We precisely characterize the number of observations $n$ per group needed for the mixture model to be identifiable, as a function of the number $m$ of mixture components. In addition to our assumption-free analysis, we also study the settings where the mixture components are either linearly independent or jointly irreducible. Furthermore, our analysis considers two kinds of identifiability, where the mixture model is the simplest one explaining the data, and where it is the only one. As an application of these results, we precisely characterize identifiability of multinomial mixture models. Our analysis relies on an operator-theoretic framework that associates mixture models in the grouped-sample setting with certain infinite-dimensional tensors. Based on this framework, we introduce a general spectral algorithm for recovering the mixture components.




or

Linear hypothesis testing for high dimensional generalized linear models

Chengchun Shi, Rui Song, Zhao Chen, Runze Li.

Source: The Annals of Statistics, Volume 47, Number 5, 2671--2703.

Abstract:
This paper is concerned with testing linear hypotheses in high dimensional generalized linear models. To deal with linear hypotheses, we first propose the constrained partial regularization method and study its statistical properties. We further introduce an algorithm for solving regularization problems with folded-concave penalty functions and linear constraints. To test linear hypotheses, we propose a partial penalized likelihood ratio test, a partial penalized score test and a partial penalized Wald test. We show that the limiting null distributions of these three test statistics are $chi^{2}$ distribution with the same degrees of freedom, and under local alternatives, they asymptotically follow noncentral $chi^{2}$ distributions with the same degrees of freedom and noncentral parameter, provided the number of parameters involved in the test hypothesis grows to $infty$ at a certain rate. Simulation studies are conducted to examine the finite sample performance of the proposed tests. Empirical analysis of a real data example is used to illustrate the proposed testing procedures.




or

Semiparametrically point-optimal hybrid rank tests for unit roots

Bo Zhou, Ramon van den Akker, Bas J. M. Werker.

Source: The Annals of Statistics, Volume 47, Number 5, 2601--2638.

Abstract:
We propose a new class of unit root tests that exploits invariance properties in the Locally Asymptotically Brownian Functional limit experiment associated to the unit root model. The invariance structures naturally suggest tests that are based on the ranks of the increments of the observations, their average and an assumed reference density for the innovations. The tests are semiparametric in the sense that they are valid, that is, have the correct (asymptotic) size, irrespective of the true innovation density. For a correctly specified reference density, our test is point-optimal and nearly efficient. For arbitrary reference densities, we establish a Chernoff–Savage-type result, that is, our test performs as well as commonly used tests under Gaussian innovations but has improved power under other, for example, fat-tailed or skewed, innovation distributions. To avoid nonparametric estimation, we propose a simplified version of our test that exhibits the same asymptotic properties, except for the Chernoff–Savage result that we are only able to demonstrate by means of simulations.




or

Semi-supervised inference: General theory and estimation of means

Anru Zhang, Lawrence D. Brown, T. Tony Cai.

Source: The Annals of Statistics, Volume 47, Number 5, 2538--2566.

Abstract:
We propose a general semi-supervised inference framework focused on the estimation of the population mean. As usual in semi-supervised settings, there exists an unlabeled sample of covariate vectors and a labeled sample consisting of covariate vectors along with real-valued responses (“labels”). Otherwise, the formulation is “assumption-lean” in that no major conditions are imposed on the statistical or functional form of the data. We consider both the ideal semi-supervised setting where infinitely many unlabeled samples are available, as well as the ordinary semi-supervised setting in which only a finite number of unlabeled samples is available. Estimators are proposed along with corresponding confidence intervals for the population mean. Theoretical analysis on both the asymptotic distribution and $ell_{2}$-risk for the proposed procedures are given. Surprisingly, the proposed estimators, based on a simple form of the least squares method, outperform the ordinary sample mean. The simple, transparent form of the estimator lends confidence to the perception that its asymptotic improvement over the ordinary sample mean also nearly holds even for moderate size samples. The method is further extended to a nonparametric setting, in which the oracle rate can be achieved asymptotically. The proposed estimators are further illustrated by simulation studies and a real data example involving estimation of the homeless population.




or

A knockoff filter for high-dimensional selective inference

Rina Foygel Barber, Emmanuel J. Candès.

Source: The Annals of Statistics, Volume 47, Number 5, 2504--2537.

Abstract:
This paper develops a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the number of observational units. In this framework, the observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables, whereas the second is used for inference over this reduced set of variables; we also develop strategies for leveraging information from the first part of the data at the inference step for greater power. In our work, the inferential step is carried out by applying the recently introduced knockoff filter, which creates a knockoff copy—a fake variable serving as a control—for each screened variable. We prove that this procedure controls the directional false discovery rate (FDR) in the reduced model controlling for all screened variables; this says that our high-dimensional knockoff procedure “discovers” important variables as well as the directions (signs) of their effects, in such a way that the expected proportion of wrongly chosen signs is below the user-specified level (thereby controlling a notion of Type S error averaged over the selected set). This result is nonasymptotic, and holds for any distribution of the original features and any values of the unknown regression coefficients, so that inference is not calibrated under hypothesized values of the effect sizes. We demonstrate the performance of our general and flexible approach through numerical studies, showing more power than existing alternatives. Finally, we apply our method to a genome-wide association study to find locations on the genome that are possibly associated with a continuous phenotype.




or

The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics

Joshua Cape, Minh Tang, Carey E. Priebe.

Source: The Annals of Statistics, Volume 47, Number 5, 2405--2439.

Abstract:
The singular value matrix decomposition plays a ubiquitous role throughout statistics and related fields. Myriad applications including clustering, classification, and dimensionality reduction involve studying and exploiting the geometric structure of singular values and singular vectors. This paper provides a novel collection of technical and theoretical tools for studying the geometry of singular subspaces using the two-to-infinity norm. Motivated by preliminary deterministic Procrustes analysis, we consider a general matrix perturbation setting in which we derive a new Procrustean matrix decomposition. Together with flexible machinery developed for the two-to-infinity norm, this allows us to conduct a refined analysis of the induced perturbation geometry with respect to the underlying singular vectors even in the presence of singular value multiplicity. Our analysis yields singular vector entrywise perturbation bounds for a range of popular matrix noise models, each of which has a meaningful associated statistical inference task. In addition, we demonstrate how the two-to-infinity norm is the preferred norm in certain statistical settings. Specific applications discussed in this paper include covariance estimation, singular subspace recovery, and multiple graph inference. Both our Procrustean matrix decomposition and the technical machinery developed for the two-to-infinity norm may be of independent interest.




or

Cross validation for locally stationary processes

Stefan Richter, Rainer Dahlhaus.

Source: The Annals of Statistics, Volume 47, Number 4, 2145--2173.

Abstract:
We propose an adaptive bandwidth selector via cross validation for local M-estimators in locally stationary processes. We prove asymptotic optimality of the procedure under mild conditions on the underlying parameter curves. The results are applicable to a wide range of locally stationary processes such linear and nonlinear processes. A simulation study shows that the method works fairly well also in misspecified situations.




or

Dynamic network models and graphon estimation

Marianna Pensky.

Source: The Annals of Statistics, Volume 47, Number 4, 2378--2403.

Abstract:
In the present paper, we consider a dynamic stochastic network model. The objective is estimation of the tensor of connection probabilities $mathbf{{Lambda}}$ when it is generated by a Dynamic Stochastic Block Model (DSBM) or a dynamic graphon. In particular, in the context of the DSBM, we derive a penalized least squares estimator $widehat{oldsymbol{Lambda}}$ of $mathbf{{Lambda}}$ and show that $widehat{oldsymbol{Lambda}}$ satisfies an oracle inequality and also attains minimax lower bounds for the risk. We extend those results to estimation of $mathbf{{Lambda}}$ when it is generated by a dynamic graphon function. The estimators constructed in the paper are adaptive to the unknown number of blocks in the context of the DSBM or to the smoothness of the graphon function. The technique relies on the vectorization of the model and leads to much simpler mathematical arguments than the ones used previously in the stationary set up. In addition, all results in the paper are nonasymptotic and allow a variety of extensions.




or

Convergence complexity analysis of Albert and Chib’s algorithm for Bayesian probit regression

Qian Qin, James P. Hobert.

Source: The Annals of Statistics, Volume 47, Number 4, 2320--2347.

Abstract:
The use of MCMC algorithms in high dimensional Bayesian problems has become routine. This has spurred so-called convergence complexity analysis, the goal of which is to ascertain how the convergence rate of a Monte Carlo Markov chain scales with sample size, $n$, and/or number of covariates, $p$. This article provides a thorough convergence complexity analysis of Albert and Chib’s [ J. Amer. Statist. Assoc. 88 (1993) 669–679] data augmentation algorithm for the Bayesian probit regression model. The main tools used in this analysis are drift and minorization conditions. The usual pitfalls associated with this type of analysis are avoided by utilizing centered drift functions, which are minimized in high posterior probability regions, and by using a new technique to suppress high-dimensionality in the construction of minorization conditions. The main result is that the geometric convergence rate of the underlying Markov chain is bounded below 1 both as $n ightarrowinfty$ (with $p$ fixed), and as $p ightarrowinfty$ (with $n$ fixed). Furthermore, the first computable bounds on the total variation distance to stationarity are byproducts of the asymptotic analysis.




or

Convergence rates of least squares regression estimators with heavy-tailed errors

Qiyang Han, Jon A. Wellner.

Source: The Annals of Statistics, Volume 47, Number 4, 2286--2319.

Abstract:
We study the performance of the least squares estimator (LSE) in a general nonparametric regression model, when the errors are independent of the covariates but may only have a $p$th moment ($pgeq1$). In such a heavy-tailed regression setting, we show that if the model satisfies a standard “entropy condition” with exponent $alphain(0,2)$, then the $L_{2}$ loss of the LSE converges at a rate [mathcal{O}_{mathbf{P}}igl(n^{-frac{1}{2+alpha}}vee n^{-frac{1}{2}+frac{1}{2p}}igr).] Such a rate cannot be improved under the entropy condition alone. This rate quantifies both some positive and negative aspects of the LSE in a heavy-tailed regression setting. On the positive side, as long as the errors have $pgeq1+2/alpha$ moments, the $L_{2}$ loss of the LSE converges at the same rate as if the errors are Gaussian. On the negative side, if $p<1+2/alpha$, there are (many) hard models at any entropy level $alpha$ for which the $L_{2}$ loss of the LSE converges at a strictly slower rate than other robust estimators. The validity of the above rate relies crucially on the independence of the covariates and the errors. In fact, the $L_{2}$ loss of the LSE can converge arbitrarily slowly when the independence fails. The key technical ingredient is a new multiplier inequality that gives sharp bounds for the “multiplier empirical process” associated with the LSE. We further give an application to the sparse linear regression model with heavy-tailed covariates and errors to demonstrate the scope of this new inequality.




or

On deep learning as a remedy for the curse of dimensionality in nonparametric regression

Benedikt Bauer, Michael Kohler.

Source: The Annals of Statistics, Volume 47, Number 4, 2261--2285.

Abstract:
Assuming that a smoothness condition and a suitable restriction on the structure of the regression function hold, it is shown that least squares estimates based on multilayer feedforward neural networks are able to circumvent the curse of dimensionality in nonparametric regression. The proof is based on new approximation results concerning multilayer feedforward neural networks with bounded weights and a bounded number of hidden neurons. The estimates are compared with various other approaches by using simulated data.




or

Negative association, ordering and convergence of resampling methods

Mathieu Gerber, Nicolas Chopin, Nick Whiteley.

Source: The Annals of Statistics, Volume 47, Number 4, 2236--2260.

Abstract:
We study convergence and convergence rates for resampling schemes. Our first main result is a general consistency theorem based on the notion of negative association, which is applied to establish the almost sure weak convergence of measures output from Kitagawa’s [ J. Comput. Graph. Statist. 5 (1996) 1–25] stratified resampling method. Carpenter, Ckiffird and Fearnhead’s [ IEE Proc. Radar Sonar Navig. 146 (1999) 2–7] systematic resampling method is similar in structure but can fail to converge depending on the order of the input samples. We introduce a new resampling algorithm based on a stochastic rounding technique of [In 42nd IEEE Symposium on Foundations of Computer Science ( Las Vegas , NV , 2001) (2001) 588–597 IEEE Computer Soc.], which shares some attractive properties of systematic resampling, but which exhibits negative association and, therefore, converges irrespective of the order of the input samples. We confirm a conjecture made by [ J. Comput. Graph. Statist. 5 (1996) 1–25] that ordering input samples by their states in $mathbb{R}$ yields a faster rate of convergence; we establish that when particles are ordered using the Hilbert curve in $mathbb{R}^{d}$, the variance of the resampling error is ${scriptstylemathcal{O}}(N^{-(1+1/d)})$ under mild conditions, where $N$ is the number of particles. We use these results to establish asymptotic properties of particle algorithms based on resampling schemes that differ from multinomial resampling.




or

Spectral method and regularized MLE are both optimal for top-&#36;K&#36; ranking

Yuxin Chen, Jianqing Fan, Cong Ma, Kaizheng Wang.

Source: The Annals of Statistics, Volume 47, Number 4, 2204--2235.

Abstract:
This paper is concerned with the problem of top-$K$ ranking from pairwise comparisons. Given a collection of $n$ items and a few pairwise comparisons across them, one wishes to identify the set of $K$ items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model—the Bradley–Terry–Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress toward characterizing the performance (e.g., the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-$K$ ranking remains unsettled. We demonstrate that under a natural random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity—the number of paired comparisons needed to ensure exact top-$K$ identification, for the fixed dynamic range regime. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established via a novel leave-one-out trick, which proves effective for analyzing both iterative and noniterative procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis–Kahan $mathop{mathrm{sin}} olimits Theta $ theorem for symmetric matrices. This also allows us to close the gap between the $ell_{2}$ error upper bound for the spectral method and the minimax lower limit.




or

CORBA

(Common Object Request Broker Architecture) Pioneering integration architecture. Developed during the 1990s by the Object Management Group (OMG), CORBA was the first major attempt to define a platform-neutral architecture for combining heterogenous software resources across a network. A forerunner of today's service-oriented architectures, CORBA was designed for high-end, transaction-heavy, enterprise deployments, and thus it works best for tight coupling of software resources written in traditional programming languages such as C, C++, Java, Smalltalk and COBOL. Although the addition of IIOP (Internet Inter-ORB Protocol) extended CORBA to run over the Internet, it is less flexible than today's more loosely coupled SOAs, which are based on the exchange of XML documents using web services.




or

object-oriented

(OO) Structured around functional units. Object-oriented programming languages such as C++, SmallTalk and Java are designed to build software made up of objects: discrete bundles of functionality that can act on data only in certain pre-defined ways. This modular building-block approach makes complex software development tasks more flexible and easier to manage within a given programming environment. The emergence of object-oriented programming was a stepping stone to the development of componentization and subsequently of service-oriented architectures.




or

Correction: Sensitivity analysis for an unobserved moderator in RCT-to-target-population generalization of treatment effects

Trang Quynh Nguyen, Elizabeth A. Stuart.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 518--520.




or

Bayesian mixed effects models for zero-inflated compositions in microbiome data analysis

Boyu Ren, Sergio Bacallado, Stefano Favaro, Tommi Vatanen, Curtis Huttenhower, Lorenzo Trippa.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 494--517.

Abstract:
Detecting associations between microbial compositions and sample characteristics is one of the most important tasks in microbiome studies. Most of the existing methods apply univariate models to single microbial species separately, with adjustments for multiple hypothesis testing. We propose a Bayesian analysis for a generalized mixed effects linear model tailored to this application. The marginal prior on each microbial composition is a Dirichlet process, and dependence across compositions is induced through a linear combination of individual covariates, such as disease biomarkers or the subject’s age, and latent factors. The latent factors capture residual variability and their dimensionality is learned from the data in a fully Bayesian procedure. The proposed model is tested in data analyses and simulation studies with zero-inflated compositions. In these settings and within each sample, a large proportion of counts per microbial species are equal to zero. In our Bayesian model a priori the probability of compositions with absent microbial species is strictly positive. We propose an efficient algorithm to sample from the posterior and visualizations of model parameters which reveal associations between covariates and microbial compositions. We evaluate the proposed method in simulation studies, and then analyze a microbiome dataset for infants with type 1 diabetes which contains a large proportion of zeros in the sample-specific microbial compositions.