ed

Principal nested shape space analysis of molecular dynamics data

Ian L. Dryden, Kwang-Rae Kim, Charles A. Laughton, Huiling Le.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2213--2234.

Abstract:
Molecular dynamics simulations produce huge datasets of temporal sequences of molecules. It is of interest to summarize the shape evolution of the molecules in a succinct, low-dimensional representation. However, Euclidean techniques such as principal components analysis (PCA) can be problematic as the data may lie far from in a flat manifold. Principal nested spheres gives a fundamentally different decomposition of data from the usual Euclidean subspace based PCA [ Biometrika 99 (2012) 551–568]. Subspaces of successively lower dimension are fitted to the data in a backwards manner with the aim of retaining signal and dispensing with noise at each stage. We adapt the methodology to 3D subshape spaces and provide some practical fitting algorithms. The methodology is applied to cluster analysis of peptides, where different states of the molecules can be identified. Also, the temporal transitions between cluster states are explored.




ed

Prediction of small area quantiles for the conservation effects assessment project using a mixed effects quantile regression model

Emily Berg, Danhyang Lee.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2158--2188.

Abstract:
Quantiles of the distributions of several measures of erosion are important parameters in the Conservation Effects Assessment Project, a survey intended to quantify soil and nutrient loss on crop fields. Because sample sizes for domains of interest are too small to support reliable direct estimators, model based methods are needed. Quantile regression is appealing for CEAP because finding a single family of parametric models that adequately describes the distributions of all variables is difficult and small area quantiles are parameters of interest. We construct empirical Bayes predictors and bootstrap mean squared error estimators based on the linearly interpolated generalized Pareto distribution (LIGPD). We apply the procedures to predict county-level quantiles for four types of erosion in Wisconsin and validate the procedures through simulation.




ed

Joint model of accelerated failure time and mechanistic nonlinear model for censored covariates, with application in HIV/AIDS

Hongbin Zhang, Lang Wu.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2140--2157.

Abstract:
For a time-to-event outcome with censored time-varying covariates, a joint Cox model with a linear mixed effects model is the standard modeling approach. In some applications such as AIDS studies, mechanistic nonlinear models are available for some covariate process such as viral load during anti-HIV treatments, derived from the underlying data-generation mechanisms and disease progression. Such a mechanistic nonlinear covariate model may provide better-predicted values when the covariates are left censored or mismeasured. When the focus is on the impact of the time-varying covariate process on the survival outcome, an accelerated failure time (AFT) model provides an excellent alternative to the Cox proportional hazard model since an AFT model is formulated to allow the influence of the outcome by the entire covariate process. In this article, we consider a nonlinear mixed effects model for the censored covariates in an AFT model, implemented using a Monte Carlo EM algorithm, under the framework of a joint model for simultaneous inference. We apply the joint model to an HIV/AIDS data to gain insights for assessing the association between viral load and immunological restoration during antiretroviral therapy. Simulation is conducted to compare model performance when the covariate model and the survival model are misspecified.




ed

Statistical inference for partially observed branching processes with application to cell lineage tracking of in vivo hematopoiesis

Jason Xu, Samson Koelle, Peter Guttorp, Chuanfeng Wu, Cynthia Dunbar, Janis L. Abkowitz, Vladimir N. Minin.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2091--2119.

Abstract:
Single-cell lineage tracking strategies enabled by recent experimental technologies have produced significant insights into cell fate decisions, but lack the quantitative framework necessary for rigorous statistical analysis of mechanistic models describing cell division and differentiation. In this paper, we develop such a framework with corresponding moment-based parameter estimation techniques for continuous-time, multi-type branching processes. Such processes provide a probabilistic model of how cells divide and differentiate, and we apply our method to study hematopoiesis , the mechanism of blood cell production. We derive closed-form expressions for higher moments in a general class of such models. These analytical results allow us to efficiently estimate parameters of much richer statistical models of hematopoiesis than those used in previous statistical studies. To our knowledge, the method provides the first rate inference procedure for fitting such models to time series data generated from cellular barcoding experiments. After validating the methodology in simulation studies, we apply our estimator to hematopoietic lineage tracking data from rhesus macaques. Our analysis provides a more complete understanding of cell fate decisions during hematopoiesis in nonhuman primates, which may be more relevant to human biology and clinical strategies than previous findings from murine studies. For example, in addition to previously estimated hematopoietic stem cell self-renewal rate, we are able to estimate fate decision probabilities and to compare structurally distinct models of hematopoiesis using cross validation. These estimates of fate decision probabilities and our model selection results should help biologists compare competing hypotheses about how progenitor cells differentiate. The methodology is transferrable to a large class of stochastic compartmental and multi-type branching models, commonly used in studies of cancer progression, epidemiology and many other fields.




ed

Radio-iBAG: Radiomics-based integrative Bayesian analysis of multiplatform genomic data

Youyi Zhang, Jeffrey S. Morris, Shivali Narang Aerry, Arvind U. K. Rao, Veerabhadran Baladandayuthapani.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1957--1988.

Abstract:
Technological innovations have produced large multi-modal datasets that include imaging and multi-platform genomics data. Integrative analyses of such data have the potential to reveal important biological and clinical insights into complex diseases like cancer. In this paper, we present Bayesian approaches for integrative analysis of radiological imaging and multi-platform genomic data, where-in our goals are to simultaneously identify genomic and radiomic, that is, radiology-based imaging markers, along with the latent associations between these two modalities, and to detect the overall prognostic relevance of the combined markers. For this task, we propose Radio-iBAG: Radiomics-based Integrative Bayesian Analysis of Multiplatform Genomic Data , a multi-scale Bayesian hierarchical model that involves several innovative strategies: it incorporates integrative analysis of multi-platform genomic data sets to capture fundamental biological relationships; explores the associations between radiomic markers accompanying genomic information with clinical outcomes; and detects genomic and radiomic markers associated with clinical prognosis. We also introduce the use of sparse Principal Component Analysis (sPCA) to extract a sparse set of approximately orthogonal meta-features each containing information from a set of related individual radiomic features, reducing dimensionality and combining like features. Our methods are motivated by and applied to The Cancer Genome Atlas glioblastoma multiforme data set, where-in we integrate magnetic resonance imaging-based biomarkers along with genomic, epigenomic and transcriptomic data. Our model identifies important magnetic resonance imaging features and the associated genomic platforms that are related with patient survival times.




ed

Bayesian methods for multiple mediators: Relating principal stratification and causal mediation in the analysis of power plant emission controls

Chanmin Kim, Michael J. Daniels, Joseph W. Hogan, Christine Choirat, Corwin M. Zigler.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1927--1956.

Abstract:
Emission control technologies installed on power plants are a key feature of many air pollution regulations in the US. While such regulations are predicated on the presumed relationships between emissions, ambient air pollution and human health, many of these relationships have never been empirically verified. The goal of this paper is to develop new statistical methods to quantify these relationships. We frame this problem as one of mediation analysis to evaluate the extent to which the effect of a particular control technology on ambient pollution is mediated through causal effects on power plant emissions. Since power plants emit various compounds that contribute to ambient pollution, we develop new methods for multiple intermediate variables that are measured contemporaneously, may interact with one another, and may exhibit joint mediating effects. Specifically, we propose new methods leveraging two related frameworks for causal inference in the presence of mediating variables: principal stratification and causal mediation analysis. We define principal effects based on multiple mediators, and also introduce a new decomposition of the total effect of an intervention on ambient pollution into the natural direct effect and natural indirect effects for all combinations of mediators. Both approaches are anchored to the same observed-data models, which we specify with Bayesian nonparametric techniques. We provide assumptions for estimating principal causal effects, then augment these with an additional assumption required for causal mediation analysis. The two analyses, interpreted in tandem, provide the first empirical investigation of the presumed causal pathways that motivate important air quality regulatory policies.




ed

Sequential decision model for inference and prediction on nonuniform hypergraphs with application to knot matching from computational forestry

Seong-Hwan Jun, Samuel W. K. Wong, James V. Zidek, Alexandre Bouchard-Côté.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1678--1707.

Abstract:
In this paper, we consider the knot-matching problem arising in computational forestry. The knot-matching problem is an important problem that needs to be solved to advance the state of the art in automatic strength prediction of lumber. We show that this problem can be formulated as a quadripartite matching problem and develop a sequential decision model that admits efficient parameter estimation along with a sequential Monte Carlo sampler on graph matching that can be utilized for rapid sampling of graph matching. We demonstrate the effectiveness of our methods on 30 manually annotated boards and present findings from various simulation studies to provide further evidence supporting the efficacy of our methods.




ed

RCRnorm: An integrated system of random-coefficient hierarchical regression models for normalizing NanoString nCounter data

Gaoxiang Jia, Xinlei Wang, Qiwei Li, Wei Lu, Ximing Tang, Ignacio Wistuba, Yang Xie.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1617--1647.

Abstract:
Formalin-fixed paraffin-embedded (FFPE) samples have great potential for biomarker discovery, retrospective studies and diagnosis or prognosis of diseases. Their application, however, is hindered by the unsatisfactory performance of traditional gene expression profiling techniques on damaged RNAs. NanoString nCounter platform is well suited for profiling of FFPE samples and measures gene expression with high sensitivity which may greatly facilitate realization of scientific and clinical values of FFPE samples. However, methodological development for normalization, a critical step when analyzing this type of data, is far behind. Existing methods designed for the platform use information from different types of internal controls separately and rely on an overly-simplified assumption that expression of housekeeping genes is constant across samples for global scaling. Thus, these methods are not optimized for the nCounter system, not mentioning that they were not developed for FFPE samples. We construct an integrated system of random-coefficient hierarchical regression models to capture main patterns and characteristics observed from NanoString data of FFPE samples and develop a Bayesian approach to estimate parameters and normalize gene expression across samples. Our method, labeled RCRnorm, incorporates information from all aspects of the experimental design and simultaneously removes biases from various sources. It eliminates the unrealistic assumption on housekeeping genes and offers great interpretability. Furthermore, it is applicable to freshly frozen or like samples that can be generally viewed as a reduced case of FFPE samples. Simulation and applications showed the superior performance of RCRnorm.




ed

Spatio-temporal short-term wind forecast: A calibrated regime-switching method

Ahmed Aziz Ezzat, Mikyoung Jun, Yu Ding.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1484--1510.

Abstract:
Accurate short-term forecasts are indispensable for the integration of wind energy in power grids. On a wind farm, local wind conditions exhibit sizeable variations at a fine temporal resolution. Existing statistical models may capture the in-sample variations in wind behavior, but are often shortsighted to those occurring in the near future, that is, in the forecast horizon. The calibrated regime-switching method proposed in this paper introduces an action of regime dependent calibration on the predictand (here the wind speed variable), which helps correct the bias resulting from out-of-sample variations in wind behavior. This is achieved by modeling the calibration as a function of two elements: the wind regime at the time of the forecast (and the calibration is therefore regime dependent), and the runlength, which is the time elapsed since the last observed regime change. In addition to regime-switching dynamics, the proposed model also accounts for other features of wind fields: spatio-temporal dependencies, transport effect of wind and nonstationarity. Using one year of turbine-specific wind data, we show that the calibrated regime-switching method can offer a wide margin of improvement over existing forecasting methods in terms of both wind speed and power.




ed

A refined Cramér-type moderate deviation for sums of local statistics

Xiao Fang, Li Luo, Qi-Man Shao.

Source: Bernoulli, Volume 26, Number 3, 2319--2352.

Abstract:
We prove a refined Cramér-type moderate deviation result by taking into account of the skewness in normal approximation for sums of local statistics of independent random variables. We apply the main result to $k$-runs, U-statistics and subgraph counts in the Erdős–Rényi random graph. To prove our main result, we develop exponential concentration inequalities and higher-order tail probability expansions via Stein’s method.




ed

Weighted Lépingle inequality

Pavel Zorin-Kranich.

Source: Bernoulli, Volume 26, Number 3, 2311--2318.

Abstract:
We prove an estimate for weighted $p$th moments of the pathwise $r$-variation of a martingale in terms of the $A_{p}$ characteristic of the weight. The novelty of the proof is that we avoid real interpolation techniques.




ed

Matching strings in encoded sequences

Adriana Coutinho, Rodrigo Lambert, Jérôme Rousseau.

Source: Bernoulli, Volume 26, Number 3, 2021--2050.

Abstract:
We investigate the length of the longest common substring for encoded sequences and its asymptotic behaviour. The main result is a strong law of large numbers for a re-scaled version of this quantity, which presents an explicit relation with the Rényi entropy of the source. We apply this result to the zero-inflated contamination model and the stochastic scrabble. In the case of dynamical systems, this problem is equivalent to the shortest distance between two observed orbits and its limiting relationship with the correlation dimension of the pushforward measure. An extension to the shortest distance between orbits for random dynamical systems is also provided.




ed

Optimal functional supervised classification with separation condition

Sébastien Gadat, Sébastien Gerchinovitz, Clément Marteau.

Source: Bernoulli, Volume 26, Number 3, 1797--1831.

Abstract:
We consider the binary supervised classification problem with the Gaussian functional model introduced in ( Math. Methods Statist. 22 (2013) 213–225). Taking advantage of the Gaussian structure, we design a natural plug-in classifier and derive a family of upper bounds on its worst-case excess risk over Sobolev spaces. These bounds are parametrized by a separation distance quantifying the difficulty of the problem, and are proved to be optimal (up to logarithmic factors) through matching minimax lower bounds. Using the recent works of (In Advances in Neural Information Processing Systems (2014) 3437–3445 Curran Associates) and ( Ann. Statist. 44 (2016) 982–1009), we also derive a logarithmic lower bound showing that the popular $k$-nearest neighbors classifier is far from optimality in this specific functional setting.




ed

Influence of the seed in affine preferential attachment trees

David Corlin Marchand, Ioan Manolescu.

Source: Bernoulli, Volume 26, Number 3, 1665--1705.

Abstract:
We study randomly growing trees governed by the affine preferential attachment rule. Starting with a seed tree $S$, vertices are attached one by one, each linked by an edge to a random vertex of the current tree, chosen with a probability proportional to an affine function of its degree. This yields a one-parameter family of preferential attachment trees $(T_{n}^{S})_{ngeq |S|}$, of which the linear model is a particular case. Depending on the choice of the parameter, the power-laws governing the degrees in $T_{n}^{S}$ have different exponents. We study the problem of the asymptotic influence of the seed $S$ on the law of $T_{n}^{S}$. We show that, for any two distinct seeds $S$ and $S'$, the laws of $T_{n}^{S}$ and $T_{n}^{S'}$ remain at uniformly positive total-variation distance as $n$ increases. This is a continuation of Curien et al. ( J. Éc. Polytech. Math. 2 (2015) 1–34), which in turn was inspired by a conjecture of Bubeck et al. ( IEEE Trans. Netw. Sci. Eng. 2 (2015) 30–39). The technique developed here is more robust than previous ones and is likely to help in the study of more general attachment mechanisms.




ed

Estimating the number of connected components in a graph via subgraph sampling

Jason M. Klusowski, Yihong Wu.

Source: Bernoulli, Volume 26, Number 3, 1635--1664.

Abstract:
Learning properties of large graphs from samples has been an important problem in statistical network analysis since the early work of Goodman ( Ann. Math. Stat. 20 (1949) 572–579) and Frank ( Scand. J. Stat. 5 (1978) 177–188). We revisit a problem formulated by Frank ( Scand. J. Stat. 5 (1978) 177–188) of estimating the number of connected components in a large graph based on the subgraph sampling model, in which we randomly sample a subset of the vertices and observe the induced subgraph. The key question is whether accurate estimation is achievable in the sublinear regime where only a vanishing fraction of the vertices are sampled. We show that it is impossible if the parent graph is allowed to contain high-degree vertices or long induced cycles. For the class of chordal graphs, where induced cycles of length four or above are forbidden, we characterize the optimal sample complexity within constant factors and construct linear-time estimators that provably achieve these bounds. This significantly expands the scope of previous results which have focused on unbiased estimators and special classes of graphs such as forests or cliques. Both the construction and the analysis of the proposed methodology rely on combinatorial properties of chordal graphs and identities of induced subgraph counts. They, in turn, also play a key role in proving minimax lower bounds based on construction of random instances of graphs with matching structures of small subgraphs.




ed

Stratonovich stochastic differential equation with irregular coefficients: Girsanov’s example revisited

Ilya Pavlyukevich, Georgiy Shevchenko.

Source: Bernoulli, Volume 26, Number 2, 1381--1409.

Abstract:
In this paper, we study the Stratonovich stochastic differential equation $mathrm{d}X=|X|^{alpha }circ mathrm{d}B$, $alpha in (-1,1)$, which has been introduced by Cherstvy et al. ( New J. Phys. 15 (2013) 083039) in the context of analysis of anomalous diffusions in heterogeneous media. We determine its weak and strong solutions, which are homogeneous strong Markov processes spending zero time at $0$: for $alpha in (0,1)$, these solutions have the form egin{equation*}X_{t}^{ heta }=((1-alpha)B_{t}^{ heta })^{1/(1-alpha )},end{equation*} where $B^{ heta }$ is the $ heta $-skew Brownian motion driven by $B$ and starting at $frac{1}{1-alpha }(X_{0})^{1-alpha }$, $ heta in [-1,1]$, and $(x)^{gamma }=|x|^{gamma }operatorname{sign}x$; for $alpha in (-1,0]$, only the case $ heta =0$ is possible. The central part of the paper consists in the proof of the existence of a quadratic covariation $[f(B^{ heta }),B]$ for a locally square integrable function $f$ and is based on the time-reversion technique for Markovian diffusions.




ed

On stability of traveling wave solutions for integro-differential equations related to branching Markov processes

Pasha Tkachov.

Source: Bernoulli, Volume 26, Number 2, 1354--1380.

Abstract:
The aim of this paper is to prove stability of traveling waves for integro-differential equations connected with branching Markov processes. In other words, the limiting law of the left-most particle of a (time-continuous) branching Markov process with a Lévy non-branching part is demonstrated. The key idea is to approximate the branching Markov process by a branching random walk and apply the result of Aïdékon [ Ann. Probab. 41 (2013) 1362–1426] on the limiting law of the latter one.




ed

Interacting reinforced stochastic processes: Statistical inference based on the weighted empirical means

Giacomo Aletti, Irene Crimaldi, Andrea Ghiglietti.

Source: Bernoulli, Volume 26, Number 2, 1098--1138.

Abstract:
This work deals with a system of interacting reinforced stochastic processes , where each process $X^{j}=(X_{n,j})_{n}$ is located at a vertex $j$ of a finite weighted directed graph, and it can be interpreted as the sequence of “actions” adopted by an agent $j$ of the network. The interaction among the dynamics of these processes depends on the weighted adjacency matrix $W$ associated to the underlying graph: indeed, the probability that an agent $j$ chooses a certain action depends on its personal “inclination” $Z_{n,j}$ and on the inclinations $Z_{n,h}$, with $h eq j$, of the other agents according to the entries of $W$. The best known example of reinforced stochastic process is the Pólya urn. The present paper focuses on the weighted empirical means $N_{n,j}=sum_{k=1}^{n}q_{n,k}X_{k,j}$, since, for example, the current experience is more important than the past one in reinforced learning. Their almost sure synchronization and some central limit theorems in the sense of stable convergence are proven. The new approach with weighted means highlights the key points in proving some recent results for the personal inclinations $Z^{j}=(Z_{n,j})_{n}$ and for the empirical means $overline{X}^{j}=(sum_{k=1}^{n}X_{k,j}/n)_{n}$ given in recent papers (e.g. Aletti, Crimaldi and Ghiglietti (2019), Ann. Appl. Probab. 27 (2017) 3787–3844, Crimaldi et al. Stochastic Process. Appl. 129 (2019) 70–101). In fact, with a more sophisticated decomposition of the considered processes, we can understand how the different convergence rates of the involved stochastic processes combine. From an application point of view, we provide confidence intervals for the common limit inclination of the agents and a test statistics to make inference on the matrix $W$, based on the weighted empirical means. In particular, we answer a research question posed in Aletti, Crimaldi and Ghiglietti (2019).




ed

A unified principled framework for resampling based on pseudo-populations: Asymptotic theory

Pier Luigi Conti, Daniela Marella, Fulvia Mecatti, Federico Andreis.

Source: Bernoulli, Volume 26, Number 2, 1044--1069.

Abstract:
In this paper, a class of resampling techniques for finite populations under $pi $ps sampling design is introduced. The basic idea on which they rest is a two-step procedure consisting in: (i) constructing a “pseudo-population” on the basis of sample data; (ii) drawing a sample from the predicted population according to an appropriate resampling design. From a logical point of view, this approach is essentially based on the plug-in principle by Efron, at the “sampling design level”. Theoretical justifications based on large sample theory are provided. New approaches to construct pseudo populations based on various forms of calibrations are proposed. Finally, a simulation study is performed.




ed

Stable processes conditioned to hit an interval continuously from the outside

Leif Döring, Philip Weissmann.

Source: Bernoulli, Volume 26, Number 2, 980--1015.

Abstract:
Conditioning stable Lévy processes on zero probability events recently became a tractable subject since several explicit formulas emerged from a deep analysis using the Lamperti transformations for self-similar Markov processes. In this article, we derive new harmonic functions and use them to explain how to condition stable processes to hit continuously a compact interval from the outside.




ed

Stochastic differential equations with a fractionally filtered delay: A semimartingale model for long-range dependent processes

Richard A. Davis, Mikkel Slot Nielsen, Victor Rohde.

Source: Bernoulli, Volume 26, Number 2, 799--827.

Abstract:
In this paper, we introduce a model, the stochastic fractional delay differential equation (SFDDE), which is based on the linear stochastic delay differential equation and produces stationary processes with hyperbolically decaying autocovariance functions. The model departs from the usual way of incorporating this type of long-range dependence into a short-memory model as it is obtained by applying a fractional filter to the drift term rather than to the noise term. The advantages of this approach are that the corresponding long-range dependent solutions are semimartingales and the local behavior of the sample paths is unaffected by the degree of long memory. We prove existence and uniqueness of solutions to the SFDDEs and study their spectral densities and autocovariance functions. Moreover, we define a subclass of SFDDEs which we study in detail and relate to the well-known fractionally integrated CARMA processes. Finally, we consider the task of simulating from the defining SFDDEs.




ed

Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces

Jing Lei.

Source: Bernoulli, Volume 26, Number 1, 767--798.

Abstract:
We provide upper bounds of the expected Wasserstein distance between a probability measure and its empirical version, generalizing recent results for finite dimensional Euclidean spaces and bounded functional spaces. Such a generalization can cover Euclidean spaces with large dimensionality, with the optimal dependence on the dimensionality. Our method also covers the important case of Gaussian processes in separable Hilbert spaces, with rate-optimal upper bounds for functional data distributions whose coordinates decay geometrically or polynomially. Moreover, our bounds of the expected value can be combined with mean-concentration results to yield improved exponential tail probability bounds for the Wasserstein error of empirical measures under Bernstein-type or log Sobolev-type conditions.




ed

A Feynman–Kac result via Markov BSDEs with generalised drivers

Elena Issoglio, Francesco Russo.

Source: Bernoulli, Volume 26, Number 1, 728--766.

Abstract:
In this paper, we investigate BSDEs where the driver contains a distributional term (in the sense of generalised functions) and derive general Feynman–Kac formulae related to these BSDEs. We introduce an integral operator to give sense to the equation and then we show the existence of a strong solution employing results on a related PDE. Due to the irregularity of the driver, the $Y$-component of a couple $(Y,Z)$ solving the BSDE is not necessarily a semimartingale but a weak Dirichlet process.




ed

A unified approach to coupling SDEs driven by Lévy noise and some applications

Mingjie Liang, René L. Schilling, Jian Wang.

Source: Bernoulli, Volume 26, Number 1, 664--693.

Abstract:
We present a general method to construct couplings of stochastic differential equations driven by Lévy noise in terms of coupling operators. This approach covers both coupling by reflection and refined basic coupling which are often discussed in the literature. As applications, we prove regularity results for the transition semigroups and obtain successful couplings for the solutions to stochastic differential equations driven by additive Lévy noise.




ed

On frequentist coverage errors of Bayesian credible sets in moderately high dimensions

Keisuke Yano, Kengo Kato.

Source: Bernoulli, Volume 26, Number 1, 616--641.

Abstract:
In this paper, we study frequentist coverage errors of Bayesian credible sets for an approximately linear regression model with (moderately) high dimensional regressors, where the dimension of the regressors may increase with but is smaller than the sample size. Specifically, we consider quasi-Bayesian inference on the slope vector under the quasi-likelihood with Gaussian error distribution. Under this setup, we derive finite sample bounds on frequentist coverage errors of Bayesian credible rectangles. Derivation of those bounds builds on a novel Berry–Esseen type bound on quasi-posterior distributions and recent results on high-dimensional CLT on hyperrectangles. We use this general result to quantify coverage errors of Castillo–Nickl and $L^{infty}$-credible bands for Gaussian white noise models, linear inverse problems, and (possibly non-Gaussian) nonparametric regression models. In particular, we show that Bayesian credible bands for those nonparametric models have coverage errors decaying polynomially fast in the sample size, implying advantages of Bayesian credible bands over confidence bands based on extreme value theory.




ed

Normal approximation for sums of weighted $U$-statistics – application to Kolmogorov bounds in random subgraph counting

Nicolas Privault, Grzegorz Serafin.

Source: Bernoulli, Volume 26, Number 1, 587--615.

Abstract:
We derive normal approximation bounds in the Kolmogorov distance for sums of discrete multiple integrals and weighted $U$-statistics made of independent Bernoulli random variables. Such bounds are applied to normal approximation for the renormalized subgraph counts in the Erdős–Rényi random graph. This approach completely solves a long-standing conjecture in the general setting of arbitrary graph counting, while recovering recent results obtained for triangles and improving other bounds in the Wasserstein distance.




ed

Subspace perspective on canonical correlation analysis: Dimension reduction and minimax rates

Zhuang Ma, Xiaodong Li.

Source: Bernoulli, Volume 26, Number 1, 432--470.

Abstract:
Canonical correlation analysis (CCA) is a fundamental statistical tool for exploring the correlation structure between two sets of random variables. In this paper, motivated by the recent success of applying CCA to learn low dimensional representations of high dimensional objects, we propose two losses based on the principal angles between the model spaces spanned by the sample canonical variates and their population correspondents, respectively. We further characterize the non-asymptotic error bounds for the estimation risks under the proposed error metrics, which reveal how the performance of sample CCA depends adaptively on key quantities including the dimensions, the sample size, the condition number of the covariance matrices and particularly the population canonical correlation coefficients. The optimality of our uniform upper bounds is also justified by lower-bound analysis based on stringent and localized parameter spaces. To the best of our knowledge, for the first time our paper separates $p_{1}$ and $p_{2}$ for the first order term in the upper bounds without assuming the residual correlations are zeros. More significantly, our paper derives $(1-lambda_{k}^{2})(1-lambda_{k+1}^{2})/(lambda_{k}-lambda_{k+1})^{2}$ for the first time in the non-asymptotic CCA estimation convergence rates, which is essential to understand the behavior of CCA when the leading canonical correlation coefficients are close to $1$.




ed

High dimensional deformed rectangular matrices with applications in matrix denoising

Xiucai Ding.

Source: Bernoulli, Volume 26, Number 1, 387--417.

Abstract:
We consider the recovery of a low rank $M imes N$ matrix $S$ from its noisy observation $ ilde{S}$ in the high dimensional framework when $M$ is comparable to $N$. We propose two efficient estimators for $S$ under two different regimes. Our analysis relies on the local asymptotics of the eigenstructure of large dimensional rectangular matrices with finite rank perturbation. We derive the convergent limits and rates for the singular values and vectors for such matrices.




ed

Prediction and estimation consistency of sparse multi-class penalized optimal scoring

Irina Gaynanova.

Source: Bernoulli, Volume 26, Number 1, 286--322.

Abstract:
Sparse linear discriminant analysis via penalized optimal scoring is a successful tool for classification in high-dimensional settings. While the variable selection consistency of sparse optimal scoring has been established, the corresponding prediction and estimation consistency results have been lacking. We bridge this gap by providing probabilistic bounds on out-of-sample prediction error and estimation error of multi-class penalized optimal scoring allowing for diverging number of classes.




ed

Needles and straw in a haystack: Robust confidence for possibly sparse sequences

Eduard Belitser, Nurzhan Nurushev.

Source: Bernoulli, Volume 26, Number 1, 191--225.

Abstract:
In the general signal$+$noise (allowing non-normal, non-independent observations) model, we construct an empirical Bayes posterior which we then use for uncertainty quantification for the unknown, possibly sparse, signal. We introduce a novel excessive bias restriction (EBR) condition, which gives rise to a new slicing of the entire space that is suitable for uncertainty quantification. Under EBR and some mild exchangeable exponential moment condition on the noise, we establish the local (oracle) optimality of the proposed confidence ball. Without EBR, we propose another confidence ball of full coverage, but its radius contains an additional $sigma n^{1/4}$-term. In passing, we also get the local optimal results for estimation , posterior contraction problems, and the problem of weak recovery of sparsity structure . Adaptive minimax results (also for the estimation and posterior contraction problems) over various sparsity classes follow from our local results.




ed

The Thomson family : fisherman in Buckhaven, retailers in Kapunda / compiled by Elizabeth Anne Howell.

Thomson (Family)




ed

List of family history books owned by Roy Klemm.

Family histories -- South Australia -- Bibliography.




ed

The Klemm family : descendants of Johann Gottfried Klemm and Anna Louise Klemm : these forebears are honoured and remembered at a reunion at Gruenberg, Moculta 11th-12th March 1995.

Klemm (Family)




ed

GEDmatch : tools for DNA & genealogy research / by Kerry Farmer.

Genetic genealogy -- Handbooks, manuals, etc.




ed

The Kuerschner story : 1848 - 1999 / compiled by Gerald Kuerschner.

Kuerschner (Family)




ed

Our Lady of Grace family page of history : a bookweek bicentennial project / edited by Janeen Brian.

Our Lady of Grace School (Glengowrie, S.A.)




ed

From Wends we came : the story of Johann and Maria Huppatz & their descendants / compiled by Frank Huppatz and Rone McDonnell.

Huppatz (Family).




ed

Austin-Area District Looks for Digital/Blended Learning Program; Baltimore Seeks High School Literacy Program

The Round Rock Independent School District in Texas is looking for a digital curriculum and blended learning program. Baltimore is looking for a comprehensive high school literacy program.

The post Austin-Area District Looks for Digital/Blended Learning Program; Baltimore Seeks High School Literacy Program appeared first on Market Brief.



  • Purchasing Alert
  • Curriculum / Digital Curriculum
  • Educational Technology/Ed-Tech
  • Learning Management / Student Information Systems
  • Procurement / Purchasing / RFPs

ed

Pearson K12 Spinoff Rebranded as ‘Savvas Learning Company’

Savvas Learning Company will continue to provide its K-12 products and services, and is working to support districts with their remote learning needs during school closures.

The post Pearson K12 Spinoff Rebranded as ‘Savvas Learning Company’ appeared first on Market Brief.




ed

Calif. Ed-Tech Consortium Seeks Media Repository Solutions; Saint Paul District Needs Background Check Services

Saint Paul schools are in the market for a vendor to provide background checks, while the Education Technology Joint Powers Authority is seeking media repositories. A Texas district wants quotes on technology for new campuses.

The post Calif. Ed-Tech Consortium Seeks Media Repository Solutions; Saint Paul District Needs Background Check Services appeared first on Market Brief.




ed

Item 04: Notebook of Colonel Alfred Hobart Sturdee, 8 August 1914 to 25 February 1918




ed

Glass stereoscopic slides of Gallipoli, May 1915 / photographed by Charles Snodgrass Ryan




ed

Item 07: A Journal of ye [the] Proceedings of his Majesty's Sloop Swallow, Captain Phillip [Philip] Carteret Commander, Commencing ye [the] 23 of July 1766 and ended [4 July 1767]




ed

Item 08: A Logg [Log] Book of the proceedings on Board His Majesty's Ship Swallow, Captain Philip Carteret Commander Commencing from the 20th August 1766 and Ending [21st May 1768]




ed

Item 13: Swallow 1767, A journal of the proceedings on Board His Majesty's Sloop Swallow, commencing the 1st of March 1767 and Ended the 7th of July 1767




ed

Item 01: Notebooks (2) containing hand written copies of 123 letters from Major William Alan Audsley to his parents, ca. 1916-ca. 1919, transcribed by his father. Also includes original letters (2) written by Major Audsley.




ed

Item 01: Autograph letter signed, from Hume, Appin, to William E. Riley, concerning an account for money owed by Riley, 4 September 1834




ed

Sydney in 1848 : illustrated by copper-plate engravings of its principal streets, public buildings, churches, chapels, etc. / from drawings by Joseph Fowles.




ed

Art Around the Library - Illuminated letter

Examine some examples of font and decoration used in beautiful medieval manuscripts as inspiration for creating your own illuminated letter design.




ed

3 NY children die from syndrome possibly linked to COVID-19

Three children have now died in New York state from a possible complication from the coronavirus involving swollen blood vessels and heart problems, Gov. Andrew Cuomo said Saturday. At least 73 children in New York have been diagnosed with symptoms similar to Kawasaki disease — a rare inflammatory condition in children — and toxic shock syndrome.