is Correspondence relating to Lewis Harold Bell Lasseter, 1931 By feedproxy.google.com Published On :: 9/10/2015 12:00:00 AM Full Article
is The Most Excellent Order of the British Empire Association (New South Wales) further records, 1979-2012 By feedproxy.google.com Published On :: 9/10/2015 12:00:00 AM Full Article
is Collodion is alive and well! By www.sl.nsw.gov.au Published On :: Thu, 10 Sep 2015 02:50:11 +0000 I just came across this Youtube video submitted by modern day exponent of the collodion process, Quinn Jacobson (http: Full Article
is Digitisation Officer appointed By www.sl.nsw.gov.au Published On :: Thu, 10 Sep 2015 02:50:12 +0000 Digitisation Officer appointed I am pleased to introduce our new Digitisation Officer, Lauren O'Brien. Her main f Full Article
is Oregon's Sabrina Ionescu takes home Naismith Trophy Player of the Year honor By sports.yahoo.com Published On :: Fri, 03 Apr 2020 16:05:43 GMT Sabrina Ionescu is the Naismith Trophy Player of the Year, concluding her illustrious Oregon career with one of the major postseason women's basketball awards. As the only player in college basketball history with 2,000 career points (2,562), 1,000 assists (1,091) and 1,000 rebounds (1,040) and the NCAA all-time leader with 26 triple-doubles, Ionescu has continued to rack up player of the year honors for her remarkable senior season. Full Article video Sports
is Oregon's Ionescu wins women's Naismith Player of the Year By sports.yahoo.com Published On :: Fri, 03 Apr 2020 17:27:00 GMT Already named The Associated Press women's player of the year, Ionescu was awarded the Naismith Trophy for the most outstanding women's basketball player on Friday. Ionescu, who won AP All-American honors three times, shattered the NCAA career triple-double mark with 26 and became the first player in college history to have 2,000 points, 1,000 rebounds and 1,000 assists. Ionescu averaged 17.5 points, 9.1 assists and 8.6 rebounds with eight triple-doubles as a senior this season. Full Article article Sports
is Texas women's basketball coach Karen Aston dismissed By sports.yahoo.com Published On :: Fri, 03 Apr 2020 19:35:13 GMT AUSTIN, Texas (AP) -- Texas dismissed women's basketball coach Karen Aston on Friday, ending an eight-year stint that included four straight trips to the NCAA Tournament Sweet 16 from 2015-2018. Full Article article Sports
is Top three Ruthy Hebard moments: NCAA record for consecutive FGs etched her place in history By sports.yahoo.com Published On :: Fri, 03 Apr 2020 23:08:48 GMT Over four years in Eugene, Ruthy Hebard has made a name for herself with reliability and dynamic play. She's had many memorable moments in a Duck uniform. But her career day against Washington State (34 points), her moment reaching 2,000 career points and her NCAA record for consecutive made FGs (2018) tops the list. Against the Trojans, she set the record (30) and later extended it to 33. Full Article video Sports
is Texas hires Schaefer from Mississippi State By sports.yahoo.com Published On :: Sun, 05 Apr 2020 21:54:06 GMT Texas moved quickly to hire a new women's basketball coach, luring Vic Schaefer away from powerhouse Mississippi State on Sunday. Texas athletic director Chris Del Conte announced the move by tweeting a picture of himself with Schaefer and his family holding up the “Hook'em Horns” hand signal. The move comes just two days after Texas dismissed eight-year coach Karen Aston, who had only one losing season in her tenure and had led the Longhorns to the Sweet 16 or farther four times. Full Article article Sports
is Sydney Wiese, recovering from coronavirus, continually talking with friends and family: 'Our world is uniting' By sports.yahoo.com Published On :: Mon, 06 Apr 2020 16:11:35 GMT Hear how former Oregon State guard and current member of the WNBA's LA Sparks Sydney Wiese is recovering from a COVID-19 diagnosis, seeing friends and family show support and love during a trying time. Full Article video Sports
is Clean sweep: Oregon's Sabrina Ionescu is unanimous Player of the Year after winning Wooden Award By sports.yahoo.com Published On :: Mon, 06 Apr 2020 21:21:52 GMT Sabrina Ionescu wins the Wooden Award for the second year in a row, becoming the fifth in the trophy's history to win in back-to-back seasons. With the honor, she completes a complete sweep of the national postseason player of the year awards. As a senior, Ionescu matched her own single-season mark with eight triple-doubles in 2019-20, and she was incredibly efficient from the field with a career-best 51.8 field goal percentage. Full Article video Sports
is Oregon's Sabrina Ionescu, Ruthy Hebard, Satou Sabally share meaning of Naismith Starting 5 honor By sports.yahoo.com Published On :: Wed, 08 Apr 2020 19:50:23 GMT Pac-12 Networks' Ashley Adamson speaks with Oregon stars Sabrina Ionescu, Ruthy Hebard and Satou Sabally to hear how special their recent Naismith Starting 5 honor was, as the Ducks comprise three of the nation's top five players. Ionescu (point guard), Sabally (small forward) and Hebard (power forward) led the Ducks to a 31-2 record in the 2019-20 season before it was cut short. Full Article video Sports
is Sabrina Ionescu, Ruthy Hebard, Satou Sabally on staying connected, WNBA Draft, Oregon's historic season By sports.yahoo.com Published On :: Thu, 09 Apr 2020 16:27:12 GMT Pac-12 Networks' Ashley Adamson catches up with Oregon's "Big 3" of Sabrina Ionescu, Ruthy Hebard and Satou Sabally to hear how they're adjusting to the new world without sports while still preparing for the WNBA Draft on April 17. They also share how they're staying hungry for basketball during the hiatus. Full Article video Sports
is Staley thinks No. 1 South Carolina is national champs By sports.yahoo.com Published On :: Thu, 09 Apr 2020 20:31:23 GMT South Carolina coach Dawn Staley believes her top-ranked Gamecocks are the women's basketball national champions, even without an NCAA Tournament trophy to put in their display case due to the pandemic-shortened season. The NCAA decided against officially crowning champions after its signature tournaments were called off due to the coronavirus pandemic that has sent much of the world into lock down. Staley spoke from her home where she's spent the past month managing her program and ensuring her players don't linger too much on what they missed. Full Article article Sports
is Mississippi State hires Nikki McCray-Penson as women's coach By sports.yahoo.com Published On :: Sat, 11 Apr 2020 19:32:26 GMT Mississippi State hired former Old Dominion women’s basketball coach Nikki McCray-Penson to replace Vic Schaefer as the Bulldogs’ head coach. Athletic director John Cohen called McCray-Penson “a proven winner who will lead one of the best programs in the nation” on the department’s website. McCray-Penson, a former Tennessee star and Women’s Basketball Hall of Famer, said it’s been a dream to coach in the Southeastern Conference and she’s “grateful and blessed for this incredible honor and opportunity.” Full Article article Sports
is Ruthy Hebard, Sabrina Ionescu 'represent everything that is great about basketball' By sports.yahoo.com Published On :: Tue, 14 Apr 2020 16:16:41 GMT Ruthy Hebard and Sabrina Ionescu have had a remarkable four years together in Eugene, rewriting the history books and pushing the Ducks into the national spotlight. Catch the debut of "Our Stories Unfinished Business: Sabrina Ionescu and Ruthy Hebard" at Wednesday, April 15 at 7 p.m. PT/ 8 p.m. MT on Pac-12 Network. Full Article video News
is Charli Turner Thorne drops by 'Pac-12 Playlist' to surprise former player Dr. Michelle Tom By sports.yahoo.com Published On :: Thu, 16 Apr 2020 16:51:30 GMT Pac-12 Networks' Ashley Adamson speaks with former Arizona State women's basketball player Michelle Tom, who is now a doctor treating COVID-19 patients in Winslow, Arizona. Full Article video Sports
is Chicago State women's basketball coach Misty Opat resigns By sports.yahoo.com Published On :: Fri, 17 Apr 2020 17:37:52 GMT CHICAGO (AP) -- Chicago State women’s coach Misty Opat resigned Thursday after two seasons and a 3-55 record. Full Article article Sports
is Natalie Chou on why she took a stand against anti-Asian racism in wake of coronavirus By sports.yahoo.com Published On :: Tue, 05 May 2020 16:24:25 GMT During Wednesday's "Pac-12 Perspective" podcast, Natalie Chou shared why she is using her platform to speak out against racism she sees in her community related to the novel coronavirus. Full Article video News
is The limiting behavior of isotonic and convex regression estimators when the model is misspecified By projecteuclid.org Published On :: Tue, 05 May 2020 22:00 EDT Eunji Lim. Source: Electronic Journal of Statistics, Volume 14, Number 1, 2053--2097.Abstract: We study the asymptotic behavior of the least squares estimators when the model is possibly misspecified. We consider the setting where we wish to estimate an unknown function $f_{*}:(0,1)^{d} ightarrow mathbb{R}$ from observations $(X,Y),(X_{1},Y_{1}),cdots ,(X_{n},Y_{n})$; our estimator $hat{g}_{n}$ is the minimizer of $sum _{i=1}^{n}(Y_{i}-g(X_{i}))^{2}/n$ over $gin mathcal{G}$ for some set of functions $mathcal{G}$. We provide sufficient conditions on the metric entropy of $mathcal{G}$, under which $hat{g}_{n}$ converges to $g_{*}$ as $n ightarrow infty $, where $g_{*}$ is the minimizer of $|g-f_{*}| riangleq mathbb{E}(g(X)-f_{*}(X))^{2}$ over $gin mathcal{G}$. As corollaries of our theorem, we establish $|hat{g}_{n}-g_{*}| ightarrow 0$ as $n ightarrow infty $ when $mathcal{G}$ is the set of monotone functions or the set of convex functions. We also make a connection between the convergence rate of $|hat{g}_{n}-g_{*}|$ and the metric entropy of $mathcal{G}$. As special cases of our finding, we compute the convergence rate of $|hat{g}_{n}-g_{*}|^{2}$ when $mathcal{G}$ is the set of bounded monotone functions or the set of bounded convex functions. Full Article
is Statistical convergence of the EM algorithm on Gaussian mixture models By projecteuclid.org Published On :: Tue, 05 May 2020 22:00 EDT Ruofei Zhao, Yuanzhi Li, Yuekai Sun. Source: Electronic Journal of Statistics, Volume 14, Number 1, 632--660.Abstract: We study the convergence behavior of the Expectation Maximization (EM) algorithm on Gaussian mixture models with an arbitrary number of mixture components and mixing weights. We show that as long as the means of the components are separated by at least $Omega (sqrt{min {M,d}})$, where $M$ is the number of components and $d$ is the dimension, the EM algorithm converges locally to the global optimum of the log-likelihood. Further, we show that the convergence rate is linear and characterize the size of the basin of attraction to the global optimum. Full Article
is Generalised cepstral models for the spectrum of vector time series By projecteuclid.org Published On :: Tue, 05 May 2020 22:00 EDT Maddalena Cavicchioli. Source: Electronic Journal of Statistics, Volume 14, Number 1, 605--631.Abstract: The paper treats the modeling of stationary multivariate stochastic processes via a frequency domain model expressed in terms of cepstrum theory. The proposed model nests the vector exponential model of [20] as a special case, and extends the generalised cepstral model of [36] to the multivariate setting, answering a question raised by the last authors in their paper. Contemporarily, we extend the notion of generalised autocovariance function of [35] to vector time series. Then we derive explicit matrix formulas connecting generalised cepstral and autocovariance matrices of the process, and prove the consistency and asymptotic properties of the Whittle likelihood estimators of model parameters. Asymptotic theory for the special case of the vector exponential model is a significant addition to the paper of [20]. We also provide a mathematical machinery, based on matrix differentiation, and computational methods to derive our results, which differ significantly from those employed in the univariate case. The utility of the proposed model is illustrated through Monte Carlo simulation from a bivariate process characterized by a high dynamic range, and an empirical application on time varying minimum variance hedge ratios through the second moments of future and spot prices in the corn commodity market. Full Article
is On the Letac-Massam conjecture and existence of high dimensional Bayes estimators for graphical models By projecteuclid.org Published On :: Tue, 05 May 2020 22:00 EDT Emanuel Ben-David, Bala Rajaratnam. Source: Electronic Journal of Statistics, Volume 14, Number 1, 580--604.Abstract: The Wishart distribution defined on the open cone of positive-definite matrices plays a central role in multivariate analysis and multivariate distribution theory. Its domain of parameters is often referred to as the Gindikin set. In recent years, varieties of useful extensions of the Wishart distribution have been proposed in the literature for the purposes of studying Markov random fields and graphical models. In particular, generalizations of the Wishart distribution, referred to as Type I and Type II (graphical) Wishart distributions introduced by Letac and Massam in Annals of Statistics (2007) play important roles in both frequentist and Bayesian inference for Gaussian graphical models. These distributions have been especially useful in high-dimensional settings due to the flexibility offered by their multiple-shape parameters. Concerning Type I and Type II Wishart distributions, a conjecture of Letac and Massam concerns the domain of multiple-shape parameters of these distributions. The conjecture also has implications for the existence of Bayes estimators corresponding to these high dimensional priors. The conjecture, which was first posed in the Annals of Statistics, has now been an open problem for about 10 years. In this paper, we give a necessary condition for the Letac and Massam conjecture to hold. More precisely, we prove that if the Letac and Massam conjecture holds on a decomposable graph, then no two separators of the graph can be nested within each other. For this, we analyze Type I and Type II Wishart distributions on appropriate Markov equivalent perfect DAG models and succeed in deriving the aforementioned necessary condition. This condition in particular identifies a class of counterexamples to the conjecture. Full Article
is Consistent model selection criteria and goodness-of-fit test for common time series models By projecteuclid.org Published On :: Mon, 27 Apr 2020 22:02 EDT Jean-Marc Bardet, Kare Kamila, William Kengne. Source: Electronic Journal of Statistics, Volume 14, Number 1, 2009--2052.Abstract: This paper studies the model selection problem in a large class of causal time series models, which includes both the ARMA or AR($infty $) processes, as well as the GARCH or ARCH($infty $), APARCH, ARMA-GARCH and many others processes. To tackle this issue, we consider a penalized contrast based on the quasi-likelihood of the model. We provide sufficient conditions for the penalty term to ensure the consistency of the proposed procedure as well as the consistency and the asymptotic normality of the quasi-maximum likelihood estimator of the chosen model. We also propose a tool for diagnosing the goodness-of-fit of the chosen model based on a Portmanteau test. Monte-Carlo experiments and numerical applications on illustrative examples are performed to highlight the obtained asymptotic results. Moreover, using a data-driven choice of the penalty, they show the practical efficiency of this new model selection procedure and Portemanteau test. Full Article
is Sparse equisigned PCA: Algorithms and performance bounds in the noisy rank-1 setting By projecteuclid.org Published On :: Mon, 27 Apr 2020 22:02 EDT Arvind Prasadan, Raj Rao Nadakuditi, Debashis Paul. Source: Electronic Journal of Statistics, Volume 14, Number 1, 345--385.Abstract: Singular value decomposition (SVD) based principal component analysis (PCA) breaks down in the high-dimensional and limited sample size regime below a certain critical eigen-SNR that depends on the dimensionality of the system and the number of samples. Below this critical eigen-SNR, the estimates returned by the SVD are asymptotically uncorrelated with the latent principal components. We consider a setting where the left singular vector of the underlying rank one signal matrix is assumed to be sparse and the right singular vector is assumed to be equisigned, that is, having either only nonnegative or only nonpositive entries. We consider six different algorithms for estimating the sparse principal component based on different statistical criteria and prove that by exploiting sparsity, we recover consistent estimates in the low eigen-SNR regime where the SVD fails. Our analysis reveals conditions under which a coordinate selection scheme based on a sum-type decision statistic outperforms schemes that utilize the $ell _{1}$ and $ell _{2}$ norm-based statistics. We derive lower bounds on the size of detectable coordinates of the principal left singular vector and utilize these lower bounds to derive lower bounds on the worst-case risk. Finally, we verify our findings with numerical simulations and a illustrate the performance with a video data where the interest is in identifying objects. Full Article
is Kaplan-Meier V- and U-statistics By projecteuclid.org Published On :: Thu, 23 Apr 2020 22:01 EDT Tamara Fernández, Nicolás Rivera. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1872--1916.Abstract: In this paper, we study Kaplan-Meier V- and U-statistics respectively defined as $ heta (widehat{F}_{n})=sum _{i,j}K(X_{[i:n]},X_{[j:n]})W_{i}W_{j}$ and $ heta _{U}(widehat{F}_{n})=sum _{i eq j}K(X_{[i:n]},X_{[j:n]})W_{i}W_{j}/sum _{i eq j}W_{i}W_{j}$, where $widehat{F}_{n}$ is the Kaplan-Meier estimator, ${W_{1},ldots ,W_{n}}$ are the Kaplan-Meier weights and $K:(0,infty )^{2} o mathbb{R}$ is a symmetric kernel. As in the canonical setting of uncensored data, we differentiate between two asymptotic behaviours for $ heta (widehat{F}_{n})$ and $ heta _{U}(widehat{F}_{n})$. Additionally, we derive an asymptotic canonical V-statistic representation of the Kaplan-Meier V- and U-statistics. By using this representation we study properties of the asymptotic distribution. Applications to hypothesis testing are given. Full Article
is Exact recovery in block spin Ising models at the critical line By projecteuclid.org Published On :: Thu, 23 Apr 2020 22:01 EDT Matthias Löwe, Kristina Schubert. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1796--1815.Abstract: We show how to exactly reconstruct the block structure at the critical line in the so-called Ising block model. This model was recently re-introduced by Berthet, Rigollet and Srivastava in [2]. There the authors show how to exactly reconstruct blocks away from the critical line and they give an upper and a lower bound on the number of observations one needs; thereby they establish a minimax optimal rate (up to constants). Our technique relies on a combination of their methods with fluctuation results obtained in [20]. The latter are extended to the full critical regime. We find that the number of necessary observations depends on whether the interaction parameter between two blocks is positive or negative: In the first case, there are about $Nlog N$ observations required to exactly recover the block structure, while in the latter case $sqrt{N}log N$ observations suffice. Full Article
is Nonparametric false discovery rate control for identifying simultaneous signals By projecteuclid.org Published On :: Thu, 23 Apr 2020 22:01 EDT Sihai Dave Zhao, Yet Tien Nguyen. Source: Electronic Journal of Statistics, Volume 14, Number 1, 110--142.Abstract: It is frequently of interest to identify simultaneous signals, defined as features that exhibit statistical significance across each of several independent experiments. For example, genes that are consistently differentially expressed across experiments in different animal species can reveal evolutionarily conserved biological mechanisms. However, in some problems the test statistics corresponding to these features can have complicated or unknown null distributions. This paper proposes a novel nonparametric false discovery rate control procedure that can identify simultaneous signals even without knowing these null distributions. The method is shown, theoretically and in simulations, to asymptotically control the false discovery rate. It was also used to identify genes that were both differentially expressed and proximal to differentially accessible chromatin in the brains of mice exposed to a conspecific intruder. The proposed method is available in the R package github.com/sdzhao/ssa. Full Article
is Monotone least squares and isotonic quantiles By projecteuclid.org Published On :: Wed, 22 Apr 2020 04:02 EDT Alexandre Mösching, Lutz Dümbgen. Source: Electronic Journal of Statistics, Volume 14, Number 1, 24--49.Abstract: We consider bivariate observations $(X_{1},Y_{1}),ldots,(X_{n},Y_{n})$ such that, conditional on the $X_{i}$, the $Y_{i}$ are independent random variables. Precisely, the conditional distribution function of $Y_{i}$ equals $F_{X_{i}}$, where $(F_{x})_{x}$ is an unknown family of distribution functions. Under the sole assumption that $xmapsto F_{x}$ is isotonic with respect to stochastic order, one can estimate $(F_{x})_{x}$ in two ways: (i) For any fixed $y$ one estimates the antitonic function $xmapsto F_{x}(y)$ via nonparametric monotone least squares, replacing the responses $Y_{i}$ with the indicators $1_{[Y_{i}le y]}$. (ii) For any fixed $eta in (0,1)$ one estimates the isotonic quantile function $xmapsto F_{x}^{-1}(eta)$ via a nonparametric version of regression quantiles. We show that these two approaches are closely related, with (i) being more flexible than (ii). Then, under mild regularity conditions, we establish rates of convergence for the resulting estimators $hat{F}_{x}(y)$ and $hat{F}_{x}^{-1}(eta)$, uniformly over $(x,y)$ and $(x,eta)$ in certain rectangles as well as uniformly in $y$ or $eta$ for a fixed $x$. Full Article
is Random distributions via Sequential Quantile Array By projecteuclid.org Published On :: Wed, 08 Apr 2020 22:01 EDT Annalisa Fabretti, Samantha Leorato. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1611--1647.Abstract: We propose a method to generate random distributions with known quantile distribution, or, more generally, with known distribution for some form of generalized quantile. The method takes inspiration from the random Sequential Barycenter Array distributions (SBA) proposed by Hill and Monticino (1998) which generates a Random Probability Measure (RPM) with known expected value. We define the Sequential Quantile Array (SQA) and show how to generate a random SQA from which we can derive RPMs. The distribution of the generated SQA-RPM can have full support and the RPMs can be both discrete, continuous and differentiable. We face also the problem of the efficient implementation of the procedure that ensures that the approximation of the SQA-RPM by a finite number of steps stays close to the SQA-RPM obtained theoretically by the procedure. Finally, we compare SQA-RPMs with similar approaches as Polya Tree. Full Article
is Estimating piecewise monotone signals By projecteuclid.org Published On :: Wed, 08 Apr 2020 22:01 EDT Kentaro Minami. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1508--1576.Abstract: We study the problem of estimating piecewise monotone vectors. This problem can be seen as a generalization of the isotonic regression that allows a small number of order-violating changepoints. We focus mainly on the performance of the nearly-isotonic regression proposed by Tibshirani et al. (2011). We derive risk bounds for the nearly-isotonic regression estimators that are adaptive to piecewise monotone signals. The estimator achieves a near minimax convergence rate over certain classes of piecewise monotone signals under a weak assumption. Furthermore, we present an algorithm that can be applied to the nearly-isotonic type estimators on general weighted graphs. The simulation results suggest that the nearly-isotonic regression performs as well as the ideal estimator that knows the true positions of changepoints. Full Article
is A Bayesian approach to disease clustering using restricted Chinese restaurant processes By projecteuclid.org Published On :: Wed, 08 Apr 2020 22:01 EDT Claudia Wehrhahn, Samuel Leonard, Abel Rodriguez, Tatiana Xifara. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1449--1478.Abstract: Identifying disease clusters (areas with an unusually high incidence of a particular disease) is a common problem in epidemiology and public health. We describe a Bayesian nonparametric mixture model for disease clustering that constrains clusters to be made of adjacent areal units. This is achieved by modifying the exchangeable partition probability function associated with the Ewen’s sampling distribution. We call the resulting prior the Restricted Chinese Restaurant Process, as the associated full conditional distributions resemble those associated with the standard Chinese Restaurant Process. The model is illustrated using synthetic data sets and in an application to oral cancer mortality in Germany. Full Article
is A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables By projecteuclid.org Published On :: Fri, 27 Mar 2020 22:00 EDT Ryoya Oda, Hirokazu Yanagihara. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1386--1412.Abstract: We put forward a variable selection method for selecting explanatory variables in a normality-assumed multivariate linear regression. It is cumbersome to calculate variable selection criteria for all subsets of explanatory variables when the number of explanatory variables is large. Therefore, we propose a fast and consistent variable selection method based on a generalized $C_{p}$ criterion. The consistency of the method is provided by a high-dimensional asymptotic framework such that the sample size and the sum of the dimensions of response vectors and explanatory vectors divided by the sample size tend to infinity and some positive constant which are less than one, respectively. Through numerical simulations, it is shown that the proposed method has a high probability of selecting the true subset of explanatory variables and is fast under a moderate sample size even when the number of dimensions is large. Full Article
is Consistency and asymptotic normality of Latent Block Model estimators By projecteuclid.org Published On :: Mon, 23 Mar 2020 22:02 EDT Vincent Brault, Christine Keribin, Mahendra Mariadassou. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1234--1268.Abstract: The Latent Block Model (LBM) is a model-based method to cluster simultaneously the $d$ columns and $n$ rows of a data matrix. Parameter estimation in LBM is a difficult and multifaceted problem. Although various estimation strategies have been proposed and are now well understood empirically, theoretical guarantees about their asymptotic behavior is rather sparse and most results are limited to the binary setting. We prove here theoretical guarantees in the valued settings. We show that under some mild conditions on the parameter space, and in an asymptotic regime where $log (d)/n$ and $log (n)/d$ tend to $0$ when $n$ and $d$ tend to infinity, (1) the maximum-likelihood estimate of the complete model (with known labels) is consistent and (2) the log-likelihood ratios are equivalent under the complete and observed (with unknown labels) models. This equivalence allows us to transfer the asymptotic consistency, and under mild conditions, asymptotic normality, to the maximum likelihood estimate under the observed model. Moreover, the variational estimator is also consistent and, under the same conditions, asymptotically normal. Full Article
is A general drift estimation procedure for stochastic differential equations with additive fractional noise By projecteuclid.org Published On :: Tue, 25 Feb 2020 22:00 EST Fabien Panloup, Samy Tindel, Maylis Varvenne. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1075--1136.Abstract: In this paper we consider the drift estimation problem for a general differential equation driven by an additive multidimensional fractional Brownian motion, under ergodic assumptions on the drift coefficient. Our estimation procedure is based on the identification of the invariant measure, and we provide consistency results as well as some information about the convergence rate. We also give some examples of coefficients for which the identifiability assumption for the invariant measure is satisfied. Full Article
is Testing goodness of fit for point processes via topological data analysis By projecteuclid.org Published On :: Mon, 24 Feb 2020 04:00 EST Christophe A. N. Biscio, Nicolas Chenavier, Christian Hirsch, Anne Marie Svane. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1024--1074.Abstract: We introduce tests for the goodness of fit of point patterns via methods from topological data analysis. More precisely, the persistent Betti numbers give rise to a bivariate functional summary statistic for observed point patterns that is asymptotically Gaussian in large observation windows. We analyze the power of tests derived from this statistic on simulated point patterns and compare its performance with global envelope tests. Finally, we apply the tests to a point pattern from an application context in neuroscience. As the main methodological contribution, we derive sufficient conditions for a functional central limit theorem on bounded persistent Betti numbers of point processes with exponential decay of correlations. Full Article
is On the distribution, model selection properties and uniqueness of the Lasso estimator in low and high dimensions By projecteuclid.org Published On :: Mon, 17 Feb 2020 22:06 EST Karl Ewald, Ulrike Schneider. Source: Electronic Journal of Statistics, Volume 14, Number 1, 944--969.Abstract: We derive expressions for the finite-sample distribution of the Lasso estimator in the context of a linear regression model in low as well as in high dimensions by exploiting the structure of the optimization problem defining the estimator. In low dimensions, we assume full rank of the regressor matrix and present expressions for the cumulative distribution function as well as the densities of the absolutely continuous parts of the estimator. Our results are presented for the case of normally distributed errors, but do not hinge on this assumption and can easily be generalized. Additionally, we establish an explicit formula for the correspondence between the Lasso and the least-squares estimator. We derive analogous results for the distribution in less explicit form in high dimensions where we make no assumptions on the regressor matrix at all. In this setting, we also investigate the model selection properties of the Lasso and show that possibly only a subset of models might be selected by the estimator, completely independently of the observed response vector. Finally, we present a condition for uniqueness of the estimator that is necessary as well as sufficient. Full Article
is On a Metropolis–Hastings importance sampling estimator By projecteuclid.org Published On :: Mon, 10 Feb 2020 04:01 EST Daniel Rudolf, Björn Sprungk. Source: Electronic Journal of Statistics, Volume 14, Number 1, 857--889.Abstract: A classical approach for approximating expectations of functions w.r.t. partially known distributions is to compute the average of function values along a trajectory of a Metropolis–Hastings (MH) Markov chain. A key part in the MH algorithm is a suitable acceptance/rejection of a proposed state, which ensures the correct stationary distribution of the resulting Markov chain. However, the rejection of proposals causes highly correlated samples. In particular, when a state is rejected it is not taken any further into account. In contrast to that we consider a MH importance sampling estimator which explicitly incorporates all proposed states generated by the MH algorithm. The estimator satisfies a strong law of large numbers as well as a central limit theorem, and, in addition to that, we provide an explicit mean squared error bound. Remarkably, the asymptotic variance of the MH importance sampling estimator does not involve any correlation term in contrast to its classical counterpart. Moreover, although the analyzed estimator uses the same amount of information as the classical MH estimator, it can outperform the latter in scenarios of moderate dimensions as indicated by numerical experiments. Full Article
is The bias of isotonic regression By projecteuclid.org Published On :: Tue, 04 Feb 2020 22:03 EST Ran Dai, Hyebin Song, Rina Foygel Barber, Garvesh Raskutti. Source: Electronic Journal of Statistics, Volume 14, Number 1, 801--834.Abstract: We study the bias of the isotonic regression estimator. While there is extensive work characterizing the mean squared error of the isotonic regression estimator, relatively little is known about the bias. In this paper, we provide a sharp characterization, proving that the bias scales as $O(n^{-eta /3})$ up to log factors, where $1leq eta leq 2$ is the exponent corresponding to Hölder smoothness of the underlying mean. Importantly, this result only requires a strictly monotone mean and that the noise distribution has subexponential tails, without relying on symmetric noise or other restrictive assumptions. Full Article
is A Statistical Learning Approach to Modal Regression By Published On :: 2020 This paper studies the nonparametric modal regression problem systematically from a statistical learning viewpoint. Originally motivated by pursuing a theoretical understanding of the maximum correntropy criterion based regression (MCCR), our study reveals that MCCR with a tending-to-zero scale parameter is essentially modal regression. We show that the nonparametric modal regression problem can be approached via the classical empirical risk minimization. Some efforts are then made to develop a framework for analyzing and implementing modal regression. For instance, the modal regression function is described, the modal regression risk is defined explicitly and its Bayes rule is characterized; for the sake of computational tractability, the surrogate modal regression risk, which is termed as the generalization risk in our study, is introduced. On the theoretical side, the excess modal regression risk, the excess generalization risk, the function estimation error, and the relations among the above three quantities are studied rigorously. It turns out that under mild conditions, function estimation consistency and convergence may be pursued in modal regression as in vanilla regression protocols such as mean regression, median regression, and quantile regression. On the practical side, the implementation issues of modal regression including the computational algorithm and the selection of the tuning parameters are discussed. Numerical validations on modal regression are also conducted to verify our findings. Full Article
is A Model of Fake Data in Data-driven Analysis By Published On :: 2020 Data-driven analysis has been increasingly used in various decision making processes. With more sources, including reviews, news, and pictures, can now be used for data analysis, the authenticity of data sources is in doubt. While previous literature attempted to detect fake data piece by piece, in the current work, we try to capture the fake data sender's strategic behavior to detect the fake data source. Specifically, we model the tension between a data receiver who makes data-driven decisions and a fake data sender who benefits from misleading the receiver. We propose a potentially infinite horizon continuous time game-theoretic model with asymmetric information to capture the fact that the receiver does not initially know the existence of fake data and learns about it during the course of the game. We use point processes to model the data traffic, where each piece of data can occur at any discrete moment in a continuous time flow. We fully solve the model and employ numerical examples to illustrate the players' strategies and payoffs for insights. Specifically, our results show that maintaining some suspicion about the data sources and understanding that the sender can be strategic are very helpful to the data receiver. In addition, based on our model, we propose a methodology of detecting fake data that is complementary to the previous studies on this topic, which suggested various approaches on analyzing the data piece by piece. We show that after analyzing each piece of data, understanding a source by looking at the its whole history of pushing data can be helpful. Full Article
is On Mahalanobis Distance in Functional Settings By Published On :: 2020 Mahalanobis distance is a classical tool in multivariate analysis. We suggest here an extension of this concept to the case of functional data. More precisely, the proposed definition concerns those statistical problems where the sample data are real functions defined on a compact interval of the real line. The obvious difficulty for such a functional extension is the non-invertibility of the covariance operator in infinite-dimensional cases. Unlike other recent proposals, our definition is suggested and motivated in terms of the Reproducing Kernel Hilbert Space (RKHS) associated with the stochastic process that generates the data. The proposed distance is a true metric; it depends on a unique real smoothing parameter which is fully motivated in RKHS terms. Moreover, it shares some properties of its finite dimensional counterpart: it is invariant under isometries, it can be consistently estimated from the data and its sampling distribution is known under Gaussian models. An empirical study for two statistical applications, outliers detection and binary classification, is included. The results are quite competitive when compared to other recent proposals in the literature. Full Article
is Generalized probabilistic principal component analysis of correlated data By Published On :: 2020 Principal component analysis (PCA) is a well-established tool in machine learning and data processing. The principal axes in PCA were shown to be equivalent to the maximum marginal likelihood estimator of the factor loading matrix in a latent factor model for the observed data, assuming that the latent factors are independently distributed as standard normal distributions. However, the independence assumption may be unrealistic for many scenarios such as modeling multiple time series, spatial processes, and functional data, where the outcomes are correlated. In this paper, we introduce the generalized probabilistic principal component analysis (GPPCA) to study the latent factor model for multiple correlated outcomes, where each factor is modeled by a Gaussian process. Our method generalizes the previous probabilistic formulation of PCA (PPCA) by providing the closed-form maximum marginal likelihood estimator of the factor loadings and other parameters. Based on the explicit expression of the precision matrix in the marginal likelihood that we derived, the number of the computational operations is linear to the number of output variables. Furthermore, we also provide the closed-form expression of the marginal likelihood when other covariates are included in the mean structure. We highlight the advantage of GPPCA in terms of the practical relevance, estimation accuracy and computational convenience. Numerical studies of simulated and real data confirm the excellent finite-sample performance of the proposed approach. Full Article
is GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing By Published On :: 2020 We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customization. Leveraging the MXNet ecosystem, the deep learning models in GluonCV and GluonNLP can be deployed onto a variety of platforms with different programming languages. The Apache 2.0 license has been adopted by GluonCV and GluonNLP to allow for software distribution, modification, and usage. Full Article
is Distributed Feature Screening via Componentwise Debiasing By Published On :: 2020 Feature screening is a powerful tool in processing high-dimensional data. When the sample size N and the number of features p are both large, the implementation of classic screening methods can be numerically challenging. In this paper, we propose a distributed screening framework for big data setup. In the spirit of 'divide-and-conquer', the proposed framework expresses a correlation measure as a function of several component parameters, each of which can be distributively estimated using a natural U-statistic from data segments. With the component estimates aggregated, we obtain a final correlation estimate that can be readily used for screening features. This framework enables distributed storage and parallel computing and thus is computationally attractive. Due to the unbiased distributive estimation of the component parameters, the final aggregated estimate achieves a high accuracy that is insensitive to the number of data segments m. Under mild conditions, we show that the aggregated correlation estimator is as efficient as the centralized estimator in terms of the probability convergence bound and the mean squared error rate; the corresponding screening procedure enjoys sure screening property for a wide range of correlation measures. The promising performances of the new method are supported by extensive numerical examples. Full Article
is Lower Bounds for Testing Graphical Models: Colorings and Antiferromagnetic Ising Models By Published On :: 2020 We study the identity testing problem in the context of spin systems or undirected graphical models, where it takes the following form: given the parameter specification of the model $M$ and a sampling oracle for the distribution $mu_{M^*}$ of an unknown model $M^*$, can we efficiently determine if the two models $M$ and $M^*$ are the same? We consider identity testing for both soft-constraint and hard-constraint systems. In particular, we prove hardness results in two prototypical cases, the Ising model and proper colorings, and explore whether identity testing is any easier than structure learning. For the ferromagnetic (attractive) Ising model, Daskalakis et al. (2018) presented a polynomial-time algorithm for identity testing. We prove hardness results in the antiferromagnetic (repulsive) setting in the same regime of parameters where structure learning is known to require a super-polynomial number of samples. Specifically, for $n$-vertex graphs of maximum degree $d$, we prove that if $|eta| d = omega(log{n})$ (where $eta$ is the inverse temperature parameter), then there is no polynomial running time identity testing algorithm unless $RP=NP$. In the hard-constraint setting, we present hardness results for identity testing for proper colorings. Our results are based on the presumed hardness of #BIS, the problem of (approximately) counting independent sets in bipartite graphs. Full Article
is On the consistency of graph-based Bayesian semi-supervised learning and the scalability of sampling algorithms By Published On :: 2020 This paper considers a Bayesian approach to graph-based semi-supervised learning. We show that if the graph parameters are suitably scaled, the graph-posteriors converge to a continuum limit as the size of the unlabeled data set grows. This consistency result has profound algorithmic implications: we prove that when consistency holds, carefully designed Markov chain Monte Carlo algorithms have a uniform spectral gap, independent of the number of unlabeled inputs. Numerical experiments illustrate and complement the theory. Full Article
is On the Complexity Analysis of the Primal Solutions for the Accelerated Randomized Dual Coordinate Ascent By Published On :: 2020 Dual first-order methods are essential techniques for large-scale constrained convex optimization. However, when recovering the primal solutions, we need $T(epsilon^{-2})$ iterations to achieve an $epsilon$-optimal primal solution when we apply an algorithm to the non-strongly convex dual problem with $T(epsilon^{-1})$ iterations to achieve an $epsilon$-optimal dual solution, where $T(x)$ can be $x$ or $sqrt{x}$. In this paper, we prove that the iteration complexity of the primal solutions and dual solutions have the same $Oleft(frac{1}{sqrt{epsilon}} ight)$ order of magnitude for the accelerated randomized dual coordinate ascent. When the dual function further satisfies the quadratic functional growth condition, by restarting the algorithm at any period, we establish the linear iteration complexity for both the primal solutions and dual solutions even if the condition number is unknown. When applied to the regularized empirical risk minimization problem, we prove the iteration complexity of $Oleft(nlog n+sqrt{frac{n}{epsilon}} ight)$ in both primal space and dual space, where $n$ is the number of samples. Our result takes out the $left(log frac{1}{epsilon} ight)$ factor compared with the methods based on smoothing/regularization or Catalyst reduction. As far as we know, this is the first time that the optimal $Oleft(sqrt{frac{n}{epsilon}} ight)$ iteration complexity in the primal space is established for the dual coordinate ascent based stochastic algorithms. We also establish the accelerated linear complexity for some problems with nonsmooth loss, e.g., the least absolute deviation and SVM. Full Article
is Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent By Published On :: 2020 We propose graph-dependent implicit regularisation strategies for synchronised distributed stochastic subgradient descent (Distributed SGD) for convex problems in multi-agent learning. Under the standard assumptions of convexity, Lipschitz continuity, and smoothness, we establish statistical learning rates that retain, up to logarithmic terms, single-machine serial statistical guarantees through implicit regularisation (step size tuning and early stopping) with appropriate dependence on the graph topology. Our approach avoids the need for explicit regularisation in decentralised learning problems, such as adding constraints to the empirical risk minimisation rule. Particularly for distributed methods, the use of implicit regularisation allows the algorithm to remain simple, without projections or dual methods. To prove our results, we establish graph-independent generalisation bounds for Distributed SGD that match the single-machine serial SGD setting (using algorithmic stability), and we establish graph-dependent optimisation bounds that are of independent interest. We present numerical experiments to show that the qualitative nature of the upper bounds we derive can be representative of real behaviours. Full Article
is Noise Accumulation in High Dimensional Classification and Total Signal Index By Published On :: 2020 Great attention has been paid to Big Data in recent years. Such data hold promise for scientific discoveries but also pose challenges to analyses. One potential challenge is noise accumulation. In this paper, we explore noise accumulation in high dimensional two-group classification. First, we revisit a previous assessment of noise accumulation with principal component analyses, which yields a different threshold for discriminative ability than originally identified. Then we extend our scope to its impact on classifiers developed with three common machine learning approaches---random forest, support vector machine, and boosted classification trees. We simulate four scenarios with differing amounts of signal strength to evaluate each method. After determining noise accumulation may affect the performance of these classifiers, we assess factors that impact it. We conduct simulations by varying sample size, signal strength, signal strength proportional to the number predictors, and signal magnitude with random forest classifiers. These simulations suggest that noise accumulation affects the discriminative ability of high-dimensional classifiers developed using common machine learning methods, which can be modified by sample size, signal strength, and signal magnitude. We developed the measure total signal index (TSI) to track the trends of total signal and noise accumulation. Full Article