on Constrained Bayesian Optimization with Noisy Experiments By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Benjamin Letham, Brian Karrer, Guilherme Ottoni, Eytan Bakshy. Source: Bayesian Analysis, Volume 14, Number 2, 495--519.Abstract: Randomized experiments are the gold standard for evaluating the effects of changes to real-world systems. Data in these tests may be difficult to collect and outcomes may have high variance, resulting in potentially large measurement error. Bayesian optimization is a promising technique for efficiently optimizing multiple continuous parameters, but existing approaches degrade in performance when the noise level is high, limiting its applicability to many randomized experiments. We derive an expression for expected improvement under greedy batch optimization with noisy observations and noisy constraints, and develop a quasi-Monte Carlo approximation that allows it to be efficiently optimized. Simulations with synthetic functions show that optimization performance on noisy, constrained problems outperforms existing methods. We further demonstrate the effectiveness of the method with two real-world experiments conducted at Facebook: optimizing a ranking system, and optimizing server compiler flags. Full Article
on Analysis of the Maximal a Posteriori Partition in the Gaussian Dirichlet Process Mixture Model By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Łukasz Rajkowski. Source: Bayesian Analysis, Volume 14, Number 2, 477--494.Abstract: Mixture models are a natural choice in many applications, but it can be difficult to place an a priori upper bound on the number of components. To circumvent this, investigators are turning increasingly to Dirichlet process mixture models (DPMMs). It is therefore important to develop an understanding of the strengths and weaknesses of this approach. This work considers the MAP (maximum a posteriori) clustering for the Gaussian DPMM (where the cluster means have Gaussian distribution and, for each cluster, the observations within the cluster have Gaussian distribution). Some desirable properties of the MAP partition are proved: ‘almost disjointness’ of the convex hulls of clusters (they may have at most one point in common) and (with natural assumptions) the comparability of sizes of those clusters that intersect any fixed ball with the number of observations (as the latter goes to infinity). Consequently, the number of such clusters remains bounded. Furthermore, if the data arises from independent identically distributed sampling from a given distribution with bounded support then the asymptotic MAP partition of the observation space maximises a function which has a straightforward expression, which depends only on the within-group covariance parameter. As the operator norm of this covariance parameter decreases, the number of clusters in the MAP partition becomes arbitrarily large, which may lead to the overestimation of the number of mixture components. Full Article
on Efficient Bayesian Regularization for Graphical Model Selection By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Suprateek Kundu, Bani K. Mallick, Veera Baladandayuthapani. Source: Bayesian Analysis, Volume 14, Number 2, 449--476.Abstract: There has been an intense development in the Bayesian graphical model literature over the past decade; however, most of the existing methods are restricted to moderate dimensions. We propose a novel graphical model selection approach for large dimensional settings where the dimension increases with the sample size, by decoupling model fitting and covariance selection. First, a full model based on a complete graph is fit under a novel class of mixtures of inverse–Wishart priors, which induce shrinkage on the precision matrix under an equivalence with Cholesky-based regularization, while enabling conjugate updates. Subsequently, a post-fitting model selection step uses penalized joint credible regions to perform model selection. This allows our methods to be computationally feasible for large dimensional settings using a combination of straightforward Gibbs samplers and efficient post-fitting inferences. Theoretical guarantees in terms of selection consistency are also established. Simulations show that the proposed approach compares favorably with competing methods, both in terms of accuracy metrics and computation times. We apply this approach to a cancer genomics data example. Full Article
on A Bayesian Approach to Statistical Shape Analysis via the Projected Normal Distribution By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Luis Gutiérrez, Eduardo Gutiérrez-Peña, Ramsés H. Mena. Source: Bayesian Analysis, Volume 14, Number 2, 427--447.Abstract: This work presents a Bayesian predictive approach to statistical shape analysis. A modeling strategy that starts with a Gaussian distribution on the configuration space, and then removes the effects of location, rotation and scale, is studied. This boils down to an application of the projected normal distribution to model the configurations in the shape space, which together with certain identifiability constraints, facilitates parameter interpretation. Having better control over the parameters allows us to generalize the model to a regression setting where the effect of predictors on shapes can be considered. The methodology is illustrated and tested using both simulated scenarios and a real data set concerning eight anatomical landmarks on a sagittal plane of the corpus callosum in patients with autism and in a group of controls. Full Article
on Control of Type I Error Rates in Bayesian Sequential Designs By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Haolun Shi, Guosheng Yin. Source: Bayesian Analysis, Volume 14, Number 2, 399--425.Abstract: Bayesian approaches to phase II clinical trial designs are usually based on the posterior distribution of the parameter of interest and calibration of certain threshold for decision making. If the posterior probability is computed and assessed in a sequential manner, the design may involve the problem of multiplicity, which, however, is often a neglected aspect in Bayesian trial designs. To effectively maintain the overall type I error rate, we propose solutions to the problem of multiplicity for Bayesian sequential designs and, in particular, the determination of the cutoff boundaries for the posterior probabilities. We present both theoretical and numerical methods for finding the optimal posterior probability boundaries with $alpha$ -spending functions that mimic those of the frequentist group sequential designs. The theoretical approach is based on the asymptotic properties of the posterior probability, which establishes a connection between the Bayesian trial design and the frequentist group sequential method. The numerical approach uses a sandwich-type searching algorithm, which immensely reduces the computational burden. We apply least-square fitting to find the $alpha$ -spending function closest to the target. We discuss the application of our method to single-arm and double-arm cases with binary and normal endpoints, respectively, and provide a real trial example for each case. Full Article
on Variational Message Passing for Elaborate Response Regression Models By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT M. W. McLean, M. P. Wand. Source: Bayesian Analysis, Volume 14, Number 2, 371--398.Abstract: We build on recent work concerning message passing approaches to approximate fitting and inference for arbitrarily large regression models. The focus is on regression models where the response variable is modeled to have an elaborate distribution, which is loosely defined to mean a distribution that is more complicated than common distributions such as those in the Bernoulli, Poisson and Normal families. Examples of elaborate response families considered here are the Negative Binomial and $t$ families. Variational message passing is more challenging due to some of the conjugate exponential families being non-standard and numerical integration being needed. Nevertheless, a factor graph fragment approach means the requisite calculations only need to be done once for a particular elaborate response distribution family. Computer code can be compartmentalized, including that involving numerical integration. A major finding of this work is that the modularity of variational message passing extends to elaborate response regression models. Full Article
on Bayesian Effect Fusion for Categorical Predictors By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Daniela Pauger, Helga Wagner. Source: Bayesian Analysis, Volume 14, Number 2, 341--369.Abstract: We propose a Bayesian approach to obtain a sparse representation of the effect of a categorical predictor in regression type models. As this effect is captured by a group of level effects, sparsity cannot only be achieved by excluding single irrelevant level effects or the whole group of effects associated to this predictor but also by fusing levels which have essentially the same effect on the response. To achieve this goal, we propose a prior which allows for almost perfect as well as almost zero dependence between level effects a priori. This prior can alternatively be obtained by specifying spike and slab prior distributions on all effect differences associated to this categorical predictor. We show how restricted fusion can be implemented and develop an efficient MCMC (Markov chain Monte Carlo) method for posterior computation. The performance of the proposed method is investigated on simulated data and we illustrate its application on real data from EU-SILC (European Union Statistics on Income and Living Conditions). Full Article
on Modeling Population Structure Under Hierarchical Dirichlet Processes By projecteuclid.org Published On :: Wed, 13 Mar 2019 22:00 EDT Lloyd T. Elliott, Maria De Iorio, Stefano Favaro, Kaustubh Adhikari, Yee Whye Teh. Source: Bayesian Analysis, Volume 14, Number 2, 313--339.Abstract: We propose a Bayesian nonparametric model to infer population admixture, extending the hierarchical Dirichlet process to allow for correlation between loci due to linkage disequilibrium. Given multilocus genotype data from a sample of individuals, the proposed model allows inferring and classifying individuals as unadmixed or admixed, inferring the number of subpopulations ancestral to an admixed population and the population of origin of chromosomal regions. Our model does not assume any specific mutation process, and can be applied to most of the commonly used genetic markers. We present a Markov chain Monte Carlo (MCMC) algorithm to perform posterior inference from the model and we discuss some methods to summarize the MCMC output for the analysis of population admixture. Finally, we demonstrate the performance of the proposed model in a real application, using genetic data from the ectodysplasin-A receptor (EDAR) gene, which is considered to be ancestry-informative due to well-known variations in allele frequency as well as phenotypic effects across ancestry. The structure analysis of this dataset leads to the identification of a rare haplotype in Europeans. We also conduct a simulated experiment and show that our algorithm outperforms parametric methods. Full Article
on Separable covariance arrays via the Tucker product, with applications to multivariate relational data By projecteuclid.org Published On :: Wed, 13 Jun 2012 14:27 EDT Peter D. HoffSource: Bayesian Anal., Volume 6, Number 2, 179--196.Abstract: Modern datasets are often in the form of matrices or arrays, potentially having correlations along each set of data indices. For example, data involving repeated measurements of several variables over time may exhibit temporal correlation as well as correlation among the variables. A possible model for matrix-valued data is the class of matrix normal distributions, which is parametrized by two covariance matrices, one for each index set of the data. In this article we discuss an extension of the matrix normal model to accommodate multidimensional data arrays, or tensors. We show how a particular array-matrix product can be used to generate the class of array normal distributions having separable covariance structure. We derive some properties of these covariance structures and the corresponding array normal distributions, and show how the array-matrix product can be used to define a semi-conjugate prior distribution and calculate the corresponding posterior distribution. We illustrate the methodology in an analysis of multivariate longitudinal network data which take the form of a four-way array. Full Article
on Maximum Independent Component Analysis with Application to EEG Data By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Ruosi Guo, Chunming Zhang, Zhengjun Zhang. Source: Statistical Science, Volume 35, Number 1, 145--157.Abstract: In many scientific disciplines, finding hidden influential factors behind observational data is essential but challenging. The majority of existing approaches, such as the independent component analysis (${mathrm{ICA}}$), rely on linear transformation, that is, true signals are linear combinations of hidden components. Motivated from analyzing nonlinear temporal signals in neuroscience, genetics, and finance, this paper proposes the “maximum independent component analysis” (${mathrm{MaxICA}}$), based on max-linear combinations of components. In contrast to existing methods, ${mathrm{MaxICA}}$ benefits from focusing on significant major components while filtering out ignorable components. A major tool for parameter learning of ${mathrm{MaxICA}}$ is an augmented genetic algorithm, consisting of three schemes for the elite weighted sum selection, randomly combined crossover, and dynamic mutation. Extensive empirical evaluations demonstrate the effectiveness of ${mathrm{MaxICA}}$ in either extracting max-linearly combined essential sources in many applications or supplying a better approximation for nonlinearly combined source signals, such as $mathrm{EEG}$ recordings analyzed in this paper. Full Article
on Statistical Inference for the Evolutionary History of Cancer Genomes By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Khanh N. Dinh, Roman Jaksik, Marek Kimmel, Amaury Lambert, Simon Tavaré. Source: Statistical Science, Volume 35, Number 1, 129--144.Abstract: Recent years have seen considerable work on inference about cancer evolution from mutations identified in cancer samples. Much of the modeling work has been based on classical models of population genetics, generalized to accommodate time-varying cell population size. Reverse-time, genealogical views of such models, commonly known as coalescents, have been used to infer aspects of the past of growing populations. Another approach is to use branching processes, the simplest scenario being the classical linear birth-death process. Inference from evolutionary models of DNA often exploits summary statistics of the sequence data, a common one being the so-called Site Frequency Spectrum (SFS). In a bulk tumor sequencing experiment, we can estimate for each site at which a novel somatic point mutation has arisen, the proportion of cells that carry that mutation. These numbers are then grouped into collections of sites which have similar mutant fractions. We examine how the SFS based on birth-death processes differs from those based on the coalescent model. This may stem from the different sampling mechanisms in the two approaches. However, we also show that despite this, they are quantitatively comparable for the range of parameters typical for tumor cell populations. We also present a model of tumor evolution with selective sweeps, and demonstrate how it may help in understanding the history of a tumor as well as the influence of data pre-processing. We illustrate the theory with applications to several examples from The Cancer Genome Atlas tumors. Full Article
on Data Denoising and Post-Denoising Corrections in Single Cell RNA Sequencing By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Divyansh Agarwal, Jingshu Wang, Nancy R. Zhang. Source: Statistical Science, Volume 35, Number 1, 112--128.Abstract: Single cell sequencing technologies are transforming biomedical research. However, due to the inherent nature of the data, single cell RNA sequencing analysis poses new computational and statistical challenges. We begin with a survey of a selection of topics in this field, with a gentle introduction to the biology and a more detailed exploration of the technical noise. We consider in detail the problem of single cell data denoising, sometimes referred to as “imputation” in the relevant literature. We discuss why this is not a typical statistical imputation problem, and review current approaches to this problem. We then explore why the use of denoised values in downstream analyses invites novel statistical insights, and how denoising uncertainty should be accounted for to yield valid statistical inference. The utilization of denoised or imputed matrices in statistical inference is not unique to single cell genomics, and arises in many other fields. We describe the challenges in this type of analysis, discuss some preliminary solutions, and highlight unresolved issues. Full Article
on Statistical Molecule Counting in Super-Resolution Fluorescence Microscopy: Towards Quantitative Nanoscopy By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Thomas Staudt, Timo Aspelmeier, Oskar Laitenberger, Claudia Geisler, Alexander Egner, Axel Munk. Source: Statistical Science, Volume 35, Number 1, 92--111.Abstract: Super-resolution microscopy is rapidly gaining importance as an analytical tool in the life sciences. A compelling feature is the ability to label biological units of interest with fluorescent markers in (living) cells and to observe them with considerably higher resolution than conventional microscopy permits. The images obtained this way, however, lack an absolute intensity scale in terms of numbers of fluorophores observed. In this article, we discuss state of the art methods to count such fluorophores and statistical challenges that come along with it. In particular, we suggest a modeling scheme for time series generated by single-marker-switching (SMS) microscopy that makes it possible to quantify the number of markers in a statistically meaningful manner from the raw data. To this end, we model the entire process of photon generation in the fluorophore, their passage through the microscope, detection and photoelectron amplification in the camera, and extraction of time series from the microscopic images. At the heart of these modeling steps is a careful description of the fluorophore dynamics by a novel hidden Markov model that operates on two timescales (HTMM). Besides the fluorophore number, information about the kinetic transition rates of the fluorophore’s internal states is also inferred during estimation. We comment on computational issues that arise when applying our model to simulated or measured fluorescence traces and illustrate our methodology on simulated data. Full Article
on A Tale of Two Parasites: Statistical Modelling to Support Disease Control Programmes in Africa By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Peter J. Diggle, Emanuele Giorgi, Julienne Atsame, Sylvie Ntsame Ella, Kisito Ogoussan, Katherine Gass. Source: Statistical Science, Volume 35, Number 1, 42--50.Abstract: Vector-borne diseases have long presented major challenges to the health of rural communities in the wet tropical regions of the world, but especially in sub-Saharan Africa. In this paper, we describe the contribution that statistical modelling has made to the global elimination programme for one vector-borne disease, onchocerciasis. We explain why information on the spatial distribution of a second vector-borne disease, Loa loa, is needed before communities at high risk of onchocerciasis can be treated safely with mass distribution of ivermectin, an antifiarial medication. We show how a model-based geostatistical analysis of Loa loa prevalence survey data can be used to map the predictive probability that each location in the region of interest meets a WHO policy guideline for safe mass distribution of ivermectin and describe two applications: one is to data from Cameroon that assesses prevalence using traditional blood-smear microscopy; the other is to Africa-wide data that uses a low-cost questionnaire-based method. We describe how a recent technological development in image-based microscopy has resulted in a change of emphasis from prevalence alone to the bivariate spatial distribution of prevalence and the intensity of infection among infected individuals. We discuss how statistical modelling of the kind described here can contribute to health policy guidelines and decision-making in two ways. One is to ensure that, in a resource-limited setting, prevalence surveys are designed, and the resulting data analysed, as efficiently as possible. The other is to provide an honest quantification of the uncertainty attached to any binary decision by reporting predictive probabilities that a policy-defined condition for action is or is not met. Full Article
on Risk Models for Breast Cancer and Their Validation By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Adam R. Brentnall, Jack Cuzick. Source: Statistical Science, Volume 35, Number 1, 14--30.Abstract: Strategies to prevent cancer and diagnose it early when it is most treatable are needed to reduce the public health burden from rising disease incidence. Risk assessment is playing an increasingly important role in targeting individuals in need of such interventions. For breast cancer many individual risk factors have been well understood for a long time, but the development of a fully comprehensive risk model has not been straightforward, in part because there have been limited data where joint effects of an extensive set of risk factors may be estimated with precision. In this article we first review the approach taken to develop the IBIS (Tyrer–Cuzick) model, and describe recent updates. We then review and develop methods to assess calibration of models such as this one, where the risk of disease allowing for competing mortality over a long follow-up time or lifetime is estimated. The breast cancer risk model model and calibration assessment methods are demonstrated using a cohort of 132,139 women attending mammography screening in the State of Washington, USA. Full Article
on Model-Based Approach to the Joint Analysis of Single-Cell Data on Chromatin Accessibility and Gene Expression By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Zhixiang Lin, Mahdi Zamanighomi, Timothy Daley, Shining Ma, Wing Hung Wong. Source: Statistical Science, Volume 35, Number 1, 2--13.Abstract: Unsupervised methods, including clustering methods, are essential to the analysis of single-cell genomic data. Model-based clustering methods are under-explored in the area of single-cell genomics, and have the advantage of quantifying the uncertainty of the clustering result. Here we develop a model-based approach for the integrative analysis of single-cell chromatin accessibility and gene expression data. We show that combining these two types of data, we can achieve a better separation of the underlying cell types. An efficient Markov chain Monte Carlo algorithm is also developed. Full Article
on Introduction to the Special Issue By projecteuclid.org Published On :: Tue, 03 Mar 2020 04:00 EST Source: Statistical Science, Volume 35, Number 1, 1--1. Full Article
on Larry Brown’s Work on Admissibility By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Iain M. Johnstone. Source: Statistical Science, Volume 34, Number 4, 657--668.Abstract: Many papers in the early part of Brown’s career focused on the admissibility or otherwise of estimators of a vector parameter. He established that inadmissibility of invariant estimators in three and higher dimensions is a general phenomenon, and found deep and beautiful connections between admissibility and other areas of mathematics. This review touches on several of his major contributions, with a focus on his celebrated 1971 paper connecting admissibility, recurrence and elliptic partial differential equations. Full Article
on Gaussianization Machines for Non-Gaussian Function Estimation Models By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST T. Tony Cai. Source: Statistical Science, Volume 34, Number 4, 635--656.Abstract: A wide range of nonparametric function estimation models have been studied individually in the literature. Among them the homoscedastic nonparametric Gaussian regression is arguably the best known and understood. Inspired by the asymptotic equivalence theory, Brown, Cai and Zhou ( Ann. Statist. 36 (2008) 2055–2084; Ann. Statist. 38 (2010) 2005–2046) and Brown et al. ( Probab. Theory Related Fields 146 (2010) 401–433) developed a unified approach to turn a collection of non-Gaussian function estimation models into a standard Gaussian regression and any good Gaussian nonparametric regression method can then be used. These Gaussianization Machines have two key components, binning and transformation. When combined with BlockJS, a wavelet thresholding procedure for Gaussian regression, the procedures are computationally efficient with strong theoretical guarantees. Technical analysis given in Brown, Cai and Zhou ( Ann. Statist. 36 (2008) 2055–2084; Ann. Statist. 38 (2010) 2005–2046) and Brown et al. ( Probab. Theory Related Fields 146 (2010) 401–433) shows that the estimators attain the optimal rate of convergence adaptively over a large set of Besov spaces and across a collection of non-Gaussian function estimation models, including robust nonparametric regression, density estimation, and nonparametric regression in exponential families. The estimators are also spatially adaptive. The Gaussianization Machines significantly extend the flexibility and scope of the theories and methodologies originally developed for the conventional nonparametric Gaussian regression. This article aims to provide a concise account of the Gaussianization Machines developed in Brown, Cai and Zhou ( Ann. Statist. 36 (2008) 2055–2084; Ann. Statist. 38 (2010) 2005–2046), Brown et al. ( Probab. Theory Related Fields 146 (2010) 401–433). Full Article
on Larry Brown’s Contributions to Parametric Inference, Decision Theory and Foundations: A Survey By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST James O. Berger, Anirban DasGupta. Source: Statistical Science, Volume 34, Number 4, 621--634.Abstract: This article gives a panoramic survey of the general area of parametric statistical inference, decision theory and foundations of statistics for the period 1965–2010 through the lens of Larry Brown’s contributions to varied aspects of this massive area. The article goes over sufficiency, shrinkage estimation, admissibility, minimaxity, complete class theorems, estimated confidence, conditional confidence procedures, Edgeworth and higher order asymptotic expansions, variational Bayes, Stein’s SURE, differential inequalities, geometrization of convergence rates, asymptotic equivalence, aspects of empirical process theory, inference after model selection, unified frequentist and Bayesian testing, and Wald’s sequential theory. A reasonably comprehensive bibliography is provided. Full Article
on Models as Approximations—Rejoinder By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Andreas Buja, Arun Kumar Kuchibhotla, Richard Berk, Edward George, Eric Tchetgen Tchetgen, Linda Zhao. Source: Statistical Science, Volume 34, Number 4, 606--620.Abstract: We respond to the discussants of our articles emphasizing the importance of inference under misspecification in the context of the reproducibility/replicability crisis. Along the way, we discuss the roles of diagnostics and model building in regression as well as connections between our well-specification framework and semiparametric theory. Full Article
on Discussion: Models as Approximations By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Dalia Ghanem, Todd A. Kuffner. Source: Statistical Science, Volume 34, Number 4, 604--605. Full Article
on Comment: Models as (Deliberate) Approximations By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST David Whitney, Ali Shojaie, Marco Carone. Source: Statistical Science, Volume 34, Number 4, 591--598. Full Article
on Comment: Models Are Approximations! By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Anthony C. Davison, Erwan Koch, Jonathan Koh. Source: Statistical Science, Volume 34, Number 4, 584--590.Abstract: This discussion focuses on areas of disagreement with the papers, particularly the target of inference and the case for using the robust ‘sandwich’ variance estimator in the presence of moderate mis-specification. We also suggest that existing procedures may be appreciably more powerful for detecting mis-specification than the authors’ RAV statistic, and comment on the use of the pairs bootstrap in balanced situations. Full Article
on Comment: “Models as Approximations I: Consequences Illustrated with Linear Regression” by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, L. Zhan and K. Zhang By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Roderick J. Little. Source: Statistical Science, Volume 34, Number 4, 580--583. Full Article
on Discussion of Models as Approximations I & II By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Dag Tjøstheim. Source: Statistical Science, Volume 34, Number 4, 575--579. Full Article
on Comment: Models as Approximations By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Nikki L. B. Freeman, Xiaotong Jiang, Owen E. Leete, Daniel J. Luckett, Teeranan Pokaprakarn, Michael R. Kosorok. Source: Statistical Science, Volume 34, Number 4, 572--574. Full Article
on Comment on Models as Approximations, Parts I and II, by Buja et al. By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Jerald F. Lawless. Source: Statistical Science, Volume 34, Number 4, 569--571.Abstract: I comment on the papers Models as Approximations I and II, by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, M. Traskin, L. Zhao and K. Zhang. Full Article
on Discussion of Models as Approximations I & II By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Sara van de Geer. Source: Statistical Science, Volume 34, Number 4, 566--568.Abstract: We discuss the papers “Models as Approximations” I & II, by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, M. Traskin, L. Zao and K. Zhang (Part I) and A. Buja, L. Brown, A. K. Kuchibhota, R. Berk, E. George and L. Zhao (Part II). We present a summary with some details for the generalized linear model. Full Article
on Models as Approximations II: A Model-Free Theory of Parametric Regression By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Andreas Buja, Lawrence Brown, Arun Kumar Kuchibhotla, Richard Berk, Edward George, Linda Zhao. Source: Statistical Science, Volume 34, Number 4, 545--565.Abstract: We develop a model-free theory of general types of parametric regression for i.i.d. observations. The theory replaces the parameters of parametric models with statistical functionals, to be called “regression functionals,” defined on large nonparametric classes of joint ${x extrm{-}y}$ distributions, without assuming a correct model. Parametric models are reduced to heuristics to suggest plausible objective functions. An example of a regression functional is the vector of slopes of linear equations fitted by OLS to largely arbitrary ${x extrm{-}y}$ distributions, without assuming a linear model (see Part I). More generally, regression functionals can be defined by minimizing objective functions, solving estimating equations, or with ad hoc constructions. In this framework, it is possible to achieve the following: (1) define a notion of “well-specification” for regression functionals that replaces the notion of correct specification of models, (2) propose a well-specification diagnostic for regression functionals based on reweighting distributions and data, (3) decompose sampling variability of regression functionals into two sources, one due to the conditional response distribution and another due to the regressor distribution interacting with misspecification, both of order $N^{-1/2}$, (4) exhibit plug-in/sandwich estimators of standard error as limit cases of ${x extrm{-}y}$ bootstrap estimators, and (5) provide theoretical heuristics to indicate that ${x extrm{-}y}$ bootstrap standard errors may generally be preferred over sandwich estimators. Full Article
on Models as Approximations I: Consequences Illustrated with Linear Regression By projecteuclid.org Published On :: Wed, 08 Jan 2020 04:00 EST Andreas Buja, Lawrence Brown, Richard Berk, Edward George, Emil Pitkin, Mikhail Traskin, Kai Zhang, Linda Zhao. Source: Statistical Science, Volume 34, Number 4, 523--544.Abstract: In the early 1980s, Halbert White inaugurated a “model-robust” form of statistical inference based on the “sandwich estimator” of standard error. This estimator is known to be “heteroskedasticity-consistent,” but it is less well known to be “nonlinearity-consistent” as well. Nonlinearity, however, raises fundamental issues because in its presence regressors are not ancillary, hence cannot be treated as fixed. The consequences are deep: (1) population slopes need to be reinterpreted as statistical functionals obtained from OLS fits to largely arbitrary joint ${x extrm{-}y}$ distributions; (2) the meaning of slope parameters needs to be rethought; (3) the regressor distribution affects the slope parameters; (4) randomness of the regressors becomes a source of sampling variability in slope estimates of order $1/sqrt{N}$; (5) inference needs to be based on model-robust standard errors, including sandwich estimators or the ${x extrm{-}y}$ bootstrap. In theory, model-robust and model-trusting standard errors can deviate by arbitrary magnitudes either way. In practice, significant deviations between them can be detected with a diagnostic test. Full Article
on A Conversation with Peter Diggle By projecteuclid.org Published On :: Fri, 11 Oct 2019 04:03 EDT Peter M. Atkinson, Jorge Mateu. Source: Statistical Science, Volume 34, Number 3, 504--521.Abstract: Peter John Diggle was born on February 24, 1950, in Lancashire, England. Peter went to school in Scotland, and it was at the end of his school years that he found that he was good at maths and actually enjoyed it. Peter went to Edinburgh to do a maths degree, but transferred halfway through to Liverpool where he completed his degree. Peter studied for a year at Oxford and was then appointed in 1974 as a lecturer in statistics at the University of Newcastle-upon-Tyne where he gained his PhD, and was promoted to Reader in 1983. A sabbatical at the Swedish Royal College of Forestry gave him his first exposure to real scientific data and problems, prompting a move to CSIRO, Australia. After five years with CSIRO where he was Senior, then Principal, then Chief Research Scientist and Chief of the Division of Mathematics and Statistics, he returned to the UK in 1988, to a Chair at Lancaster University. Since 2011 Peter has held appointments at Lancaster and Liverpool, together with honorary appointments at Johns Hopkins, Columbia and Yale. At Lancaster, Peter was the founder and Director of the Medical Statistics Unit (1995–2001), University Dean for Research (1998–2001), EPSRC Senior Fellow (2004–2008), Associate Dean for Research at the School of Health and Medicine (2007–2011), Distinguished University Professor, and leader of the CHICAS Research Group (2007–2017). A Fellow of the Royal Statistical Society since 1974, he was a Member of Council (1983–1985), Joint Editor of JRSSB (1984–1987), Honorary Secretary (1990–1996), awarded the Guy Medal in Silver (1997) and the Barnett Award (2018), Associate Editor of Applied Statistics (1998–2000), Chair of the Research Section Committee (1998–2000), and President (2014–2016). Away from work, Peter enjoys music, playing folk-blues guitar and tenor recorder, and listening to jazz. His running days are behind him, but he can just about hold his own in mixed-doubles badminton with his family. His boyhoood hero was Stirling Moss, and he retains an enthusiasm for classic cars, not least his 1988 Porsche 924S. His favorite authors are George Orwell, Primo Levi and Nigel Slater. This interview was done prior to the fourth Spatial Statistics conference held in Lancaster, July 2017 where a session was dedicated to Peter celebrating his contributions to statistics. Full Article
on Assessing the Causal Effect of Binary Interventions from Observational Panel Data with Few Treated Units By projecteuclid.org Published On :: Fri, 11 Oct 2019 04:03 EDT Pantelis Samartsidis, Shaun R. Seaman, Anne M. Presanis, Matthew Hickman, Daniela De Angelis. Source: Statistical Science, Volume 34, Number 3, 486--503.Abstract: Researchers are often challenged with assessing the impact of an intervention on an outcome of interest in situations where the intervention is nonrandomised, the intervention is only applied to one or few units, the intervention is binary, and outcome measurements are available at multiple time points. In this paper, we review existing methods for causal inference in these situations. We detail the assumptions underlying each method, emphasize connections between the different approaches and provide guidelines regarding their practical implementation. Several open problems are identified thus highlighting the need for future research. Full Article
on Conditionally Conjugate Mean-Field Variational Bayes for Logistic Models By projecteuclid.org Published On :: Fri, 11 Oct 2019 04:03 EDT Daniele Durante, Tommaso Rigon. Source: Statistical Science, Volume 34, Number 3, 472--485.Abstract: Variational Bayes (VB) is a common strategy for approximate Bayesian inference, but simple methods are only available for specific classes of models including, in particular, representations having conditionally conjugate constructions within an exponential family. Models with logit components are an apparently notable exception to this class, due to the absence of conjugacy among the logistic likelihood and the Gaussian priors for the coefficients in the linear predictor. To facilitate approximate inference within this widely used class of models, Jaakkola and Jordan ( Stat. Comput. 10 (2000) 25–37) proposed a simple variational approach which relies on a family of tangent quadratic lower bounds of the logistic log-likelihood, thus restoring conjugacy between these approximate bounds and the Gaussian priors. This strategy is still implemented successfully, but few attempts have been made to formally understand the reasons underlying its excellent performance. Following a review on VB for logistic models, we cover this gap by providing a formal connection between the above bound and a recent Pólya-gamma data augmentation for logistic regression. Such a result places the computational methods associated with the aforementioned bounds within the framework of variational inference for conditionally conjugate exponential family models, thereby allowing recent advances for this class to be inherited also by the methods relying on Jaakkola and Jordan ( Stat. Comput. 10 (2000) 25–37). Full Article
on User-Friendly Covariance Estimation for Heavy-Tailed Distributions By projecteuclid.org Published On :: Fri, 11 Oct 2019 04:03 EDT Yuan Ke, Stanislav Minsker, Zhao Ren, Qiang Sun, Wen-Xin Zhou. Source: Statistical Science, Volume 34, Number 3, 454--471.Abstract: We provide a survey of recent results on covariance estimation for heavy-tailed distributions. By unifying ideas scattered in the literature, we propose user-friendly methods that facilitate practical implementation. Specifically, we introduce elementwise and spectrumwise truncation operators, as well as their $M$-estimator counterparts, to robustify the sample covariance matrix. Different from the classical notion of robustness that is characterized by the breakdown property, we focus on the tail robustness which is evidenced by the connection between nonasymptotic deviation and confidence level. The key insight is that estimators should adapt to the sample size, dimensionality and noise level to achieve optimal tradeoff between bias and robustness. Furthermore, to facilitate practical implementation, we propose data-driven procedures that automatically calibrate the tuning parameters. We demonstrate their applications to a series of structured models in high dimensions, including the bandable and low-rank covariance matrices and sparse precision matrices. Numerical studies lend strong support to the proposed methods. Full Article
on The Geometry of Continuous Latent Space Models for Network Data By projecteuclid.org Published On :: Fri, 11 Oct 2019 04:03 EDT Anna L. Smith, Dena M. Asta, Catherine A. Calder. Source: Statistical Science, Volume 34, Number 3, 428--453.Abstract: We review the class of continuous latent space (statistical) models for network data, paying particular attention to the role of the geometry of the latent space. In these models, the presence/absence of network dyadic ties are assumed to be conditionally independent given the dyads’ unobserved positions in a latent space. In this way, these models provide a probabilistic framework for embedding network nodes in a continuous space equipped with a geometry that facilitates the description of dependence between random dyadic ties. Specifically, these models naturally capture homophilous tendencies and triadic clustering, among other common properties of observed networks. In addition to reviewing the literature on continuous latent space models from a geometric perspective, we highlight the important role the geometry of the latent space plays on properties of networks arising from these models via intuition and simulation. Finally, we discuss results from spectral graph theory that allow us to explore the role of the geometry of the latent space, independent of network size. We conclude with conjectures about how these results might be used to infer the appropriate latent space geometry from observed networks. Full Article
on An Overview of Semiparametric Extensions of Finite Mixture Models By projecteuclid.org Published On :: Fri, 11 Oct 2019 04:03 EDT Sijia Xiang, Weixin Yao, Guangren Yang. Source: Statistical Science, Volume 34, Number 3, 391--404.Abstract: Finite mixture models have offered a very important tool for exploring complex data structures in many scientific areas, such as economics, epidemiology and finance. Semiparametric mixture models, which were introduced into traditional finite mixture models in the past decade, have brought forth exciting developments in their methodologies, theories, and applications. In this article, we not only provide a selective overview of the newly-developed semiparametric mixture models, but also discuss their estimation methodologies, theoretical properties if applicable, and some open questions. Recent developments are also discussed. Full Article
on ROS Regression: Integrating Regularization with Optimal Scaling Regression By projecteuclid.org Published On :: Fri, 11 Oct 2019 04:03 EDT Jacqueline J. Meulman, Anita J. van der Kooij, Kevin L. W. Duisters. Source: Statistical Science, Volume 34, Number 3, 361--390.Abstract: We present a methodology for multiple regression analysis that deals with categorical variables (possibly mixed with continuous ones), in combination with regularization, variable selection and high-dimensional data ($Pgg N$). Regularization and optimal scaling (OS) are two important extensions of ordinary least squares regression (OLS) that will be combined in this paper. There are two data analytic situations for which optimal scaling was developed. One is the analysis of categorical data, and the other the need for transformations because of nonlinear relationships between predictors and outcome. Optimal scaling of categorical data finds quantifications for the categories, both for the predictors and for the outcome variables, that are optimal for the regression model in the sense that they maximize the multiple correlation. When nonlinear relationships exist, nonlinear transformation of predictors and outcome maximize the multiple correlation in the same way. We will consider a variety of transformation types; typically we use step functions for categorical variables, and smooth (spline) functions for continuous variables. Both types of functions can be restricted to be monotonic, preserving the ordinal information in the data. In combination with optimal scaling, three popular regularization methods will be considered: Ridge regression, the Lasso and the Elastic Net. The resulting method will be called ROS Regression (Regularized Optimal Scaling Regression). The OS algorithm provides straightforward and efficient estimation of the regularized regression coefficients, automatically gives the Group Lasso and Blockwise Sparse Regression, and extends them by the possibility to maintain ordinal properties in the data. Extended examples are provided. Full Article
on A Conversation with Noel Cressie By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Christopher K. Wikle, Jay M. Ver Hoef. Source: Statistical Science, Volume 34, Number 2, 349--359.Abstract: Noel Cressie, FAA is Director of the Centre for Environmental Informatics in the National Institute for Applied Statistics Research Australia (NIASRA) and Distinguished Professor in the School of Mathematics and Applied Statistics at the University of Wollongong, Australia. He is also Adjunct Professor at the University of Missouri (USA), Affiliate of Org 398, Science Data Understanding, at NASA’s Jet Propulsion Laboratory (USA), and a member of the Science Team for NASA’s Orbiting Carbon Observatory-2 (OCO-2) satellite. Cressie was awarded a B.Sc. with First Class Honours in Mathematics in 1972 from the University of Western Australia, and an M.A. and Ph.D. in Statistics in 1973 and 1975, respectively, from Princeton University (USA). Two brief postdoctoral periods followed, at the Centre de Morphologie Mathématique, ENSMP, in Fontainebleau (France) from April 1975–September 1975, and at Imperial College, London (UK) from September 1975–January 1976. His past appointments have been at The Flinders University of South Australia from 1976–1983, at Iowa State University (USA) from 1983–1998, and at The Ohio State University (USA) from 1998–2012. He has authored or co-authored four books and more than 280 papers in peer-reviewed outlets, covering areas that include spatial and spatio-temporal statistics, environmental statistics, empirical-Bayesian and Bayesian methods including sequential design, goodness-of-fit, and remote sensing of the environment. Many of his papers also address important questions in the sciences. Cressie is a Fellow of the Australian Academy of Science, the American Statistical Association, the Institute of Mathematical Statistics, and the Spatial Econometrics Association, and he is an Elected Member of the International Statistical Institute. Noel Cressie’s refereed, unrefereed, and other publications are available at: https://niasra.uow.edu.au/cei/people/UOW232444.html. Full Article
on A Conversation with Robert E. Kass By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Sam Behseta. Source: Statistical Science, Volume 34, Number 2, 334--348.Abstract: Rob Kass has been been on the faculty of the Department of Statistics at Carnegie Mellon since 1981; he joined the Center for the Neural Basis of Cognition (CNBC) in 1997, and the Machine Learning Department (in the School of Computer Science) in 2007. He served as Department Head of Statistics from 1995 to 2004 and served as Interim Co-Director of the CNBC 2015–2018. He became the Maurice Falk Professor of Statistics and Computational Neuroscience in 2016. Kass has served as Chair of the Section for Bayesian Statistical Science of the American Statistical Association, Chair of the Statistics Section of the American Association for the Advancement of Science, founding Editor-in-Chief of the journal Bayesian Analysis and Executive Editor of Statistical Science . He is an elected Fellow of the American Statistical Association, the Institute of Mathematical Statistics and the American Association for the Advancement of Science. He has been recognized by the Institute for Scientific Information as one of the 10 most highly cited researchers, 1995–2005, in the category of mathematics. Kass is the recipient of the 2017 Fisher Award and lectureship by the Committee of the Presidents of the Statistical Societies. This interview took place at Carnegie Mellon University in November 2017. Full Article
on Statistical Analysis of Zero-Inflated Nonnegative Continuous Data: A Review By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Lei Liu, Ya-Chen Tina Shih, Robert L. Strawderman, Daowen Zhang, Bankole A. Johnson, Haitao Chai. Source: Statistical Science, Volume 34, Number 2, 253--279.Abstract: Zero-inflated nonnegative continuous (or semicontinuous) data arise frequently in biomedical, economical, and ecological studies. Examples include substance abuse, medical costs, medical care utilization, biomarkers (e.g., CD4 cell counts, coronary artery calcium scores), single cell gene expression rates, and (relative) abundance of microbiome. Such data are often characterized by the presence of a large portion of zero values and positive continuous values that are skewed to the right and heteroscedastic. Both of these features suggest that no simple parametric distribution may be suitable for modeling such type of outcomes. In this paper, we review statistical methods for analyzing zero-inflated nonnegative outcome data. We will start with the cross-sectional setting, discussing ways to separate zero and positive values and introducing flexible models to characterize right skewness and heteroscedasticity in the positive values. We will then present models of correlated zero-inflated nonnegative continuous data, using random effects to tackle the correlation on repeated measures from the same subject and that across different parts of the model. We will also discuss expansion to related topics, for example, zero-inflated count and survival data, nonlinear covariate effects, and joint models of longitudinal zero-inflated nonnegative continuous data and survival. Finally, we will present applications to three real datasets (i.e., microbiome, medical costs, and alcohol drinking) to illustrate these methods. Example code will be provided to facilitate applications of these methods. Full Article
on A Kernel Regression Procedure in the 3D Shape Space with an Application to Online Sales of Children’s Wear By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Gregorio Quintana-Ortí, Amelia Simó. Source: Statistical Science, Volume 34, Number 2, 236--252.Abstract: This paper is focused on kernel regression when the response variable is the shape of a 3D object represented by a configuration matrix of landmarks. Regression methods on this shape space are not trivial because this space has a complex finite-dimensional Riemannian manifold structure (non-Euclidean). Papers about it are scarce in the literature, the majority of them are restricted to the case of a single explanatory variable, and many of them are based on the approximated tangent space. In this paper, there are several methodological innovations. The first one is the adaptation of the general method for kernel regression analysis in manifold-valued data to the three-dimensional case of Kendall’s shape space. The second one is its generalization to the multivariate case and the addressing of the curse-of-dimensionality problem. Finally, we propose bootstrap confidence intervals for prediction. A simulation study is carried out to check the goodness of the procedure, and a comparison with a current approach is performed. Then, it is applied to a 3D database obtained from an anthropometric survey of the Spanish child population with a potential application to online sales of children’s wear. Full Article
on Comment: Variational Autoencoders as Empirical Bayes By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Yixin Wang, Andrew C. Miller, David M. Blei. Source: Statistical Science, Volume 34, Number 2, 229--233. Full Article
on Comment: Empirical Bayes, Compound Decisions and Exchangeability By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Eitan Greenshtein, Ya’acov Ritov. Source: Statistical Science, Volume 34, Number 2, 224--228.Abstract: We present some personal reflections on empirical Bayes/ compound decision (EB/CD) theory following Efron (2019). In particular, we consider the role of exchangeability in the EB/CD theory and how it can be achieved when there are covariates. We also discuss the interpretation of EB/CD confidence interval, the theoretical efficiency of the CD procedure, and the impact of sparsity assumptions. Full Article
on Comment: Empirical Bayes Interval Estimation By projecteuclid.org Published On :: Thu, 18 Jul 2019 22:01 EDT Wenhua Jiang. Source: Statistical Science, Volume 34, Number 2, 219--223.Abstract: This is a contribution to the discussion of the enlightening paper by Professor Efron. We focus on empirical Bayes interval estimation. We discuss the oracle interval estimation rules, the empirical Bayes estimation of the oracle rule and the computation. Some numerical results are reported. Full Article
on A Conversation with Dick Dudley By projecteuclid.org Published On :: Fri, 12 Apr 2019 04:00 EDT Vladimir Koltchinskii, Richard Nickl, Philippe Rigollet. Source: Statistical Science, Volume 34, Number 1, 169--175.Abstract: Richard Mansfield Dudley (Dick Dudley) was born in 1938. He received the A.B. from Harvard in 1952 and the Ph.D. from Princeton in 1962 (under the supervision of Gilbert Hunt and Edward Nelson). Following an appointment at UC Berkeley as an assistant professor, he joined the Department of Mathematics at MIT in 1967. Dick Dudley has made fundamental contributions to the theory of Gaussian processes and Probability in Banach Spaces. Among his major achievements is the development of a general framework for empirical processes theory, in particular, for uniform central limit theorems. These results have had and continue having tremendous impact in contemporary statistics and in mathematical foundations of machine learning. A more extensive biographical sketch is contained in the preface to the Selected works of R. M. Dudley (editors: E. Giné, V. Koltchinskii and R. Norvaisa) published in 2010. This conversation took place (mostly, via email) in the fall of 2017. Full Article
on A Conversation with Piet Groeneboom By projecteuclid.org Published On :: Fri, 12 Apr 2019 04:00 EDT Geurt Jongbloed. Source: Statistical Science, Volume 34, Number 1, 156--168.Abstract: Petrus (Piet) Groeneboom was born in Scheveningen in 1941 and grew up in Voorburg. Both villages are located near The Hague in The Netherlands; Scheveningen actually being part of The Hague. He attended the gymnasium of the Huygens lyceum. In 1959, he entered the University of Amsterdam, where he studied psychology. After his “candidate” exam (comparable to BSc) in 1963, he worked at the psychological laboratory of the University of Amsterdam until 1966. In 1965, he took up mathematics as a part-time study. After having obtained his master’s degree in 1971, he had a position at the psychological laboratory again until 1973, when he was appointed to the Mathematical Center in Amsterdam. There, he wrote between 1975 and 1979 his Ph.D. thesis with Kobus Oosterhoff as advisor, graduating in 1979. After a period of two years as visiting professor at the University of Washington (UW) in Seattle, Piet moved back to the Mathematical Center until he was appointed full professor of statistics at the University of Amsterdam in 1984. Four years later, he moved to Delft University of Technology where he became professor of statistics and stayed until his retirement in 2006. Between 2000 and 2006 he also held a part-time professorship at the Vrije Universiteit in Amsterdam. From 1999 till 2013 he was Affiliate Professor at the statistics department of UW, Seattle. Apart from being visiting professor at the UW in Seattle, he was also visiting professor at Stanford University, Université Paris 6 and ETH Zürich. Piet is well known for his work on shape constrained statistical inference. He worked on asymptotic theory for these problems, created algorithms to compute nonparametric estimates in such models and applied these models to real data. He also worked on interacting particle systems, extreme value analysis and efficiency theory for testing procedures. Piet (co-)authored four books and 64 papers and served as promotor of 13 students. He is the recipient of the 1985 Rollo Davidson prize, a fellow of the IMS and elected member of the ISI. In 2015, he delivered the Wald lecture at the Joint Statistical Meeting in Montreal. Piet and his wife Marijke live in Naarden. He has two sons, Thomas and Tim, and (since June 12, 2018) one grandson, Tarik. This conversation was held at Piet’s house in Naarden, on February 28 and April 24, 2018. Full Article
on Gaussian Integrals and Rice Series in Crossing Distributions—to Compute the Distribution of Maxima and Other Features of Gaussian Processes By projecteuclid.org Published On :: Fri, 12 Apr 2019 04:00 EDT Georg Lindgren. Source: Statistical Science, Volume 34, Number 1, 100--128.Abstract: We describe and compare how methods based on the classical Rice’s formula for the expected number, and higher moments, of level crossings by a Gaussian process stand up to contemporary numerical methods to accurately deal with crossing related characteristics of the sample paths. We illustrate the relative merits in accuracy and computing time of the Rice moment methods and the exact numerical method, developed since the late 1990s, on three groups of distribution problems, the maximum over a finite interval and the waiting time to first crossing, the length of excursions over a level, and the joint period/amplitude of oscillations. We also treat the notoriously difficult problem of dependence between successive zero crossing distances. The exact solution has been known since at least 2000, but it has remained largely unnoticed outside the ocean science community. Extensive simulation studies illustrate the accuracy of the numerical methods. As a historical introduction an attempt is made to illustrate the relation between Rice’s original formulation and arguments and the exact numerical methods. Full Article
on Rejoinder: Response to Discussions and a Look Ahead By projecteuclid.org Published On :: Fri, 12 Apr 2019 04:00 EDT Vincent Dorie, Jennifer Hill, Uri Shalit, Marc Scott, Dan Cervone. Source: Statistical Science, Volume 34, Number 1, 94--99.Abstract: Response to discussion of Dorie (2017), in which the authors of that piece express their gratitude to the discussants, rebut some specific criticisms, and argue that the limitations of the 2016 Atlantic Causal Inference Competition represent an exciting opportunity for future competitions in a similar mold. Full Article
on Comment: Contributions of Model Features to BART Causal Inference Performance Using ACIC 2016 Competition Data By projecteuclid.org Published On :: Fri, 12 Apr 2019 04:00 EDT Nicole Bohme Carnegie. Source: Statistical Science, Volume 34, Number 1, 90--93.Abstract: With a thorough exposition of the methods and results of the 2016 Atlantic Causal Inference Competition, Dorie et al. have set a new standard for reproducibility and comparability of evaluations of causal inference methods. In particular, the open-source R package aciccomp2016, which permits reproduction of all datasets used in the competition, will be an invaluable resource for evaluation of future methodological developments. Building upon results from Dorie et al., we examine whether a set of potential modifications to Bayesian Additive Regression Trees (BART)—multiple chains in model fitting, using the propensity score as a covariate, targeted maximum likelihood estimation (TMLE), and computing symmetric confidence intervals—have a stronger impact on bias, RMSE, and confidence interval coverage in combination than they do alone. We find that bias in the estimate of SATT is minimal, regardless of the BART formulation. For purposes of CI coverage, however, all proposed modifications are beneficial—alone and in combination—but use of TMLE is least beneficial for coverage and results in considerably wider confidence intervals. Full Article