re

Bayesian Effect Fusion for Categorical Predictors

Daniela Pauger, Helga Wagner.

Source: Bayesian Analysis, Volume 14, Number 2, 341--369.

Abstract:
We propose a Bayesian approach to obtain a sparse representation of the effect of a categorical predictor in regression type models. As this effect is captured by a group of level effects, sparsity cannot only be achieved by excluding single irrelevant level effects or the whole group of effects associated to this predictor but also by fusing levels which have essentially the same effect on the response. To achieve this goal, we propose a prior which allows for almost perfect as well as almost zero dependence between level effects a priori. This prior can alternatively be obtained by specifying spike and slab prior distributions on all effect differences associated to this categorical predictor. We show how restricted fusion can be implemented and develop an efficient MCMC (Markov chain Monte Carlo) method for posterior computation. The performance of the proposed method is investigated on simulated data and we illustrate its application on real data from EU-SILC (European Union Statistics on Income and Living Conditions).




re

Modeling Population Structure Under Hierarchical Dirichlet Processes

Lloyd T. Elliott, Maria De Iorio, Stefano Favaro, Kaustubh Adhikari, Yee Whye Teh.

Source: Bayesian Analysis, Volume 14, Number 2, 313--339.

Abstract:
We propose a Bayesian nonparametric model to infer population admixture, extending the hierarchical Dirichlet process to allow for correlation between loci due to linkage disequilibrium. Given multilocus genotype data from a sample of individuals, the proposed model allows inferring and classifying individuals as unadmixed or admixed, inferring the number of subpopulations ancestral to an admixed population and the population of origin of chromosomal regions. Our model does not assume any specific mutation process, and can be applied to most of the commonly used genetic markers. We present a Markov chain Monte Carlo (MCMC) algorithm to perform posterior inference from the model and we discuss some methods to summarize the MCMC output for the analysis of population admixture. Finally, we demonstrate the performance of the proposed model in a real application, using genetic data from the ectodysplasin-A receptor (EDAR) gene, which is considered to be ancestry-informative due to well-known variations in allele frequency as well as phenotypic effects across ancestry. The structure analysis of this dataset leads to the identification of a rare haplotype in Europeans. We also conduct a simulated experiment and show that our algorithm outperforms parametric methods.




re

Separable covariance arrays via the Tucker product, with applications to multivariate relational data

Peter D. Hoff

Source: Bayesian Anal., Volume 6, Number 2, 179--196.

Abstract:
Modern datasets are often in the form of matrices or arrays, potentially having correlations along each set of data indices. For example, data involving repeated measurements of several variables over time may exhibit temporal correlation as well as correlation among the variables. A possible model for matrix-valued data is the class of matrix normal distributions, which is parametrized by two covariance matrices, one for each index set of the data. In this article we discuss an extension of the matrix normal model to accommodate multidimensional data arrays, or tensors. We show how a particular array-matrix product can be used to generate the class of array normal distributions having separable covariance structure. We derive some properties of these covariance structures and the corresponding array normal distributions, and show how the array-matrix product can be used to define a semi-conjugate prior distribution and calculate the corresponding posterior distribution. We illustrate the methodology in an analysis of multivariate longitudinal network data which take the form of a four-way array.




re

Statistical Inference for the Evolutionary History of Cancer Genomes

Khanh N. Dinh, Roman Jaksik, Marek Kimmel, Amaury Lambert, Simon Tavaré.

Source: Statistical Science, Volume 35, Number 1, 129--144.

Abstract:
Recent years have seen considerable work on inference about cancer evolution from mutations identified in cancer samples. Much of the modeling work has been based on classical models of population genetics, generalized to accommodate time-varying cell population size. Reverse-time, genealogical views of such models, commonly known as coalescents, have been used to infer aspects of the past of growing populations. Another approach is to use branching processes, the simplest scenario being the classical linear birth-death process. Inference from evolutionary models of DNA often exploits summary statistics of the sequence data, a common one being the so-called Site Frequency Spectrum (SFS). In a bulk tumor sequencing experiment, we can estimate for each site at which a novel somatic point mutation has arisen, the proportion of cells that carry that mutation. These numbers are then grouped into collections of sites which have similar mutant fractions. We examine how the SFS based on birth-death processes differs from those based on the coalescent model. This may stem from the different sampling mechanisms in the two approaches. However, we also show that despite this, they are quantitatively comparable for the range of parameters typical for tumor cell populations. We also present a model of tumor evolution with selective sweeps, and demonstrate how it may help in understanding the history of a tumor as well as the influence of data pre-processing. We illustrate the theory with applications to several examples from The Cancer Genome Atlas tumors.




re

Data Denoising and Post-Denoising Corrections in Single Cell RNA Sequencing

Divyansh Agarwal, Jingshu Wang, Nancy R. Zhang.

Source: Statistical Science, Volume 35, Number 1, 112--128.

Abstract:
Single cell sequencing technologies are transforming biomedical research. However, due to the inherent nature of the data, single cell RNA sequencing analysis poses new computational and statistical challenges. We begin with a survey of a selection of topics in this field, with a gentle introduction to the biology and a more detailed exploration of the technical noise. We consider in detail the problem of single cell data denoising, sometimes referred to as “imputation” in the relevant literature. We discuss why this is not a typical statistical imputation problem, and review current approaches to this problem. We then explore why the use of denoised values in downstream analyses invites novel statistical insights, and how denoising uncertainty should be accounted for to yield valid statistical inference. The utilization of denoised or imputed matrices in statistical inference is not unique to single cell genomics, and arises in many other fields. We describe the challenges in this type of analysis, discuss some preliminary solutions, and highlight unresolved issues.




re

Statistical Molecule Counting in Super-Resolution Fluorescence Microscopy: Towards Quantitative Nanoscopy

Thomas Staudt, Timo Aspelmeier, Oskar Laitenberger, Claudia Geisler, Alexander Egner, Axel Munk.

Source: Statistical Science, Volume 35, Number 1, 92--111.

Abstract:
Super-resolution microscopy is rapidly gaining importance as an analytical tool in the life sciences. A compelling feature is the ability to label biological units of interest with fluorescent markers in (living) cells and to observe them with considerably higher resolution than conventional microscopy permits. The images obtained this way, however, lack an absolute intensity scale in terms of numbers of fluorophores observed. In this article, we discuss state of the art methods to count such fluorophores and statistical challenges that come along with it. In particular, we suggest a modeling scheme for time series generated by single-marker-switching (SMS) microscopy that makes it possible to quantify the number of markers in a statistically meaningful manner from the raw data. To this end, we model the entire process of photon generation in the fluorophore, their passage through the microscope, detection and photoelectron amplification in the camera, and extraction of time series from the microscopic images. At the heart of these modeling steps is a careful description of the fluorophore dynamics by a novel hidden Markov model that operates on two timescales (HTMM). Besides the fluorophore number, information about the kinetic transition rates of the fluorophore’s internal states is also inferred during estimation. We comment on computational issues that arise when applying our model to simulated or measured fluorescence traces and illustrate our methodology on simulated data.




re

Risk Models for Breast Cancer and Their Validation

Adam R. Brentnall, Jack Cuzick.

Source: Statistical Science, Volume 35, Number 1, 14--30.

Abstract:
Strategies to prevent cancer and diagnose it early when it is most treatable are needed to reduce the public health burden from rising disease incidence. Risk assessment is playing an increasingly important role in targeting individuals in need of such interventions. For breast cancer many individual risk factors have been well understood for a long time, but the development of a fully comprehensive risk model has not been straightforward, in part because there have been limited data where joint effects of an extensive set of risk factors may be estimated with precision. In this article we first review the approach taken to develop the IBIS (Tyrer–Cuzick) model, and describe recent updates. We then review and develop methods to assess calibration of models such as this one, where the risk of disease allowing for competing mortality over a long follow-up time or lifetime is estimated. The breast cancer risk model model and calibration assessment methods are demonstrated using a cohort of 132,139 women attending mammography screening in the State of Washington, USA.




re

Model-Based Approach to the Joint Analysis of Single-Cell Data on Chromatin Accessibility and Gene Expression

Zhixiang Lin, Mahdi Zamanighomi, Timothy Daley, Shining Ma, Wing Hung Wong.

Source: Statistical Science, Volume 35, Number 1, 2--13.

Abstract:
Unsupervised methods, including clustering methods, are essential to the analysis of single-cell genomic data. Model-based clustering methods are under-explored in the area of single-cell genomics, and have the advantage of quantifying the uncertainty of the clustering result. Here we develop a model-based approach for the integrative analysis of single-cell chromatin accessibility and gene expression data. We show that combining these two types of data, we can achieve a better separation of the underlying cell types. An efficient Markov chain Monte Carlo algorithm is also developed.




re

Larry Brown’s Contributions to Parametric Inference, Decision Theory and Foundations: A Survey

James O. Berger, Anirban DasGupta.

Source: Statistical Science, Volume 34, Number 4, 621--634.

Abstract:
This article gives a panoramic survey of the general area of parametric statistical inference, decision theory and foundations of statistics for the period 1965–2010 through the lens of Larry Brown’s contributions to varied aspects of this massive area. The article goes over sufficiency, shrinkage estimation, admissibility, minimaxity, complete class theorems, estimated confidence, conditional confidence procedures, Edgeworth and higher order asymptotic expansions, variational Bayes, Stein’s SURE, differential inequalities, geometrization of convergence rates, asymptotic equivalence, aspects of empirical process theory, inference after model selection, unified frequentist and Bayesian testing, and Wald’s sequential theory. A reasonably comprehensive bibliography is provided.




re

Models as Approximations—Rejoinder

Andreas Buja, Arun Kumar Kuchibhotla, Richard Berk, Edward George, Eric Tchetgen Tchetgen, Linda Zhao.

Source: Statistical Science, Volume 34, Number 4, 606--620.

Abstract:
We respond to the discussants of our articles emphasizing the importance of inference under misspecification in the context of the reproducibility/replicability crisis. Along the way, we discuss the roles of diagnostics and model building in regression as well as connections between our well-specification framework and semiparametric theory.




re

Comment: Statistical Inference from a Predictive Perspective

Alessandro Rinaldo, Ryan J. Tibshirani, Larry Wasserman.

Source: Statistical Science, Volume 34, Number 4, 599--603.

Abstract:
What is the meaning of a regression parameter? Why is this the de facto standard object of interest for statistical inference? These are delicate issues, especially when the model is misspecified. We argue that focusing on predictive quantities may be a desirable alternative.




re

Comment: Models Are Approximations!

Anthony C. Davison, Erwan Koch, Jonathan Koh.

Source: Statistical Science, Volume 34, Number 4, 584--590.

Abstract:
This discussion focuses on areas of disagreement with the papers, particularly the target of inference and the case for using the robust ‘sandwich’ variance estimator in the presence of moderate mis-specification. We also suggest that existing procedures may be appreciably more powerful for detecting mis-specification than the authors’ RAV statistic, and comment on the use of the pairs bootstrap in balanced situations.




re

Comment: “Models as Approximations I: Consequences Illustrated with Linear Regression” by A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, L. Zhan and K. Zhang

Roderick J. Little.

Source: Statistical Science, Volume 34, Number 4, 580--583.




re

Models as Approximations II: A Model-Free Theory of Parametric Regression

Andreas Buja, Lawrence Brown, Arun Kumar Kuchibhotla, Richard Berk, Edward George, Linda Zhao.

Source: Statistical Science, Volume 34, Number 4, 545--565.

Abstract:
We develop a model-free theory of general types of parametric regression for i.i.d. observations. The theory replaces the parameters of parametric models with statistical functionals, to be called “regression functionals,” defined on large nonparametric classes of joint ${x extrm{-}y}$ distributions, without assuming a correct model. Parametric models are reduced to heuristics to suggest plausible objective functions. An example of a regression functional is the vector of slopes of linear equations fitted by OLS to largely arbitrary ${x extrm{-}y}$ distributions, without assuming a linear model (see Part I). More generally, regression functionals can be defined by minimizing objective functions, solving estimating equations, or with ad hoc constructions. In this framework, it is possible to achieve the following: (1) define a notion of “well-specification” for regression functionals that replaces the notion of correct specification of models, (2) propose a well-specification diagnostic for regression functionals based on reweighting distributions and data, (3) decompose sampling variability of regression functionals into two sources, one due to the conditional response distribution and another due to the regressor distribution interacting with misspecification, both of order $N^{-1/2}$, (4) exhibit plug-in/sandwich estimators of standard error as limit cases of ${x extrm{-}y}$ bootstrap estimators, and (5) provide theoretical heuristics to indicate that ${x extrm{-}y}$ bootstrap standard errors may generally be preferred over sandwich estimators.




re

Models as Approximations I: Consequences Illustrated with Linear Regression

Andreas Buja, Lawrence Brown, Richard Berk, Edward George, Emil Pitkin, Mikhail Traskin, Kai Zhang, Linda Zhao.

Source: Statistical Science, Volume 34, Number 4, 523--544.

Abstract:
In the early 1980s, Halbert White inaugurated a “model-robust” form of statistical inference based on the “sandwich estimator” of standard error. This estimator is known to be “heteroskedasticity-consistent,” but it is less well known to be “nonlinearity-consistent” as well. Nonlinearity, however, raises fundamental issues because in its presence regressors are not ancillary, hence cannot be treated as fixed. The consequences are deep: (1) population slopes need to be reinterpreted as statistical functionals obtained from OLS fits to largely arbitrary joint ${x extrm{-}y}$ distributions; (2) the meaning of slope parameters needs to be rethought; (3) the regressor distribution affects the slope parameters; (4) randomness of the regressors becomes a source of sampling variability in slope estimates of order $1/sqrt{N}$; (5) inference needs to be based on model-robust standard errors, including sandwich estimators or the ${x extrm{-}y}$ bootstrap. In theory, model-robust and model-trusting standard errors can deviate by arbitrary magnitudes either way. In practice, significant deviations between them can be detected with a diagnostic test.




re

Assessing the Causal Effect of Binary Interventions from Observational Panel Data with Few Treated Units

Pantelis Samartsidis, Shaun R. Seaman, Anne M. Presanis, Matthew Hickman, Daniela De Angelis.

Source: Statistical Science, Volume 34, Number 3, 486--503.

Abstract:
Researchers are often challenged with assessing the impact of an intervention on an outcome of interest in situations where the intervention is nonrandomised, the intervention is only applied to one or few units, the intervention is binary, and outcome measurements are available at multiple time points. In this paper, we review existing methods for causal inference in these situations. We detail the assumptions underlying each method, emphasize connections between the different approaches and provide guidelines regarding their practical implementation. Several open problems are identified thus highlighting the need for future research.




re

An Overview of Semiparametric Extensions of Finite Mixture Models

Sijia Xiang, Weixin Yao, Guangren Yang.

Source: Statistical Science, Volume 34, Number 3, 391--404.

Abstract:
Finite mixture models have offered a very important tool for exploring complex data structures in many scientific areas, such as economics, epidemiology and finance. Semiparametric mixture models, which were introduced into traditional finite mixture models in the past decade, have brought forth exciting developments in their methodologies, theories, and applications. In this article, we not only provide a selective overview of the newly-developed semiparametric mixture models, but also discuss their estimation methodologies, theoretical properties if applicable, and some open questions. Recent developments are also discussed.




re

ROS Regression: Integrating Regularization with Optimal Scaling Regression

Jacqueline J. Meulman, Anita J. van der Kooij, Kevin L. W. Duisters.

Source: Statistical Science, Volume 34, Number 3, 361--390.

Abstract:
We present a methodology for multiple regression analysis that deals with categorical variables (possibly mixed with continuous ones), in combination with regularization, variable selection and high-dimensional data ($Pgg N$). Regularization and optimal scaling (OS) are two important extensions of ordinary least squares regression (OLS) that will be combined in this paper. There are two data analytic situations for which optimal scaling was developed. One is the analysis of categorical data, and the other the need for transformations because of nonlinear relationships between predictors and outcome. Optimal scaling of categorical data finds quantifications for the categories, both for the predictors and for the outcome variables, that are optimal for the regression model in the sense that they maximize the multiple correlation. When nonlinear relationships exist, nonlinear transformation of predictors and outcome maximize the multiple correlation in the same way. We will consider a variety of transformation types; typically we use step functions for categorical variables, and smooth (spline) functions for continuous variables. Both types of functions can be restricted to be monotonic, preserving the ordinal information in the data. In combination with optimal scaling, three popular regularization methods will be considered: Ridge regression, the Lasso and the Elastic Net. The resulting method will be called ROS Regression (Regularized Optimal Scaling Regression). The OS algorithm provides straightforward and efficient estimation of the regularized regression coefficients, automatically gives the Group Lasso and Blockwise Sparse Regression, and extends them by the possibility to maintain ordinal properties in the data. Extended examples are provided.




re

A Conversation with Noel Cressie

Christopher K. Wikle, Jay M. Ver Hoef.

Source: Statistical Science, Volume 34, Number 2, 349--359.

Abstract:
Noel Cressie, FAA is Director of the Centre for Environmental Informatics in the National Institute for Applied Statistics Research Australia (NIASRA) and Distinguished Professor in the School of Mathematics and Applied Statistics at the University of Wollongong, Australia. He is also Adjunct Professor at the University of Missouri (USA), Affiliate of Org 398, Science Data Understanding, at NASA’s Jet Propulsion Laboratory (USA), and a member of the Science Team for NASA’s Orbiting Carbon Observatory-2 (OCO-2) satellite. Cressie was awarded a B.Sc. with First Class Honours in Mathematics in 1972 from the University of Western Australia, and an M.A. and Ph.D. in Statistics in 1973 and 1975, respectively, from Princeton University (USA). Two brief postdoctoral periods followed, at the Centre de Morphologie Mathématique, ENSMP, in Fontainebleau (France) from April 1975–September 1975, and at Imperial College, London (UK) from September 1975–January 1976. His past appointments have been at The Flinders University of South Australia from 1976–1983, at Iowa State University (USA) from 1983–1998, and at The Ohio State University (USA) from 1998–2012. He has authored or co-authored four books and more than 280 papers in peer-reviewed outlets, covering areas that include spatial and spatio-temporal statistics, environmental statistics, empirical-Bayesian and Bayesian methods including sequential design, goodness-of-fit, and remote sensing of the environment. Many of his papers also address important questions in the sciences. Cressie is a Fellow of the Australian Academy of Science, the American Statistical Association, the Institute of Mathematical Statistics, and the Spatial Econometrics Association, and he is an Elected Member of the International Statistical Institute. Noel Cressie’s refereed, unrefereed, and other publications are available at: https://niasra.uow.edu.au/cei/people/UOW232444.html.




re

The Importance of Being Clustered: Uncluttering the Trends of Statistics from 1970 to 2015

Laura Anderlucci, Angela Montanari, Cinzia Viroli.

Source: Statistical Science, Volume 34, Number 2, 280--300.

Abstract:
In this paper, we retrace the recent history of statistics by analyzing all the papers published in five prestigious statistical journals since 1970, namely: The Annals of Statistics , Biometrika , Journal of the American Statistical Association , Journal of the Royal Statistical Society, Series B and Statistical Science . The aim is to construct a kind of “taxonomy” of the statistical papers by organizing and clustering them in main themes. In this sense being identified in a cluster means being important enough to be uncluttered in the vast and interconnected world of the statistical research. Since the main statistical research topics naturally born, evolve or die during time, we will also develop a dynamic clustering strategy, where a group in a time period is allowed to migrate or to merge into different groups in the following one. Results show that statistics is a very dynamic and evolving science, stimulated by the rise of new research questions and types of data.




re

Statistical Analysis of Zero-Inflated Nonnegative Continuous Data: A Review

Lei Liu, Ya-Chen Tina Shih, Robert L. Strawderman, Daowen Zhang, Bankole A. Johnson, Haitao Chai.

Source: Statistical Science, Volume 34, Number 2, 253--279.

Abstract:
Zero-inflated nonnegative continuous (or semicontinuous) data arise frequently in biomedical, economical, and ecological studies. Examples include substance abuse, medical costs, medical care utilization, biomarkers (e.g., CD4 cell counts, coronary artery calcium scores), single cell gene expression rates, and (relative) abundance of microbiome. Such data are often characterized by the presence of a large portion of zero values and positive continuous values that are skewed to the right and heteroscedastic. Both of these features suggest that no simple parametric distribution may be suitable for modeling such type of outcomes. In this paper, we review statistical methods for analyzing zero-inflated nonnegative outcome data. We will start with the cross-sectional setting, discussing ways to separate zero and positive values and introducing flexible models to characterize right skewness and heteroscedasticity in the positive values. We will then present models of correlated zero-inflated nonnegative continuous data, using random effects to tackle the correlation on repeated measures from the same subject and that across different parts of the model. We will also discuss expansion to related topics, for example, zero-inflated count and survival data, nonlinear covariate effects, and joint models of longitudinal zero-inflated nonnegative continuous data and survival. Finally, we will present applications to three real datasets (i.e., microbiome, medical costs, and alcohol drinking) to illustrate these methods. Example code will be provided to facilitate applications of these methods.




re

A Kernel Regression Procedure in the 3D Shape Space with an Application to Online Sales of Children’s Wear

Gregorio Quintana-Ortí, Amelia Simó.

Source: Statistical Science, Volume 34, Number 2, 236--252.

Abstract:
This paper is focused on kernel regression when the response variable is the shape of a 3D object represented by a configuration matrix of landmarks. Regression methods on this shape space are not trivial because this space has a complex finite-dimensional Riemannian manifold structure (non-Euclidean). Papers about it are scarce in the literature, the majority of them are restricted to the case of a single explanatory variable, and many of them are based on the approximated tangent space. In this paper, there are several methodological innovations. The first one is the adaptation of the general method for kernel regression analysis in manifold-valued data to the three-dimensional case of Kendall’s shape space. The second one is its generalization to the multivariate case and the addressing of the curse-of-dimensionality problem. Finally, we propose bootstrap confidence intervals for prediction. A simulation study is carried out to check the goodness of the procedure, and a comparison with a current approach is performed. Then, it is applied to a 3D database obtained from an anthropometric survey of the Spanish child population with a potential application to online sales of children’s wear.




re

Rejoinder: Bayes, Oracle Bayes, and Empirical Bayes

Bradley Efron.

Source: Statistical Science, Volume 34, Number 2, 234--235.




re

Gaussian Integrals and Rice Series in Crossing Distributions—to Compute the Distribution of Maxima and Other Features of Gaussian Processes

Georg Lindgren.

Source: Statistical Science, Volume 34, Number 1, 100--128.

Abstract:
We describe and compare how methods based on the classical Rice’s formula for the expected number, and higher moments, of level crossings by a Gaussian process stand up to contemporary numerical methods to accurately deal with crossing related characteristics of the sample paths. We illustrate the relative merits in accuracy and computing time of the Rice moment methods and the exact numerical method, developed since the late 1990s, on three groups of distribution problems, the maximum over a finite interval and the waiting time to first crossing, the length of excursions over a level, and the joint period/amplitude of oscillations. We also treat the notoriously difficult problem of dependence between successive zero crossing distances. The exact solution has been known since at least 2000, but it has remained largely unnoticed outside the ocean science community. Extensive simulation studies illustrate the accuracy of the numerical methods. As a historical introduction an attempt is made to illustrate the relation between Rice’s original formulation and arguments and the exact numerical methods.




re

Rejoinder: Response to Discussions and a Look Ahead

Vincent Dorie, Jennifer Hill, Uri Shalit, Marc Scott, Dan Cervone.

Source: Statistical Science, Volume 34, Number 1, 94--99.

Abstract:
Response to discussion of Dorie (2017), in which the authors of that piece express their gratitude to the discussants, rebut some specific criticisms, and argue that the limitations of the 2016 Atlantic Causal Inference Competition represent an exciting opportunity for future competitions in a similar mold.




re

Comment: Contributions of Model Features to BART Causal Inference Performance Using ACIC 2016 Competition Data

Nicole Bohme Carnegie.

Source: Statistical Science, Volume 34, Number 1, 90--93.

Abstract:
With a thorough exposition of the methods and results of the 2016 Atlantic Causal Inference Competition, Dorie et al. have set a new standard for reproducibility and comparability of evaluations of causal inference methods. In particular, the open-source R package aciccomp2016, which permits reproduction of all datasets used in the competition, will be an invaluable resource for evaluation of future methodological developments. Building upon results from Dorie et al., we examine whether a set of potential modifications to Bayesian Additive Regression Trees (BART)—multiple chains in model fitting, using the propensity score as a covariate, targeted maximum likelihood estimation (TMLE), and computing symmetric confidence intervals—have a stronger impact on bias, RMSE, and confidence interval coverage in combination than they do alone. We find that bias in the estimate of SATT is minimal, regardless of the BART formulation. For purposes of CI coverage, however, all proposed modifications are beneficial—alone and in combination—but use of TMLE is least beneficial for coverage and results in considerably wider confidence intervals.




re

Comment: Causal Inference Competitions: Where Should We Aim?

Ehud Karavani, Tal El-Hay, Yishai Shimoni, Chen Yanover.

Source: Statistical Science, Volume 34, Number 1, 86--89.

Abstract:
Data competitions proved to be highly beneficial to the field of machine learning, and thus expected to provide similar advantages in the field of causal inference. As participants in the 2016 and 2017 Atlantic Causal Inference Conference (ACIC) data competitions and co-organizers of the 2018 competition, we discuss the strengths of simulation-based competitions and suggest potential extensions to address their limitations. These suggested augmentations aim at making the data generating processes more realistic and gradually increase in complexity, allowing thorough investigations of algorithms’ performance. We further outline a community-wide competition framework to evaluate an end-to-end causal inference pipeline, beginning with a causal question and a database, and ending with causal estimates.




re

Comment on “Automated Versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition”

Susan Gruber, Mark J. van der Laan.

Source: Statistical Science, Volume 34, Number 1, 82--85.

Abstract:
Dorie and co-authors (DHSSC) are to be congratulated for initiating the ACIC Data Challenge. Their project engaged the community and accelerated research by providing a level playing field for comparing the performance of a priori specified algorithms. DHSSC identified themes concerning characteristics of the DGP, properties of the estimators, and inference. We discuss these themes in the context of targeted learning.




re

Matching Methods for Causal Inference: A Review and a Look Forward

Elizabeth A. Stuart

Source: Statist. Sci., Volume 25, Number 1, 1--21.

Abstract:
When estimating causal effects using observational data, it is desirable to replicate a randomized experiment as closely as possible by obtaining treated and control groups with similar covariate distributions. This goal can often be achieved by choosing well-matched samples of the original treated and control groups, thereby reducing bias due to the covariates. Since the 1970s, work on matching methods has examined how to best choose treated and control subjects for comparison. Matching methods are gaining popularity in fields such as economics, epidemiology, medicine and political science. However, until now the literature and related advice has been scattered across disciplines. Researchers who are interested in using matching methods—or developing methods related to matching—do not have a single place to turn to learn about past and current research. This paper provides a structure for thinking about matching methods and guidance on their use, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed.




re

Heteromodal Cortical Areas Encode Sensory-Motor Features of Word Meaning

The capacity to process information in conceptual form is a fundamental aspect of human cognition, yet little is known about how this type of information is encoded in the brain. Although the role of sensory and motor cortical areas has been a focus of recent debate, neuroimaging studies of concept representation consistently implicate a network of heteromodal areas that seem to support concept retrieval in general rather than knowledge related to any particular sensory-motor content. We used predictive machine learning on fMRI data to investigate the hypothesis that cortical areas in this "general semantic network" (GSN) encode multimodal information derived from basic sensory-motor processes, possibly functioning as convergence–divergence zones for distributed concept representation. An encoding model based on five conceptual attributes directly related to sensory-motor experience (sound, color, shape, manipulability, and visual motion) was used to predict brain activation patterns associated with individual lexical concepts in a semantic decision task. When the analysis was restricted to voxels in the GSN, the model was able to identify the activation patterns corresponding to individual concrete concepts significantly above chance. In contrast, a model based on five perceptual attributes of the word form performed at chance level. This pattern was reversed when the analysis was restricted to areas involved in the perceptual analysis of written word forms. These results indicate that heteromodal areas involved in semantic processing encode information about the relative importance of different sensory-motor attributes of concepts, possibly by storing particular combinations of sensory and motor features.

SIGNIFICANCE STATEMENT The present study used a predictive encoding model of word semantics to decode conceptual information from neural activity in heteromodal cortical areas. The model is based on five sensory-motor attributes of word meaning (color, shape, sound, visual motion, and manipulability) and encodes the relative importance of each attribute to the meaning of a word. This is the first demonstration that heteromodal areas involved in semantic processing can discriminate between different concepts based on sensory-motor information alone. This finding indicates that the brain represents concepts as multimodal combinations of sensory and motor representations.




re

Cleanair posters to create a smoke-free environment / designed by Biman Mullick ; published by Cleanair.

London (33 Stillness Road, London SE23 ING) : Cleanair, [198-?]




re

Each year in Britain 9,300 babies are killed by their smoking mums. / Biman Mullick.

[London?], [6th June 1990]




re

How can the smoker and the nonsmoker be equally free in the same place? George Bernard Shaw / Biman Mullick.

[London?], [199-?]




re

[Silhouette of a pregant woman smoking with death skull inside womb, 29 January 1994] / design: Biman Mullick.

London, [29 January 1994]




re

The Joyful Reduction of Uncertainty: Music Perception as a Window to Predictive Neuronal Processing




re

The 2019 Victoria’s Secret Fashion Show Is Canceled After Facing Backlash for Lack of Body Diversity

The reaction on social media has been fierce.




re

Taylor Swift, Hailey Bieber, and Tons of Other Celebs’ Favorite Leggings Are on Sale Ahead of Black Friday

Here’s where you can snag their Alo Yoga Moto leggings for less.




re

Kourtney Kardashian's Favorite Leggings Are So Good, Everyone Should Own A Pair

And they're on sale for Black Friday. 




re

These Nordstrom Cyber Monday Deals Are Giving Black Friday a Run for Its Money

This is not a drill: You can get up to 50% off at Nordstrom right now.




re

Macy’s Insane Cyber Monday Sale Ends in a Few Hours—Here Are the Best Deals

You've got exactly four hours left to take advantage of these heavily discounted prices.




re

Katie Holmes’s Affordable Sneakers Are the Star of Her Latest Outfit

Meghan Markle is also a fan of the comfy shoes.




re

These Clark Booties Are Actually Comfortable Enough to Wear All Day—and They’re on Sale

You can save 50% right now. 




re

The Comfy Sneakers That Kate Middleton, Kelly Ripa, and More Celebs Love Are on Sale at Amazon

Keep your feet comfy and your wallet fat.




re

Reese Witherspoon and I Wear the Same Comfy Hoka One One Sneakers to Run Errands 

Once you try them, you’ll never want to wear anything else




re

Forget Black Booties, Amal Clooney and J.Lo Are Wearing This Weather-Resistant Boot Trend Instead

And it’s on sale at Nordstrom.




re

Sweatsuits Should Be Your Cozy Day Uniform—and These Are Our Favorites From Amazon

This retro style is making a comeback for a reason.




re

Nike Launches Zoom Pulse Sneakers for Medical Workers Who Are On Their Feet All Day

The new style is available to shop today.




re

Shoppers Swear These $30 Colorfulkoala Leggings Are the Ultimate Lululemon Dupes

And they’re available in 19 fun prints.




re

Dopamine D1 and D2 Receptor Family Contributions to Modafinil-Induced Wakefulness

Jared W. Young
Mar 4, 2009; 29:2663-2665
Journal Club




re

Allometric Analysis Detects Brain Size-Independent Effects of Sex and Sex Chromosome Complement on Human Cerebellar Organization

Catherine Mankiw
May 24, 2017; 37:5221-5231
Development Plasticity Repair