gen

The Best and Worst Places to be a Woman in Canada 2019 : The Gender Gap in Canada’s 26 Biggest Cities

9781771254434 (print)




gen

Terrestrial hermit crab populations in the Maldives : ecology, distribution and anthropogenic impact

Steibl, Sebastian, author
9783658295417 (electronic bk.)




gen

Rediscovery of genetic and genomic resources for future food security

9811501564




gen

Recent developments on genus Chaetomium

9783030316129 (electronic bk.)




gen

Radiomics and radiogenomics in neuro-oncology : First International Workshop, RNO-AI 2019, held in conjunction with MICCAI 2019, Shenzhen, China, October 13, proceedings

Radiomics and Radiogenomics in Neuro-oncology using AI Workshop (1st : 2019 : Shenzhen Shi, China)
9783030401245




gen

Racing for the surface : pathogenesis of implant infection and advanced antimicrobial strategies

9783030344757 (electronic bk.)




gen

Population genomics : marine organisms

3030379361 electronic book




gen

Plant small RNA : biogenesis, regulation and application

9780128173367 (electronic bk.)




gen

Pathogenesis of periodontal diseases : biological concepts for clinicians

9783319537375




gen

Passive and active measurement : 21st International Conference, PAM 2020, Eugene, Oregon, USA, March 30-31, 2020, Proceedings

PAM (Conference) (21st : 2020 : Eugene, Oregon)
9783030440817




gen

Oxygen transport to tissue XLI

International Society on Oxygen Transport to Tissue. Annual Meeting (46th : 2018 : Seoul, Korea)
9783030344610 (electronic bk.)




gen

Methylotrophs : microbiology, biochemistry and genetics

9781351074513 (electronic bk.)




gen

Mental Conditioning to Perform Common Operations in General Surgery Training

9783319911649 978-3-319-91164-9




gen

LGBTQ cultures : what health care professionals need to know about sexual and gender diversity

Eliason, Michele J., author.
9781496394606 paperback




gen

Intelligent wavelet based techniques for advanced multimedia applications

Singh, Rajiv, author
9783030318734 (electronic bk.)




gen

Genomic designing of climate-smart vegetable crops

9783319974156 (electronic bk.)




gen

Genetic and metabolic engineering for improved biofuel production from lignocellulosic biomass

9780128179543 (electronic bk.)




gen

General medicine and surgery for dental practitioners

Greenwood, M. (Mark), author.
9783319977379 (electronic book)




gen

DNA beyond genes : from data storage and computing to nanobots, nanomedicine, and nanoelectronics

Demidov, Vadim V., author
9783030364342 (electronic bk.)




gen

Conservation genetics in mammals : integrative research using novel approaches

9783030333348 (electronic bk.)




gen

Clinical approaches in endodontic regeneration : current and emerging therapeutic perspectives

9783319968483 (electronic bk.)




gen

Chickpea : crop wild relatives for enhancing genetic gains

9780128183007 (electronic bk.)




gen

Brassica improvement : molecular, genetics and genomic perspectives

9783030346942 (electronic bk.)




gen

Biologic and systemic agents in dermatology

9783319668840 (electronic bk.)




gen

Beyond our genes : pathophysiology of gene and environment interaction and epigenetic inheritance

9783030352134 (electronic bk.)




gen

Atlas of male genital dermatology

Hall, Anthony, author.
9783319997506 (electronic bk.)




gen

General Notices




gen

InBios receives Emergency Use Authorization for its Smart Detect...

InBios International, Inc. announces the U.S. Food and Drug Administration (FDA) issued an emergency use authorization (EUA) for its diagnostic test that can be used immediately by CLIA...

(PRWeb April 08, 2020)

Read the full story at https://www.prweb.com/releases/inbios_receives_emergency_use_authorization_for_its_smart_detect_sars_cov_2_rrt_pcr_kit_for_detection_of_the_virus_causing_covid_19/prweb17036897.htm




gen

Penalized generalized empirical likelihood with a diverging number of general estimating equations for censored data

Niansheng Tang, Xiaodong Yan, Xingqiu Zhao.

Source: The Annals of Statistics, Volume 48, Number 1, 607--627.

Abstract:
This article considers simultaneous variable selection and parameter estimation as well as hypothesis testing in censored survival models where a parametric likelihood is not available. For the problem, we utilize certain growing dimensional general estimating equations and propose a penalized generalized empirical likelihood, where the general estimating equations are constructed based on the semiparametric efficiency bound of estimation with given moment conditions. The proposed penalized generalized empirical likelihood estimators enjoy the oracle properties, and the estimator of any fixed dimensional vector of nonzero parameters achieves the semiparametric efficiency bound asymptotically. Furthermore, we show that the penalized generalized empirical likelihood ratio test statistic has an asymptotic central chi-square distribution. The conditions of local and restricted global optimality of weighted penalized generalized empirical likelihood estimators are also discussed. We present a two-layer iterative algorithm for efficient implementation, and investigate its convergence property. The performance of the proposed methods is demonstrated by extensive simulation studies, and a real data example is provided for illustration.




gen

Asymptotic genealogies of interacting particle systems with an application to sequential Monte Carlo

Jere Koskela, Paul A. Jenkins, Adam M. Johansen, Dario Spanò.

Source: The Annals of Statistics, Volume 48, Number 1, 560--583.

Abstract:
We study weighted particle systems in which new generations are resampled from current particles with probabilities proportional to their weights. This covers a broad class of sequential Monte Carlo (SMC) methods, widely-used in applied statistics and cognate disciplines. We consider the genealogical tree embedded into such particle systems, and identify conditions, as well as an appropriate time-scaling, under which they converge to the Kingman $n$-coalescent in the infinite system size limit in the sense of finite-dimensional distributions. Thus, the tractable $n$-coalescent can be used to predict the shape and size of SMC genealogies, as we illustrate by characterising the limiting mean and variance of the tree height. SMC genealogies are known to be connected to algorithm performance, so that our results are likely to have applications in the design of new methods as well. Our conditions for convergence are strong, but we show by simulation that they do not appear to be necessary.




gen

Joint convergence of sample autocovariance matrices when $p/n o 0$ with application

Monika Bhattacharjee, Arup Bose.

Source: The Annals of Statistics, Volume 47, Number 6, 3470--3503.

Abstract:
Consider a high-dimensional linear time series model where the dimension $p$ and the sample size $n$ grow in such a way that $p/n o 0$. Let $hat{Gamma }_{u}$ be the $u$th order sample autocovariance matrix. We first show that the LSD of any symmetric polynomial in ${hat{Gamma }_{u},hat{Gamma }_{u}^{*},ugeq 0}$ exists under independence and moment assumptions on the driving sequence together with weak assumptions on the coefficient matrices. This LSD result, with some additional effort, implies the asymptotic normality of the trace of any polynomial in ${hat{Gamma }_{u},hat{Gamma }_{u}^{*},ugeq 0}$. We also study similar results for several independent MA processes. We show applications of the above results to statistical inference problems such as in estimation of the unknown order of a high-dimensional MA process and in graphical and significance tests for hypotheses on coefficient matrices of one or several such independent processes.




gen

Minimax posterior convergence rates and model selection consistency in high-dimensional DAG models based on sparse Cholesky factors

Kyoungjae Lee, Jaeyong Lee, Lizhen Lin.

Source: The Annals of Statistics, Volume 47, Number 6, 3413--3437.

Abstract:
In this paper we study the high-dimensional sparse directed acyclic graph (DAG) models under the empirical sparse Cholesky prior. Among our results, strong model selection consistency or graph selection consistency is obtained under more general conditions than those in the existing literature. Compared to Cao, Khare and Ghosh [ Ann. Statist. (2019) 47 319–348], the required conditions are weakened in terms of the dimensionality, sparsity and lower bound of the nonzero elements in the Cholesky factor. Furthermore, our result does not require the irrepresentable condition, which is necessary for Lasso-type methods. We also derive the posterior convergence rates for precision matrices and Cholesky factors with respect to various matrix norms. The obtained posterior convergence rates are the fastest among those of the existing Bayesian approaches. In particular, we prove that our posterior convergence rates for Cholesky factors are the minimax or at least nearly minimax depending on the relative size of true sparseness for the entire dimension. The simulation study confirms that the proposed method outperforms the competing methods.




gen

Distributed estimation of principal eigenspaces

Jianqing Fan, Dong Wang, Kaizheng Wang, Ziwei Zhu.

Source: The Annals of Statistics, Volume 47, Number 6, 3009--3031.

Abstract:
Principal component analysis (PCA) is fundamental to statistical machine learning. It extracts latent principal factors that contribute to the most variation of the data. When data are stored across multiple machines, however, communication cost can prohibit the computation of PCA in a central location and distributed algorithms for PCA are thus needed. This paper proposes and studies a distributed PCA algorithm: each node machine computes the top $K$ eigenvectors and transmits them to the central server; the central server then aggregates the information from all the node machines and conducts a PCA based on the aggregated information. We investigate the bias and variance for the resulting distributed estimator of the top $K$ eigenvectors. In particular, we show that for distributions with symmetric innovation, the empirical top eigenspaces are unbiased, and hence the distributed PCA is “unbiased.” We derive the rate of convergence for distributed PCA estimators, which depends explicitly on the effective rank of covariance, eigengap, and the number of machines. We show that when the number of machines is not unreasonably large, the distributed PCA performs as well as the whole sample PCA, even without full access of whole data. The theoretical results are verified by an extensive simulation study. We also extend our analysis to the heterogeneous case where the population covariance matrices are different across local machines but share similar top eigenstructures.




gen

Eigenvalue distributions of variance components estimators in high-dimensional random effects models

Zhou Fan, Iain M. Johnstone.

Source: The Annals of Statistics, Volume 47, Number 5, 2855--2886.

Abstract:
We study the spectra of MANOVA estimators for variance component covariance matrices in multivariate random effects models. When the dimensionality of the observations is large and comparable to the number of realizations of each random effect, we show that the empirical spectra of such estimators are well approximated by deterministic laws. The Stieltjes transforms of these laws are characterized by systems of fixed-point equations, which are numerically solvable by a simple iterative procedure. Our proof uses operator-valued free probability theory, and we establish a general asymptotic freeness result for families of rectangular orthogonally invariant random matrices, which is of independent interest. Our work is motivated in part by the estimation of components of covariance between multiple phenotypic traits in quantitative genetics, and we specialize our results to common experimental designs that arise in this application.




gen

Linear hypothesis testing for high dimensional generalized linear models

Chengchun Shi, Rui Song, Zhao Chen, Runze Li.

Source: The Annals of Statistics, Volume 47, Number 5, 2671--2703.

Abstract:
This paper is concerned with testing linear hypotheses in high dimensional generalized linear models. To deal with linear hypotheses, we first propose the constrained partial regularization method and study its statistical properties. We further introduce an algorithm for solving regularization problems with folded-concave penalty functions and linear constraints. To test linear hypotheses, we propose a partial penalized likelihood ratio test, a partial penalized score test and a partial penalized Wald test. We show that the limiting null distributions of these three test statistics are $chi^{2}$ distribution with the same degrees of freedom, and under local alternatives, they asymptotically follow noncentral $chi^{2}$ distributions with the same degrees of freedom and noncentral parameter, provided the number of parameters involved in the test hypothesis grows to $infty$ at a certain rate. Simulation studies are conducted to examine the finite sample performance of the proposed tests. Empirical analysis of a real data example is used to illustrate the proposed testing procedures.




gen

Semi-supervised inference: General theory and estimation of means

Anru Zhang, Lawrence D. Brown, T. Tony Cai.

Source: The Annals of Statistics, Volume 47, Number 5, 2538--2566.

Abstract:
We propose a general semi-supervised inference framework focused on the estimation of the population mean. As usual in semi-supervised settings, there exists an unlabeled sample of covariate vectors and a labeled sample consisting of covariate vectors along with real-valued responses (“labels”). Otherwise, the formulation is “assumption-lean” in that no major conditions are imposed on the statistical or functional form of the data. We consider both the ideal semi-supervised setting where infinitely many unlabeled samples are available, as well as the ordinary semi-supervised setting in which only a finite number of unlabeled samples is available. Estimators are proposed along with corresponding confidence intervals for the population mean. Theoretical analysis on both the asymptotic distribution and $ell_{2}$-risk for the proposed procedures are given. Surprisingly, the proposed estimators, based on a simple form of the least squares method, outperform the ordinary sample mean. The simple, transparent form of the estimator lends confidence to the perception that its asymptotic improvement over the ordinary sample mean also nearly holds even for moderate size samples. The method is further extended to a nonparametric setting, in which the oracle rate can be achieved asymptotically. The proposed estimators are further illustrated by simulation studies and a real data example involving estimation of the homeless population.




gen

Isotonic regression in general dimensions

Qiyang Han, Tengyao Wang, Sabyasachi Chatterjee, Richard J. Samworth.

Source: The Annals of Statistics, Volume 47, Number 5, 2440--2471.

Abstract:
We study the least squares regression function estimator over the class of real-valued functions on $[0,1]^{d}$ that are increasing in each coordinate. For uniformly bounded signals and with a fixed, cubic lattice design, we establish that the estimator achieves the minimax rate of order $n^{-min{2/(d+2),1/d}}$ in the empirical $L_{2}$ loss, up to polylogarithmic factors. Further, we prove a sharp oracle inequality, which reveals in particular that when the true regression function is piecewise constant on $k$ hyperrectangles, the least squares estimator enjoys a faster, adaptive rate of convergence of $(k/n)^{min(1,2/d)}$, again up to polylogarithmic factors. Previous results are confined to the case $dleq2$. Finally, we establish corresponding bounds (which are new even in the case $d=2$) in the more challenging random design setting. There are two surprising features of these results: first, they demonstrate that it is possible for a global empirical risk minimisation procedure to be rate optimal up to polylogarithmic factors even when the corresponding entropy integral for the function class diverges rapidly; second, they indicate that the adaptation rate for shape-constrained estimators can be strictly worse than the parametric rate.




gen

Convergence complexity analysis of Albert and Chib’s algorithm for Bayesian probit regression

Qian Qin, James P. Hobert.

Source: The Annals of Statistics, Volume 47, Number 4, 2320--2347.

Abstract:
The use of MCMC algorithms in high dimensional Bayesian problems has become routine. This has spurred so-called convergence complexity analysis, the goal of which is to ascertain how the convergence rate of a Monte Carlo Markov chain scales with sample size, $n$, and/or number of covariates, $p$. This article provides a thorough convergence complexity analysis of Albert and Chib’s [ J. Amer. Statist. Assoc. 88 (1993) 669–679] data augmentation algorithm for the Bayesian probit regression model. The main tools used in this analysis are drift and minorization conditions. The usual pitfalls associated with this type of analysis are avoided by utilizing centered drift functions, which are minimized in high posterior probability regions, and by using a new technique to suppress high-dimensionality in the construction of minorization conditions. The main result is that the geometric convergence rate of the underlying Markov chain is bounded below 1 both as $n ightarrowinfty$ (with $p$ fixed), and as $p ightarrowinfty$ (with $n$ fixed). Furthermore, the first computable bounds on the total variation distance to stationarity are byproducts of the asymptotic analysis.




gen

Convergence rates of least squares regression estimators with heavy-tailed errors

Qiyang Han, Jon A. Wellner.

Source: The Annals of Statistics, Volume 47, Number 4, 2286--2319.

Abstract:
We study the performance of the least squares estimator (LSE) in a general nonparametric regression model, when the errors are independent of the covariates but may only have a $p$th moment ($pgeq1$). In such a heavy-tailed regression setting, we show that if the model satisfies a standard “entropy condition” with exponent $alphain(0,2)$, then the $L_{2}$ loss of the LSE converges at a rate [mathcal{O}_{mathbf{P}}igl(n^{-frac{1}{2+alpha}}vee n^{-frac{1}{2}+frac{1}{2p}}igr).] Such a rate cannot be improved under the entropy condition alone. This rate quantifies both some positive and negative aspects of the LSE in a heavy-tailed regression setting. On the positive side, as long as the errors have $pgeq1+2/alpha$ moments, the $L_{2}$ loss of the LSE converges at the same rate as if the errors are Gaussian. On the negative side, if $p<1+2/alpha$, there are (many) hard models at any entropy level $alpha$ for which the $L_{2}$ loss of the LSE converges at a strictly slower rate than other robust estimators. The validity of the above rate relies crucially on the independence of the covariates and the errors. In fact, the $L_{2}$ loss of the LSE can converge arbitrarily slowly when the independence fails. The key technical ingredient is a new multiplier inequality that gives sharp bounds for the “multiplier empirical process” associated with the LSE. We further give an application to the sparse linear regression model with heavy-tailed covariates and errors to demonstrate the scope of this new inequality.




gen

Negative association, ordering and convergence of resampling methods

Mathieu Gerber, Nicolas Chopin, Nick Whiteley.

Source: The Annals of Statistics, Volume 47, Number 4, 2236--2260.

Abstract:
We study convergence and convergence rates for resampling schemes. Our first main result is a general consistency theorem based on the notion of negative association, which is applied to establish the almost sure weak convergence of measures output from Kitagawa’s [ J. Comput. Graph. Statist. 5 (1996) 1–25] stratified resampling method. Carpenter, Ckiffird and Fearnhead’s [ IEE Proc. Radar Sonar Navig. 146 (1999) 2–7] systematic resampling method is similar in structure but can fail to converge depending on the order of the input samples. We introduce a new resampling algorithm based on a stochastic rounding technique of [In 42nd IEEE Symposium on Foundations of Computer Science ( Las Vegas , NV , 2001) (2001) 588–597 IEEE Computer Soc.], which shares some attractive properties of systematic resampling, but which exhibits negative association and, therefore, converges irrespective of the order of the input samples. We confirm a conjecture made by [ J. Comput. Graph. Statist. 5 (1996) 1–25] that ordering input samples by their states in $mathbb{R}$ yields a faster rate of convergence; we establish that when particles are ordered using the Hilbert curve in $mathbb{R}^{d}$, the variance of the resampling error is ${scriptstylemathcal{O}}(N^{-(1+1/d)})$ under mild conditions, where $N$ is the number of particles. We use these results to establish asymptotic properties of particle algorithms based on resampling schemes that differ from multinomial resampling.




gen

Generalized cluster trees and singular measures

Yen-Chi Chen.

Source: The Annals of Statistics, Volume 47, Number 4, 2174--2203.

Abstract:
In this paper we study the $alpha $-cluster tree ($alpha $-tree) under both singular and nonsingular measures. The $alpha $-tree uses probability contents within a set created by the ordering of points to construct a cluster tree so that it is well defined even for singular measures. We first derive the convergence rate for a density level set around critical points, which leads to the convergence rate for estimating an $alpha $-tree under nonsingular measures. For singular measures, we study how the kernel density estimator (KDE) behaves and prove that the KDE is not uniformly consistent but pointwise consistent after rescaling. We further prove that the estimated $alpha $-tree fails to converge in the $L_{infty }$ metric but is still consistent under the integrated distance. We also observe a new type of critical points—the dimensional critical points (DCPs)—of a singular measure. DCPs are points that contribute to cluster tree topology but cannot be defined using density gradient. Building on the analysis of the KDE and DCPs, we prove the topological consistency of an estimated $alpha $-tree.




gen

Correction: Sensitivity analysis for an unobserved moderator in RCT-to-target-population generalization of treatment effects

Trang Quynh Nguyen, Elizabeth A. Stuart.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 518--520.




gen

A comparison of principal component methods between multiple phenotype regression and multiple SNP regression in genetic association studies

Zhonghua Liu, Ian Barnett, Xihong Lin.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 433--451.

Abstract:
Principal component analysis (PCA) is a popular method for dimension reduction in unsupervised multivariate analysis. However, existing ad hoc uses of PCA in both multivariate regression (multiple outcomes) and multiple regression (multiple predictors) lack theoretical justification. The differences in the statistical properties of PCAs in these two regression settings are not well understood. In this paper we provide theoretical results on the power of PCA in genetic association testings in both multiple phenotype and SNP-set settings. The multiple phenotype setting refers to the case when one is interested in studying the association between a single SNP and multiple phenotypes as outcomes. The SNP-set setting refers to the case when one is interested in studying the association between multiple SNPs in a SNP set and a single phenotype as the outcome. We demonstrate analytically that the properties of the PC-based analysis in these two regression settings are substantially different. We show that the lower order PCs, that is, PCs with large eigenvalues, are generally preferred and lead to a higher power in the SNP-set setting, while the higher-order PCs, that is, PCs with small eigenvalues, are generally preferred in the multiple phenotype setting. We also investigate the power of three other popular statistical methods, the Wald test, the variance component test and the minimum $p$-value test, in both multiple phenotype and SNP-set settings. We use theoretical power, simulation studies, and two real data analyses to validate our findings.




gen

Estimating and forecasting the smoking-attributable mortality fraction for both genders jointly in over 60 countries

Yicheng Li, Adrian E. Raftery.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 381--408.

Abstract:
Smoking is one of the leading preventable threats to human health and a major risk factor for lung cancer, upper aerodigestive cancer and chronic obstructive pulmonary disease. Estimating and forecasting the smoking attributable fraction (SAF) of mortality can yield insights into smoking epidemics and also provide a basis for more accurate mortality and life expectancy projection. Peto et al. ( Lancet 339 (1992) 1268–1278) proposed a method to estimate the SAF using the lung cancer mortality rate as an indicator of exposure to smoking in the population of interest. Here, we use the same method to estimate the all-age SAF (ASAF) for both genders for over 60 countries. We document a strong and cross-nationally consistent pattern of the evolution of the SAF over time. We use this as the basis for a new Bayesian hierarchical model to project future male and female ASAF from over 60 countries simultaneously. This gives forecasts as well as predictive distributions that can be used to find uncertainty intervals for any quantity of interest. We assess the model using out-of-sample predictive validation and find that it provides good forecasts and well-calibrated forecast intervals, comparing favorably with other methods.




gen

Feature selection for generalized varying coefficient mixed-effect models with application to obesity GWAS

Wanghuan Chu, Runze Li, Jingyuan Liu, Matthew Reimherr.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 276--298.

Abstract:
Motivated by an empirical analysis of data from a genome-wide association study on obesity, measured by the body mass index (BMI), we propose a two-step gene-detection procedure for generalized varying coefficient mixed-effects models with ultrahigh dimensional covariates. The proposed procedure selects significant single nucleotide polymorphisms (SNPs) impacting the mean BMI trend, some of which have already been biologically proven to be “fat genes.” The method also discovers SNPs that significantly influence the age-dependent variability of BMI. The proposed procedure takes into account individual variations of genetic effects and can also be directly applied to longitudinal data with continuous, binary or count responses. We employ Monte Carlo simulation studies to assess the performance of the proposed method and further carry out causal inference for the selected SNPs.




gen

Modifying the Chi-square and the CMH test for population genetic inference: Adapting to overdispersion

Kerstin Spitzer, Marta Pelizzola, Andreas Futschik.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 202--220.

Abstract:
Evolve and resequence studies provide a popular approach to simulate evolution in the lab and explore its genetic basis. In this context, Pearson’s chi-square test, Fisher’s exact test as well as the Cochran–Mantel–Haenszel test are commonly used to infer genomic positions affected by selection from temporal changes in allele frequency. However, the null model associated with these tests does not match the null hypothesis of actual interest. Indeed, due to genetic drift and possibly other additional noise components such as pool sequencing, the null variance in the data can be substantially larger than accounted for by these common test statistics. This leads to $p$-values that are systematically too small and, therefore, a huge number of false positive results. Even, if the ranking rather than the actual $p$-values is of interest, a naive application of the mentioned tests will give misleading results, as the amount of overdispersion varies from locus to locus. We therefore propose adjusted statistics that take the overdispersion into account while keeping the formulas simple. This is particularly useful in genome-wide applications, where millions of SNPs can be handled with little computational effort. We then apply the adapted test statistics to real data from Drosophila and investigate how information from intermediate generations can be included when available. We also discuss further applications such as genome-wide association studies based on pool sequencing data and tests for local adaptation.




gen

A general theory for preferential sampling in environmental networks

Joe Watson, James V. Zidek, Gavin Shaddick.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2662--2700.

Abstract:
This paper presents a general model framework for detecting the preferential sampling of environmental monitors recording an environmental process across space and/or time. This is achieved by considering the joint distribution of an environmental process with a site-selection process that considers where and when sites are placed to measure the process. The environmental process may be spatial, temporal or spatio-temporal in nature. By sharing random effects between the two processes, the joint model is able to establish whether site placement was stochastically dependent of the environmental process under study. Furthermore, if stochastic dependence is identified between the two processes, then inferences about the probability distribution of the spatio-temporal process will change, as will predictions made of the process across space and time. The embedding into a spatio-temporal framework also allows for the modelling of the dynamic site-selection process itself. Real-world factors affecting both the size and location of the network can be easily modelled and quantified. Depending upon the choice of the population of locations considered for selection across space and time under the site-selection process, different insights about the precise nature of preferential sampling can be obtained. The general framework developed in the paper is designed to be easily and quickly fit using the R-INLA package. We apply this framework to a case study involving particulate air pollution over the UK where a major reduction in the size of a monitoring network through time occurred. It is demonstrated that a significant response-biased reduction in the air quality monitoring network occurred, namely the relocation of monitoring sites to locations with the highest pollution levels, and the routine removal of sites at locations with the lowest. We also show that the network was consistently unrepresenting levels of particulate matter seen across much of GB throughout the operating life of the network. Finally we show that this may have led to a severe overreporting of the population-average exposure levels experienced across GB. This could have great impacts on estimates of the health effects of black smoke levels.




gen

A simple, consistent estimator of SNP heritability from genome-wide association studies

Armin Schwartzman, Andrew J. Schork, Rong Zablocki, Wesley K. Thompson.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2509--2538.

Abstract:
Analysis of genome-wide association studies (GWAS) is characterized by a large number of univariate regressions where a quantitative trait is regressed on hundreds of thousands to millions of single-nucleotide polymorphism (SNP) allele counts, one at a time. This article proposes an estimator of the SNP heritability of the trait, defined here as the fraction of the variance of the trait explained by the SNPs in the study. The proposed GWAS heritability (GWASH) estimator is easy to compute, highly interpretable and is consistent as the number of SNPs and the sample size increase. More importantly, it can be computed from summary statistics typically reported in GWAS, not requiring access to the original data. The estimator takes full account of the linkage disequilibrium (LD) or correlation between the SNPs in the study through moments of the LD matrix, estimable from auxiliary datasets. Unlike other proposed estimators in the literature, we establish the theoretical properties of the GWASH estimator and obtain analytical estimates of the precision, allowing for power and sample size calculations for SNP heritability estimates and forming a firm foundation for future methodological development.




gen

A semiparametric modeling approach using Bayesian Additive Regression Trees with an application to evaluate heterogeneous treatment effects

Bret Zeldow, Vincent Lo Re III, Jason Roy.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1989--2010.

Abstract:
Bayesian Additive Regression Trees (BART) is a flexible machine learning algorithm capable of capturing nonlinearities between an outcome and covariates and interactions among covariates. We extend BART to a semiparametric regression framework in which the conditional expectation of an outcome is a function of treatment, its effect modifiers, and confounders. The confounders are allowed to have unspecified functional form, while treatment and effect modifiers that are directly related to the research question are given a linear form. The result is a Bayesian semiparametric linear regression model where the posterior distribution of the parameters of the linear part can be interpreted as in parametric Bayesian regression. This is useful in situations where a subset of the variables are of substantive interest and the others are nuisance variables that we would like to control for. An example of this occurs in causal modeling with the structural mean model (SMM). Under certain causal assumptions, our method can be used as a Bayesian SMM. Our methods are demonstrated with simulation studies and an application to dataset involving adults with HIV/Hepatitis C coinfection who newly initiate antiretroviral therapy. The methods are available in an R package called semibart.




gen

Radio-iBAG: Radiomics-based integrative Bayesian analysis of multiplatform genomic data

Youyi Zhang, Jeffrey S. Morris, Shivali Narang Aerry, Arvind U. K. Rao, Veerabhadran Baladandayuthapani.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1957--1988.

Abstract:
Technological innovations have produced large multi-modal datasets that include imaging and multi-platform genomics data. Integrative analyses of such data have the potential to reveal important biological and clinical insights into complex diseases like cancer. In this paper, we present Bayesian approaches for integrative analysis of radiological imaging and multi-platform genomic data, where-in our goals are to simultaneously identify genomic and radiomic, that is, radiology-based imaging markers, along with the latent associations between these two modalities, and to detect the overall prognostic relevance of the combined markers. For this task, we propose Radio-iBAG: Radiomics-based Integrative Bayesian Analysis of Multiplatform Genomic Data , a multi-scale Bayesian hierarchical model that involves several innovative strategies: it incorporates integrative analysis of multi-platform genomic data sets to capture fundamental biological relationships; explores the associations between radiomic markers accompanying genomic information with clinical outcomes; and detects genomic and radiomic markers associated with clinical prognosis. We also introduce the use of sparse Principal Component Analysis (sPCA) to extract a sparse set of approximately orthogonal meta-features each containing information from a set of related individual radiomic features, reducing dimensionality and combining like features. Our methods are motivated by and applied to The Cancer Genome Atlas glioblastoma multiforme data set, where-in we integrate magnetic resonance imaging-based biomarkers along with genomic, epigenomic and transcriptomic data. Our model identifies important magnetic resonance imaging features and the associated genomic platforms that are related with patient survival times.