era Margin-Based Generalization Lower Bounds for Boosted Classifiers. (arXiv:1909.12518v4 [cs.LG] UPDATED) By arxiv.org Published On :: Boosting is one of the most successful ideas in machine learning. The most well-accepted explanations for the low generalization error of boosting algorithms such as AdaBoost stem from margin theory. The study of margins in the context of boosting algorithms was initiated by Schapire, Freund, Bartlett and Lee (1998) and has inspired numerous boosting algorithms and generalization bounds. To date, the strongest known generalization (upper bound) is the $k$th margin bound of Gao and Zhou (2013). Despite the numerous generalization upper bounds that have been proved over the last two decades, nothing is known about the tightness of these bounds. In this paper, we give the first margin-based lower bounds on the generalization error of boosted classifiers. Our lower bounds nearly match the $k$th margin bound and thus almost settle the generalization performance of boosted classifiers in terms of margins. Full Article
era Nonstationary Bayesian modeling for a large data set of derived surface temperature return values. (arXiv:2005.03658v1 [stat.ME]) By arxiv.org Published On :: Heat waves resulting from prolonged extreme temperatures pose a significant risk to human health globally. Given the limitations of observations of extreme temperature, climate models are often used to characterize extreme temperature globally, from which one can derive quantities like return values to summarize the magnitude of a low probability event for an arbitrary geographic location. However, while these derived quantities are useful on their own, it is also often important to apply a spatial statistical model to such data in order to, e.g., understand how the spatial dependence properties of the return values vary over space and emulate the climate model for generating additional spatial fields with corresponding statistical properties. For these objectives, when modeling global data it is critical to use a nonstationary covariance function. Furthermore, given that the output of modern global climate models can be on the order of $mathcal{O}(10^4)$, it is important to utilize approximate Gaussian process methods to enable inference. In this paper, we demonstrate the application of methodology introduced in Risser and Turek (2020) to conduct a nonstationary and fully Bayesian analysis of a large data set of 20-year return values derived from an ensemble of global climate model runs with over 50,000 spatial locations. This analysis uses the freely available BayesNSGP software package for R. Full Article
era Generative Feature Replay with Orthogonal Weight Modification for Continual Learning. (arXiv:2005.03490v1 [cs.LG]) By arxiv.org Published On :: The ability of intelligent agents to learn and remember multiple tasks sequentially is crucial to achieving artificial general intelligence. Many continual learning (CL) methods have been proposed to overcome catastrophic forgetting. Catastrophic forgetting notoriously impedes the sequential learning of neural networks as the data of previous tasks are unavailable. In this paper we focus on class incremental learning, a challenging CL scenario, in which classes of each task are disjoint and task identity is unknown during test. For this scenario, generative replay is an effective strategy which generates and replays pseudo data for previous tasks to alleviate catastrophic forgetting. However, it is not trivial to learn a generative model continually for relatively complex data. Based on recently proposed orthogonal weight modification (OWM) algorithm which can keep previously learned input-output mappings invariant approximately when learning new tasks, we propose to directly generate and replay feature. Empirical results on image and text datasets show our method can improve OWM consistently by a significant margin while conventional generative replay always results in a negative effect. Our method also beats a state-of-the-art generative replay method and is competitive with a strong baseline based on real data storage. Full Article
era A stochastic user-operator assignment game for microtransit service evaluation: A case study of Kussbus in Luxembourg. (arXiv:2005.03465v1 [physics.soc-ph]) By arxiv.org Published On :: This paper proposes a stochastic variant of the stable matching model from Rasulkhani and Chow [1] which allows microtransit operators to evaluate their operation policy and resource allocations. The proposed model takes into account the stochastic nature of users' travel utility perception, resulting in a probabilistic stable operation cost allocation outcome to design ticket price and ridership forecasting. We applied the model for the operation policy evaluation of a microtransit service in Luxembourg and its border area. The methodology for the model parameters estimation and calibration is developed. The results provide useful insights for the operator and the government to improve the ridership of the service. Full Article
era Curious Hierarchical Actor-Critic Reinforcement Learning. (arXiv:2005.03420v1 [cs.LG]) By arxiv.org Published On :: Hierarchical abstraction and curiosity-driven exploration are two common paradigms in current reinforcement learning approaches to break down difficult problems into a sequence of simpler ones and to overcome reward sparsity. However, there is a lack of approaches that combine these paradigms, and it is currently unknown whether curiosity also helps to perform the hierarchical abstraction. As a novelty and scientific contribution, we tackle this issue and develop a method that combines hierarchical reinforcement learning with curiosity. Herein, we extend a contemporary hierarchical actor-critic approach with a forward model to develop a hierarchical notion of curiosity. We demonstrate in several continuous-space environments that curiosity approximately doubles the learning performance and success rates for most of the investigated benchmarking problems. Full Article
era Fair Algorithms for Hierarchical Agglomerative Clustering. (arXiv:2005.03197v1 [cs.LG]) By arxiv.org Published On :: Hierarchical Agglomerative Clustering (HAC) algorithms are extensively utilized in modern data science and machine learning, and seek to partition the dataset into clusters while generating a hierarchical relationship between the data samples themselves. HAC algorithms are employed in a number of applications, such as biology, natural language processing, and recommender systems. Thus, it is imperative to ensure that these algorithms are fair-- even if the dataset contains biases against certain protected groups, the cluster outputs generated should not be discriminatory against samples from any of these groups. However, recent work in clustering fairness has mostly focused on center-based clustering algorithms, such as k-median and k-means clustering. Therefore, in this paper, we propose fair algorithms for performing HAC that enforce fairness constraints 1) irrespective of the distance linkage criteria used, 2) generalize to any natural measures of clustering fairness for HAC, 3) work for multiple protected groups, and 4) have competitive running times to vanilla HAC. To the best of our knowledge, this is the first work that studies fairness for HAC algorithms. We also propose an algorithm with lower asymptotic time complexity than HAC algorithms that can rectify existing HAC outputs and make them subsequently fair as a result. Moreover, we carry out extensive experiments on multiple real-world UCI datasets to demonstrate the working of our algorithms. Full Article
era Shortlists announced for 2020 NSW Premier’s Literary Awards By feedproxy.google.com Published On :: Thu, 19 Mar 2020 21:24:32 +0000 Friday 20 March 2020 Contemporary works by leading and emerging Australian writers have been shortlisted for the 2020 NSW Premier's Literary Awards, the State Library of NSW announced today. Full Article
era 2020 NSW Premier’s Literary Awards announced By feedproxy.google.com Published On :: Sat, 25 Apr 2020 01:29:17 +0000 Sunday 26 April 2020 A total of $295,000 awarded across 12 prize categories. Full Article
era 2020 NSW Premier’s Literary Awards announced By feedproxy.google.com Published On :: Sat, 25 Apr 2020 01:12:46 +0000 A total of $295,000 awarded across 12 prize categories. Full Article
era Theranostics approaches to gastric and colon cancer By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9789811520174 (electronic bk.) Full Article
era The interaction of food industry and environment By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780128175156 (electronic bk.) Full Article
era Salt, fat and sugar reduction : sensory approaches for nutritional reformulation of foods and beverages By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: O'Sullivan, Maurice G., authorCallnumber: OnlineISBN: 9780128226124 (electronic bk.) Full Article
era Regulation of cancer immune checkpoints : molecular and cellular mechanisms and therapy By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9789811532665 Full Article
era Priming-mediated stress and cross-stress tolerance in crop plants By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780128178935 (electronic bk.) Full Article
era Plant-fire interactions : applying ecophysiology to wildfire management By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Resco de Dios, Víctor, authorCallnumber: OnlineISBN: 9783030411923 (electronic book) Full Article
era Personalized food intervention and therapy for autism spectrum disorder management By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783030304027 (electronic bk.) Full Article
era Ocular therapeutics handbook : a clinical manual By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Onofrey, Bruce E., author.Callnumber: OnlineISBN: 197510904X Full Article
era Mental Conditioning to Perform Common Operations in General Surgery Training By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783319911649 978-3-319-91164-9 Full Article
era Interaction of nanomaterials with the immune system By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783030339623 (electronic bk.) Full Article
era Health consequences of microbial interactions with hydrocarbons, oils, and lipids By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783319724737 (electronic bk.) Full Article
era General medicine and surgery for dental practitioners By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Greenwood, M. (Mark), author.Callnumber: OnlineISBN: 9783319977379 (electronic book) Full Article
era Functional foods in cancer prevention and therapy By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780128165386 (electronic bk.) Full Article
era Forest-water interactions By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783030260866 (electronic bk.) Full Article
era Enterprise information systems : 21st International Conference, ICEIS 2019, Heraklion, Crete, Greece, May 3-5, 2019, Revised Selected Papers By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: International Conference on Enterprise Information Systems (21st : 2019 : Ērakleion, Greece)Callnumber: OnlineISBN: 9783030407834 (electronic bk.) Full Article
era Cullin-RING ligases and protein neddylation : biology and therapeutics By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9789811510250 (electronic bk.) Full Article
era Consequences of microbial interactions with hydrocarbons, oils, and lipids : biodegradation and bioremediation By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783319445359 (electronic bk.) Full Article
era Clinical approaches in endodontic regeneration : current and emerging therapeutic perspectives By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783319968483 (electronic bk.) Full Article
era Climate change and soil interactions By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780128180334 (electronic bk.) Full Article
era Beyond our genes : pathophysiology of gene and environment interaction and epigenetic inheritance By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783030352134 (electronic bk.) Full Article
era Bacteriophages : biology, technology, therapy By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783319405988 electronic book Full Article
era 100 cases in clinical pharmacology, therapeutics and prescribing By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Layne, Kerry, author.Callnumber: OnlineISBN: 9780429624537 electronic book Full Article
era General Notices By www.eastgwillimbury.ca Published On :: Sun, 03 May 2020 16:28:42 GMT Full Article
era Penalized generalized empirical likelihood with a diverging number of general estimating equations for censored data By projecteuclid.org Published On :: Mon, 17 Feb 2020 04:02 EST Niansheng Tang, Xiaodong Yan, Xingqiu Zhao. Source: The Annals of Statistics, Volume 48, Number 1, 607--627.Abstract: This article considers simultaneous variable selection and parameter estimation as well as hypothesis testing in censored survival models where a parametric likelihood is not available. For the problem, we utilize certain growing dimensional general estimating equations and propose a penalized generalized empirical likelihood, where the general estimating equations are constructed based on the semiparametric efficiency bound of estimation with given moment conditions. The proposed penalized generalized empirical likelihood estimators enjoy the oracle properties, and the estimator of any fixed dimensional vector of nonzero parameters achieves the semiparametric efficiency bound asymptotically. Furthermore, we show that the penalized generalized empirical likelihood ratio test statistic has an asymptotic central chi-square distribution. The conditions of local and restricted global optimality of weighted penalized generalized empirical likelihood estimators are also discussed. We present a two-layer iterative algorithm for efficient implementation, and investigate its convergence property. The performance of the proposed methods is demonstrated by extensive simulation studies, and a real data example is provided for illustration. Full Article
era Asymptotic genealogies of interacting particle systems with an application to sequential Monte Carlo By projecteuclid.org Published On :: Mon, 17 Feb 2020 04:02 EST Jere Koskela, Paul A. Jenkins, Adam M. Johansen, Dario Spanò. Source: The Annals of Statistics, Volume 48, Number 1, 560--583.Abstract: We study weighted particle systems in which new generations are resampled from current particles with probabilities proportional to their weights. This covers a broad class of sequential Monte Carlo (SMC) methods, widely-used in applied statistics and cognate disciplines. We consider the genealogical tree embedded into such particle systems, and identify conditions, as well as an appropriate time-scaling, under which they converge to the Kingman $n$-coalescent in the infinite system size limit in the sense of finite-dimensional distributions. Thus, the tractable $n$-coalescent can be used to predict the shape and size of SMC genealogies, as we illustrate by characterising the limiting mean and variance of the tree height. SMC genealogies are known to be connected to algorithm performance, so that our results are likely to have applications in the design of new methods as well. Our conditions for convergence are strong, but we show by simulation that they do not appear to be necessary. Full Article
era Averages of unlabeled networks: Geometric characterization and asymptotic behavior By projecteuclid.org Published On :: Mon, 17 Feb 2020 04:02 EST Eric D. Kolaczyk, Lizhen Lin, Steven Rosenberg, Jackson Walters, Jie Xu. Source: The Annals of Statistics, Volume 48, Number 1, 514--538.Abstract: It is becoming increasingly common to see large collections of network data objects, that is, data sets in which a network is viewed as a fundamental unit of observation. As a result, there is a pressing need to develop network-based analogues of even many of the most basic tools already standard for scalar and vector data. In this paper, our focus is on averages of unlabeled, undirected networks with edge weights. Specifically, we (i) characterize a certain notion of the space of all such networks, (ii) describe key topological and geometric properties of this space relevant to doing probability and statistics thereupon, and (iii) use these properties to establish the asymptotic behavior of a generalized notion of an empirical mean under sampling from a distribution supported on this space. Our results rely on a combination of tools from geometry, probability theory and statistical shape analysis. In particular, the lack of vertex labeling necessitates working with a quotient space modding out permutations of labels. This results in a nontrivial geometry for the space of unlabeled networks, which in turn is found to have important implications on the types of probabilistic and statistical results that may be obtained and the techniques needed to obtain them. Full Article
era Rerandomization in $2^{K}$ factorial experiments By projecteuclid.org Published On :: Mon, 17 Feb 2020 04:02 EST Xinran Li, Peng Ding, Donald B. Rubin. Source: The Annals of Statistics, Volume 48, Number 1, 43--63.Abstract: With many pretreatment covariates and treatment factors, the classical factorial experiment often fails to balance covariates across multiple factorial effects simultaneously. Therefore, it is intuitive to restrict the randomization of the treatment factors to satisfy certain covariate balance criteria, possibly conforming to the tiers of factorial effects and covariates based on their relative importances. This is rerandomization in factorial experiments. We study the asymptotic properties of this experimental design under the randomization inference framework without imposing any distributional or modeling assumptions of the covariates and outcomes. We derive the joint asymptotic sampling distribution of the usual estimators of the factorial effects, and show that it is symmetric, unimodal and more “concentrated” at the true factorial effects under rerandomization than under the classical factorial experiment. We quantify this advantage of rerandomization using the notions of “central convex unimodality” and “peakedness” of the joint asymptotic sampling distribution. We also construct conservative large-sample confidence sets for the factorial effects. Full Article
era An operator theoretic approach to nonparametric mixture models By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Robert A. Vandermeulen, Clayton D. Scott. Source: The Annals of Statistics, Volume 47, Number 5, 2704--2733.Abstract: When estimating finite mixture models, it is common to make assumptions on the mixture components, such as parametric assumptions. In this work, we make no distributional assumptions on the mixture components and instead assume that observations from the mixture model are grouped, such that observations in the same group are known to be drawn from the same mixture component. We precisely characterize the number of observations $n$ per group needed for the mixture model to be identifiable, as a function of the number $m$ of mixture components. In addition to our assumption-free analysis, we also study the settings where the mixture components are either linearly independent or jointly irreducible. Furthermore, our analysis considers two kinds of identifiability, where the mixture model is the simplest one explaining the data, and where it is the only one. As an application of these results, we precisely characterize identifiability of multinomial mixture models. Our analysis relies on an operator-theoretic framework that associates mixture models in the grouped-sample setting with certain infinite-dimensional tensors. Based on this framework, we introduce a general spectral algorithm for recovering the mixture components. Full Article
era Linear hypothesis testing for high dimensional generalized linear models By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Chengchun Shi, Rui Song, Zhao Chen, Runze Li. Source: The Annals of Statistics, Volume 47, Number 5, 2671--2703.Abstract: This paper is concerned with testing linear hypotheses in high dimensional generalized linear models. To deal with linear hypotheses, we first propose the constrained partial regularization method and study its statistical properties. We further introduce an algorithm for solving regularization problems with folded-concave penalty functions and linear constraints. To test linear hypotheses, we propose a partial penalized likelihood ratio test, a partial penalized score test and a partial penalized Wald test. We show that the limiting null distributions of these three test statistics are $chi^{2}$ distribution with the same degrees of freedom, and under local alternatives, they asymptotically follow noncentral $chi^{2}$ distributions with the same degrees of freedom and noncentral parameter, provided the number of parameters involved in the test hypothesis grows to $infty$ at a certain rate. Simulation studies are conducted to examine the finite sample performance of the proposed tests. Empirical analysis of a real data example is used to illustrate the proposed testing procedures. Full Article
era Semi-supervised inference: General theory and estimation of means By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Anru Zhang, Lawrence D. Brown, T. Tony Cai. Source: The Annals of Statistics, Volume 47, Number 5, 2538--2566.Abstract: We propose a general semi-supervised inference framework focused on the estimation of the population mean. As usual in semi-supervised settings, there exists an unlabeled sample of covariate vectors and a labeled sample consisting of covariate vectors along with real-valued responses (“labels”). Otherwise, the formulation is “assumption-lean” in that no major conditions are imposed on the statistical or functional form of the data. We consider both the ideal semi-supervised setting where infinitely many unlabeled samples are available, as well as the ordinary semi-supervised setting in which only a finite number of unlabeled samples is available. Estimators are proposed along with corresponding confidence intervals for the population mean. Theoretical analysis on both the asymptotic distribution and $ell_{2}$-risk for the proposed procedures are given. Surprisingly, the proposed estimators, based on a simple form of the least squares method, outperform the ordinary sample mean. The simple, transparent form of the estimator lends confidence to the perception that its asymptotic improvement over the ordinary sample mean also nearly holds even for moderate size samples. The method is further extended to a nonparametric setting, in which the oracle rate can be achieved asymptotically. The proposed estimators are further illustrated by simulation studies and a real data example involving estimation of the homeless population. Full Article
era Isotonic regression in general dimensions By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Qiyang Han, Tengyao Wang, Sabyasachi Chatterjee, Richard J. Samworth. Source: The Annals of Statistics, Volume 47, Number 5, 2440--2471.Abstract: We study the least squares regression function estimator over the class of real-valued functions on $[0,1]^{d}$ that are increasing in each coordinate. For uniformly bounded signals and with a fixed, cubic lattice design, we establish that the estimator achieves the minimax rate of order $n^{-min{2/(d+2),1/d}}$ in the empirical $L_{2}$ loss, up to polylogarithmic factors. Further, we prove a sharp oracle inequality, which reveals in particular that when the true regression function is piecewise constant on $k$ hyperrectangles, the least squares estimator enjoys a faster, adaptive rate of convergence of $(k/n)^{min(1,2/d)}$, again up to polylogarithmic factors. Previous results are confined to the case $dleq2$. Finally, we establish corresponding bounds (which are new even in the case $d=2$) in the more challenging random design setting. There are two surprising features of these results: first, they demonstrate that it is possible for a global empirical risk minimisation procedure to be rate optimal up to polylogarithmic factors even when the corresponding entropy integral for the function class diverges rapidly; second, they indicate that the adaptation rate for shape-constrained estimators can be strictly worse than the parametric rate. Full Article
era Generalized cluster trees and singular measures By projecteuclid.org Published On :: Tue, 21 May 2019 04:00 EDT Yen-Chi Chen. Source: The Annals of Statistics, Volume 47, Number 4, 2174--2203.Abstract: In this paper we study the $alpha $-cluster tree ($alpha $-tree) under both singular and nonsingular measures. The $alpha $-tree uses probability contents within a set created by the ordering of points to construct a cluster tree so that it is well defined even for singular measures. We first derive the convergence rate for a density level set around critical points, which leads to the convergence rate for estimating an $alpha $-tree under nonsingular measures. For singular measures, we study how the kernel density estimator (KDE) behaves and prove that the KDE is not uniformly consistent but pointwise consistent after rescaling. We further prove that the estimated $alpha $-tree fails to converge in the $L_{infty }$ metric but is still consistent under the integrated distance. We also observe a new type of critical points—the dimensional critical points (DCPs)—of a singular measure. DCPs are points that contribute to cluster tree topology but cannot be defined using density gradient. Building on the analysis of the KDE and DCPs, we prove the topological consistency of an estimated $alpha $-tree. Full Article
era interoperability By looselycoupled.com Published On :: 2003-08-07T17:00:00-00:00 Ability to work with each other. In the loosely coupled environment of a service-oriented architecture, separate resources don't need to know the details of how they each work, but they need to have enough common ground to reliably exchange messages without error or misunderstanding. Standardized specifications go a long way towards creating this common ground, but differences in implementation may still lead to breakdowns in communication. Interoperability is when services can interact with each other without encountering such problems. Full Article
era Correction: Sensitivity analysis for an unobserved moderator in RCT-to-target-population generalization of treatment effects By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Trang Quynh Nguyen, Elizabeth A. Stuart. Source: The Annals of Applied Statistics, Volume 14, Number 1, 518--520. Full Article
era A hierarchical dependent Dirichlet process prior for modelling bird migration patterns in the UK By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Alex Diana, Eleni Matechou, Jim Griffin, Alison Johnston. Source: The Annals of Applied Statistics, Volume 14, Number 1, 473--493.Abstract: Environmental changes in recent years have been linked to phenological shifts which in turn are linked to the survival of species. The work in this paper is motivated by capture-recapture data on blackcaps collected by the British Trust for Ornithology as part of the Constant Effort Sites monitoring scheme. Blackcaps overwinter abroad and migrate to the UK annually for breeding purposes. We propose a novel Bayesian nonparametric approach for expressing the bivariate density of individual arrival and departure times at different sites across a number of years as a mixture model. The new model combines the ideas of the hierarchical and the dependent Dirichlet process, allowing the estimation of site-specific weights and year-specific mixture locations, which are modelled as functions of environmental covariates using a multivariate extension of the Gaussian process. The proposed modelling framework is extremely general and can be used in any context where multivariate density estimation is performed jointly across different groups and in the presence of a continuous covariate. Full Article
era Feature selection for generalized varying coefficient mixed-effect models with application to obesity GWAS By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Wanghuan Chu, Runze Li, Jingyuan Liu, Matthew Reimherr. Source: The Annals of Applied Statistics, Volume 14, Number 1, 276--298.Abstract: Motivated by an empirical analysis of data from a genome-wide association study on obesity, measured by the body mass index (BMI), we propose a two-step gene-detection procedure for generalized varying coefficient mixed-effects models with ultrahigh dimensional covariates. The proposed procedure selects significant single nucleotide polymorphisms (SNPs) impacting the mean BMI trend, some of which have already been biologically proven to be “fat genes.” The method also discovers SNPs that significantly influence the age-dependent variability of BMI. The proposed procedure takes into account individual variations of genetic effects and can also be directly applied to longitudinal data with continuous, binary or count responses. We employ Monte Carlo simulation studies to assess the performance of the proposed method and further carry out causal inference for the selected SNPs. Full Article
era A hierarchical Bayesian model for predicting ecological interactions using scaled evolutionary relationships By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Mohamad Elmasri, Maxwell J. Farrell, T. Jonathan Davies, David A. Stephens. Source: The Annals of Applied Statistics, Volume 14, Number 1, 221--240.Abstract: Identifying undocumented or potential future interactions among species is a challenge facing modern ecologists. Recent link prediction methods rely on trait data; however, large species interaction databases are typically sparse and covariates are limited to only a fraction of species. On the other hand, evolutionary relationships, encoded as phylogenetic trees, can act as proxies for underlying traits and historical patterns of parasite sharing among hosts. We show that, using a network-based conditional model, phylogenetic information provides strong predictive power in a recently published global database of host-parasite interactions. By scaling the phylogeny using an evolutionary model, our method allows for biological interpretation often missing from latent variable models. To further improve on the phylogeny-only model, we combine a hierarchical Bayesian latent score framework for bipartite graphs that accounts for the number of interactions per species with host dependence informed by phylogeny. Combining the two information sources yields significant improvement in predictive accuracy over each of the submodels alone. As many interaction networks are constructed from presence-only data, we extend the model by integrating a correction mechanism for missing interactions which proves valuable in reducing uncertainty in unobserved interactions. Full Article
era Surface temperature monitoring in liver procurement via functional variance change-point analysis By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Zhenguo Gao, Pang Du, Ran Jin, John L. Robertson. Source: The Annals of Applied Statistics, Volume 14, Number 1, 143--159.Abstract: Liver procurement experiments with surface-temperature monitoring motivated Gao et al. ( J. Amer. Statist. Assoc. 114 (2019) 773–781) to develop a variance change-point detection method under a smoothly-changing mean trend. However, the spotwise change points yielded from their method do not offer immediate information to surgeons since an organ is often transplanted as a whole or in part. We develop a new practical method that can analyze a defined portion of the organ surface at a time. It also provides a novel addition to the developing field of functional data monitoring. Furthermore, numerical challenge emerges for simultaneously modeling the variance functions of 2D locations and the mean function of location and time. The respective sample sizes in the scales of 10,000 and 1,000,000 for modeling these functions make standard spline estimation too costly to be useful. We introduce a multistage subsampling strategy with steps educated by quickly-computable preliminary statistical measures. Extensive simulations show that the new method can efficiently reduce the computational cost and provide reasonable parameter estimates. Application of the new method to our liver surface temperature monitoring data shows its effectiveness in providing accurate status change information for a selected portion of the organ in the experiment. Full Article
era A general theory for preferential sampling in environmental networks By projecteuclid.org Published On :: Wed, 27 Nov 2019 22:01 EST Joe Watson, James V. Zidek, Gavin Shaddick. Source: The Annals of Applied Statistics, Volume 13, Number 4, 2662--2700.Abstract: This paper presents a general model framework for detecting the preferential sampling of environmental monitors recording an environmental process across space and/or time. This is achieved by considering the joint distribution of an environmental process with a site-selection process that considers where and when sites are placed to measure the process. The environmental process may be spatial, temporal or spatio-temporal in nature. By sharing random effects between the two processes, the joint model is able to establish whether site placement was stochastically dependent of the environmental process under study. Furthermore, if stochastic dependence is identified between the two processes, then inferences about the probability distribution of the spatio-temporal process will change, as will predictions made of the process across space and time. The embedding into a spatio-temporal framework also allows for the modelling of the dynamic site-selection process itself. Real-world factors affecting both the size and location of the network can be easily modelled and quantified. Depending upon the choice of the population of locations considered for selection across space and time under the site-selection process, different insights about the precise nature of preferential sampling can be obtained. The general framework developed in the paper is designed to be easily and quickly fit using the R-INLA package. We apply this framework to a case study involving particulate air pollution over the UK where a major reduction in the size of a monitoring network through time occurred. It is demonstrated that a significant response-biased reduction in the air quality monitoring network occurred, namely the relocation of monitoring sites to locations with the highest pollution levels, and the routine removal of sites at locations with the lowest. We also show that the network was consistently unrepresenting levels of particulate matter seen across much of GB throughout the operating life of the network. Finally we show that this may have led to a severe overreporting of the population-average exposure levels experienced across GB. This could have great impacts on estimates of the health effects of black smoke levels. Full Article
era Hierarchical infinite factor models for improving the prediction of surgical complications for geriatric patients By projecteuclid.org Published On :: Wed, 27 Nov 2019 22:01 EST Elizabeth Lorenzi, Ricardo Henao, Katherine Heller. Source: The Annals of Applied Statistics, Volume 13, Number 4, 2637--2661.Abstract: Nearly a third of all surgeries performed in the United States occur for patients over the age of 65; these older adults experience a higher rate of postoperative morbidity and mortality. To improve the care for these patients, we aim to identify and characterize high risk geriatric patients to send to a specialized perioperative clinic while leveraging the overall surgical population to improve learning. To this end, we develop a hierarchical infinite latent factor model (HIFM) to appropriately account for the covariance structure across subpopulations in data. We propose a novel Hierarchical Dirichlet Process shrinkage prior on the loadings matrix that flexibly captures the underlying structure of our data while sharing information across subpopulations to improve inference and prediction. The stick-breaking construction of the prior assumes an infinite number of factors and allows for each subpopulation to utilize different subsets of the factor space and select the number of factors needed to best explain the variation. We develop the model into a latent factor regression method that excels at prediction and inference of regression coefficients. Simulations validate this strong performance compared to baseline methods. We apply this work to the problem of predicting surgical complications using electronic health record data for geriatric patients and all surgical patients at Duke University Health System (DUHS). The motivating application demonstrates the improved predictive performance when using HIFM in both area under the ROC curve and area under the PR Curve while providing interpretable coefficients that may lead to actionable interventions. Full Article
era Bayesian indicator variable selection to incorporate hierarchical overlapping group structure in multi-omics applications By projecteuclid.org Published On :: Wed, 27 Nov 2019 22:01 EST Li Zhu, Zhiguang Huo, Tianzhou Ma, Steffi Oesterreich, George C. Tseng. Source: The Annals of Applied Statistics, Volume 13, Number 4, 2611--2636.Abstract: Variable selection is a pervasive problem in modern high-dimensional data analysis where the number of features often exceeds the sample size (a.k.a. small-n-large-p problem). Incorporation of group structure knowledge to improve variable selection has been widely studied. Here, we consider prior knowledge of a hierarchical overlapping group structure to improve variable selection in regression setting. In genomics applications, for instance, a biological pathway contains tens to hundreds of genes and a gene can be mapped to multiple experimentally measured features (such as its mRNA expression, copy number variation and methylation levels of possibly multiple sites). In addition to the hierarchical structure, the groups at the same level may overlap (e.g., two pathways can share common genes). Incorporating such hierarchical overlapping groups in traditional penalized regression setting remains a difficult optimization problem. Alternatively, we propose a Bayesian indicator model that can elegantly serve the purpose. We evaluate the model in simulations and two breast cancer examples, and demonstrate its superior performance over existing models. The result not only enhances prediction accuracy but also improves variable selection and model interpretation that lead to deeper biological insight of the disease. Full Article