ul Models of tree and stand dynamics : theory, formulation and application By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Mäkelä, Annikki, authorCallnumber: OnlineISBN: 9783030357610 Full Article
ul Microbial endophytes : prospects for sustainable agriculture By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 0128187255 Full Article
ul Manual of valvular heart disease By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9781496310125 paperback Full Article
ul Machine learning in aquaculture : hunger classification of Lates calcarifer By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Mohd Razman, Mohd Azraai, authorCallnumber: OnlineISBN: 9789811522376 (electronic bk.) Full Article
ul LGBTQ cultures : what health care professionals need to know about sexual and gender diversity By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Eliason, Michele J., author.Callnumber: OnlineISBN: 9781496394606 paperback Full Article
ul Intelligent wavelet based techniques for advanced multimedia applications By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Singh, Rajiv, authorCallnumber: OnlineISBN: 9783030318734 (electronic bk.) Full Article
ul Insect sex pheromone research and beyond : from molecules to robots By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9789811530821 (electronic bk.) Full Article
ul Insect metamorphosis : from natural history to regulation of development and evolution By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Bellés, X., authorCallnumber: OnlineISBN: 9780128130216 Full Article
ul Implants in the aesthetic zone : a guide for treatment of the partially edentulous patient By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783319726014 (electronic bk.) Full Article
ul Imaging of the temporomandibular joint By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783319994680 (electronic book) Full Article
ul Hepatitis B virus infection : molecular virology to antiviral drugs By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9789811391514 (electronic bk.) Full Article
ul Handbook of biochemistry and molecular biology By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9781315314433 (electronic bk.) Full Article
ul Genetic and metabolic engineering for improved biofuel production from lignocellulosic biomass By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780128179543 (electronic bk.) Full Article
ul Frailty and cardiovascular diseases : research into an elderly population By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783030333300 (electronic bk.) Full Article
ul Encyclopedia of signaling molecules By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9781461464389 (electronic bk.) Full Article
ul Encyclopedia of molecular pharmacology By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783030215736 (electronic bk.) Full Article
ul Drying atlas : drying kinetics and quality of agricultural products By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Mühlbauer, Werner, authorCallnumber: OnlineISBN: 9780128181638 (electronic bk.) Full Article
ul Cullin-RING ligases and protein neddylation : biology and therapeutics By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9789811510250 (electronic bk.) Full Article
ul Characterization of nanoencapsulated food ingredients By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780128156681 (electronic bk.) Full Article
ul Cellular internet of things : from massive deployments to critical 5G applications By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Liberg, Olof, 1943- author.Callnumber: OnlineISBN: 9780081029039 (electronic bk.) Full Article
ul Brassica improvement : molecular, genetics and genomic perspectives By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783030346942 (electronic bk.) Full Article
ul Botulinum toxins, fillers and related substances By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783319168029 (electronic bk.) Full Article
ul Berquist's musculoskeletal imaging companion By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Peterson, Jeffrey J., author.Callnumber: OnlineISBN: 9781496314994 Full Article
ul Atlas of ulcers in systemic sclerosis : diagnosis and management By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783319984773 (electronic bk.) Full Article
ul Animal agriculture : sustainability, challenges and innovations By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780128170526 Full Article
ul Domestic Gag Rule Reduces Contraceptive Access For Nearly 370,000... By www.prweb.com Published On :: According to data released by Power to Decide, an estimated 369,960 New Jersey women of reproductive age (13-44) in need of publicly funded contraception live in counties impacted by the...(PRWeb April 09, 2020)Read the full story at https://www.prweb.com/releases/domestic_gag_rule_reduces_contraceptive_access_for_nearly_370_000_women_living_in_new_jersey/prweb17040987.htm Full Article
ul Colorado Court Rules STRmix Is “Relevant and Reliable” Practice for... By www.prweb.com Published On :: Defendant’s Motion to Exclude Expert Testimony regarding evidence generated by STRmix denied.(PRWeb May 08, 2020)Read the full story at https://www.prweb.com/releases/colorado_court_rules_strmix_is_relevant_and_reliable_practice_for_interpreting_likelihood_ratios/prweb17101548.htm Full Article
ul Concentration and consistency results for canonical and curved exponential-family models of random graphs By projecteuclid.org Published On :: Mon, 17 Feb 2020 04:02 EST Michael Schweinberger, Jonathan Stewart. Source: The Annals of Statistics, Volume 48, Number 1, 374--396.Abstract: Statistical inference for exponential-family models of random graphs with dependent edges is challenging. We stress the importance of additional structure and show that additional structure facilitates statistical inference. A simple example of a random graph with additional structure is a random graph with neighborhoods and local dependence within neighborhoods. We develop the first concentration and consistency results for maximum likelihood and $M$-estimators of a wide range of canonical and curved exponential-family models of random graphs with local dependence. All results are nonasymptotic and applicable to random graphs with finite populations of nodes, although asymptotic consistency results can be obtained as well. In addition, we show that additional structure can facilitate subgraph-to-graph estimation, and present concentration results for subgraph-to-graph estimators. As an application, we consider popular curved exponential-family models of random graphs, with local dependence induced by transitivity and parameter vectors whose dimensions depend on the number of nodes. Full Article
ul The multi-armed bandit problem: An efficient nonparametric solution By projecteuclid.org Published On :: Mon, 17 Feb 2020 04:02 EST Hock Peng Chan. Source: The Annals of Statistics, Volume 48, Number 1, 346--373.Abstract: Lai and Robbins ( Adv. in Appl. Math. 6 (1985) 4–22) and Lai ( Ann. Statist. 15 (1987) 1091–1114) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback–Leibler information of the reward distributions, estimated from specified parametric families. In recent years, there has been renewed interest in the multi-armed bandit problem due to new applications in machine learning algorithms and data analytics. Nonparametric arm allocation procedures like $epsilon $-greedy, Boltzmann exploration and BESA were studied, and modified versions of the UCB procedure were also analyzed under nonparametric settings. However, unlike UCB these nonparametric procedures are not efficient under general parametric settings. In this paper, we propose efficient nonparametric procedures. Full Article
ul Spectral and matrix factorization methods for consistent community detection in multi-layer networks By projecteuclid.org Published On :: Mon, 17 Feb 2020 04:02 EST Subhadeep Paul, Yuguo Chen. Source: The Annals of Statistics, Volume 48, Number 1, 230--250.Abstract: We consider the problem of estimating a consensus community structure by combining information from multiple layers of a multi-layer network using methods based on the spectral clustering or a low-rank matrix factorization. As a general theme, these “intermediate fusion” methods involve obtaining a low column rank matrix by optimizing an objective function and then using the columns of the matrix for clustering. However, the theoretical properties of these methods remain largely unexplored. In the absence of statistical guarantees on the objective functions, it is difficult to determine if the algorithms optimizing the objectives will return good community structures. We investigate the consistency properties of the global optimizer of some of these objective functions under the multi-layer stochastic blockmodel. For this purpose, we derive several new asymptotic results showing consistency of the intermediate fusion techniques along with the spectral clustering of mean adjacency matrix under a high dimensional setup, where the number of nodes, the number of layers and the number of communities of the multi-layer graph grow. Our numerical study shows that the intermediate fusion techniques outperform late fusion methods, namely spectral clustering on aggregate spectral kernel and module allegiance matrix in sparse networks, while they outperform the spectral clustering of mean adjacency matrix in multi-layer networks that contain layers with both homophilic and heterophilic communities. Full Article
ul New $G$-formula for the sequential causal effect and blip effect of treatment in sequential causal inference By projecteuclid.org Published On :: Mon, 17 Feb 2020 04:02 EST Xiaoqin Wang, Li Yin. Source: The Annals of Statistics, Volume 48, Number 1, 138--160.Abstract: In sequential causal inference, two types of causal effects are of practical interest, namely, the causal effect of the treatment regime (called the sequential causal effect) and the blip effect of treatment on the potential outcome after the last treatment. The well-known $G$-formula expresses these causal effects in terms of the standard parameters. In this article, we obtain a new $G$-formula that expresses these causal effects in terms of the point observable effects of treatments similar to treatment in the framework of single-point causal inference. Based on the new $G$-formula, we estimate these causal effects by maximum likelihood via point observable effects with methods extended from single-point causal inference. We are able to increase precision of the estimation without introducing biases by an unsaturated model imposing constraints on the point observable effects. We are also able to reduce the number of point observable effects in the estimation by treatment assignment conditions. Full Article
ul On optimal designs for nonregular models By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Yi Lin, Ryan Martin, Min Yang. Source: The Annals of Statistics, Volume 47, Number 6, 3335--3359.Abstract: Classically, Fisher information is the relevant object in defining optimal experimental designs. However, for models that lack certain regularity, the Fisher information does not exist, and hence, there is no notion of design optimality available in the literature. This article seeks to fill the gap by proposing a so-called Hellinger information , which generalizes Fisher information in the sense that the two measures agree in regular problems, but the former also exists for certain types of nonregular problems. We derive a Hellinger information inequality, showing that Hellinger information defines a lower bound on the local minimax risk of estimators. This provides a connection between features of the underlying model—in particular, the design—and the performance of estimators, motivating the use of this new Hellinger information for nonregular optimal design problems. Hellinger optimal designs are derived for several nonregular regression problems, with numerical results empirically demonstrating the efficiency of these designs compared to alternatives. Full Article
ul Adaptive estimation of the rank of the coefficient matrix in high-dimensional multivariate response regression models By projecteuclid.org Published On :: Wed, 30 Oct 2019 22:03 EDT Xin Bing, Marten H. Wegkamp. Source: The Annals of Statistics, Volume 47, Number 6, 3157--3184.Abstract: We consider the multivariate response regression problem with a regression coefficient matrix of low, unknown rank. In this setting, we analyze a new criterion for selecting the optimal reduced rank. This criterion differs notably from the one proposed in Bunea, She and Wegkamp ( Ann. Statist. 39 (2011) 1282–1309) in that it does not require estimation of the unknown variance of the noise, nor does it depend on a delicate choice of a tuning parameter. We develop an iterative, fully data-driven procedure, that adapts to the optimal signal-to-noise ratio. This procedure finds the true rank in a few steps with overwhelming probability. At each step, our estimate increases, while at the same time it does not exceed the true rank. Our finite sample results hold for any sample size and any dimension, even when the number of responses and of covariates grow much faster than the number of observations. We perform an extensive simulation study that confirms our theoretical findings. The new method performs better and is more stable than the procedure of Bunea, She and Wegkamp ( Ann. Statist. 39 (2011) 1282–1309) in both low- and high-dimensional settings. Full Article
ul A unified treatment of multiple testing with prior knowledge using the p-filter By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Aaditya K. Ramdas, Rina F. Barber, Martin J. Wainwright, Michael I. Jordan. Source: The Annals of Statistics, Volume 47, Number 5, 2790--2821.Abstract: There is a significant literature on methods for incorporating knowledge into multiple testing procedures so as to improve their power and precision. Some common forms of prior knowledge include (a) beliefs about which hypotheses are null, modeled by nonuniform prior weights; (b) differing importances of hypotheses, modeled by differing penalties for false discoveries; (c) multiple arbitrary partitions of the hypotheses into (possibly overlapping) groups and (d) knowledge of independence, positive or arbitrary dependence between hypotheses or groups, suggesting the use of more aggressive or conservative procedures. We present a unified algorithmic framework called p-filter for global null testing and false discovery rate (FDR) control that allows the scientist to incorporate all four types of prior knowledge (a)–(d) simultaneously, recovering a variety of known algorithms as special cases. Full Article
ul Distance multivariance: New dependence measures for random vectors By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Björn Böttcher, Martin Keller-Ressel, René L. Schilling. Source: The Annals of Statistics, Volume 47, Number 5, 2757--2789.Abstract: We introduce two new measures for the dependence of $nge2$ random variables: distance multivariance and total distance multivariance . Both measures are based on the weighted $L^{2}$-distance of quantities related to the characteristic functions of the underlying random variables. These extend distance covariance (introduced by Székely, Rizzo and Bakirov) from pairs of random variables to $n$-tuplets of random variables. We show that total distance multivariance can be used to detect the independence of $n$ random variables and has a simple finite-sample representation in terms of distance matrices of the sample points, where distance is measured by a continuous negative definite function. Under some mild moment conditions, this leads to a test for independence of multiple random vectors which is consistent against all alternatives. Full Article
ul The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics By projecteuclid.org Published On :: Fri, 02 Aug 2019 22:04 EDT Joshua Cape, Minh Tang, Carey E. Priebe. Source: The Annals of Statistics, Volume 47, Number 5, 2405--2439.Abstract: The singular value matrix decomposition plays a ubiquitous role throughout statistics and related fields. Myriad applications including clustering, classification, and dimensionality reduction involve studying and exploiting the geometric structure of singular values and singular vectors. This paper provides a novel collection of technical and theoretical tools for studying the geometry of singular subspaces using the two-to-infinity norm. Motivated by preliminary deterministic Procrustes analysis, we consider a general matrix perturbation setting in which we derive a new Procrustean matrix decomposition. Together with flexible machinery developed for the two-to-infinity norm, this allows us to conduct a refined analysis of the induced perturbation geometry with respect to the underlying singular vectors even in the presence of singular value multiplicity. Our analysis yields singular vector entrywise perturbation bounds for a range of popular matrix noise models, each of which has a meaningful associated statistical inference task. In addition, we demonstrate how the two-to-infinity norm is the preferred norm in certain statistical settings. Specific applications discussed in this paper include covariance estimation, singular subspace recovery, and multiple graph inference. Both our Procrustean matrix decomposition and the technical machinery developed for the two-to-infinity norm may be of independent interest. Full Article
ul Spectral method and regularized MLE are both optimal for top-$K$ ranking By projecteuclid.org Published On :: Tue, 21 May 2019 04:00 EDT Yuxin Chen, Jianqing Fan, Cong Ma, Kaizheng Wang. Source: The Annals of Statistics, Volume 47, Number 4, 2204--2235.Abstract: This paper is concerned with the problem of top-$K$ ranking from pairwise comparisons. Given a collection of $n$ items and a few pairwise comparisons across them, one wishes to identify the set of $K$ items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model—the Bradley–Terry–Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Recent works have made significant progress toward characterizing the performance (e.g., the mean square error for estimating the scores) of several classical methods, including the spectral method and the maximum likelihood estimator (MLE). However, where they stand regarding top-$K$ ranking remains unsettled. We demonstrate that under a natural random sampling model, the spectral method alone, or the regularized MLE alone, is minimax optimal in terms of the sample complexity—the number of paired comparisons needed to ensure exact top-$K$ identification, for the fixed dynamic range regime. This is accomplished via optimal control of the entrywise error of the score estimates. We complement our theoretical studies by numerical experiments, confirming that both methods yield low entrywise errors for estimating the underlying scores. Our theory is established via a novel leave-one-out trick, which proves effective for analyzing both iterative and noniterative procedures. Along the way, we derive an elementary eigenvector perturbation bound for probability transition matrices, which parallels the Davis–Kahan $mathop{mathrm{sin}} olimits Theta $ theorem for symmetric matrices. This also allows us to close the gap between the $ell_{2}$ error upper bound for the spectral method and the minimax lower limit. Full Article
ul Generalized cluster trees and singular measures By projecteuclid.org Published On :: Tue, 21 May 2019 04:00 EDT Yen-Chi Chen. Source: The Annals of Statistics, Volume 47, Number 4, 2174--2203.Abstract: In this paper we study the $alpha $-cluster tree ($alpha $-tree) under both singular and nonsingular measures. The $alpha $-tree uses probability contents within a set created by the ordering of points to construct a cluster tree so that it is well defined even for singular measures. We first derive the convergence rate for a density level set around critical points, which leads to the convergence rate for estimating an $alpha $-tree under nonsingular measures. For singular measures, we study how the kernel density estimator (KDE) behaves and prove that the KDE is not uniformly consistent but pointwise consistent after rescaling. We further prove that the estimated $alpha $-tree fails to converge in the $L_{infty }$ metric but is still consistent under the integrated distance. We also observe a new type of critical points—the dimensional critical points (DCPs)—of a singular measure. DCPs are points that contribute to cluster tree topology but cannot be defined using density gradient. Building on the analysis of the KDE and DCPs, we prove the topological consistency of an estimated $alpha $-tree. Full Article
ul Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem By projecteuclid.org Published On :: Thu, 05 Aug 2010 15:41 EDT James G. Scott, James O. BergerSource: Ann. Statist., Volume 38, Number 5, 2587--2619.Abstract: This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains. Full Article
ul granularity By looselycoupled.com Published On :: 2004-09-28T15:00:00-00:00 How small the pieces are. When a system is split into components, it's important to get the right degree of componentization. Small, fine-grained components give much greater flexibility in assembling precisely the right combination of functionality, but they are more difficult to co-ordinate. Much larger, coarse-grained components are easier to manage but may become too unwieldy. Performance and management considerations tend to favor the use of more coarsely grained messages in a service oriented architecture, whereas earlier generations of distributed computing have preferred a much finer level of granularity. Full Article
ul Correction: Sensitivity analysis for an unobserved moderator in RCT-to-target-population generalization of treatment effects By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Trang Quynh Nguyen, Elizabeth A. Stuart. Source: The Annals of Applied Statistics, Volume 14, Number 1, 518--520. Full Article
ul A comparison of principal component methods between multiple phenotype regression and multiple SNP regression in genetic association studies By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Zhonghua Liu, Ian Barnett, Xihong Lin. Source: The Annals of Applied Statistics, Volume 14, Number 1, 433--451.Abstract: Principal component analysis (PCA) is a popular method for dimension reduction in unsupervised multivariate analysis. However, existing ad hoc uses of PCA in both multivariate regression (multiple outcomes) and multiple regression (multiple predictors) lack theoretical justification. The differences in the statistical properties of PCAs in these two regression settings are not well understood. In this paper we provide theoretical results on the power of PCA in genetic association testings in both multiple phenotype and SNP-set settings. The multiple phenotype setting refers to the case when one is interested in studying the association between a single SNP and multiple phenotypes as outcomes. The SNP-set setting refers to the case when one is interested in studying the association between multiple SNPs in a SNP set and a single phenotype as the outcome. We demonstrate analytically that the properties of the PC-based analysis in these two regression settings are substantially different. We show that the lower order PCs, that is, PCs with large eigenvalues, are generally preferred and lead to a higher power in the SNP-set setting, while the higher-order PCs, that is, PCs with small eigenvalues, are generally preferred in the multiple phenotype setting. We also investigate the power of three other popular statistical methods, the Wald test, the variance component test and the minimum $p$-value test, in both multiple phenotype and SNP-set settings. We use theoretical power, simulation studies, and two real data analyses to validate our findings. Full Article
ul Regression for copula-linked compound distributions with applications in modeling aggregate insurance claims By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Peng Shi, Zifeng Zhao. Source: The Annals of Applied Statistics, Volume 14, Number 1, 357--380.Abstract: In actuarial research a task of particular interest and importance is to predict the loss cost for individual risks so that informative decisions are made in various insurance operations such as underwriting, ratemaking and capital management. The loss cost is typically viewed to follow a compound distribution where the summation of the severity variables is stopped by the frequency variable. A challenging issue in modeling such outcomes is to accommodate the potential dependence between the number of claims and the size of each individual claim. In this article we introduce a novel regression framework for compound distributions that uses a copula to accommodate the association between the frequency and the severity variables and, thus, allows for arbitrary dependence between the two components. We further show that the new model is very flexible and is easily modified to account for incomplete data due to censoring or truncation. The flexibility of the proposed model is illustrated using both simulated and real data sets. In the analysis of granular claims data from property insurance, we find substantive negative relationship between the number and the size of insurance claims. In addition, we demonstrate that ignoring the frequency-severity association could lead to biased decision-making in insurance operations. Full Article
ul Optimal asset allocation with multivariate Bayesian dynamic linear models By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Jared D. Fisher, Davide Pettenuzzo, Carlos M. Carvalho. Source: The Annals of Applied Statistics, Volume 14, Number 1, 299--338.Abstract: We introduce a fast, closed-form, simulation-free method to model and forecast multiple asset returns and employ it to investigate the optimal ensemble of features to include when jointly predicting monthly stock and bond excess returns. Our approach builds on the Bayesian dynamic linear models of West and Harrison ( Bayesian Forecasting and Dynamic Models (1997) Springer), and it can objectively determine, through a fully automated procedure, both the optimal set of regressors to include in the predictive system and the degree to which the model coefficients, volatilities and covariances should vary over time. When applied to a portfolio of five stock and bond returns, we find that our method leads to large forecast gains, both in statistical and economic terms. In particular, we find that relative to a standard no-predictability benchmark, the optimal combination of predictors, stochastic volatility and time-varying covariances increases the annualized certainty equivalent returns of a leverage-constrained power utility investor by more than 500 basis points. Full Article
ul Modifying the Chi-square and the CMH test for population genetic inference: Adapting to overdispersion By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Kerstin Spitzer, Marta Pelizzola, Andreas Futschik. Source: The Annals of Applied Statistics, Volume 14, Number 1, 202--220.Abstract: Evolve and resequence studies provide a popular approach to simulate evolution in the lab and explore its genetic basis. In this context, Pearson’s chi-square test, Fisher’s exact test as well as the Cochran–Mantel–Haenszel test are commonly used to infer genomic positions affected by selection from temporal changes in allele frequency. However, the null model associated with these tests does not match the null hypothesis of actual interest. Indeed, due to genetic drift and possibly other additional noise components such as pool sequencing, the null variance in the data can be substantially larger than accounted for by these common test statistics. This leads to $p$-values that are systematically too small and, therefore, a huge number of false positive results. Even, if the ranking rather than the actual $p$-values is of interest, a naive application of the mentioned tests will give misleading results, as the amount of overdispersion varies from locus to locus. We therefore propose adjusted statistics that take the overdispersion into account while keeping the formulas simple. This is particularly useful in genome-wide applications, where millions of SNPs can be handled with little computational effort. We then apply the adapted test statistics to real data from Drosophila and investigate how information from intermediate generations can be included when available. We also discuss further applications such as genome-wide association studies based on pool sequencing data and tests for local adaptation. Full Article
ul TFisher: A powerful truncation and weighting procedure for combining $p$-values By projecteuclid.org Published On :: Wed, 15 Apr 2020 22:05 EDT Hong Zhang, Tiejun Tong, John Landers, Zheyang Wu. Source: The Annals of Applied Statistics, Volume 14, Number 1, 178--201.Abstract: The $p$-value combination approach is an important statistical strategy for testing global hypotheses with broad applications in signal detection, meta-analysis, data integration, etc. In this paper we extend the classic Fisher’s combination method to a unified family of statistics, called TFisher, which allows a general truncation-and-weighting scheme of input $p$-values. TFisher can significantly improve statistical power over the Fisher and related truncation-only methods for detecting both rare and dense “signals.” To address wide applications, analytical calculations for TFisher’s size and power are deduced under any two continuous distributions in the null and the alternative hypotheses. The corresponding omnibus test (oTFisher) and its size calculation are also provided for data-adaptive analysis. We study the asymptotic optimal parameters of truncation and weighting based on Bahadur efficiency (BE). A new asymptotic measure, called the asymptotic power efficiency (APE), is also proposed for better reflecting the statistics’ performance in real data analysis. Interestingly, under the Gaussian mixture model in the signal detection problem, both BE and APE indicate that the soft-thresholding scheme is the best, the truncation and weighting parameters should be equal. By simulations of various signal patterns, we systematically compare the power of statistics within TFisher family as well as some rare-signal-optimal tests. We illustrate the use of TFisher in an exome-sequencing analysis for detecting novel genes of amyotrophic lateral sclerosis. Relevant computation has been implemented into an R package TFisher published on the Comprehensive R Archive Network to cater for applications. Full Article
ul Bayesian indicator variable selection to incorporate hierarchical overlapping group structure in multi-omics applications By projecteuclid.org Published On :: Wed, 27 Nov 2019 22:01 EST Li Zhu, Zhiguang Huo, Tianzhou Ma, Steffi Oesterreich, George C. Tseng. Source: The Annals of Applied Statistics, Volume 13, Number 4, 2611--2636.Abstract: Variable selection is a pervasive problem in modern high-dimensional data analysis where the number of features often exceeds the sample size (a.k.a. small-n-large-p problem). Incorporation of group structure knowledge to improve variable selection has been widely studied. Here, we consider prior knowledge of a hierarchical overlapping group structure to improve variable selection in regression setting. In genomics applications, for instance, a biological pathway contains tens to hundreds of genes and a gene can be mapped to multiple experimentally measured features (such as its mRNA expression, copy number variation and methylation levels of possibly multiple sites). In addition to the hierarchical structure, the groups at the same level may overlap (e.g., two pathways can share common genes). Incorporating such hierarchical overlapping groups in traditional penalized regression setting remains a difficult optimization problem. Alternatively, we propose a Bayesian indicator model that can elegantly serve the purpose. We evaluate the model in simulations and two breast cancer examples, and demonstrate its superior performance over existing models. The result not only enhances prediction accuracy but also improves variable selection and model interpretation that lead to deeper biological insight of the disease. Full Article
ul New formulation of the logistic-Gaussian process to analyze trajectory tracking data By projecteuclid.org Published On :: Wed, 27 Nov 2019 22:01 EST Gianluca Mastrantonio, Clara Grazian, Sara Mancinelli, Enrico Bibbona. Source: The Annals of Applied Statistics, Volume 13, Number 4, 2483--2508.Abstract: Improved communication systems, shrinking battery sizes and the price drop of tracking devices have led to an increasing availability of trajectory tracking data. These data are often analyzed to understand animal behavior. In this work, we propose a new model for interpreting the animal movent as a mixture of characteristic patterns, that we interpret as different behaviors. The probability that the animal is behaving according to a specific pattern, at each time instant, is nonparametrically estimated using the Logistic-Gaussian process. Owing to a new formalization and the way we specify the coregionalization matrix of the associated multivariate Gaussian process, our model is invariant with respect to the choice of the reference element and of the ordering of the probability vector components. We fit the model under a Bayesian framework, and show that the Markov chain Monte Carlo algorithm we propose is straightforward to implement. We perform a simulation study with the aim of showing the ability of the estimation procedure to retrieve the model parameters. We also test the performance of the information criterion we used to select the number of behaviors. The model is then applied to a real dataset where a wolf has been observed before and after procreation. The results are easy to interpret, and clear differences emerge in the two phases. Full Article
ul Propensity score weighting for causal inference with multiple treatments By projecteuclid.org Published On :: Wed, 27 Nov 2019 22:01 EST Fan Li, Fan Li. Source: The Annals of Applied Statistics, Volume 13, Number 4, 2389--2415.Abstract: Causal or unconfounded descriptive comparisons between multiple groups are common in observational studies. Motivated from a racial disparity study in health services research, we propose a unified propensity score weighting framework, the balancing weights, for estimating causal effects with multiple treatments. These weights incorporate the generalized propensity scores to balance the weighted covariate distribution of each treatment group, all weighted toward a common prespecified target population. The class of balancing weights include several existing approaches such as the inverse probability weights and trimming weights as special cases. Within this framework, we propose a set of target estimands based on linear contrasts. We further develop the generalized overlap weights, constructed as the product of the inverse probability weights and the harmonic mean of the generalized propensity scores. The generalized overlap weighting scheme corresponds to the target population with the most overlap in covariates across the multiple treatments. These weights are bounded and thus bypass the problem of extreme propensities. We show that the generalized overlap weights minimize the total asymptotic variance of the moment weighting estimators for the pairwise contrasts within the class of balancing weights. We consider two balance check criteria and propose a new sandwich variance estimator for estimating the causal effects with generalized overlap weights. We apply these methods to study the racial disparities in medical expenditure between several racial groups using the 2009 Medical Expenditure Panel Survey (MEPS) data. Simulations were carried out to compare with existing methods. Full Article
ul Predicting paleoclimate from compositional data using multivariate Gaussian process inverse prediction By projecteuclid.org Published On :: Wed, 27 Nov 2019 22:01 EST John R. Tipton, Mevin B. Hooten, Connor Nolan, Robert K. Booth, Jason McLachlan. Source: The Annals of Applied Statistics, Volume 13, Number 4, 2363--2388.Abstract: Multivariate compositional count data arise in many applications including ecology, microbiology, genetics and paleoclimate. A frequent question in the analysis of multivariate compositional count data is what underlying values of a covariate(s) give rise to the observed composition. Learning the relationship between covariates and the compositional count allows for inverse prediction of unobserved covariates given compositional count observations. Gaussian processes provide a flexible framework for modeling functional responses with respect to a covariate without assuming a functional form. Many scientific disciplines use Gaussian process approximations to improve prediction and make inference on latent processes and parameters. When prediction is desired on unobserved covariates given realizations of the response variable, this is called inverse prediction. Because inverse prediction is often mathematically and computationally challenging, predicting unobserved covariates often requires fitting models that are different from the hypothesized generative model. We present a novel computational framework that allows for efficient inverse prediction using a Gaussian process approximation to generative models. Our framework enables scientific learning about how the latent processes co-vary with respect to covariates while simultaneously providing predictions of missing covariates. The proposed framework is capable of efficiently exploring the high dimensional, multi-modal latent spaces that arise in the inverse problem. To demonstrate flexibility, we apply our method in a generalized linear model framework to predict latent climate states given multivariate count data. Based on cross-validation, our model has predictive skill competitive with current methods while simultaneously providing formal, statistical inference on the underlying community dynamics of the biological system previously not available. Full Article