lt

Health consequences of microbial interactions with hydrocarbons, oils, and lipids

9783319724737 (electronic bk.)




lt

Handbook of Global Health

9783030053253 978-3-030-05325-3




lt

Gapenski's understanding healthcare financial management

Pink, George H., author.
9781640551145 (electronic bk.)




lt

Frailty and cardiovascular diseases : research into an elderly population

9783030333300 (electronic bk.)




lt

Feed additives : aromatic plants and herbs in animal nutrition and health

9780128147016 (electronic bk.)




lt

Drying atlas : drying kinetics and quality of agricultural products

Mühlbauer, Werner, author
9780128181638 (electronic bk.)




lt

Dietary sugar, salt and fat in human health

9780128169193 (electronic bk.)




lt

Apical periodontitis in root-filled teeth : endodontic retreatment and alternative approaches

9783319572505 (electronic bk.)




lt

Animal agriculture : sustainability, challenges and innovations

9780128170526




lt

Agri-food industry strategies for healthy diets and sustainability : new challenges in nutrition and public health

9780128172261




lt

African edible insects as alternative source of food, oil, protein and bioactive components

9783030329525 (electronic bk.)





lt

Concentration and consistency results for canonical and curved exponential-family models of random graphs

Michael Schweinberger, Jonathan Stewart.

Source: The Annals of Statistics, Volume 48, Number 1, 374--396.

Abstract:
Statistical inference for exponential-family models of random graphs with dependent edges is challenging. We stress the importance of additional structure and show that additional structure facilitates statistical inference. A simple example of a random graph with additional structure is a random graph with neighborhoods and local dependence within neighborhoods. We develop the first concentration and consistency results for maximum likelihood and $M$-estimators of a wide range of canonical and curved exponential-family models of random graphs with local dependence. All results are nonasymptotic and applicable to random graphs with finite populations of nodes, although asymptotic consistency results can be obtained as well. In addition, we show that additional structure can facilitate subgraph-to-graph estimation, and present concentration results for subgraph-to-graph estimators. As an application, we consider popular curved exponential-family models of random graphs, with local dependence induced by transitivity and parameter vectors whose dimensions depend on the number of nodes.




lt

The multi-armed bandit problem: An efficient nonparametric solution

Hock Peng Chan.

Source: The Annals of Statistics, Volume 48, Number 1, 346--373.

Abstract:
Lai and Robbins ( Adv. in Appl. Math. 6 (1985) 4–22) and Lai ( Ann. Statist. 15 (1987) 1091–1114) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback–Leibler information of the reward distributions, estimated from specified parametric families. In recent years, there has been renewed interest in the multi-armed bandit problem due to new applications in machine learning algorithms and data analytics. Nonparametric arm allocation procedures like $epsilon $-greedy, Boltzmann exploration and BESA were studied, and modified versions of the UCB procedure were also analyzed under nonparametric settings. However, unlike UCB these nonparametric procedures are not efficient under general parametric settings. In this paper, we propose efficient nonparametric procedures.




lt

Spectral and matrix factorization methods for consistent community detection in multi-layer networks

Subhadeep Paul, Yuguo Chen.

Source: The Annals of Statistics, Volume 48, Number 1, 230--250.

Abstract:
We consider the problem of estimating a consensus community structure by combining information from multiple layers of a multi-layer network using methods based on the spectral clustering or a low-rank matrix factorization. As a general theme, these “intermediate fusion” methods involve obtaining a low column rank matrix by optimizing an objective function and then using the columns of the matrix for clustering. However, the theoretical properties of these methods remain largely unexplored. In the absence of statistical guarantees on the objective functions, it is difficult to determine if the algorithms optimizing the objectives will return good community structures. We investigate the consistency properties of the global optimizer of some of these objective functions under the multi-layer stochastic blockmodel. For this purpose, we derive several new asymptotic results showing consistency of the intermediate fusion techniques along with the spectral clustering of mean adjacency matrix under a high dimensional setup, where the number of nodes, the number of layers and the number of communities of the multi-layer graph grow. Our numerical study shows that the intermediate fusion techniques outperform late fusion methods, namely spectral clustering on aggregate spectral kernel and module allegiance matrix in sparse networks, while they outperform the spectral clustering of mean adjacency matrix in multi-layer networks that contain layers with both homophilic and heterophilic communities.




lt

Adaptive risk bounds in univariate total variation denoising and trend filtering

Adityanand Guntuboyina, Donovan Lieu, Sabyasachi Chatterjee, Bodhisattva Sen.

Source: The Annals of Statistics, Volume 48, Number 1, 205--229.

Abstract:
We study trend filtering, a relatively recent method for univariate nonparametric regression. For a given integer $rgeq1$, the $r$th order trend filtering estimator is defined as the minimizer of the sum of squared errors when we constrain (or penalize) the sum of the absolute $r$th order discrete derivatives of the fitted function at the design points. For $r=1$, the estimator reduces to total variation regularization which has received much attention in the statistics and image processing literature. In this paper, we study the performance of the trend filtering estimator for every $rgeq1$, both in the constrained and penalized forms. Our main results show that in the strong sparsity setting when the underlying function is a (discrete) spline with few “knots,” the risk (under the global squared error loss) of the trend filtering estimator (with an appropriate choice of the tuning parameter) achieves the parametric $n^{-1}$-rate, up to a logarithmic (multiplicative) factor. Our results therefore provide support for the use of trend filtering, for every $rgeq1$, in the strong sparsity setting.




lt

Adaptive estimation of the rank of the coefficient matrix in high-dimensional multivariate response regression models

Xin Bing, Marten H. Wegkamp.

Source: The Annals of Statistics, Volume 47, Number 6, 3157--3184.

Abstract:
We consider the multivariate response regression problem with a regression coefficient matrix of low, unknown rank. In this setting, we analyze a new criterion for selecting the optimal reduced rank. This criterion differs notably from the one proposed in Bunea, She and Wegkamp ( Ann. Statist. 39 (2011) 1282–1309) in that it does not require estimation of the unknown variance of the noise, nor does it depend on a delicate choice of a tuning parameter. We develop an iterative, fully data-driven procedure, that adapts to the optimal signal-to-noise ratio. This procedure finds the true rank in a few steps with overwhelming probability. At each step, our estimate increases, while at the same time it does not exceed the true rank. Our finite sample results hold for any sample size and any dimension, even when the number of responses and of covariates grow much faster than the number of observations. We perform an extensive simulation study that confirms our theoretical findings. The new method performs better and is more stable than the procedure of Bunea, She and Wegkamp ( Ann. Statist. 39 (2011) 1282–1309) in both low- and high-dimensional settings.




lt

Additive models with trend filtering

Veeranjaneyulu Sadhanala, Ryan J. Tibshirani.

Source: The Annals of Statistics, Volume 47, Number 6, 3032--3068.

Abstract:
We study additive models built with trend filtering, that is, additive models whose components are each regularized by the (discrete) total variation of their $k$th (discrete) derivative, for a chosen integer $kgeq0$. This results in $k$th degree piecewise polynomial components, (e.g., $k=0$ gives piecewise constant components, $k=1$ gives piecewise linear, $k=2$ gives piecewise quadratic, etc.). Analogous to its advantages in the univariate case, additive trend filtering has favorable theoretical and computational properties, thanks in large part to the localized nature of the (discrete) total variation regularizer that it uses. On the theory side, we derive fast error rates for additive trend filtering estimates, and show these rates are minimax optimal when the underlying function is additive and has component functions whose derivatives are of bounded variation. We also show that these rates are unattainable by additive smoothing splines (and by additive models built from linear smoothers, in general). On the computational side, we use backfitting, to leverage fast univariate trend filtering solvers; we also describe a new backfitting algorithm whose iterations can be run in parallel, which (as far as we can tell) is the first of its kind. Lastly, we present a number of experiments to examine the empirical performance of trend filtering.




lt

A unified treatment of multiple testing with prior knowledge using the p-filter

Aaditya K. Ramdas, Rina F. Barber, Martin J. Wainwright, Michael I. Jordan.

Source: The Annals of Statistics, Volume 47, Number 5, 2790--2821.

Abstract:
There is a significant literature on methods for incorporating knowledge into multiple testing procedures so as to improve their power and precision. Some common forms of prior knowledge include (a) beliefs about which hypotheses are null, modeled by nonuniform prior weights; (b) differing importances of hypotheses, modeled by differing penalties for false discoveries; (c) multiple arbitrary partitions of the hypotheses into (possibly overlapping) groups and (d) knowledge of independence, positive or arbitrary dependence between hypotheses or groups, suggesting the use of more aggressive or conservative procedures. We present a unified algorithmic framework called p-filter for global null testing and false discovery rate (FDR) control that allows the scientist to incorporate all four types of prior knowledge (a)–(d) simultaneously, recovering a variety of known algorithms as special cases.




lt

Distance multivariance: New dependence measures for random vectors

Björn Böttcher, Martin Keller-Ressel, René L. Schilling.

Source: The Annals of Statistics, Volume 47, Number 5, 2757--2789.

Abstract:
We introduce two new measures for the dependence of $nge2$ random variables: distance multivariance and total distance multivariance . Both measures are based on the weighted $L^{2}$-distance of quantities related to the characteristic functions of the underlying random variables. These extend distance covariance (introduced by Székely, Rizzo and Bakirov) from pairs of random variables to $n$-tuplets of random variables. We show that total distance multivariance can be used to detect the independence of $n$ random variables and has a simple finite-sample representation in terms of distance matrices of the sample points, where distance is measured by a continuous negative definite function. Under some mild moment conditions, this leads to a test for independence of multiple random vectors which is consistent against all alternatives.




lt

A knockoff filter for high-dimensional selective inference

Rina Foygel Barber, Emmanuel J. Candès.

Source: The Annals of Statistics, Volume 47, Number 5, 2504--2537.

Abstract:
This paper develops a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the number of observational units. In this framework, the observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables, whereas the second is used for inference over this reduced set of variables; we also develop strategies for leveraging information from the first part of the data at the inference step for greater power. In our work, the inferential step is carried out by applying the recently introduced knockoff filter, which creates a knockoff copy—a fake variable serving as a control—for each screened variable. We prove that this procedure controls the directional false discovery rate (FDR) in the reduced model controlling for all screened variables; this says that our high-dimensional knockoff procedure “discovers” important variables as well as the directions (signs) of their effects, in such a way that the expected proportion of wrongly chosen signs is below the user-specified level (thereby controlling a notion of Type S error averaged over the selected set). This result is nonasymptotic, and holds for any distribution of the original features and any values of the unknown regression coefficients, so that inference is not calibrated under hypothesized values of the effect sizes. We demonstrate the performance of our general and flexible approach through numerical studies, showing more power than existing alternatives. Finally, we apply our method to a genome-wide association study to find locations on the genome that are possibly associated with a continuous phenotype.




lt

Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem

James G. Scott, James O. Berger

Source: Ann. Statist., Volume 38, Number 5, 2587--2619.

Abstract:
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.




lt

XSLT

(eXtensible Stylesheet Language Transformations) Reformats XML data. Endorsed as a standard by the W3C, XSLT is a language for transforming data extracted from an XML document into a new document. The destination format can be in simple text, HTML or XML. The most common use of XSLT is in conjunction with XML Path Language (XPath) in e-business integration, when its role is to transform XML data between the different XML structures used by each participant. XSLT is also widely used in website content management, to convert raw XML into HTML pages.




lt

A comparison of principal component methods between multiple phenotype regression and multiple SNP regression in genetic association studies

Zhonghua Liu, Ian Barnett, Xihong Lin.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 433--451.

Abstract:
Principal component analysis (PCA) is a popular method for dimension reduction in unsupervised multivariate analysis. However, existing ad hoc uses of PCA in both multivariate regression (multiple outcomes) and multiple regression (multiple predictors) lack theoretical justification. The differences in the statistical properties of PCAs in these two regression settings are not well understood. In this paper we provide theoretical results on the power of PCA in genetic association testings in both multiple phenotype and SNP-set settings. The multiple phenotype setting refers to the case when one is interested in studying the association between a single SNP and multiple phenotypes as outcomes. The SNP-set setting refers to the case when one is interested in studying the association between multiple SNPs in a SNP set and a single phenotype as the outcome. We demonstrate analytically that the properties of the PC-based analysis in these two regression settings are substantially different. We show that the lower order PCs, that is, PCs with large eigenvalues, are generally preferred and lead to a higher power in the SNP-set setting, while the higher-order PCs, that is, PCs with small eigenvalues, are generally preferred in the multiple phenotype setting. We also investigate the power of three other popular statistical methods, the Wald test, the variance component test and the minimum $p$-value test, in both multiple phenotype and SNP-set settings. We use theoretical power, simulation studies, and two real data analyses to validate our findings.




lt

Optimal asset allocation with multivariate Bayesian dynamic linear models

Jared D. Fisher, Davide Pettenuzzo, Carlos M. Carvalho.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 299--338.

Abstract:
We introduce a fast, closed-form, simulation-free method to model and forecast multiple asset returns and employ it to investigate the optimal ensemble of features to include when jointly predicting monthly stock and bond excess returns. Our approach builds on the Bayesian dynamic linear models of West and Harrison ( Bayesian Forecasting and Dynamic Models (1997) Springer), and it can objectively determine, through a fully automated procedure, both the optimal set of regressors to include in the predictive system and the degree to which the model coefficients, volatilities and covariances should vary over time. When applied to a portfolio of five stock and bond returns, we find that our method leads to large forecast gains, both in statistical and economic terms. In particular, we find that relative to a standard no-predictability benchmark, the optimal combination of predictors, stochastic volatility and time-varying covariances increases the annualized certainty equivalent returns of a leverage-constrained power utility investor by more than 500 basis points.




lt

Estimating the health effects of environmental mixtures using Bayesian semiparametric regression and sparsity inducing priors

Joseph Antonelli, Maitreyi Mazumdar, David Bellinger, David Christiani, Robert Wright, Brent Coull.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 257--275.

Abstract:
Humans are routinely exposed to mixtures of chemical and other environmental factors, making the quantification of health effects associated with environmental mixtures a critical goal for establishing environmental policy sufficiently protective of human health. The quantification of the effects of exposure to an environmental mixture poses several statistical challenges. It is often the case that exposure to multiple pollutants interact with each other to affect an outcome. Further, the exposure-response relationship between an outcome and some exposures, such as some metals, can exhibit complex, nonlinear forms, since some exposures can be beneficial and detrimental at different ranges of exposure. To estimate the health effects of complex mixtures, we propose a flexible Bayesian approach that allows exposures to interact with each other and have nonlinear relationships with the outcome. We induce sparsity using multivariate spike and slab priors to determine which exposures are associated with the outcome and which exposures interact with each other. The proposed approach is interpretable, as we can use the posterior probabilities of inclusion into the model to identify pollutants that interact with each other. We utilize our approach to study the impact of exposure to metals on child neurodevelopment in Bangladesh and find a nonlinear, interactive relationship between arsenic and manganese.




lt

Bayesian indicator variable selection to incorporate hierarchical overlapping group structure in multi-omics applications

Li Zhu, Zhiguang Huo, Tianzhou Ma, Steffi Oesterreich, George C. Tseng.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2611--2636.

Abstract:
Variable selection is a pervasive problem in modern high-dimensional data analysis where the number of features often exceeds the sample size (a.k.a. small-n-large-p problem). Incorporation of group structure knowledge to improve variable selection has been widely studied. Here, we consider prior knowledge of a hierarchical overlapping group structure to improve variable selection in regression setting. In genomics applications, for instance, a biological pathway contains tens to hundreds of genes and a gene can be mapped to multiple experimentally measured features (such as its mRNA expression, copy number variation and methylation levels of possibly multiple sites). In addition to the hierarchical structure, the groups at the same level may overlap (e.g., two pathways can share common genes). Incorporating such hierarchical overlapping groups in traditional penalized regression setting remains a difficult optimization problem. Alternatively, we propose a Bayesian indicator model that can elegantly serve the purpose. We evaluate the model in simulations and two breast cancer examples, and demonstrate its superior performance over existing models. The result not only enhances prediction accuracy but also improves variable selection and model interpretation that lead to deeper biological insight of the disease.




lt

Propensity score weighting for causal inference with multiple treatments

Fan Li, Fan Li.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2389--2415.

Abstract:
Causal or unconfounded descriptive comparisons between multiple groups are common in observational studies. Motivated from a racial disparity study in health services research, we propose a unified propensity score weighting framework, the balancing weights, for estimating causal effects with multiple treatments. These weights incorporate the generalized propensity scores to balance the weighted covariate distribution of each treatment group, all weighted toward a common prespecified target population. The class of balancing weights include several existing approaches such as the inverse probability weights and trimming weights as special cases. Within this framework, we propose a set of target estimands based on linear contrasts. We further develop the generalized overlap weights, constructed as the product of the inverse probability weights and the harmonic mean of the generalized propensity scores. The generalized overlap weighting scheme corresponds to the target population with the most overlap in covariates across the multiple treatments. These weights are bounded and thus bypass the problem of extreme propensities. We show that the generalized overlap weights minimize the total asymptotic variance of the moment weighting estimators for the pairwise contrasts within the class of balancing weights. We consider two balance check criteria and propose a new sandwich variance estimator for estimating the causal effects with generalized overlap weights. We apply these methods to study the racial disparities in medical expenditure between several racial groups using the 2009 Medical Expenditure Panel Survey (MEPS) data. Simulations were carried out to compare with existing methods.




lt

Predicting paleoclimate from compositional data using multivariate Gaussian process inverse prediction

John R. Tipton, Mevin B. Hooten, Connor Nolan, Robert K. Booth, Jason McLachlan.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2363--2388.

Abstract:
Multivariate compositional count data arise in many applications including ecology, microbiology, genetics and paleoclimate. A frequent question in the analysis of multivariate compositional count data is what underlying values of a covariate(s) give rise to the observed composition. Learning the relationship between covariates and the compositional count allows for inverse prediction of unobserved covariates given compositional count observations. Gaussian processes provide a flexible framework for modeling functional responses with respect to a covariate without assuming a functional form. Many scientific disciplines use Gaussian process approximations to improve prediction and make inference on latent processes and parameters. When prediction is desired on unobserved covariates given realizations of the response variable, this is called inverse prediction. Because inverse prediction is often mathematically and computationally challenging, predicting unobserved covariates often requires fitting models that are different from the hypothesized generative model. We present a novel computational framework that allows for efficient inverse prediction using a Gaussian process approximation to generative models. Our framework enables scientific learning about how the latent processes co-vary with respect to covariates while simultaneously providing predictions of missing covariates. The proposed framework is capable of efficiently exploring the high dimensional, multi-modal latent spaces that arise in the inverse problem. To demonstrate flexibility, we apply our method in a generalized linear model framework to predict latent climate states given multivariate count data. Based on cross-validation, our model has predictive skill competitive with current methods while simultaneously providing formal, statistical inference on the underlying community dynamics of the biological system previously not available.




lt

A latent discrete Markov random field approach to identifying and classifying historical forest communities based on spatial multivariate tree species counts

Stephen Berg, Jun Zhu, Murray K. Clayton, Monika E. Shea, David J. Mladenoff.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2312--2340.

Abstract:
The Wisconsin Public Land Survey database describes historical forest composition at high spatial resolution and is of interest in ecological studies of forest composition in Wisconsin just prior to significant Euro-American settlement. For such studies it is useful to identify recurring subpopulations of tree species known as communities, but standard clustering approaches for subpopulation identification do not account for dependence between spatially nearby observations. Here, we develop and fit a latent discrete Markov random field model for the purpose of identifying and classifying historical forest communities based on spatially referenced multivariate tree species counts across Wisconsin. We show empirically for the actual dataset and through simulation that our latent Markov random field modeling approach improves prediction and parameter estimation performance. For model fitting we introduce a new stochastic approximation algorithm which enables computationally efficient estimation and classification of large amounts of spatial multivariate count data.




lt

Fire seasonality identification with multimodality tests

Jose Ameijeiras-Alonso, Akli Benali, Rosa M. Crujeiras, Alberto Rodríguez-Casal, José M. C. Pereira.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2120--2139.

Abstract:
Understanding the role of vegetation fires in the Earth system is an important environmental problem. Although fire occurrence is influenced by natural factors, human activity related to land use and management has altered the temporal patterns of fire in several regions of the world. Hence, for a better insight into fires regimes it is of special interest to analyze where human activity has altered fire seasonality. For doing so, multimodality tests are a useful tool for determining the number of annual fire peaks. The periodicity of fires and their complex distributional features motivate the use of nonparametric circular statistics. The unsatisfactory performance of previous circular nonparametric proposals for testing multimodality justifies the introduction of a new approach, considering an adapted version of the excess mass statistic, jointly with a bootstrap calibration algorithm. A systematic application of the test on the Russia–Kazakhstan area is presented in order to determine how many fire peaks can be identified in this region. A False Discovery Rate correction, accounting for the spatial dependence of the data, is also required.




lt

Estimating abundance from multiple sampling capture-recapture data via a multi-state multi-period stopover model

Hannah Worthington, Rachel McCrea, Ruth King, Richard Griffiths.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2043--2064.

Abstract:
Capture-recapture studies often involve collecting data on numerous capture occasions over a relatively short period of time. For many study species this process is repeated, for example, annually, resulting in capture information spanning multiple sampling periods. To account for the different temporal scales, the robust design class of models have traditionally been applied providing a framework in which to analyse all of the available capture data in a single likelihood expression. However, these models typically require strong constraints, either the assumption of closure within a sampling period (the closed robust design) or conditioning on the number of individuals captured within a sampling period (the open robust design). For real datasets these assumptions may not be appropriate. We develop a general modelling structure that requires neither assumption by explicitly modelling the movement of individuals into the population both within and between the sampling periods, which in turn permits the estimation of abundance within a single consistent framework. The flexibility of the novel model structure is further demonstrated by including the computationally challenging case of multi-state data where there is individual time-varying discrete covariate information. We derive an efficient likelihood expression for the new multi-state multi-period stopover model using the hidden Markov model framework. We demonstrate the significant improvement in parameter estimation using our new modelling approach in terms of both the multi-period and multi-state components through both a simulation study and a real dataset relating to the protected species of great crested newts, Triturus cristatus .




lt

Radio-iBAG: Radiomics-based integrative Bayesian analysis of multiplatform genomic data

Youyi Zhang, Jeffrey S. Morris, Shivali Narang Aerry, Arvind U. K. Rao, Veerabhadran Baladandayuthapani.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1957--1988.

Abstract:
Technological innovations have produced large multi-modal datasets that include imaging and multi-platform genomics data. Integrative analyses of such data have the potential to reveal important biological and clinical insights into complex diseases like cancer. In this paper, we present Bayesian approaches for integrative analysis of radiological imaging and multi-platform genomic data, where-in our goals are to simultaneously identify genomic and radiomic, that is, radiology-based imaging markers, along with the latent associations between these two modalities, and to detect the overall prognostic relevance of the combined markers. For this task, we propose Radio-iBAG: Radiomics-based Integrative Bayesian Analysis of Multiplatform Genomic Data , a multi-scale Bayesian hierarchical model that involves several innovative strategies: it incorporates integrative analysis of multi-platform genomic data sets to capture fundamental biological relationships; explores the associations between radiomic markers accompanying genomic information with clinical outcomes; and detects genomic and radiomic markers associated with clinical prognosis. We also introduce the use of sparse Principal Component Analysis (sPCA) to extract a sparse set of approximately orthogonal meta-features each containing information from a set of related individual radiomic features, reducing dimensionality and combining like features. Our methods are motivated by and applied to The Cancer Genome Atlas glioblastoma multiforme data set, where-in we integrate magnetic resonance imaging-based biomarkers along with genomic, epigenomic and transcriptomic data. Our model identifies important magnetic resonance imaging features and the associated genomic platforms that are related with patient survival times.




lt

Bayesian methods for multiple mediators: Relating principal stratification and causal mediation in the analysis of power plant emission controls

Chanmin Kim, Michael J. Daniels, Joseph W. Hogan, Christine Choirat, Corwin M. Zigler.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1927--1956.

Abstract:
Emission control technologies installed on power plants are a key feature of many air pollution regulations in the US. While such regulations are predicated on the presumed relationships between emissions, ambient air pollution and human health, many of these relationships have never been empirically verified. The goal of this paper is to develop new statistical methods to quantify these relationships. We frame this problem as one of mediation analysis to evaluate the extent to which the effect of a particular control technology on ambient pollution is mediated through causal effects on power plant emissions. Since power plants emit various compounds that contribute to ambient pollution, we develop new methods for multiple intermediate variables that are measured contemporaneously, may interact with one another, and may exhibit joint mediating effects. Specifically, we propose new methods leveraging two related frameworks for causal inference in the presence of mediating variables: principal stratification and causal mediation analysis. We define principal effects based on multiple mediators, and also introduce a new decomposition of the total effect of an intervention on ambient pollution into the natural direct effect and natural indirect effects for all combinations of mediators. Both approaches are anchored to the same observed-data models, which we specify with Bayesian nonparametric techniques. We provide assumptions for estimating principal causal effects, then augment these with an additional assumption required for causal mediation analysis. The two analyses, interpreted in tandem, provide the first empirical investigation of the presumed causal pathways that motivate important air quality regulatory policies.




lt

Identifying multiple changes for a functional data sequence with application to freeway traffic segmentation

Jeng-Min Chiou, Yu-Ting Chen, Tailen Hsing.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1430--1463.

Abstract:
Motivated by the study of road segmentation partitioned by shifts in traffic conditions along a freeway, we introduce a two-stage procedure, Dynamic Segmentation and Backward Elimination (DSBE), for identifying multiple changes in the mean functions for a sequence of functional data. The Dynamic Segmentation procedure searches for all possible changepoints using the derived global optimality criterion coupled with the local strategy of at-most-one-changepoint by dividing the entire sequence into individual subsequences that are recursively adjusted until convergence. Then, the Backward Elimination procedure verifies these changepoints by iteratively testing the unlikely changes to ensure their significance until no more changepoints can be removed. By combining the local strategy with the global optimal changepoint criterion, the DSBE algorithm is conceptually simple and easy to implement and performs better than the binary segmentation-based approach at detecting small multiple changes. The consistency property of the changepoint estimators and the convergence of the algorithm are proved. We apply DSBE to detect changes in traffic streams through real freeway traffic data. The practical performance of DSBE is also investigated through intensive simulation studies for various scenarios.




lt

Bayesian linear regression for multivariate responses under group sparsity

Bo Ning, Seonghyun Jeong, Subhashis Ghosal.

Source: Bernoulli, Volume 26, Number 3, 2353--2382.

Abstract:
We study frequentist properties of a Bayesian high-dimensional multivariate linear regression model with correlated responses. The predictors are separated into many groups and the group structure is pre-determined. Two features of the model are unique: (i) group sparsity is imposed on the predictors; (ii) the covariance matrix is unknown and its dimensions can also be high. We choose a product of independent spike-and-slab priors on the regression coefficients and a new prior on the covariance matrix based on its eigendecomposition. Each spike-and-slab prior is a mixture of a point mass at zero and a multivariate density involving the $ell_{2,1}$-norm. We first obtain the posterior contraction rate, the bounds on the effective dimension of the model with high posterior probabilities. We then show that the multivariate regression coefficients can be recovered under certain compatibility conditions. Finally, we quantify the uncertainty for the regression coefficients with frequentist validity through a Bernstein–von Mises type theorem. The result leads to selection consistency for the Bayesian method. We derive the posterior contraction rate using the general theory by constructing a suitable test from the first principle using moment bounds for certain likelihood ratios. This leads to posterior concentration around the truth with respect to the average Rényi divergence of order $1/2$. This technique of obtaining the required tests for posterior contraction rate could be useful in many other problems.




lt

Recurrence of multidimensional persistent random walks. Fourier and series criteria

Peggy Cénac, Basile de Loynes, Yoann Offret, Arnaud Rousselle.

Source: Bernoulli, Volume 26, Number 2, 858--892.

Abstract:
The recurrence and transience of persistent random walks built from variable length Markov chains are investigated. It turns out that these stochastic processes can be seen as Lévy walks for which the persistence times depend on some internal Markov chain: they admit Markov random walk skeletons. A recurrence versus transience dichotomy is highlighted. Assuming the positive recurrence of the driving chain, a sufficient Fourier criterion for the recurrence, close to the usual Chung–Fuchs one, is given and a series criterion is derived. The key tool is the Nagaev–Guivarc’h method. Finally, we focus on particular two-dimensional persistent random walks, including directionally reinforced random walks, for which necessary and sufficient Fourier and series criteria are obtained. Inspired by ( Adv. Math. 208 (2007) 680–698), we produce a genuine counterexample to the conjecture of ( Adv. Math. 117 (1996) 239–252). As for the one-dimensional case studied in ( J. Theoret. Probab. 31 (2018) 232–243), it is easier for a persistent random walk than its skeleton to be recurrent. However, such examples are much more difficult to exhibit in the higher dimensional context. These results are based on a surprisingly novel – to our knowledge – upper bound for the Lévy concentration function associated with symmetric distributions.




lt

Stochastic differential equations with a fractionally filtered delay: A semimartingale model for long-range dependent processes

Richard A. Davis, Mikkel Slot Nielsen, Victor Rohde.

Source: Bernoulli, Volume 26, Number 2, 799--827.

Abstract:
In this paper, we introduce a model, the stochastic fractional delay differential equation (SFDDE), which is based on the linear stochastic delay differential equation and produces stationary processes with hyperbolically decaying autocovariance functions. The model departs from the usual way of incorporating this type of long-range dependence into a short-memory model as it is obtained by applying a fractional filter to the drift term rather than to the noise term. The advantages of this approach are that the corresponding long-range dependent solutions are semimartingales and the local behavior of the sample paths is unaffected by the degree of long memory. We prove existence and uniqueness of solutions to the SFDDEs and study their spectral densities and autocovariance functions. Moreover, we define a subclass of SFDDEs which we study in detail and relate to the well-known fractionally integrated CARMA processes. Finally, we consider the task of simulating from the defining SFDDEs.




lt

A Feynman–Kac result via Markov BSDEs with generalised drivers

Elena Issoglio, Francesco Russo.

Source: Bernoulli, Volume 26, Number 1, 728--766.

Abstract:
In this paper, we investigate BSDEs where the driver contains a distributional term (in the sense of generalised functions) and derive general Feynman–Kac formulae related to these BSDEs. We introduce an integral operator to give sense to the equation and then we show the existence of a strong solution employing results on a related PDE. Due to the irregularity of the driver, the $Y$-component of a couple $(Y,Z)$ solving the BSDE is not necessarily a semimartingale but a weak Dirichlet process.




lt

Multivariate count autoregression

Konstantinos Fokianos, Bård Støve, Dag Tjøstheim, Paul Doukhan.

Source: Bernoulli, Volume 26, Number 1, 471--499.

Abstract:
We are studying linear and log-linear models for multivariate count time series data with Poisson marginals. For studying the properties of such processes we develop a novel conceptual framework which is based on copulas. Earlier contributions impose the copula on the joint distribution of the vector of counts by employing a continuous extension methodology. Instead we introduce a copula function on a vector of associated continuous random variables. This construction avoids conceptual difficulties related to the joint distribution of counts yet it keeps the properties of the Poisson process marginally. Furthermore, this construction can be employed for modeling multivariate count time series with other marginal count distributions. We employ Markov chain theory and the notion of weak dependence to study ergodicity and stationarity of the models we consider. Suitable estimating equations are suggested for estimating unknown model parameters. The large sample properties of the resulting estimators are studied in detail. The work concludes with some simulations and a real data example.




lt

Construction results for strong orthogonal arrays of strength three

Chenlu Shi, Boxin Tang.

Source: Bernoulli, Volume 26, Number 1, 418--431.

Abstract:
Strong orthogonal arrays were recently introduced as a class of space-filling designs for computer experiments. The most attractive are those of strength three for their economical run sizes. Although the existence of strong orthogonal arrays of strength three has been completely characterized, the construction of these arrays has not been explored. In this paper, we provide a systematic and comprehensive study on the construction of these arrays, with the aim at better space-filling properties. Besides various characterizing results, three families of strength-three strong orthogonal arrays are presented. One of these families deserves special mention, as the arrays in this family enjoy almost all of the space-filling properties of strength-four strong orthogonal arrays, and do so with much more economical run sizes than the latter. The theory of maximal designs and their doubling constructions plays a crucial role in many of theoretical developments.




lt

Prediction and estimation consistency of sparse multi-class penalized optimal scoring

Irina Gaynanova.

Source: Bernoulli, Volume 26, Number 1, 286--322.

Abstract:
Sparse linear discriminant analysis via penalized optimal scoring is a successful tool for classification in high-dimensional settings. While the variable selection consistency of sparse optimal scoring has been established, the corresponding prediction and estimation consistency results have been lacking. We bridge this gap by providing probabilistic bounds on out-of-sample prediction error and estimation error of multi-class penalized optimal scoring allowing for diverging number of classes.




lt

The Klemm family : descendants of Johann Gottfried Klemm and Anna Louise Klemm : these forebears are honoured and remembered at a reunion at Gruenberg, Moculta 11th-12th March 1995.

Klemm (Family)




lt

Austin-Area District Looks for Digital/Blended Learning Program; Baltimore Seeks High School Literacy Program

The Round Rock Independent School District in Texas is looking for a digital curriculum and blended learning program. Baltimore is looking for a comprehensive high school literacy program.

The post Austin-Area District Looks for Digital/Blended Learning Program; Baltimore Seeks High School Literacy Program appeared first on Market Brief.



  • Purchasing Alert
  • Curriculum / Digital Curriculum
  • Educational Technology/Ed-Tech
  • Learning Management / Student Information Systems
  • Procurement / Purchasing / RFPs

lt

4 Ways to Help Students Cultivate Meaningful Connections Through Tech

The CEO of Move This World isn't big on screen time, but in the midst of the coronavirus pandemic, technology--when used with care--can help strengthen relationships.

The post 4 Ways to Help Students Cultivate Meaningful Connections Through Tech appeared first on Market Brief.




lt

Item 02: William Hilton Saunders WWI diary, 1 January 1917 - 24 October 1917




lt

Item 04: William Hilton Saunders WWI diary, 18 February 1919 - 8 July 1919




lt

Item 03: William Hilton Saunders WWI diary, 1 January 1918 - 31 December 1918




lt

Item 01: William Hilton Saunders WWI diary, February 1916 - 2 January 1917




lt

Item 05: William Hilton Saunders WWI 1916-1919 address book with poetry