ens

Capitals cutting Leipsic after 'offensive' comments

Washington announced it is terminating the contract of forward Brendan Leipsic after his mysogynistic comments on Instagram were leaked.




ens

More Than Phonics: How to Boost Comprehension for Early Readers

Learning how to decode words is essential to becoming a reader. But research shows that building a strong vocabulary and knowledge-base is crucial as well.




ens

Antike Naturwissenschaft und ihre Rezeption : Band XXIX / Jochen Althoff, Sabine Föllinger, Georg Wöhrle (Hg.)

Trier : WVT Wissenschaftlicher Verlag Trier, 2017.




ens

Needle sharing among intravenous drug abusers: national and international perspectives / Editors, Robert J. Battjes, Roy W. Pickens.

Rockville, Maryland : National Institute on Drug Abuse, 1988.




ens

Mississippi State hires Nikki McCray-Penson as women's coach

Mississippi State hired former Old Dominion women’s basketball coach Nikki McCray-Penson to replace Vic Schaefer as the Bulldogs’ head coach. Athletic director John Cohen called McCray-Penson “a proven winner who will lead one of the best programs in the nation” on the department’s website. McCray-Penson, a former Tennessee star and Women’s Basketball Hall of Famer, said it’s been a dream to coach in the Southeastern Conference and she’s “grateful and blessed for this incredible honor and opportunity.”




ens

Nonparametric confidence intervals for conditional quantiles with large-dimensional covariates

Laurent Gardes.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 661--701.

Abstract:
The first part of the paper is dedicated to the construction of a $gamma$ - nonparametric confidence interval for a conditional quantile with a level depending on the sample size. When this level tends to 0 or 1 as the sample size increases, the conditional quantile is said to be extreme and is located in the tail of the conditional distribution. The proposed confidence interval is constructed by approximating the distribution of the order statistics selected with a nearest neighbor approach by a Beta distribution. We show that its coverage probability converges to the preselected probability $gamma $ and its accuracy is illustrated on a simulation study. When the dimension of the covariate increases, the coverage probability of the confidence interval can be very different from $gamma $. This is a well known consequence of the data sparsity especially in the tail of the distribution. In a second part, a dimension reduction procedure is proposed in order to select more appropriate nearest neighbors in the right tail of the distribution and in turn to obtain a better coverage probability for extreme conditional quantiles. This procedure is based on the Tail Conditional Independence assumption introduced in (Gardes, Extremes , pp. 57–95, 18(3) , 2018).




ens

On the Letac-Massam conjecture and existence of high dimensional Bayes estimators for graphical models

Emanuel Ben-David, Bala Rajaratnam.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 580--604.

Abstract:
The Wishart distribution defined on the open cone of positive-definite matrices plays a central role in multivariate analysis and multivariate distribution theory. Its domain of parameters is often referred to as the Gindikin set. In recent years, varieties of useful extensions of the Wishart distribution have been proposed in the literature for the purposes of studying Markov random fields and graphical models. In particular, generalizations of the Wishart distribution, referred to as Type I and Type II (graphical) Wishart distributions introduced by Letac and Massam in Annals of Statistics (2007) play important roles in both frequentist and Bayesian inference for Gaussian graphical models. These distributions have been especially useful in high-dimensional settings due to the flexibility offered by their multiple-shape parameters. Concerning Type I and Type II Wishart distributions, a conjecture of Letac and Massam concerns the domain of multiple-shape parameters of these distributions. The conjecture also has implications for the existence of Bayes estimators corresponding to these high dimensional priors. The conjecture, which was first posed in the Annals of Statistics, has now been an open problem for about 10 years. In this paper, we give a necessary condition for the Letac and Massam conjecture to hold. More precisely, we prove that if the Letac and Massam conjecture holds on a decomposable graph, then no two separators of the graph can be nested within each other. For this, we analyze Type I and Type II Wishart distributions on appropriate Markov equivalent perfect DAG models and succeed in deriving the aforementioned necessary condition. This condition in particular identifies a class of counterexamples to the conjecture.




ens

Parseval inequalities and lower bounds for variance-based sensitivity indices

Olivier Roustant, Fabrice Gamboa, Bertrand Iooss.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 386--412.

Abstract:
The so-called polynomial chaos expansion is widely used in computer experiments. For example, it is a powerful tool to estimate Sobol’ sensitivity indices. In this paper, we consider generalized chaos expansions built on general tensor Hilbert basis. In this frame, we revisit the computation of the Sobol’ indices with Parseval equalities and give general lower bounds for these indices obtained by truncation. The case of the eigenfunctions system associated with a Poincaré differential operator leads to lower bounds involving the derivatives of the analyzed function and provides an efficient tool for variable screening. These lower bounds are put in action both on toy and real life models demonstrating their accuracy.




ens

Asymptotics and optimal bandwidth for nonparametric estimation of density level sets

Wanli Qiao.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 302--344.

Abstract:
Bandwidth selection is crucial in the kernel estimation of density level sets. A risk based on the symmetric difference between the estimated and true level sets is usually used to measure their proximity. In this paper we provide an asymptotic $L^{p}$ approximation to this risk, where $p$ is characterized by the weight function in the risk. In particular the excess risk corresponds to an $L^{2}$ type of risk, and is adopted to derive an optimal bandwidth for nonparametric level set estimation of $d$-dimensional density functions ($dgeq 1$). A direct plug-in bandwidth selector is developed for kernel density level set estimation and its efficacy is verified in numerical studies.




ens

Estimation of linear projections of non-sparse coefficients in high-dimensional regression

David Azriel, Armin Schwartzman.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 174--206.

Abstract:
In this work we study estimation of signals when the number of parameters is much larger than the number of observations. A large body of literature assumes for these kind of problems a sparse structure where most of the parameters are zero or close to zero. When this assumption does not hold, one can focus on low-dimensional functions of the parameter vector. In this work we study one-dimensional linear projections. Specifically, in the context of high-dimensional linear regression, the parameter of interest is ${oldsymbol{eta}}$ and we study estimation of $mathbf{a}^{T}{oldsymbol{eta}}$. We show that $mathbf{a}^{T}hat{oldsymbol{eta}}$, where $hat{oldsymbol{eta}}$ is the least squares estimator, using pseudo-inverse when $p>n$, is minimax and admissible. Thus, for linear projections no regularization or shrinkage is needed. This estimator is easy to analyze and confidence intervals can be constructed. We study a high-dimensional dataset from brain imaging where it is shown that the signal is weak, non-sparse and significantly different from zero.




ens

A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables

Ryoya Oda, Hirokazu Yanagihara.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1386--1412.

Abstract:
We put forward a variable selection method for selecting explanatory variables in a normality-assumed multivariate linear regression. It is cumbersome to calculate variable selection criteria for all subsets of explanatory variables when the number of explanatory variables is large. Therefore, we propose a fast and consistent variable selection method based on a generalized $C_{p}$ criterion. The consistency of the method is provided by a high-dimensional asymptotic framework such that the sample size and the sum of the dimensions of response vectors and explanatory vectors divided by the sample size tend to infinity and some positive constant which are less than one, respectively. Through numerical simulations, it is shown that the proposed method has a high probability of selecting the true subset of explanatory variables and is fast under a moderate sample size even when the number of dimensions is large.




ens

Conditional density estimation with covariate measurement error

Xianzheng Huang, Haiming Zhou.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 970--1023.

Abstract:
We consider estimating the density of a response conditioning on an error-prone covariate. Motivated by two existing kernel density estimators in the absence of covariate measurement error, we propose a method to correct the existing estimators for measurement error. Asymptotic properties of the resultant estimators under different types of measurement error distributions are derived. Moreover, we adjust bandwidths readily available from existing bandwidth selection methods developed for error-free data to obtain bandwidths for the new estimators. Extensive simulation studies are carried out to compare the proposed estimators with naive estimators that ignore measurement error, which also provide empirical evidence for the effectiveness of the proposed bandwidth selection methods. A real-life data example is used to illustrate implementation of these methods under practical scenarios. An R package, lpme, is developed for implementing all considered methods, which we demonstrate via an R code example in Appendix B.2.




ens

On the distribution, model selection properties and uniqueness of the Lasso estimator in low and high dimensions

Karl Ewald, Ulrike Schneider.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 944--969.

Abstract:
We derive expressions for the finite-sample distribution of the Lasso estimator in the context of a linear regression model in low as well as in high dimensions by exploiting the structure of the optimization problem defining the estimator. In low dimensions, we assume full rank of the regressor matrix and present expressions for the cumulative distribution function as well as the densities of the absolutely continuous parts of the estimator. Our results are presented for the case of normally distributed errors, but do not hinge on this assumption and can easily be generalized. Additionally, we establish an explicit formula for the correspondence between the Lasso and the least-squares estimator. We derive analogous results for the distribution in less explicit form in high dimensions where we make no assumptions on the regressor matrix at all. In this setting, we also investigate the model selection properties of the Lasso and show that possibly only a subset of models might be selected by the estimator, completely independently of the observed response vector. Finally, we present a condition for uniqueness of the estimator that is necessary as well as sufficient.




ens

DESlib: A Dynamic ensemble selection library in Python

DESlib is an open-source python library providing the implementation of several dynamic selection techniques. The library is divided into three modules: (i) dcs, containing the implementation of dynamic classifier selection methods (DCS); (ii) des, containing the implementation of dynamic ensemble selection methods (DES); (iii) static, with the implementation of static ensemble techniques. The library is fully documented (documentation available online on Read the Docs), has a high test coverage (codecov.io) and is part of the scikit-learn-contrib supported projects. Documentation, code and examples can be found on its GitHub page: https://github.com/scikit-learn-contrib/DESlib.




ens

Online Sufficient Dimension Reduction Through Sliced Inverse Regression

Sliced inverse regression is an effective paradigm that achieves the goal of dimension reduction through replacing high dimensional covariates with a small number of linear combinations. It does not impose parametric assumptions on the dependence structure. More importantly, such a reduction of dimension is sufficient in that it does not cause loss of information. In this paper, we adapt the stationary sliced inverse regression to cope with the rapidly changing environments. We propose to implement sliced inverse regression in an online fashion. This online learner consists of two steps. In the first step we construct an online estimate for the kernel matrix; in the second step we propose two online algorithms, one is motivated by the perturbation method and the other is originated from the gradient descent optimization, to perform online singular value decomposition. The theoretical properties of this online learner are established. We demonstrate the numerical performance of this online learner through simulations and real world applications. All numerical studies confirm that this online learner performs as well as the batch learner.




ens

On lp-Support Vector Machines and Multidimensional Kernels

In this paper, we extend the methodology developed for Support Vector Machines (SVM) using the $ell_2$-norm ($ell_2$-SVM) to the more general case of $ell_p$-norms with $p>1$ ($ell_p$-SVM). We derive second order cone formulations for the resulting dual and primal problems. The concept of kernel function, widely applied in $ell_2$-SVM, is extended to the more general case of $ell_p$-norms with $p>1$ by defining a new operator called multidimensional kernel. This object gives rise to reformulations of dual problems, in a transformed space of the original data, where the dependence on the original data always appear as homogeneous polynomials. We adapt known solution algorithms to efficiently solve the primal and dual resulting problems and some computational experiments on real-world datasets are presented showing rather good behavior in terms of the accuracy of $ell_p$-SVM with $p>1$.




ens

High-Dimensional Interactions Detection with Sparse Principal Hessian Matrix

In statistical learning framework with regressions, interactions are the contributions to the response variable from the products of the explanatory variables. In high-dimensional problems, detecting interactions is challenging due to combinatorial complexity and limited data information. We consider detecting interactions by exploring their connections with the principal Hessian matrix. Specifically, we propose a one-step synthetic approach for estimating the principal Hessian matrix by a penalized M-estimator. An alternating direction method of multipliers (ADMM) is proposed to efficiently solve the encountered regularized optimization problem. Based on the sparse estimator, we detect the interactions by identifying its nonzero components. Our method directly targets at the interactions, and it requires no structural assumption on the hierarchy of the interactions effects. We show that our estimator is theoretically valid, computationally efficient, and practically useful for detecting the interactions in a broad spectrum of scenarios.




ens

Targeted Fused Ridge Estimation of Inverse Covariance Matrices from Multiple High-Dimensional Data Classes

We consider the problem of jointly estimating multiple inverse covariance matrices from high-dimensional data consisting of distinct classes. An $ell_2$-penalized maximum likelihood approach is employed. The suggested approach is flexible and generic, incorporating several other $ell_2$-penalized estimators as special cases. In addition, the approach allows specification of target matrices through which prior knowledge may be incorporated and which can stabilize the estimation procedure in high-dimensional settings. The result is a targeted fused ridge estimator that is of use when the precision matrices of the constituent classes are believed to chiefly share the same structure while potentially differing in a number of locations of interest. It has many applications in (multi)factorial study designs. We focus on the graphical interpretation of precision matrices with the proposed estimator then serving as a basis for integrative or meta-analytic Gaussian graphical modeling. Situations are considered in which the classes are defined by data sets and subtypes of diseases. The performance of the proposed estimator in the graphical modeling setting is assessed through extensive simulation experiments. Its practical usability is illustrated by the differential network modeling of 12 large-scale gene expression data sets of diffuse large B-cell lymphoma subtypes. The estimator and its related procedures are incorporated into the R-package rags2ridges.




ens

The Maximum Separation Subspace in Sufficient Dimension Reduction with Categorical Response

Sufficient dimension reduction (SDR) is a very useful concept for exploratory analysis and data visualization in regression, especially when the number of covariates is large. Many SDR methods have been proposed for regression with a continuous response, where the central subspace (CS) is the target of estimation. Various conditions, such as the linearity condition and the constant covariance condition, are imposed so that these methods can estimate at least a portion of the CS. In this paper we study SDR for regression and discriminant analysis with categorical response. Motivated by the exploratory analysis and data visualization aspects of SDR, we propose a new geometric framework to reformulate the SDR problem in terms of manifold optimization and introduce a new concept called Maximum Separation Subspace (MASES). The MASES naturally preserves the “sufficiency” in SDR without imposing additional conditions on the predictor distribution, and directly inspires a semi-parametric estimator. Numerical studies show MASES exhibits superior performance as compared with competing SDR methods in specific settings.




ens

Tensor Train Decomposition on TensorFlow (T3F)

Tensor Train decomposition is used across many branches of machine learning. We present T3F—a library for Tensor Train decomposition based on TensorFlow. T3F supports GPU execution, batch processing, automatic differentiation, and versatile functionality for the Riemannian optimization framework, which takes into account the underlying manifold structure to construct efficient optimization methods. The library makes it easier to implement machine learning papers that rely on the Tensor Train decomposition. T3F includes documentation, examples and 94% test coverage.




ens

Noise Accumulation in High Dimensional Classification and Total Signal Index

Great attention has been paid to Big Data in recent years. Such data hold promise for scientific discoveries but also pose challenges to analyses. One potential challenge is noise accumulation. In this paper, we explore noise accumulation in high dimensional two-group classification. First, we revisit a previous assessment of noise accumulation with principal component analyses, which yields a different threshold for discriminative ability than originally identified. Then we extend our scope to its impact on classifiers developed with three common machine learning approaches---random forest, support vector machine, and boosted classification trees. We simulate four scenarios with differing amounts of signal strength to evaluate each method. After determining noise accumulation may affect the performance of these classifiers, we assess factors that impact it. We conduct simulations by varying sample size, signal strength, signal strength proportional to the number predictors, and signal magnitude with random forest classifiers. These simulations suggest that noise accumulation affects the discriminative ability of high-dimensional classifiers developed using common machine learning methods, which can be modified by sample size, signal strength, and signal magnitude. We developed the measure total signal index (TSI) to track the trends of total signal and noise accumulation.




ens

Latent Simplex Position Model: High Dimensional Multi-view Clustering with Uncertainty Quantification

High dimensional data often contain multiple facets, and several clustering patterns can co-exist under different variable subspaces, also known as the views. While multi-view clustering algorithms were proposed, the uncertainty quantification remains difficult --- a particular challenge is in the high complexity of estimating the cluster assignment probability under each view, and sharing information among views. In this article, we propose an approximate Bayes approach --- treating the similarity matrices generated over the views as rough first-stage estimates for the co-assignment probabilities; in its Kullback-Leibler neighborhood, we obtain a refined low-rank matrix, formed by the pairwise product of simplex coordinates. Interestingly, each simplex coordinate directly encodes the cluster assignment uncertainty. For multi-view clustering, we let each view draw a parameterization from a few candidates, leading to dimension reduction. With high model flexibility, the estimation can be efficiently carried out as a continuous optimization problem, hence enjoys gradient-based computation. The theory establishes the connection of this model to a random partition distribution under multiple views. Compared to single-view clustering approaches, substantially more interpretable results are obtained when clustering brains from a human traumatic brain injury study, using high-dimensional gene expression data.




ens

Ensemble Learning for Relational Data

We present a theoretical analysis framework for relational ensemble models. We show that ensembles of collective classifiers can improve predictions for graph data by reducing errors due to variance in both learning and inference. In addition, we propose a relational ensemble framework that combines a relational ensemble learning approach with a relational ensemble inference approach for collective classification. The proposed ensemble techniques are applicable for both single and multiple graph settings. Experiments on both synthetic and real-world data demonstrate the effectiveness of the proposed framework. Finally, our experimental results support the theoretical analysis and confirm that ensemble algorithms that explicitly focus on both learning and inference processes and aim at reducing errors associated with both, are the best performers.




ens

High-Dimensional Inference for Cluster-Based Graphical Models

Motivated by modern applications in which one constructs graphical models based on a very large number of features, this paper introduces a new class of cluster-based graphical models, in which variable clustering is applied as an initial step for reducing the dimension of the feature space. We employ model assisted clustering, in which the clusters contain features that are similar to the same unobserved latent variable. Two different cluster-based Gaussian graphical models are considered: the latent variable graph, corresponding to the graphical model associated with the unobserved latent variables, and the cluster-average graph, corresponding to the vector of features averaged over clusters. Our study reveals that likelihood based inference for the latent graph, not analyzed previously, is analytically intractable. Our main contribution is the development and analysis of alternative estimation and inference strategies, for the precision matrix of an unobservable latent vector Z. We replace the likelihood of the data by an appropriate class of empirical risk functions, that can be specialized to the latent graphical model and to the simpler, but under-analyzed, cluster-average graphical model. The estimators thus derived can be used for inference on the graph structure, for instance on edge strength or pattern recovery. Inference is based on the asymptotic limits of the entry-wise estimates of the precision matrices associated with the conditional independence graphs under consideration. While taking the uncertainty induced by the clustering step into account, we establish Berry-Esseen central limit theorems for the proposed estimators. It is noteworthy that, although the clusters are estimated adaptively from the data, the central limit theorems regarding the entries of the estimated graphs are proved under the same conditions one would use if the clusters were known in advance. As an illustration of the usage of these newly developed inferential tools, we show that they can be reliably used for recovery of the sparsity pattern of the graphs we study, under FDR control, which is verified via simulation studies and an fMRI data analysis. These experimental results confirm the theoretically established difference between the two graph structures. Furthermore, the data analysis suggests that the latent variable graph, corresponding to the unobserved cluster centers, can help provide more insight into the understanding of the brain connectivity networks relative to the simpler, average-based, graph.




ens

WONDER: Weighted One-shot Distributed Ridge Regression in High Dimensions

In many areas, practitioners need to analyze large data sets that challenge conventional single-machine computing. To scale up data analysis, distributed and parallel computing approaches are increasingly needed. Here we study a fundamental and highly important problem in this area: How to do ridge regression in a distributed computing environment? Ridge regression is an extremely popular method for supervised learning, and has several optimality properties, thus it is important to study. We study one-shot methods that construct weighted combinations of ridge regression estimators computed on each machine. By analyzing the mean squared error in a high-dimensional random-effects model where each predictor has a small effect, we discover several new phenomena. Infinite-worker limit: The distributed estimator works well for very large numbers of machines, a phenomenon we call 'infinite-worker limit'. Optimal weights: The optimal weights for combining local estimators sum to more than unity, due to the downward bias of ridge. Thus, all averaging methods are suboptimal. We also propose a new Weighted ONe-shot DistributEd Ridge regression algorithm (WONDER). We test WONDER in simulation studies and using the Million Song Dataset as an example. There it can save at least 100x in computation time, while nearly preserving test accuracy.




ens

Union of Low-Rank Tensor Spaces: Clustering and Completion

We consider the problem of clustering and completing a set of tensors with missing data that are drawn from a union of low-rank tensor spaces. In the clustering problem, given a partially sampled tensor data that is composed of a number of subtensors, each chosen from one of a certain number of unknown tensor spaces, we need to group the subtensors that belong to the same tensor space. We provide a geometrical analysis on the sampling pattern and subsequently derive the sampling rate that guarantees the correct clustering under some assumptions with high probability. Moreover, we investigate the fundamental conditions for finite/unique completability for the union of tensor spaces completion problem. Both deterministic and probabilistic conditions on the sampling pattern to ensure finite/unique completability are obtained. For both the clustering and completion problems, our tensor analysis provides significantly better bound than the bound given by the matrix analysis applied to any unfolding of the tensor data.




ens

High-dimensional Gaussian graphical models on network-linked data

Graphical models are commonly used to represent conditional dependence relationships between variables. There are multiple methods available for exploring them from high-dimensional data, but almost all of them rely on the assumption that the observations are independent and identically distributed. At the same time, observations connected by a network are becoming increasingly common, and tend to violate these assumptions. Here we develop a Gaussian graphical model for observations connected by a network with potentially different mean vectors, varying smoothly over the network. We propose an efficient estimation algorithm and demonstrate its effectiveness on both simulated and real data, obtaining meaningful and interpretable results on a statistics coauthorship network. We also prove that our method estimates both the inverse covariance matrix and the corresponding graph structure correctly under the assumption of network “cohesion”, which refers to the empirically observed phenomenon of network neighbors sharing similar traits.




ens

Reliability estimation in a multicomponent stress-strength model for Burr XII distribution under progressive censoring

Raj Kamal Maurya, Yogesh Mani Tripathi.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 2, 345--369.

Abstract:
We consider estimation of the multicomponent stress-strength reliability under progressive Type II censoring under the assumption that stress and strength variables follow Burr XII distributions with a common shape parameter. Maximum likelihood estimates of the reliability are obtained along with asymptotic intervals when common shape parameter may be known or unknown. Bayes estimates are also derived under the squared error loss function using different approximation methods. Further, we obtain exact Bayes and uniformly minimum variance unbiased estimates of the reliability for the case common shape parameter is known. The highest posterior density intervals are also obtained. We perform Monte Carlo simulations to compare the performance of proposed estimates and present a discussion based on this study. Finally, two real data sets are analyzed for illustration purposes.




ens

Bayesian modeling and prior sensitivity analysis for zero–one augmented beta regression models with an application to psychometric data

Danilo Covaes Nogarotto, Caio Lucidius Naberezny Azevedo, Jorge Luis Bazán.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 2, 304--322.

Abstract:
The interest on the analysis of the zero–one augmented beta regression (ZOABR) model has been increasing over the last few years. In this work, we developed a Bayesian inference for the ZOABR model, providing some contributions, namely: we explored the use of Jeffreys-rule and independence Jeffreys prior for some of the parameters, performing a sensitivity study of prior choice, comparing the Bayesian estimates with the maximum likelihood ones and measuring the accuracy of the estimates under several scenarios of interest. The results indicate, in a general way, that: the Bayesian approach, under the Jeffreys-rule prior, was as accurate as the ML one. Also, different from other approaches, we use the predictive distribution of the response to implement Bayesian residuals. To further illustrate the advantages of our approach, we conduct an analysis of a real psychometric data set including a Bayesian residual analysis, where it is shown that misleading inference can be obtained when the data is transformed. That is, when the zeros and ones are transformed to suitable values and the usual beta regression model is considered, instead of the ZOABR model. Finally, future developments are discussed.




ens

A note on the “L-logistic regression models: Prior sensitivity analysis, robustness to outliers and applications”

Saralees Nadarajah, Yuancheng Si.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 183--187.

Abstract:
Da Paz, Balakrishnan and Bazan [Braz. J. Probab. Stat. 33 (2019), 455–479] introduced the L-logistic distribution, studied its properties including estimation issues and illustrated a data application. This note derives a closed form expression for moment properties of the distribution. Some computational issues are discussed.




ens

Option pricing with bivariate risk-neutral density via copula and heteroscedastic model: A Bayesian approach

Lucas Pereira Lopes, Vicente Garibay Cancho, Francisco Louzada.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 4, 801--825.

Abstract:
Multivariate options are adequate tools for multi-asset risk management. The pricing models derived from the pioneer Black and Scholes method under the multivariate case consider that the asset-object prices follow a Brownian geometric motion. However, the construction of such methods imposes some unrealistic constraints on the process of fair option calculation, such as constant volatility over the maturity time and linear correlation between the assets. Therefore, this paper aims to price and analyze the fair price behavior of the call-on-max (bivariate) option considering marginal heteroscedastic models with dependence structure modeled via copulas. Concerning inference, we adopt a Bayesian perspective and computationally intensive methods based on Monte Carlo simulations via Markov Chain (MCMC). A simulation study examines the bias, and the root mean squared errors of the posterior means for the parameters. Real stocks prices of Brazilian banks illustrate the approach. For the proposed method is verified the effects of strike and dependence structure on the fair price of the option. The results show that the prices obtained by our heteroscedastic model approach and copulas differ substantially from the prices obtained by the model derived from Black and Scholes. Empirical results are presented to argue the advantages of our strategy.




ens

Density for solutions to stochastic differential equations with unbounded drift

Christian Olivera, Ciprian Tudor.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 3, 520--531.

Abstract:
Via a special transform and by using the techniques of the Malliavin calculus, we analyze the density of the solution to a stochastic differential equation with unbounded drift.




ens

L-Logistic regression models: Prior sensitivity analysis, robustness to outliers and applications

Rosineide F. da Paz, Narayanaswamy Balakrishnan, Jorge Luis Bazán.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 3, 455--479.

Abstract:
Tadikamalla and Johnson [ Biometrika 69 (1982) 461–465] developed the $L_{B}$ distribution to variables with bounded support by considering a transformation of the standard Logistic distribution. In this manuscript, a convenient parametrization of this distribution is proposed in order to develop regression models. This distribution, referred to here as L-Logistic distribution, provides great flexibility and includes the uniform distribution as a particular case. Several properties of this distribution are studied, and a Bayesian approach is adopted for the parameter estimation. Simulation studies, considering prior sensitivity analysis, recovery of parameters and comparison of algorithms, and robustness to outliers are all discussed showing that the results are insensitive to the choice of priors, efficiency of the algorithm MCMC adopted, and robustness of the model when compared with the beta distribution. Applications to estimate the vulnerability to poverty and to explain the anxiety are performed. The results to applications show that the L-Logistic regression models provide a better fit than the corresponding beta regression models.




ens

Scalar-on-function regression for predicting distal outcomes from intensively gathered longitudinal data: Interpretability for applied scientists

John J. Dziak, Donna L. Coffman, Matthew Reimherr, Justin Petrovich, Runze Li, Saul Shiffman, Mariya P. Shiyko.

Source: Statistics Surveys, Volume 13, 150--180.

Abstract:
Researchers are sometimes interested in predicting a distal or external outcome (such as smoking cessation at follow-up) from the trajectory of an intensively recorded longitudinal variable (such as urge to smoke). This can be done in a semiparametric way via scalar-on-function regression. However, the resulting fitted coefficient regression function requires special care for correct interpretation, as it represents the joint relationship of time points to the outcome, rather than a marginal or cross-sectional relationship. We provide practical guidelines, based on experience with scientific applications, for helping practitioners interpret their results and illustrate these ideas using data from a smoking cessation study.




ens

Additive monotone regression in high and lower dimensions

Solveig Engebretsen, Ingrid K. Glad.

Source: Statistics Surveys, Volume 13, 1--51.

Abstract:
In numerous problems where the aim is to estimate the effect of a predictor variable on a response, one can assume a monotone relationship. For example, dose-effect models in medicine are of this type. In a multiple regression setting, additive monotone regression models assume that each predictor has a monotone effect on the response. In this paper, we present an overview and comparison of very recent frequentist methods for fitting additive monotone regression models. Three of the methods we present can be used both in the high dimensional setting, where the number of parameters $p$ exceeds the number of observations $n$, and in the classical multiple setting where $1<pleq n$. However, many of the most recent methods only apply to the classical setting. The methods are compared through simulation experiments in terms of efficiency, prediction error and variable selection properties in both settings, and they are applied to the Boston housing data. We conclude with some recommendations on when the various methods perform best.




ens

A design-sensitive approach to fitting regression models with complex survey data

Phillip S. Kott.

Source: Statistics Surveys, Volume 12, 1--17.

Abstract:
Fitting complex survey data to regression equations is explored under a design-sensitive model-based framework. A robust version of the standard model assumes that the expected value of the difference between the dependent variable and its model-based prediction is zero no matter what the values of the explanatory variables. The extended model assumes only that the difference is uncorrelated with the covariates. Little is assumed about the error structure of this difference under either model other than independence across primary sampling units. The standard model often fails in practice, but the extended model very rarely does. Under this framework some of the methods developed in the conventional design-based, pseudo-maximum-likelihood framework, such as fitting weighted estimating equations and sandwich mean-squared-error estimation, are retained but their interpretations change. Few of the ideas here are new to the refereed literature. The goal instead is to collect those ideas and put them into a unified conceptual framework.




ens

Curse of dimensionality and related issues in nonparametric functional regression

Gery Geenens

Source: Statist. Surv., Volume 5, 30--43.

Abstract:
Recently, some nonparametric regression ideas have been extended to the case of functional regression. Within that framework, the main concern arises from the infinite dimensional nature of the explanatory objects. Specifically, in the classical multivariate regression context, it is well-known that any nonparametric method is affected by the so-called “curse of dimensionality”, caused by the sparsity of data in high-dimensional spaces, resulting in a decrease in fastest achievable rates of convergence of regression function estimators toward their target curve as the dimension of the regressor vector increases. Therefore, it is not surprising to find dramatically bad theoretical properties for the nonparametric functional regression estimators, leading many authors to condemn the methodology. Nevertheless, a closer look at the meaning of the functional data under study and on the conclusions that the statistician would like to draw from it allows to consider the problem from another point-of-view, and to justify the use of slightly modified estimators. In most cases, it can be entirely legitimate to measure the proximity between two elements of the infinite dimensional functional space via a semi-metric, which could prevent those estimators suffering from what we will call the “curse of infinite dimensionality”.

References:
[1] Ait-Saïdi, A., Ferraty, F., Kassa, K. and Vieu, P. (2008). Cross-validated estimations in the single-functional index model, Statistics, 42, 475–494.

[2] Aneiros-Perez, G. and Vieu, P. (2008). Nonparametric time series prediction: A semi-functional partial linear modeling, J. Multivariate Anal., 99, 834–857.

[3] Baillo, A. and Grané, A. (2009). Local linear regression for functional predictor and scalar response, J. Multivariate Anal., 100, 102–111.

[4] Burba, F., Ferraty, F. and Vieu, P. (2009). k-Nearest Neighbour method in functional nonparametric regression, J. Nonparam. Stat., 21, 453–469.

[5] Cardot, H., Ferraty, F. and Sarda, P. (1999). Functional linear model, Stat. Probabil. Lett., 45, 11–22.

[6] Crambes, C., Kneip, A. and Sarda, P. (2009). Smoothing splines estimators for functional linear regression, Ann. Statist., 37, 35–72.

[7] Delsol, L. (2009). Advances on asymptotic normality in nonparametric functional time series analysis, Statistics, 43, 13–33.

[8] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications, Chapman and Hall, London.

[9] Fan, J. and Zhang, J.-T. (2000). Two-step estimation of functional linear models with application to longitudinal data, J. Roy. Stat. Soc. B, 62, 303–322.

[10] Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis, Springer-Verlag, New York.

[11] Ferraty, F., Laksaci, A. and Vieu, P. (2006). Estimating Some Characteristics of the Conditional Distribution in Nonparametric Functional Models, Statist. Inf. Stoch. Proc., 9, 47–76.

[12] Ferraty, F., Mas, A. and Vieu, P. (2007). Nonparametric regression on functional data: inference and practical aspects, Aust. NZ. J. Stat., 49, 267–286.

[13] Ferraty, F., Van Keilegom, I. and Vieu, P. (2010). On the validity of the bootstrap in nonparametric functional regression, Scand. J. Stat., 37, 286–306.

[14] Ferraty, F., Laksaci, A., Tadj, A. and Vieu, P. (2010). Rate of uniform consistency for nonparametric estimates with functional variables, J. Stat. Plan. Inf., 140, 335–352.

[15] Ferraty, F. and Romain, Y. (2011). Oxford handbook on functional data analysis (Eds), Oxford University Press.

[16] Gasser, T., Hall, P. and Presnell, B. (1998). Nonparametric estimation of the mode of a distribution of random curves, J. Roy. Stat. Soc. B, 60, 681–691.

[17] Geenens, G. (2011). A nonparametric functional method for signature recognition, Manuscript.

[18] Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2004). Nonparametric and semiparametric models, Springer-Verlag, Berlin.

[19] James, G.M. (2002). Generalized linear models with functional predictors, J. Roy. Stat. Soc. B, 64, 411–432.

[20] Masry, E. (2005). Nonparametric regression estimation for dependent functional data: asymptotic normality, Stochastic Process. Appl., 115, 155–177.

[21] Nadaraya, E.A. (1964). On estimating regression, Theory Probab. Applic., 9, 141–142.

[22] Quintela-Del-Rio, A. (2008). Hazard function given a functional variable: nonparametric estimation under strong mixing conditions, J. Nonparam. Stat., 20, 413–430.

[23] Rachdi, M. and Vieu, P. (2007). Nonparametric regression for functional data: automatic smoothing parameter selection, J. Stat. Plan. Inf., 137, 2784–2801.

[24] Ramsay, J. and Silverman, B.W. (1997). Functional Data Analysis, Springer-Verlag, New York.

[25] Ramsay, J. and Silverman, B.W. (2002). Applied functional data analysis; methods and case study, Springer-Verlag, New York.

[26] Ramsay, J. and Silverman, B.W. (2005). Functional Data Analysis, 2nd Edition, Springer-Verlag, New York.

[27] Stone, C.J. (1982). Optimal global rates of convergence for nonparametric regression, Ann. Stat., 10, 1040–1053.

[28] Watson, G.S. (1964). Smooth regression analysis, Sankhya A, 26, 359–372.

[29] Yeung, D.T., Chang, H., Xiong, Y., George, S., Kashi, R., Matsumoto, T. and Rigoll, G. (2004). SVC2004: First International Signature Verification Competition, Proceedings of the International Conference on Biometric Authentication (ICBA), Hong Kong, July 2004.




ens

Additive Bayesian variable selection under censoring and misspecification. (arXiv:1907.13563v3 [stat.ME] UPDATED)

We study the interplay of two important issues on Bayesian model selection (BMS): censoring and model misspecification. We consider additive accelerated failure time (AAFT), Cox proportional hazards and probit models, and a more general concave log-likelihood structure. A fundamental question is what solution can one hope BMS to provide, when (inevitably) models are misspecified. We show that asymptotically BMS keeps any covariate with predictive power for either the outcome or censoring times, and discards other covariates. Misspecification refers to assuming the wrong model or functional effect on the response, including using a finite basis for a truly non-parametric effect, or omitting truly relevant covariates. We argue for using simple models that are computationally practical yet attain good power to detect potentially complex effects, despite misspecification. Misspecification and censoring both have an asymptotically negligible effect on (suitably-defined) false positives, but their impact on power is exponential. We portray these issues via simple descriptions of early/late censoring and the drop in predictive accuracy due to misspecification. From a methods point of view, we consider local priors and a novel structure that combines local and non-local priors to enforce sparsity. We develop algorithms to capitalize on the AAFT tractability, approximations to AAFT and probit likelihoods giving significant computational gains, a simple augmented Gibbs sampler to hierarchically explore linear and non-linear effects, and an implementation in the R package mombf. We illustrate the proposed methods and others based on likelihood penalties via extensive simulations under misspecification and censoring. We present two applications concerning the effect of gene expression on colon and breast cancer.




ens

An n-dimensional Rosenbrock Distribution for MCMC Testing. (arXiv:1903.09556v4 [stat.CO] UPDATED)

The Rosenbrock function is an ubiquitous benchmark problem for numerical optimisation, and variants have been proposed to test the performance of Markov Chain Monte Carlo algorithms. In this work we discuss the two-dimensional Rosenbrock density, its current $n$-dimensional extensions, and their advantages and limitations. We then propose a new extension to arbitrary dimensions called the Hybrid Rosenbrock distribution, which is composed of conditional normal kernels arranged in such a way that preserves the key features of the original kernel. Moreover, due to its structure, the Hybrid Rosenbrock distribution is analytically tractable and possesses several desirable properties, which make it an excellent test model for computational algorithms.




ens

Local Cascade Ensemble for Multivariate Data Classification. (arXiv:2005.03645v1 [cs.LG])

We present LCE, a Local Cascade Ensemble for traditional (tabular) multivariate data classification, and its extension LCEM for Multivariate Time Series (MTS) classification. LCE is a new hybrid ensemble method that combines an explicit boosting-bagging approach to handle the usual bias-variance tradeoff faced by machine learning models and an implicit divide-and-conquer approach to individualize classifier errors on different parts of the training data. Our evaluation firstly shows that the hybrid ensemble method LCE outperforms the state-of-the-art classifiers on the UCI datasets and that LCEM outperforms the state-of-the-art MTS classifiers on the UEA datasets. Furthermore, LCEM provides explainability by design and manifests robust performance when faced with challenges arising from continuous data collection (different MTS length, missing data and noise).




ens

Phase Transitions of the Maximum Likelihood Estimates in the Tensor Curie-Weiss Model. (arXiv:2005.03631v1 [math.ST])

The $p$-tensor Curie-Weiss model is a two-parameter discrete exponential family for modeling dependent binary data, where the sufficient statistic has a linear term and a term with degree $p geq 2$. This is a special case of the tensor Ising model and the natural generalization of the matrix Curie-Weiss model, which provides a convenient mathematical abstraction for capturing, not just pairwise, but higher-order dependencies. In this paper we provide a complete description of the limiting properties of the maximum likelihood (ML) estimates of the natural parameters, given a single sample from the $p$-tensor Curie-Weiss model, for $p geq 3$, complementing the well-known results in the matrix ($p=2$) case (Comets and Gidas (1991)). Our results unearth various new phase transitions and surprising limit theorems, such as the existence of a 'critical' curve in the parameter space, where the limiting distribution of the ML estimates is a mixture with both continuous and discrete components. The number of mixture components is either two or three, depending on, among other things, the sign of one of the parameters and the parity of $p$. Another interesting revelation is the existence of certain 'special' points in the parameter space where the ML estimates exhibit a superefficiency phenomenon, converging to a non-Gaussian limiting distribution at rate $N^{frac{3}{4}}$. We discuss how these results can be used to construct confidence intervals for the model parameters and, as a byproduct of our analysis, obtain limit theorems for the sample mean, which provide key insights into the statistical properties of the model.




ens

Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach. (arXiv:2005.03582v1 [cs.LG])

Early detection of patients vulnerable to infections acquired in the hospital environment is a challenge in current health systems given the impact that such infections have on patient mortality and healthcare costs. This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units by means of machine-learning methods. The aim is to support decision making addressed at reducing the incidence rate of infections. In this field, it is necessary to deal with the problem of building reliable classifiers from imbalanced datasets. We propose a clustering-based undersampling strategy to be used in combination with ensemble classifiers. A comparative study with data from 4616 patients was conducted in order to validate our proposal. We applied several single and ensemble classifiers both to the original dataset and to data preprocessed by means of different resampling methods. The results were analyzed by means of classic and recent metrics specifically designed for imbalanced data classification. They revealed that the proposal is more efficient in comparison with other approaches.




ens

Sequential Aggregation of Probabilistic Forecasts -- Applicaton to Wind Speed Ensemble Forecasts. (arXiv:2005.03540v1 [stat.AP])

In the field of numerical weather prediction (NWP), the probabilistic distribution of the future state of the atmosphere is sampled with Monte-Carlo-like simulations, called ensembles. These ensembles have deficiencies (such as conditional biases) that can be corrected thanks to statistical post-processing methods. Several ensembles exist and may be corrected with different statistiscal methods. A further step is to combine these raw or post-processed ensembles. The theory of prediction with expert advice allows us to build combination algorithms with theoretical guarantees on the forecast performance. This article adapts this theory to the case of probabilistic forecasts issued as step-wise cumulative distribution functions (CDF). The theory is applied to wind speed forecasting, by combining several raw or post-processed ensembles, considered as CDFs. The second goal of this study is to explore the use of two forecast performance criteria: the Continous ranked probability score (CRPS) and the Jolliffe-Primo test. Comparing the results obtained with both criteria leads to reconsidering the usual way to build skillful probabilistic forecasts, based on the minimization of the CRPS. Minimizing the CRPS does not necessarily produce reliable forecasts according to the Jolliffe-Primo test. The Jolliffe-Primo test generally selects reliable forecasts, but could lead to issuing suboptimal forecasts in terms of CRPS. It is proposed to use both criterion to achieve reliable and skillful probabilistic forecasts.




ens

Modeling High-Dimensional Unit-Root Time Series. (arXiv:2005.03496v1 [stat.ME])

In this paper, we propose a new procedure to build a structural-factor model for a vector unit-root time series. For a $p$-dimensional unit-root process, we assume that each component consists of a set of common factors, which may be unit-root non-stationary, and a set of stationary components, which contain the cointegrations among the unit-root processes. To further reduce the dimensionality, we also postulate that the stationary part of the series is a nonsingular linear transformation of certain common factors and idiosyncratic white noise components as in Gao and Tsay (2019a, b). The estimation of linear loading spaces of the unit-root factors and the stationary components is achieved by an eigenanalysis of some nonnegative definite matrix, and the separation between the stationary factors and the white noises is based on an eigenanalysis and a projected principal component analysis. Asymptotic properties of the proposed method are established for both fixed $p$ and diverging $p$ as the sample size $n$ tends to infinity. Both simulated and real examples are used to demonstrate the performance of the proposed method in finite samples.




ens

Interpreting Deep Models through the Lens of Data. (arXiv:2005.03442v1 [cs.LG])

Identification of input data points relevant for the classifier (i.e. serve as the support vector) has recently spurred the interest of researchers for both interpretability as well as dataset debugging. This paper presents an in-depth analysis of the methods which attempt to identify the influence of these data points on the resulting classifier. To quantify the quality of the influence, we curated a set of experiments where we debugged and pruned the dataset based on the influence information obtained from different methods. To do so, we provided the classifier with mislabeled examples that hampered the overall performance. Since the classifier is a combination of both the data and the model, therefore, it is essential to also analyze these influences for the interpretability of deep learning models. Analysis of the results shows that some interpretability methods can detect mislabels better than using a random approach, however, contrary to the claim of these methods, the sample selection based on the training loss showed a superior performance.




ens

On a computationally-scalable sparse formulation of the multidimensional and non-stationary maximum entropy principle. (arXiv:2005.03253v1 [stat.CO])

Data-driven modelling and computational predictions based on maximum entropy principle (MaxEnt-principle) aim at finding as-simple-as-possible - but not simpler then necessary - models that allow to avoid the data overfitting problem. We derive a multivariate non-parametric and non-stationary formulation of the MaxEnt-principle and show that its solution can be approximated through a numerical maximisation of the sparse constrained optimization problem with regularization. Application of the resulting algorithm to popular financial benchmarks reveals memoryless models allowing for simple and qualitative descriptions of the major stock market indexes data. We compare the obtained MaxEnt-models to the heteroschedastic models from the computational econometrics (GARCH, GARCH-GJR, MS-GARCH, GARCH-PML4) in terms of the model fit, complexity and prediction quality. We compare the resulting model log-likelihoods, the values of the Bayesian Information Criterion, posterior model probabilities, the quality of the data autocorrelation function fits as well as the Value-at-Risk prediction quality. We show that all of the considered seven major financial benchmark time series (DJI, SPX, FTSE, STOXX, SMI, HSI and N225) are better described by conditionally memoryless MaxEnt-models with nonstationary regime-switching than by the common econometric models with finite memory. This analysis also reveals a sparse network of statistically-significant temporal relations for the positive and negative latent variance changes among different markets. The code is provided for open access.




ens

Fast multivariate empirical cumulative distribution function with connection to kernel density estimation. (arXiv:2005.03246v1 [cs.DS])

This paper revisits the problem of computing empirical cumulative distribution functions (ECDF) efficiently on large, multivariate datasets. Computing an ECDF at one evaluation point requires $mathcal{O}(N)$ operations on a dataset composed of $N$ data points. Therefore, a direct evaluation of ECDFs at $N$ evaluation points requires a quadratic $mathcal{O}(N^2)$ operations, which is prohibitive for large-scale problems. Two fast and exact methods are proposed and compared. The first one is based on fast summation in lexicographical order, with a $mathcal{O}(N{log}N)$ complexity and requires the evaluation points to lie on a regular grid. The second one is based on the divide-and-conquer principle, with a $mathcal{O}(Nlog(N)^{(d-1){vee}1})$ complexity and requires the evaluation points to coincide with the input points. The two fast algorithms are described and detailed in the general $d$-dimensional case, and numerical experiments validate their speed and accuracy. Secondly, the paper establishes a direct connection between cumulative distribution functions and kernel density estimation (KDE) for a large class of kernels. This connection paves the way for fast exact algorithms for multivariate kernel density estimation and kernel regression. Numerical tests with the Laplacian kernel validate the speed and accuracy of the proposed algorithms. A broad range of large-scale multivariate density estimation, cumulative distribution estimation, survival function estimation and regression problems can benefit from the proposed numerical methods.




ens

Joint Multi-Dimensional Model for Global and Time-Series Annotations. (arXiv:2005.03117v1 [cs.LG])

Crowdsourcing is a popular approach to collect annotations for unlabeled data instances. It involves collecting a large number of annotations from several, often naive untrained annotators for each data instance which are then combined to estimate the ground truth. Further, annotations for constructs such as affect are often multi-dimensional with annotators rating multiple dimensions, such as valence and arousal, for each instance. Most annotation fusion schemes however ignore this aspect and model each dimension separately. In this work we address this by proposing a generative model for multi-dimensional annotation fusion, which models the dimensions jointly leading to more accurate ground truth estimates. The model we propose is applicable to both global and time series annotation fusion problems and treats the ground truth as a latent variable distorted by the annotators. The model parameters are estimated using the Expectation-Maximization algorithm and we evaluate its performance using synthetic data and real emotion corpora as well as on an artificial task with human annotations




ens

mgm: Estimating Time-Varying Mixed Graphical Models in High-Dimensional Data

We present the R package mgm for the estimation of k-order mixed graphical models (MGMs) and mixed vector autoregressive (mVAR) models in high-dimensional data. These are a useful extensions of graphical models for only one variable type, since data sets consisting of mixed types of variables (continuous, count, categorical) are ubiquitous. In addition, we allow to relax the stationarity assumption of both models by introducing time-varying versions of MGMs and mVAR models based on a kernel weighting approach. Time-varying models offer a rich description of temporally evolving systems and allow to identify external influences on the model structure such as the impact of interventions. We provide the background of all implemented methods and provide fully reproducible examples that illustrate how to use the package.




ens

Salt, fat and sugar reduction : sensory approaches for nutritional reformulation of foods and beverages

O'Sullivan, Maurice G., author
9780128226124 (electronic bk.)