va Robust location estimators in regression models with covariates and responses missing at random. (arXiv:2005.03511v1 [stat.ME]) By arxiv.org Published On :: This paper deals with robust marginal estimation under a general regression model when missing data occur in the response and also in some of covariates. The target is a marginal location parameter which is given through an $M-$functional. To obtain robust Fisher--consistent estimators, properly defined marginal distribution function estimators are considered. These estimators avoid the bias due to missing values by assuming a missing at random condition. Three methods are considered to estimate the marginal distribution function which allows to obtain the $M-$location of interest: the well-known inverse probability weighting, a convolution--based method that makes use of the regression model and an augmented inverse probability weighting procedure that prevents against misspecification. The robust proposed estimators and the classical ones are compared through a numerical study under different missing models including clean and contaminated samples. We illustrate the estimators behaviour under a nonlinear model. A real data set is also analysed. Full Article
va Reference and Document Aware Semantic Evaluation Methods for Korean Language Summarization. (arXiv:2005.03510v1 [cs.CL]) By arxiv.org Published On :: Text summarization refers to the process that generates a shorter form of text from the source document preserving salient information. Recently, many models for text summarization have been proposed. Most of those models were evaluated using recall-oriented understudy for gisting evaluation (ROUGE) scores. However, as ROUGE scores are computed based on n-gram overlap, they do not reflect semantic meaning correspondences between generated and reference summaries. Because Korean is an agglutinative language that combines various morphemes into a word that express several meanings, ROUGE is not suitable for Korean summarization. In this paper, we propose evaluation metrics that reflect semantic meanings of a reference summary and the original document, Reference and Document Aware Semantic Score (RDASS). We then propose a method for improving the correlation of the metrics with human judgment. Evaluation results show that the correlation with human judgment is significantly higher for our evaluation metrics than for ROUGE scores. Full Article
va A stochastic user-operator assignment game for microtransit service evaluation: A case study of Kussbus in Luxembourg. (arXiv:2005.03465v1 [physics.soc-ph]) By arxiv.org Published On :: This paper proposes a stochastic variant of the stable matching model from Rasulkhani and Chow [1] which allows microtransit operators to evaluate their operation policy and resource allocations. The proposed model takes into account the stochastic nature of users' travel utility perception, resulting in a probabilistic stable operation cost allocation outcome to design ticket price and ridership forecasting. We applied the model for the operation policy evaluation of a microtransit service in Luxembourg and its border area. The methodology for the model parameters estimation and calibration is developed. The results provide useful insights for the operator and the government to improve the ridership of the service. Full Article
va Relevance Vector Machine with Weakly Informative Hyperprior and Extended Predictive Information Criterion. (arXiv:2005.03419v1 [stat.ML]) By arxiv.org Published On :: In the variational relevance vector machine, the gamma distribution is representative as a hyperprior over the noise precision of automatic relevance determination prior. Instead of the gamma hyperprior, we propose to use the inverse gamma hyperprior with a shape parameter close to zero and a scale parameter not necessary close to zero. This hyperprior is associated with the concept of a weakly informative prior. The effect of this hyperprior is investigated through regression to non-homogeneous data. Because it is difficult to capture the structure of such data with a single kernel function, we apply the multiple kernel method, in which multiple kernel functions with different widths are arranged for input data. We confirm that the degrees of freedom in a model is controlled by adjusting the scale parameter and keeping the shape parameter close to zero. A candidate for selecting the scale parameter is the predictive information criterion. However the estimated model using this criterion seems to cause over-fitting. This is because the multiple kernel method makes the model a situation where the dimension of the model is larger than the data size. To select an appropriate scale parameter even in such a situation, we also propose an extended prediction information criterion. It is confirmed that a multiple kernel relevance vector regression model with good predictive accuracy can be obtained by selecting the scale parameter minimizing extended prediction information criterion. Full Article
va Fast multivariate empirical cumulative distribution function with connection to kernel density estimation. (arXiv:2005.03246v1 [cs.DS]) By arxiv.org Published On :: This paper revisits the problem of computing empirical cumulative distribution functions (ECDF) efficiently on large, multivariate datasets. Computing an ECDF at one evaluation point requires $mathcal{O}(N)$ operations on a dataset composed of $N$ data points. Therefore, a direct evaluation of ECDFs at $N$ evaluation points requires a quadratic $mathcal{O}(N^2)$ operations, which is prohibitive for large-scale problems. Two fast and exact methods are proposed and compared. The first one is based on fast summation in lexicographical order, with a $mathcal{O}(N{log}N)$ complexity and requires the evaluation points to lie on a regular grid. The second one is based on the divide-and-conquer principle, with a $mathcal{O}(Nlog(N)^{(d-1){vee}1})$ complexity and requires the evaluation points to coincide with the input points. The two fast algorithms are described and detailed in the general $d$-dimensional case, and numerical experiments validate their speed and accuracy. Secondly, the paper establishes a direct connection between cumulative distribution functions and kernel density estimation (KDE) for a large class of kernels. This connection paves the way for fast exact algorithms for multivariate kernel density estimation and kernel regression. Numerical tests with the Laplacian kernel validate the speed and accuracy of the proposed algorithms. A broad range of large-scale multivariate density estimation, cumulative distribution estimation, survival function estimation and regression problems can benefit from the proposed numerical methods. Full Article
va Efficient Characterization of Dynamic Response Variation Using Multi-Fidelity Data Fusion through Composite Neural Network. (arXiv:2005.03213v1 [stat.ML]) By arxiv.org Published On :: Uncertainties in a structure is inevitable, which generally lead to variation in dynamic response predictions. For a complex structure, brute force Monte Carlo simulation for response variation analysis is infeasible since one single run may already be computationally costly. Data driven meta-modeling approaches have thus been explored to facilitate efficient emulation and statistical inference. The performance of a meta-model hinges upon both the quality and quantity of training dataset. In actual practice, however, high-fidelity data acquired from high-dimensional finite element simulation or experiment are generally scarce, which poses significant challenge to meta-model establishment. In this research, we take advantage of the multi-level response prediction opportunity in structural dynamic analysis, i.e., acquiring rapidly a large amount of low-fidelity data from reduced-order modeling, and acquiring accurately a small amount of high-fidelity data from full-scale finite element analysis. Specifically, we formulate a composite neural network fusion approach that can fully utilize the multi-level, heterogeneous datasets obtained. It implicitly identifies the correlation of the low- and high-fidelity datasets, which yields improved accuracy when compared with the state-of-the-art. Comprehensive investigations using frequency response variation characterization as case example are carried out to demonstrate the performance. Full Article
va On the Optimality of Randomization in Experimental Design: How to Randomize for Minimax Variance and Design-Based Inference. (arXiv:2005.03151v1 [stat.ME]) By arxiv.org Published On :: I study the minimax-optimal design for a two-arm controlled experiment where conditional mean outcomes may vary in a given set. When this set is permutation symmetric, the optimal design is complete randomization, and using a single partition (i.e., the design that only randomizes the treatment labels for each side of the partition) has minimax risk larger by a factor of $n-1$. More generally, the optimal design is shown to be the mixed-strategy optimal design (MSOD) of Kallus (2018). Notably, even when the set of conditional mean outcomes has structure (i.e., is not permutation symmetric), being minimax-optimal for variance still requires randomization beyond a single partition. Nonetheless, since this targets precision, it may still not ensure sufficient uniformity in randomization to enable randomization (i.e., design-based) inference by Fisher's exact test to appropriately detect violations of null. I therefore propose the inference-constrained MSOD, which is minimax-optimal among all designs subject to such uniformity constraints. On the way, I discuss Johansson et al. (2020) who recently compared rerandomization of Morgan and Rubin (2012) and the pure-strategy optimal design (PSOD) of Kallus (2018). I point out some errors therein and set straight that randomization is minimax-optimal and that the "no free lunch" theorem and example in Kallus (2018) are correct. Full Article
va Adaptive Invariance for Molecule Property Prediction. (arXiv:2005.03004v1 [q-bio.QM]) By arxiv.org Published On :: Effective property prediction methods can help accelerate the search for COVID-19 antivirals either through accurate in-silico screens or by effectively guiding on-going at-scale experimental efforts. However, existing prediction tools have limited ability to accommodate scarce or fragmented training data currently available. In this paper, we introduce a novel approach to learn predictors that can generalize or extrapolate beyond the heterogeneous data. Our method builds on and extends recently proposed invariant risk minimization, adaptively forcing the predictor to avoid nuisance variation. We achieve this by continually exercising and manipulating latent representations of molecules to highlight undesirable variation to the predictor. To test the method we use a combination of three data sources: SARS-CoV-2 antiviral screening data, molecular fragments that bind to SARS-CoV-2 main protease and large screening data for SARS-CoV-1. Our predictor outperforms state-of-the-art transfer learning methods by significant margin. We also report the top 20 predictions of our model on Broad drug repurposing hub. Full Article
va mgm: Estimating Time-Varying Mixed Graphical Models in High-Dimensional Data By www.jstatsoft.org Published On :: Mon, 27 Apr 2020 00:00:00 +0000 We present the R package mgm for the estimation of k-order mixed graphical models (MGMs) and mixed vector autoregressive (mVAR) models in high-dimensional data. These are a useful extensions of graphical models for only one variable type, since data sets consisting of mixed types of variables (continuous, count, categorical) are ubiquitous. In addition, we allow to relax the stationarity assumption of both models by introducing time-varying versions of MGMs and mVAR models based on a kernel weighting approach. Time-varying models offer a rich description of temporally evolving systems and allow to identify external influences on the model structure such as the impact of interventions. We provide the background of all implemented methods and provide fully reproducible examples that illustrate how to use the package. Full Article
va mvord: An R Package for Fitting Multivariate Ordinal Regression Models By www.jstatsoft.org Published On :: Sat, 18 Apr 2020 03:35:08 +0000 The R package mvord implements composite likelihood estimation in the class of multivariate ordinal regression models with a multivariate probit and a multivariate logit link. A flexible modeling framework for multiple ordinal measurements on the same subject is set up, which takes into consideration the dependence among the multiple observations by employing different error structures. Heterogeneity in the error structure across the subjects can be accounted for by the package, which allows for covariate dependent error structures. In addition, different regression coefficients and threshold parameters for each response are supported. If a reduction of the parameter space is desired, constraints on the threshold as well as on the regression coefficients can be specified by the user. The proposed multivariate framework is illustrated by means of a credit risk application. Full Article
va lmSubsets: Exact Variable-Subset Selection in Linear Regression for R By www.jstatsoft.org Published On :: Tue, 28 Apr 2020 00:00:00 +0000 An R package for computing the all-subsets regression problem is presented. The proposed algorithms are based on computational strategies recently developed. A novel algorithm for the best-subset regression problem selects subset models based on a predetermined criterion. The package user can choose from exact and from approximation algorithms. The core of the package is written in C++ and provides an efficient implementation of all the underlying numerical computations. A case study and benchmark results illustrate the usage and the computational efficiency of the package. Full Article
va Semi-Parametric Joint Modeling of Survival and Longitudinal Data: The R Package JSM By www.jstatsoft.org Published On :: Sat, 18 Apr 2020 03:35:08 +0000 This paper is devoted to the R package JSM which performs joint statistical modeling of survival and longitudinal data. In biomedical studies it has been increasingly common to collect both baseline and longitudinal covariates along with a possibly censored survival time. Instead of analyzing the survival and longitudinal outcomes separately, joint modeling approaches have attracted substantive attention in the recent literature and have been shown to correct biases from separate modeling approaches and enhance information. Most existing approaches adopt a linear mixed effects model for the longitudinal component and the Cox proportional hazards model for the survival component. We extend the Cox model to a more general class of transformation models for the survival process, where the baseline hazard function is completely unspecified leading to semiparametric survival models. We also offer a non-parametric multiplicative random effects model for the longitudinal process in JSM in addition to the linear mixed effects model. In this paper, we present the joint modeling framework that is implemented in JSM, as well as the standard error estimation methods, and illustrate the package with two real data examples: a liver cirrhosis data and a Mayo Clinic primary biliary cirrhosis data. Full Article
va Medieval Ideas about Infertility and Old Age By blog.wellcomelibrary.org Published On :: Fri, 12 Jan 2018 11:01:15 +0000 The next seminar in the 2017–18 History of Pre-Modern Medicine seminar series takes place on Tuesday 16 January. Speaker: Dr Catherine Rider (University of Exeter) Medieval Ideas about Infertility and Old Age Abstract: When they discussed fertility and reproductive disorders it was common… Continue reading Full Article Early Medicine Events and Visits Early Health and Well-being
va Uflacker's atlas of vascular anatomy By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Uflacker, Andre, author.Callnumber: OnlineISBN: 9781496356017 (hardback) Full Article
va The ecology of invasions by animals and plants By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Elton, Charles S. (Charles Sutherland), 1900-1991.Callnumber: OnlineISBN: 9783030347215 (electronic bk.) Full Article
va The Washington manual internship survival guide By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9781975116859 Full Article
va Sustainable agriculture : advances in plant metabolome and microbiome By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Parray, Javid Ahmad, authorCallnumber: OnlineISBN: 9780128173749 (electronic bk.) Full Article
va Sustainability of the food system : sovereignty, waste, and nutrients bioavailability By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780128182949 (electronic bk.) Full Article
va Racing for the surface : pathogenesis of implant infection and advanced antimicrobial strategies By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783030344757 (electronic bk.) Full Article
va Microbiological advancements for higher altitude agro-ecosystems and sustainability By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9789811519024 (electronic bk.) Full Article
va Microalgae biotechnology for food, health and high value products By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9789811501692 (electronic bk.) Full Article
va Manual of valvular heart disease By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9781496310125 paperback Full Article
va Kanerva’s Occupational Dermatology By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783319686172 Full Article
va Intelligent wavelet based techniques for advanced multimedia applications By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Singh, Rajiv, authorCallnumber: OnlineISBN: 9783030318734 (electronic bk.) Full Article
va Information retrieval technology : 15th Asia Information Retrieval Societies Conference, AIRS 2019, Hong Kong, China, November 7-9, 2019, proceedings By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Asia Information Retrieval Societies Conference (15th : 2019 : Hong Kong, China)Callnumber: OnlineISBN: 9783030428358 Full Article
va Green food processing techniques : preservation, transformation and extraction By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780128153536 Full Article
va Functional and preservative properties of phytochemicals By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780128196861 (electronic bk.) Full Article
va Frailty and cardiovascular diseases : research into an elderly population By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783030333300 (electronic bk.) Full Article
va Formation and control of biofilm in various environments By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Kanematsu, Hideyuki, authorCallnumber: OnlineISBN: 9789811522406 (electronic bk.) Full Article
va European whales, dolphins, and porpoises : marine mammal conservation in practice By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Evans, Peter G. H., authorCallnumber: OnlineISBN: 9780128190548 electronic book Full Article
va Ecology, conservation, and restoration of Chilika Lagoon, India By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783030334246 (electronic bk.) Full Article
va Dynamics of immune activation in viral diseases By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9789811510458 (electronic bk.) Full Article
va Conservation genetics in mammals : integrative research using novel approaches By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783030333348 (electronic bk.) Full Article
va Biological invasions in South Africa By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783030323943 (electronic bk.) Full Article
va Arctic plants of Svalbard : what we learn from the green in the treeless white world By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Author: Lee, Yoo Kyung, authorCallnumber: OnlineISBN: 9783030345600 (electronic bk.) Full Article
va Animal agriculture : sustainability, challenges and innovations By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780128170526 Full Article
va Advances in virus research. By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780123850348 (electronic bk.) Full Article
va Advances in protein chemistry and structural biology. By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780123819635 (electronic bk.) Full Article
va Advances in protein chemistry and structural biology. By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780123864840 (electronic bk.) Full Article
va Advances in parasitology. By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780123742292 (electronic bk.) Full Article
va Advances in cyanobacterial biology By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9780128193129 (electronic bk.) Full Article
va Advances in applied microbiology. By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 1282169459 Full Article
va Advances in applied microbiology. By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 1282169416 Full Article
va Advanced age geriatric care : a comprehensive guide By dal.novanet.ca Published On :: Fri, 1 May 2020 19:44:43 -0300 Callnumber: OnlineISBN: 9783319969985 (electronic bk.) Full Article
va Colorado Court Rules STRmix Is “Relevant and Reliable” Practice for... By www.prweb.com Published On :: Defendant’s Motion to Exclude Expert Testimony regarding evidence generated by STRmix denied.(PRWeb May 08, 2020)Read the full story at https://www.prweb.com/releases/colorado_court_rules_strmix_is_relevant_and_reliable_practice_for_interpreting_likelihood_ratios/prweb17101548.htm Full Article
va Markov equivalence of marginalized local independence graphs By projecteuclid.org Published On :: Mon, 17 Feb 2020 04:02 EST Søren Wengel Mogensen, Niels Richard Hansen. Source: The Annals of Statistics, Volume 48, Number 1, 539--559.Abstract: Symmetric independence relations are often studied using graphical representations. Ancestral graphs or acyclic directed mixed graphs with $m$-separation provide classes of symmetric graphical independence models that are closed under marginalization. Asymmetric independence relations appear naturally for multivariate stochastic processes, for instance, in terms of local independence. However, no class of graphs representing such asymmetric independence relations, which is also closed under marginalization, has been developed. We develop the theory of directed mixed graphs with $mu $-separation and show that this provides a graphical independence model class which is closed under marginalization and which generalizes previously considered graphical representations of local independence. Several graphs may encode the same set of independence relations and this means that in many cases only an equivalence class of graphs can be identified from observational data. For statistical applications, it is therefore pivotal to characterize graphs that induce the same independence relations. Our main result is that for directed mixed graphs with $mu $-separation each equivalence class contains a maximal element which can be constructed from the independence relations alone. Moreover, we introduce the directed mixed equivalence graph as the maximal graph with dashed and solid edges. This graph encodes all information about the edges that is identifiable from the independence relations, and furthermore it can be computed efficiently from the maximal graph. Full Article
va Uniformly valid confidence intervals post-model-selection By projecteuclid.org Published On :: Mon, 17 Feb 2020 04:02 EST François Bachoc, David Preinerstorfer, Lukas Steinberger. Source: The Annals of Statistics, Volume 48, Number 1, 440--463.Abstract: We suggest general methods to construct asymptotically uniformly valid confidence intervals post-model-selection. The constructions are based on principles recently proposed by Berk et al. ( Ann. Statist. 41 (2013) 802–837). In particular, the candidate models used can be misspecified, the target of inference is model-specific, and coverage is guaranteed for any data-driven model selection procedure. After developing a general theory, we apply our methods to practically important situations where the candidate set of models, from which a working model is selected, consists of fixed design homoskedastic or heteroskedastic linear models, or of binary regression models with general link functions. In an extensive simulation study, we find that the proposed confidence intervals perform remarkably well, even when compared to existing methods that are tailored only for specific model selection procedures. Full Article
va Adaptive risk bounds in univariate total variation denoising and trend filtering By projecteuclid.org Published On :: Mon, 17 Feb 2020 04:02 EST Adityanand Guntuboyina, Donovan Lieu, Sabyasachi Chatterjee, Bodhisattva Sen. Source: The Annals of Statistics, Volume 48, Number 1, 205--229.Abstract: We study trend filtering, a relatively recent method for univariate nonparametric regression. For a given integer $rgeq1$, the $r$th order trend filtering estimator is defined as the minimizer of the sum of squared errors when we constrain (or penalize) the sum of the absolute $r$th order discrete derivatives of the fitted function at the design points. For $r=1$, the estimator reduces to total variation regularization which has received much attention in the statistics and image processing literature. In this paper, we study the performance of the trend filtering estimator for every $rgeq1$, both in the constrained and penalized forms. Our main results show that in the strong sparsity setting when the underlying function is a (discrete) spline with few “knots,” the risk (under the global squared error loss) of the trend filtering estimator (with an appropriate choice of the tuning parameter) achieves the parametric $n^{-1}$-rate, up to a logarithmic (multiplicative) factor. Our results therefore provide support for the use of trend filtering, for every $rgeq1$, in the strong sparsity setting. Full Article
va Model assisted variable clustering: Minimax-optimal recovery and algorithms By projecteuclid.org Published On :: Mon, 17 Feb 2020 04:02 EST Florentina Bunea, Christophe Giraud, Xi Luo, Martin Royer, Nicolas Verzelen. Source: The Annals of Statistics, Volume 48, Number 1, 111--137.Abstract: The problem of variable clustering is that of estimating groups of similar components of a $p$-dimensional vector $X=(X_{1},ldots ,X_{p})$ from $n$ independent copies of $X$. There exists a large number of algorithms that return data-dependent groups of variables, but their interpretation is limited to the algorithm that produced them. An alternative is model-based clustering, in which one begins by defining population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of $G$-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a $G$-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the model-defined clusters exactly, and show that they are different for the two metrics. We therefore develop two algorithms, COD and PECOK, tailored to $G$-block covariance models, and study their minimax-optimality with respect to each metric. Of independent interest is the fact that the analysis of the PECOK algorithm, which is based on a corrected convex relaxation of the popular $K$-means algorithm, provides the first statistical analysis of such algorithms for variable clustering. Additionally, we compare our methods with another popular clustering method, spectral clustering. Extensive simulation studies, as well as our data analyses, confirm the applicability of our approach. Full Article
va Robust sparse covariance estimation by thresholding Tyler’s M-estimator By projecteuclid.org Published On :: Mon, 17 Feb 2020 04:02 EST John Goes, Gilad Lerman, Boaz Nadler. Source: The Annals of Statistics, Volume 48, Number 1, 86--110.Abstract: Estimating a high-dimensional sparse covariance matrix from a limited number of samples is a fundamental task in contemporary data analysis. Most proposals to date, however, are not robust to outliers or heavy tails. Toward bridging this gap, in this work we consider estimating a sparse shape matrix from $n$ samples following a possibly heavy-tailed elliptical distribution. We propose estimators based on thresholding either Tyler’s M-estimator or its regularized variant. We prove that in the joint limit as the dimension $p$ and the sample size $n$ tend to infinity with $p/n ogamma>0$, our estimators are minimax rate optimal. Results on simulated data support our theoretical analysis. Full Article