al Can $p$-values be meaningfully interpreted without random sampling? By projecteuclid.org Published On :: Thu, 26 Mar 2020 22:02 EDT Norbert Hirschauer, Sven Grüner, Oliver Mußhoff, Claudia Becker, Antje Jantsch. Source: Statistics Surveys, Volume 14, 71--91.Abstract: Besides the inferential errors that abound in the interpretation of $p$-values, the probabilistic pre-conditions (i.e. random sampling or equivalent) for using them at all are not often met by observational studies in the social sciences. This paper systematizes different sampling designs and discusses the restrictive requirements of data collection that are the indispensable prerequisite for using $p$-values. Full Article
al Scalar-on-function regression for predicting distal outcomes from intensively gathered longitudinal data: Interpretability for applied scientists By projecteuclid.org Published On :: Tue, 05 Nov 2019 22:03 EST John J. Dziak, Donna L. Coffman, Matthew Reimherr, Justin Petrovich, Runze Li, Saul Shiffman, Mariya P. Shiyko. Source: Statistics Surveys, Volume 13, 150--180.Abstract: Researchers are sometimes interested in predicting a distal or external outcome (such as smoking cessation at follow-up) from the trajectory of an intensively recorded longitudinal variable (such as urge to smoke). This can be done in a semiparametric way via scalar-on-function regression. However, the resulting fitted coefficient regression function requires special care for correct interpretation, as it represents the joint relationship of time points to the outcome, rather than a marginal or cross-sectional relationship. We provide practical guidelines, based on experience with scientific applications, for helping practitioners interpret their results and illustrate these ideas using data from a smoking cessation study. Full Article
al PLS for Big Data: A unified parallel algorithm for regularised group PLS By projecteuclid.org Published On :: Mon, 02 Sep 2019 04:00 EDT Pierre Lafaye de Micheaux, Benoît Liquet, Matthew Sutton. Source: Statistics Surveys, Volume 13, 119--149.Abstract: Partial Least Squares (PLS) methods have been heavily exploited to analyse the association between two blocks of data. These powerful approaches can be applied to data sets where the number of variables is greater than the number of observations and in the presence of high collinearity between variables. Different sparse versions of PLS have been developed to integrate multiple data sets while simultaneously selecting the contributing variables. Sparse modeling is a key factor in obtaining better estimators and identifying associations between multiple data sets. The cornerstone of the sparse PLS methods is the link between the singular value decomposition (SVD) of a matrix (constructed from deflated versions of the original data) and least squares minimization in linear regression. We review four popular PLS methods for two blocks of data. A unified algorithm is proposed to perform all four types of PLS including their regularised versions. We present various approaches to decrease the computation time and show how the whole procedure can be scalable to big data sets. The bigsgPLS R package implements our unified algorithm and is available at https://github.com/matt-sutton/bigsgPLS . Full Article
al Halfspace depth and floating body By projecteuclid.org Published On :: Fri, 21 Jun 2019 22:03 EDT Stanislav Nagy, Carsten Schütt, Elisabeth M. Werner. Source: Statistics Surveys, Volume 13, 52--118.Abstract: Little known relations of the renown concept of the halfspace depth for multivariate data with notions from convex and affine geometry are discussed. Maximum halfspace depth may be regarded as a measure of symmetry for random vectors. As such, the maximum depth stands as a generalization of a measure of symmetry for convex sets, well studied in geometry. Under a mild assumption, the upper level sets of the halfspace depth coincide with the convex floating bodies of measures used in the definition of the affine surface area for convex bodies in Euclidean spaces. These connections enable us to partially resolve some persistent open problems regarding theoretical properties of the depth. Full Article
al Pitfalls of significance testing and $p$-value variability: An econometrics perspective By projecteuclid.org Published On :: Wed, 03 Oct 2018 22:00 EDT Norbert Hirschauer, Sven Grüner, Oliver Mußhoff, Claudia Becker. Source: Statistics Surveys, Volume 12, 136--172.Abstract: Data on how many scientific findings are reproducible are generally bleak and a wealth of papers have warned against misuses of the $p$-value and resulting false findings in recent years. This paper discusses the question of what we can(not) learn from the $p$-value, which is still widely considered as the gold standard of statistical validity. We aim to provide a non-technical and easily accessible resource for statistical practitioners who wish to spot and avoid misinterpretations and misuses of statistical significance tests. For this purpose, we first classify and describe the most widely discussed (“classical”) pitfalls of significance testing, and review published work on these misuses with a focus on regression-based “confirmatory” study. This includes a description of the single-study bias and a simulation-based illustration of how proper meta-analysis compares to misleading significance counts (“vote counting”). Going beyond the classical pitfalls, we also use simulation to provide intuition that relying on the statistical estimate “$p$-value” as a measure of evidence without considering its sample-to-sample variability falls short of the mark even within an otherwise appropriate interpretation. We conclude with a discussion of the exigencies of informed approaches to statistical inference and corresponding institutional reforms. Full Article
al Basic models and questions in statistical network analysis By projecteuclid.org Published On :: Thu, 07 Sep 2017 22:02 EDT Miklós Z. Rácz, Sébastien Bubeck. Source: Statistics Surveys, Volume 11, 1--47.Abstract: Extracting information from large graphs has become an important statistical problem since network data is now common in various fields. In this minicourse we will investigate the most natural statistical questions for three canonical probabilistic models of networks: (i) community detection in the stochastic block model, (ii) finding the embedding of a random geometric graph, and (iii) finding the original vertex in a preferential attachment tree. Along the way we will cover many interesting topics in probability theory such as Pólya urns, large deviation theory, concentration of measure in high dimension, entropic central limit theorems, and more. Full Article
al A comparison of spatial predictors when datasets could be very large By projecteuclid.org Published On :: Tue, 19 Jul 2016 14:13 EDT Jonathan R. Bradley, Noel Cressie, Tao Shi. Source: Statistics Surveys, Volume 10, 100--131.Abstract: In this article, we review and compare a number of methods of spatial prediction, where each method is viewed as an algorithm that processes spatial data. To demonstrate the breadth of available choices, we consider both traditional and more-recently-introduced spatial predictors. Specifically, in our exposition we review: traditional stationary kriging, smoothing splines, negative-exponential distance-weighting, fixed rank kriging, modified predictive processes, a stochastic partial differential equation approach, and lattice kriging. This comparison is meant to provide a service to practitioners wishing to decide between spatial predictors. Hence, we provide technical material for the unfamiliar, which includes the definition and motivation for each (deterministic and stochastic) spatial predictor. We use a benchmark dataset of $mathrm{CO}_{2}$ data from NASA’s AIRS instrument to address computational efficiencies that include CPU time and memory usage. Furthermore, the predictive performance of each spatial predictor is assessed empirically using a hold-out subset of the AIRS data. Full Article
al Fundamentals of cone regression By projecteuclid.org Published On :: Thu, 19 May 2016 09:04 EDT Mariella Dimiccoli. Source: Statistics Surveys, Volume 10, 53--99.Abstract: Cone regression is a particular case of quadratic programming that minimizes a weighted sum of squared residuals under a set of linear inequality constraints. Several important statistical problems such as isotonic, concave regression or ANOVA under partial orderings, just to name a few, can be considered as particular instances of the cone regression problem. Given its relevance in Statistics, this paper aims to address the fundamentals of cone regression from a theoretical and practical point of view. Several formulations of the cone regression problem are considered and, focusing on the particular case of concave regression as an example, several algorithms are analyzed and compared both qualitatively and quantitatively through numerical simulations. Several improvements to enhance numerical stability and bound the computational cost are proposed. For each analyzed algorithm, the pseudo-code and its corresponding code in Matlab are provided. The results from this study demonstrate that the choice of the optimization approach strongly impacts the numerical performances. It is also shown that methods are not currently available to solve efficiently cone regression problems with large dimension (more than many thousands of points). We suggest further research to fill this gap by exploiting and adapting classical multi-scale strategy to compute an approximate solution. Full Article
al A unified treatment for non-asymptotic and asymptotic approaches to minimax signal detection By projecteuclid.org Published On :: Tue, 19 Jan 2016 09:04 EST Clément Marteau, Theofanis Sapatinas. Source: Statistics Surveys, Volume 9, 253--297.Abstract: We are concerned with minimax signal detection. In this setting, we discuss non-asymptotic and asymptotic approaches through a unified treatment. In particular, we consider a Gaussian sequence model that contains classical models as special cases, such as, direct, well-posed inverse and ill-posed inverse problems. Working with certain ellipsoids in the space of squared-summable sequences of real numbers, with a ball of positive radius removed, we compare the construction of lower and upper bounds for the minimax separation radius (non-asymptotic approach) and the minimax separation rate (asymptotic approach) that have been proposed in the literature. Some additional contributions, bringing to light links between non-asymptotic and asymptotic approaches to minimax signal, are also presented. An example of a mildly ill-posed inverse problem is used for illustrative purposes. In particular, it is shown that tools used to derive ‘asymptotic’ results can be exploited to draw ‘non-asymptotic’ conclusions, and vice-versa. In order to enhance our understanding of these two minimax signal detection paradigms, we bring into light hitherto unknown similarities and links between non-asymptotic and asymptotic approaches. Full Article
al Statistical inference for dynamical systems: A review By projecteuclid.org Published On :: Tue, 10 Nov 2015 09:20 EST Kevin McGoff, Sayan Mukherjee, Natesh Pillai. Source: Statistics Surveys, Volume 9, 209--252.Abstract: The topic of statistical inference for dynamical systems has been studied widely across several fields. In this survey we focus on methods related to parameter estimation for nonlinear dynamical systems. Our objective is to place results across distinct disciplines in a common setting and highlight opportunities for further research. Full Article
al Some models and methods for the analysis of observational data By projecteuclid.org Published On :: Tue, 15 Sep 2015 20:40 EDT José A. Ferreira. Source: Statistics Surveys, Volume 9, 106--208.Abstract: This article provides a concise and essentially self-contained exposition of some of the most important models and non-parametric methods for the analysis of observational data, and a substantial number of illustrations of their application. Although for the most part our presentation follows P. Rosenbaum’s book, “Observational Studies”, and naturally draws on related literature, it contains original elements and simplifies and generalizes some basic results. The illustrations, based on simulated data, show the methods at work in some detail, highlighting pitfalls and emphasizing certain subjective aspects of the statistical analyses. Full Article
al $M$-functionals of multivariate scatter By projecteuclid.org Published On :: Fri, 20 Mar 2015 09:11 EDT Lutz Dümbgen, Markus Pauly, Thomas Schweizer. Source: Statistics Surveys, Volume 9, 32--105.Abstract: This survey provides a self-contained account of $M$-estimation of multivariate scatter. In particular, we present new proofs for existence of the underlying $M$-functionals and discuss their weak continuity and differentiability. This is done in a rather general framework with matrix-valued random variables. By doing so we reveal a connection between Tyler’s (1987a) $M$-functional of scatter and the estimation of proportional covariance matrices. Moreover, this general framework allows us to treat a new class of scatter estimators, based on symmetrizations of arbitrary order. Finally these results are applied to $M$-estimation of multivariate location and scatter via multivariate $t$-distributions. Full Article
al Semi-parametric estimation for conditional independence multivariate finite mixture models By projecteuclid.org Published On :: Fri, 06 Feb 2015 08:39 EST Didier Chauveau, David R. Hunter, Michael Levine. Source: Statistics Surveys, Volume 9, 1--31.Abstract: The conditional independence assumption for nonparametric multivariate finite mixture models, a weaker form of the well-known conditional independence assumption for random effects models for longitudinal data, is the subject of an increasing number of theoretical and algorithmic developments in the statistical literature. After presenting a survey of this literature, including an in-depth discussion of the all-important identifiability results, this article describes and extends an algorithm for estimation of the parameters in these models. The algorithm works for any number of components in three or more dimensions. It possesses a descent property and can be easily adapted to situations where the data are grouped in blocks of conditionally independent variables. We discuss how to adapt this algorithm to various location-scale models that link component densities, and we even adapt it to a particular class of univariate mixture problems in which the components are assumed symmetric. We give a bandwidth selection procedure for our algorithm. Finally, we demonstrate the effectiveness of our algorithm using a simulation study and two psychometric datasets. Full Article
al Adaptive clinical trial designs for phase I cancer studies By projecteuclid.org Published On :: Thu, 29 May 2014 09:11 EDT Oleksandr Sverdlov, Weng Kee Wong, Yevgen Ryeznik. Source: Statistics Surveys, Volume 8, 2--44.Abstract: Adaptive clinical trials are becoming increasingly popular research designs for clinical investigation. Adaptive designs are particularly useful in phase I cancer studies where clinical data are scant and the goals are to assess the drug dose-toxicity profile and to determine the maximum tolerated dose while minimizing the number of study patients treated at suboptimal dose levels. In the current work we give an overview of adaptive design methods for phase I cancer trials. We find that modern statistical literature is replete with novel adaptive designs that have clearly defined objectives and established statistical properties, and are shown to outperform conventional dose finding methods such as the 3+3 design, both in terms of statistical efficiency and in terms of minimizing the number of patients treated at highly toxic or nonefficacious doses. We discuss statistical, logistical, and regulatory aspects of these designs and present some links to non-commercial statistical software for implementing these methods in practice. Full Article
al Analyzing complex functional brain networks: Fusing statistics and network science to understand the brain By projecteuclid.org Published On :: Mon, 28 Oct 2013 09:06 EDT Sean L. Simpson, F. DuBois Bowman, Paul J. LaurientiSource: Statist. Surv., Volume 7, 1--36.Abstract: Complex functional brain network analyses have exploded over the last decade, gaining traction due to their profound clinical implications. The application of network science (an interdisciplinary offshoot of graph theory) has facilitated these analyses and enabled examining the brain as an integrated system that produces complex behaviors. While the field of statistics has been integral in advancing activation analyses and some connectivity analyses in functional neuroimaging research, it has yet to play a commensurate role in complex network analyses. Fusing novel statistical methods with network-based functional neuroimage analysis will engender powerful analytical tools that will aid in our understanding of normal brain function as well as alterations due to various brain disorders. Here we survey widely used statistical and network science tools for analyzing fMRI network data and discuss the challenges faced in filling some of the remaining methodological gaps. When applied and interpreted correctly, the fusion of network scientific and statistical methods has a chance to revolutionize the understanding of brain function. Full Article
al The theory and application of penalized methods or Reproducing Kernel Hilbert Spaces made easy By projecteuclid.org Published On :: Tue, 16 Oct 2012 09:36 EDT Nancy HeckmanSource: Statist. Surv., Volume 6, 113--141.Abstract: The popular cubic smoothing spline estimate of a regression function arises as the minimizer of the penalized sum of squares $sum_{j}(Y_{j}-mu(t_{j}))^{2}+lambda int_{a}^{b}[mu''(t)]^{2},dt$, where the data are $t_{j},Y_{j}$, $j=1,ldots,n$. The minimization is taken over an infinite-dimensional function space, the space of all functions with square integrable second derivatives. But the calculations can be carried out in a finite-dimensional space. The reduction from minimizing over an infinite dimensional space to minimizing over a finite dimensional space occurs for more general objective functions: the data may be related to the function $mu$ in another way, the sum of squares may be replaced by a more suitable expression, or the penalty, $int_{a}^{b}[mu''(t)]^{2},dt$, might take a different form. This paper reviews the Reproducing Kernel Hilbert Space structure that provides a finite-dimensional solution for a general minimization problem. Particular attention is paid to the construction and study of the Reproducing Kernel Hilbert Space corresponding to a penalty based on a linear differential operator. In this case, one can often calculate the minimizer explicitly, using Green’s functions. Full Article
al Statistical inference for disordered sphere packings By projecteuclid.org Published On :: Thu, 19 Jul 2012 08:37 EDT Jeffrey PickaSource: Statist. Surv., Volume 6, 74--112.Abstract: This paper gives an overview of statistical inference for disordered sphere packing processes. These processes are used extensively in physics and engineering in order to represent the internal structure of composite materials, packed bed reactors, and powders at rest, and are used as initial arrangements of grains in the study of avalanches and other problems involving powders in motion. Packing processes are spatial processes which are neither stationary nor ergodic. Classical spatial statistical models and procedures cannot be applied to these processes, but alternative models and procedures can be developed based on ideas from statistical physics. Most of the development of models and statistics for sphere packings has been undertaken by scientists and engineers. This review summarizes their results from an inferential perspective. Full Article
al Prediction in several conventional contexts By projecteuclid.org Published On :: Tue, 08 May 2012 08:50 EDT Bertrand Clarke, Jennifer ClarkeSource: Statist. Surv., Volume 6, 1--73.Abstract: We review predictive techniques from several traditional branches of statistics. Starting with prediction based on the normal model and on the empirical distribution function, we proceed to techniques for various forms of regression and classification. Then, we turn to time series, longitudinal data, and survival analysis. Our focus throughout is on the mechanics of prediction more than on the properties of predictors. Full Article
al A review of survival trees By projecteuclid.org Published On :: Mon, 12 Sep 2011 09:13 EDT Imad Bou-Hamad, Denis Larocque, Hatem Ben-AmeurSource: Statist. Surv., Volume 5, 44--71.Abstract: This paper presents a non–technical account of the developments in tree–based methods for the analysis of survival data with censoring. This review describes the initial developments, which mainly extended the existing basic tree methodologies to censored data as well as to more recent work. We also cover more complex models, more specialized methods, and more specific problems such as multivariate data, the use of time–varying covariates, discrete–scale survival data, and ensemble methods applied to survival trees. A data example is used to illustrate some methods that are implemented in R. Full Article
al Curse of dimensionality and related issues in nonparametric functional regression By projecteuclid.org Published On :: Thu, 14 Apr 2011 08:17 EDT Gery GeenensSource: Statist. Surv., Volume 5, 30--43.Abstract: Recently, some nonparametric regression ideas have been extended to the case of functional regression. Within that framework, the main concern arises from the infinite dimensional nature of the explanatory objects. Specifically, in the classical multivariate regression context, it is well-known that any nonparametric method is affected by the so-called “curse of dimensionality”, caused by the sparsity of data in high-dimensional spaces, resulting in a decrease in fastest achievable rates of convergence of regression function estimators toward their target curve as the dimension of the regressor vector increases. Therefore, it is not surprising to find dramatically bad theoretical properties for the nonparametric functional regression estimators, leading many authors to condemn the methodology. Nevertheless, a closer look at the meaning of the functional data under study and on the conclusions that the statistician would like to draw from it allows to consider the problem from another point-of-view, and to justify the use of slightly modified estimators. In most cases, it can be entirely legitimate to measure the proximity between two elements of the infinite dimensional functional space via a semi-metric, which could prevent those estimators suffering from what we will call the “curse of infinite dimensionality”. References:[1] Ait-Saïdi, A., Ferraty, F., Kassa, K. and Vieu, P. (2008). Cross-validated estimations in the single-functional index model, Statistics, 42, 475–494.[2] Aneiros-Perez, G. and Vieu, P. (2008). Nonparametric time series prediction: A semi-functional partial linear modeling, J. Multivariate Anal., 99, 834–857.[3] Baillo, A. and Grané, A. (2009). Local linear regression for functional predictor and scalar response, J. Multivariate Anal., 100, 102–111.[4] Burba, F., Ferraty, F. and Vieu, P. (2009). k-Nearest Neighbour method in functional nonparametric regression, J. Nonparam. Stat., 21, 453–469.[5] Cardot, H., Ferraty, F. and Sarda, P. (1999). Functional linear model, Stat. Probabil. Lett., 45, 11–22.[6] Crambes, C., Kneip, A. and Sarda, P. (2009). Smoothing splines estimators for functional linear regression, Ann. Statist., 37, 35–72.[7] Delsol, L. (2009). Advances on asymptotic normality in nonparametric functional time series analysis, Statistics, 43, 13–33.[8] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications, Chapman and Hall, London.[9] Fan, J. and Zhang, J.-T. (2000). Two-step estimation of functional linear models with application to longitudinal data, J. Roy. Stat. Soc. B, 62, 303–322.[10] Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis, Springer-Verlag, New York.[11] Ferraty, F., Laksaci, A. and Vieu, P. (2006). Estimating Some Characteristics of the Conditional Distribution in Nonparametric Functional Models, Statist. Inf. Stoch. Proc., 9, 47–76.[12] Ferraty, F., Mas, A. and Vieu, P. (2007). Nonparametric regression on functional data: inference and practical aspects, Aust. NZ. J. Stat., 49, 267–286.[13] Ferraty, F., Van Keilegom, I. and Vieu, P. (2010). On the validity of the bootstrap in nonparametric functional regression, Scand. J. Stat., 37, 286–306.[14] Ferraty, F., Laksaci, A., Tadj, A. and Vieu, P. (2010). Rate of uniform consistency for nonparametric estimates with functional variables, J. Stat. Plan. Inf., 140, 335–352.[15] Ferraty, F. and Romain, Y. (2011). Oxford handbook on functional data analysis (Eds), Oxford University Press.[16] Gasser, T., Hall, P. and Presnell, B. (1998). Nonparametric estimation of the mode of a distribution of random curves, J. Roy. Stat. Soc. B, 60, 681–691.[17] Geenens, G. (2011). A nonparametric functional method for signature recognition, Manuscript.[18] Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2004). Nonparametric and semiparametric models, Springer-Verlag, Berlin.[19] James, G.M. (2002). Generalized linear models with functional predictors, J. Roy. Stat. Soc. B, 64, 411–432.[20] Masry, E. (2005). Nonparametric regression estimation for dependent functional data: asymptotic normality, Stochastic Process. Appl., 115, 155–177.[21] Nadaraya, E.A. (1964). On estimating regression, Theory Probab. Applic., 9, 141–142.[22] Quintela-Del-Rio, A. (2008). Hazard function given a functional variable: nonparametric estimation under strong mixing conditions, J. Nonparam. Stat., 20, 413–430.[23] Rachdi, M. and Vieu, P. (2007). Nonparametric regression for functional data: automatic smoothing parameter selection, J. Stat. Plan. Inf., 137, 2784–2801.[24] Ramsay, J. and Silverman, B.W. (1997). Functional Data Analysis, Springer-Verlag, New York.[25] Ramsay, J. and Silverman, B.W. (2002). Applied functional data analysis; methods and case study, Springer-Verlag, New York.[26] Ramsay, J. and Silverman, B.W. (2005). Functional Data Analysis, 2nd Edition, Springer-Verlag, New York.[27] Stone, C.J. (1982). Optimal global rates of convergence for nonparametric regression, Ann. Stat., 10, 1040–1053.[28] Watson, G.S. (1964). Smooth regression analysis, Sankhya A, 26, 359–372.[29] Yeung, D.T., Chang, H., Xiong, Y., George, S., Kashi, R., Matsumoto, T. and Rigoll, G. (2004). SVC2004: First International Signature Verification Competition, Proceedings of the International Conference on Biometric Authentication (ICBA), Hong Kong, July 2004. Full Article
al Data confidentiality: A review of methods for statistical disclosure limitation and methods for assessing privacy By projecteuclid.org Published On :: Fri, 04 Feb 2011 09:16 EST Gregory J. Matthews, Ofer HarelSource: Statist. Surv., Volume 5, 1--29.Abstract: There is an ever increasing demand from researchers for access to useful microdata files. However, there are also growing concerns regarding the privacy of the individuals contained in the microdata. Ideally, microdata could be released in such a way that a balance between usefulness of the data and privacy is struck. This paper presents a review of proposed methods of statistical disclosure control and techniques for assessing the privacy of such methods under different definitions of disclosure. References:Abowd, J., Woodcock, S., 2001. Disclosure limitation in longitudinal linked data. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, 215–277.Adam, N.R., Worthmann, J.C., 1989. Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21 (4), 515–556.Armstrong, M., Rushton, G., Zimmerman, D.L., 1999. Geographically masking health data to preserve confidentiality. Statistics in Medicine 18 (5), 497–525.Bethlehem, J.G., Keller, W., Pannekoek, J., 1990. Disclosure control of microdata. Jorunal of the American Statistical Association 85, 38–45.Blum, A., Dwork, C., McSherry, F., Nissam, K., 2005. Practical privacy: The sulq framework. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. pp. 128–138.Bowden, R.J., Sim, A.B., 1992. The privacy bootstrap. Journal of Business and Economic Statistics 10 (3), 337–345.Carlson, M., Salabasis, M., 2002. A data-swapping technique for generating synthetic samples; a method for disclosure control. Res. Official Statist. (5), 35–64.Cox, L.H., 1980. Suppression methodology and statistical disclosure control. Journal of the American Statistical Association 75, 377–385.Cox, L.H., 1984. Disclosure control methods for frequency count data. Tech. rep., U.S. Bureau of the Census.Cox, L.H., 1987. A constructive procedure for unbiased controlled rounding. Journal of the American Statistical Association 82, 520–524.Cox, L.H., 1994. Matrix masking methods for disclosure limitation in microdata. Survey Methodology 6, 165–169.Cox, L.H., Fagan, J.T., Greenberg, B., Hemmig, R., 1987. Disclosure avoidance techniques for tabular data. Tech. rep., U.S. Bureau of the Census.Dalenius, T., 1977. Towards a methodology for statistical disclosure control. Statistik Tidskrift 15, 429–444.Dalenius, T., 1986. Finding a needle in a haystack - or identifying anonymous census record. Journal of Official Statistics 2 (3), 329–336.Dalenius, T., Denning, D., 1982. A hybrid scheme for release of statistics. Statistisk Tidskrift.Dalenius, T., Reiss, S.P., 1982. Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference 6, 73–85.De Waal, A., Hundepool, A., Willenborg, L., 1995. Argus: Software for statistical disclosure control of microdata. U.S. Census Bureau.DeGroot, M.H., 1962. Uncertainty, information, and sequential experiments. Annals of Mathematical Statistics 33, 404–419.DeGroot, M.H., 1970. Optimal Statistical Decisions. Mansell, London.Dinur, I., Nissam, K., 2003. Revealing information while preserving privacy. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principlesof Database Systems. pp. 202–210.Domingo-Ferrer, J., Torra, V., 2001a. A Quantitative Comparison of Disclosure Control Methods for Microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (Eds.), Confidentiality, Disclosure and Data Access - Theory and Practical Applications for Statistical Agencies. North-Holland, Amsterdam, Ch. 6, pp. 113–135.Domingo-Ferrer, J., Torra, V., 2001b. Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (Eds.), Confidentiality, Disclosure and Data Access - Theory and Practical Applications for Statistical Agencies. North-Holland, Amsterdam, Ch. 5, pp. 93–112.Duncan, G., Lambert, D., 1986. Disclosure-limited data dissemination. Journal of the American Statistical Association 81, 10–28.Duncan, G., Lambert, D., 1989. The risk of disclosure for microdata. Journal of Business & Economic Statistics 7, 207–217. Duncan, G., Pearson, R., 1991. Enhancing access to microdata while protecting confidentiality: prospects for the future (with discussion). Statistical Science 6, 219–232.Dwork, C., 2006. Differential privacy. In: ICALP. Springer, pp. 1–12.Dwork, C., 2008. An ad omnia approach to defining and achieving private data analysis. In: Lecture Notes in Computer Science. Springer, p. 10.Dwork, C., Lei, J., 2009. Differential privacy and robust statistics. In: Proceedings of the 41th Annual ACM Symposium on Theory of Computing (STOC). pp. 371–380.Dwork, C., Mcsherry, F., Nissim, K., Smith, A., 2006. Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference. Springer, pp. 265–284.Dwork, C., Nissam, K., 2004. Privacy-preserving datamining on vertically partitioned databases. In: Advances in Cryptology: Proceedings of Crypto. pp. 528–544.Elliot, M., 2000. DIS: a new approach to the measurement of statistical disclosure risk. International Journal of Risk Assessment and Management 2, 39–48.Federal Committee on Statistical Methodology (FCSM), 2005. Statistical policy working group 22 - report on statistical disclosure limitation methodology. U.S. Census Bureau.Fellegi, I.P., 1972. On the question of statistical confidentiality. Journal of the American Statistical Association 67 (337), 7–18.Fienberg, S.E., McIntyre, J., 2004. Data swapping: Variations on a theme by Dalenius and Reiss. In: Domingo-Ferrer, J., Torra, V. (Eds.), Privacy in Statistical Databases. Vol. 3050 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, pp. 519, http://dx.doi.org/10.1007/ 978-3-540-25955-8_2Fuller, W., 1993. Masking procedurse for microdata disclosure limitation. Journal of Official Statistics 9, 383–406.General Assembly of the United Nations, 1948. Universal declaration of human rights.Gouweleeuw, J., P. Kooiman, L.W., de Wolf, P.-P., 1998. Post randomisation for statistical disclosure control: Theory and implementation. Journal of Official Statistics 14 (4), 463–478.Greenberg, B., 1987. Rank swapping for masking ordinal microdata. Tech. rep., U.S. Bureau of the Census (unpublished manuscript), Suitland, Maryland, USA.Greenberg, B.G., Abul-Ela, A.-L.A., Simmons, W.R., Horvitz, D.G., 1969. The unrelated question randomized response model: Theoretical framework. Journal of the American Statistical Association 64 (326), 520–539.Harel, O., Zhou, X.-H., 2007. Multiple imputation: Review and theory, implementation and software. Statistics in Medicine 26, 3057–3077. Hundepool, A., Domingo-ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Nordholt, E.S., Seri, G., paul De Wolf, P., 2006. A CENtre of EXcellence for Statistical Disclosure Control Handbook on Statistical Disclosure Control Version 1.01.Hundepool, A., Wetering, A. v.d., Ramaswamy, R., Wolf, P.d., Giessing, S., Fischetti, M., Salazar, J., Castro, J., Lowthian, P., Feb. 2005. τ-argus 3.1 user manual. Statistics Netherlands, Voorburg NL.Hundepool, A., Willenborg, L., 1996. μ- and τ-argus: Software for statistical disclosure control. Third International Seminar on Statistical Confidentiality, Bled.Karr, A., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P., 2006. A framework for evaluating the utility of data altered to protect confidentiality. American Statistician 60 (3), 224–232.Kaufman, S., Seastrom, M., Roey, S., 2005. Do disclosure controls to protect confidentiality degrade the quality of the data? In: American Statistical Association, Proceedings of the Section on Survey Research.Kennickell, A.B., 1997. Multiple imputation and disclosure protection: the case of the 1995 survey of consumer finances. Record Linkage Techniques, 248–267.Kim, J., 1986. Limiting disclosure in microdata based on random noise and transformation. Bureau of the Census.Krumm, J., 2007. Inference attacks on location tracks. Proceedings of Fifth International Conference on Pervasive Computingy, 127–143.Li, N., Li, T., Venkatasubramanian, S., 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on. pp. 106–115.Liew, C.K., Choi, U.J., Liew, C.J., 1985. A data distortion by probability distribution. ACM Trans. Database Syst. 10 (3), 395–411.Little, R.J.A., 1993. Statistical analysis of masked data. Journal of Official Statistics 9, 407–426.Little, R.J.A., Rubin, D.B., 1987. Statistical Analysis with Missing Data. John Wiley & Sons.Liu, F., Little, R.J.A., 2002. Selective multiple mputation of keys for statistical disclosure control in microdata. In: Proceedings Joint Statistical Meet. pp. 2133–2138.Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L., April 2008. Privacy: Theory meets practice on the map. In: International Conference on Data Engineering. Cornell University Comuputer Science Department, Cornell, USA, p. 10.Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M., 2007. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1 (1), 3.Manning, A.M., Haglin, D.J., Keane, J.A., 2008. A recursive search algorithm for statistical disclosure assessment. Data Min. Knowl. Discov. 16 (2), 165–196. Marsh, C., Skinner, C., Arber, S., Penhale, B., Openshaw, S., Hobcraft, J., Lievesley, D., Walford, N., 1991. The case for samples of anonymized records from the 1991 census. Journal of the Royal Statistical Society 154 (2), 305–340.Matthews, G.J., Harel, O., Aseltine, R.H., 2010a. Assessing database privacy using the area under the receiver-operator characteristic curve. Health Services and Outcomes Research Methodology 10 (1), 1–15.Matthews, G.J., Harel, O., Aseltine, R.H., 2010b. Examining the robustness of fully synthetic data techniques for data with binary variables. Journal of Statistical Computation and Simulation 80 (6), 609–624.Moore, Jr., R., 1996. Controlled data-swapping techniques for masking public use microdata. Census Tech Report.Mugge, R., 1983. Issues in protecting confidentiality in national health statistics. Proceedings of the Section on Survey Research Methods.Nissim, K., Raskhodnikova, S., Smith, A., 2007. Smooth sensitivity and sampling in private data analysis. In: STOC ’07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing. pp. 75–84.Paass, G., 1988. Disclosure risk and disclosure avoidance for microdata. Journal of Business and Economic Statistics 6 (4), 487–500.Palley, M., Simonoff, J., 1987. The use of regression methodology for the compromise of confidential information in statistical databases. ACM Trans. Database Systems 12 (4), 593–608.Raghunathan, T.E., Reiter, J.P., Rubin, D.B., 2003. Multiple imputation for statistical disclosure limitation. Journal of Official Statistics 19 (1), 1–16.Rajasekaran, S., Harel, O., Zuba, M., Matthews, G.J., Aseltine, Jr., R., 2009. Responsible data releases. In: Proceedings 9th Industrial Conference on Data Mining (ICDM). Springer LNCS, pp. 388–400.Reiss, S.P., 1984. Practical data-swapping: The first steps. CM Transactions on Database Systems 9, 20–37.Reiter, J.P., 2002. Satisfying disclosure restriction with synthetic data sets. Journal of Official Statistics 18 (4), 531–543.Reiter, J.P., 2003. Inference for partially synthetic, public use microdata sets. Survey Methodology 29 (2), 181–188.Reiter, J.P., 2004a. New approaches to data dissemination: A glimpse into the future (?). Chance 17 (3), 11–15.Reiter, J.P., 2004b. Simultaneous use of multiple imputation for missing data and disclosure limitation. Survey Methodology 30 (2), 235–242.Reiter, J.P., 2005a. Estimating risks of identification disclosure in microdata. Journal of the American Statistical Association 100, 1103–1112.Reiter, J.P., 2005b. Releasing multiply imputed, synthetic public use microdata: An illustration and empirical study. Journal of the Royal Statistical Society, Series A: Statistics in Society 168 (1), 185–205.Reiter, J.P., 2005c. Using CART to generate partially synthetic public use microdata. Journal of Official Statistics 21 (3), 441–462. Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.Rubin, D.B., 1993. Comment on “Statistical disclosure limitation”. Journal of Official Statistics 9, 461–468.Rubner, Y., Tomasi, C., Guibas, L.J., 1998. A metric for distributions with applications to image databases. Computer Vision, IEEE International Conference on 0, 59.Sarathy, R., Muralidhar, K., 2002a. The security of confidential numerical data in databases. Information Systems Research 13 (4), 389–403.Sarathy, R., Muralidhar, K., 2002b. The security of confidential numerical data in databases. Info. Sys. Research 13 (4), 389–403.Schafer, J.L., Graham, J.W., 2002. Missing data: Our view of state of the art. Psychological Methods 7 (2), 147–177.Singh, A., Yu, F., Dunteman, G., 2003. MASSC: A new data mask for limiting statistical information loss and disclosure. In: Proceedings of the Joint UNECE/EUROSTAT Work Session on Statistical Data Confidentiality. pp. 373–394.Skinner, C., 2009. Statistical disclosure control for survey data. In: Pfeffermann, D and Rao, C.R. eds. Handbook of Statistics Vol. 29A: Sample Surveys: Design, Methods and Applications. pp. 381–396.Skinner, C., Marsh, C., Openshaw, S., Wymer, C., 1994. Disclosure control for census microdata. Journal of Official Statistics 10, 31–51.Skinner, C., Shlomo, N., 2008. Assessing identification risk in survey microdata using log-linear models. Journal of the American Statistical Association 103, 989–1001.Skinner, C.J., Elliot, M.J., 2002. A measure of disclosure risk for microdata. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64 (4), 855–867.Smith, A., 2008. Efficient, dfferentially private point estimators. arXiv:0809.4794v1 [cs.CR].Spruill, N.L., 1982. Measures of confidentiality. Statistics of Income and Related Administrative Record Research, 131–136.Spruill, N.L., 1983. The confidentiality and analytic usefulness of masked business microdata. In: Proceedings of the Section on Survey Reserach Microdata. American Statistical Association, pp. 602–607.Sweeney, L., 1996. Replacing personally-identifying information in medical records, the scrub system. In: American Medical Informatics Association. Hanley and Belfus, Inc., pp. 333–337.Sweeney, L., 1997. Guaranteeing anonymity when sharing medical data, the datafly system. Journal of the American Medical Informatics Association 4, 51–55.Sweeney, L., 2002a. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 10 (5), 571–588. Sweeney, L., 2002b. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 10 (5), 557–570.Tendick, P., 1991. Optimal noise addition for preserving confidentiality in multivariate data. Journal of Statistical Planning and Inference 27 (2), 341–353.United Nations Economic Comission for Europe (UNECE), 2007. Manging statistical cinfidentiality and microdata access: Principles and guidlinesof good practice.Warner, S.L., 1965. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60 (309), 63–69.Wasserman, L., Zhou, S., 2010. A statistical framework for differential privacy. Journal of the American Statistical Association 105 (489), 375–389.Willenborg, L., de Waal, T., 2001. Elements of Statistical Disclosure Control. Springer-Verlag.Woodward, B., 1995. The computer-based patient record and confidentiality. The New England Journal of Medicine, 1419–1422. Full Article
al The ARMA alphabet soup: A tour of ARMA model variants By projecteuclid.org Published On :: Tue, 07 Dec 2010 09:23 EST Scott H. Holan, Robert Lund, Ginger DavisSource: Statist. Surv., Volume 4, 232--274.Abstract: Autoregressive moving-average (ARMA) difference equations are ubiquitous models for short memory time series and have parsimoniously described many stationary series. Variants of ARMA models have been proposed to describe more exotic series features such as long memory autocovariances, periodic autocovariances, and count support set structures. This review paper enumerates, compares, and contrasts the common variants of ARMA models in today’s literature. After the basic properties of ARMA models are reviewed, we tour ARMA variants that describe seasonal features, long memory behavior, multivariate series, changing variances (stochastic volatility) and integer counts. A list of ARMA variant acronyms is provided. References:Aknouche, A. and Guerbyenne, H. (2006). Recursive estimation of GARCH models. Communications in Statistics-Simulation and Computation 35 925–938.Alzaid, A. A. and Al-Osh, M. (1990). An integer-valued pth-order autoregressive structure (INAR (p)) process. Journal of Applied Probability 27 314–324.Anderson, P. L., Tesfaye, Y. G. and Meerschaert, M. M. (2007). Fourier-PARMA models and their application to river flows. Journal of Hydrologic Engineering 12 462–472.Ansley, C. F. (1979). An algorithm for the exact likelihood of a mixed autoregressive-moving average process. Biometrika 66 59–65.Basawa, I. V. and Lund, R. (2001). Large sample properties of parameter estimates for periodic ARMA models. Journal of Time Series Analysis 22 651–663.Bauwens, L., Laurent, S. and Rombouts, J. V. K. (2006). Multivariate GARCH models: A survey. Journal of Applied Econometrics 21 79–109.Bertelli, S. and Caporin, M. (2002). A note on calculating autocovariances of long-memory processes. Journal of Time Series Analysis 23 503–508.Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31 307–327.Bollerslev, T. (2008). Glossary to ARCH (GARCH). CREATES Research Paper 2008-49.Bollerslev, T., Engle, R. F. and Wooldridge, J. M. (1988). A capital asset pricing model with time-varying covariances. The Journal of Political Economy 96 116–131.Bondon, P. and Palma, W. (2007). A class of antipersistent processes. Journal of Time Series Analysis 28 261–273.Bougerol, P. and Picard, N. (1992). Strict stationarity of generalized autoregressive processes. The Annals of Probability 20 1714–1730.Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (2008). Time Series Analysis: Forecasting and Control, 4th ed. Wiley, New Jersey.Breidt, F. J., Davis, R. A. and Trindade, A. A. (2001). Least absolute deviation estimation for all-pass time series models. Annals of Statistics 29 919–946.Brockwell, P. J. (1994). On continuous-time threshold ARMA processes. Journal of Statistical Planning and Inference 39 291–303.Brockwell, P. J. (2001). Continuous-time ARMA processes. In Stochastic Processes: Theory and Methods, ( D. N. Shanbhag and C. R. Rao, eds.). Handbook of Statistics 19 249–276. Elsevier.Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods, 2nd ed. Springer, New York.Brockwell, P. J. and Davis, R. A. (2002). Introduction to Time Series and Forecasting, 2nd ed. Springer, New York.Brockwell, P. J. and Marquardt, T. (2005). Lèvy-driven and fractionally integrated ARMA processes with continuous-time paramaters. Statistica Sinica 15 477–494.Chan, K. S. (1990). Testing for threshold autoregression. Annals of Statistics 18 1886–1894.Chan, N. H. (2002). Time Series: Applications to Finance. John Wiley & Sons, New York.Chan, N. H. and Palma, W. (1998). State space modeling of long-memory processes. Annals of Statistics 26 719–740.Chan, N. H. and Palma, W. (2006). Estimation of long-memory time series models: A survey of different likelihood-based methods. Advances in Econometrics 20 89–121.Chatfield, C. (2003). The Analysis of Time Series: An Introduction, 6th ed. Chapman & Hall/CRC, Boca Raton.Chen, W., Hurvich, C. M. and Lu, Y. (2006). On the correlation matrix of the discrete Fourier transform and the fast solution of large Toeplitz systems for long-memory time series. Journal of the American Statistical Association 101 812–822.Chernick, M. R., Hsing, T. and McCormick, W. P. (1991). Calculating the extremal index for a class of stationary sequences. Advances in Applied Probability 23 835–850.Chib, S., Nardari, F. and Shephard, N. (2006). Analysis of high dimensional multivariate stochastic volatility models. Journal of Econometrics 134 341–371.Cryer, J. D. and Chan, K. S. (2008). Time Series Analysis: With Applications in R. Springer, New York.Cui, Y. and Lund, R. (2009). A new look at time series of counts. Biometrika 96 781–792.Davis, R. A., Dunsmuir, W. T. M. and Wang, Y. (1999). Modeling time series of count data. In Asymptotics, Nonparametrics and Time Series, ( S. Ghosh, ed.). Statistics Textbooks Monograph 63–113. Marcel Dekker, New York.Davis, R. A., Dunsmuir, W. and Streett, S. B. (2003). Observation-driven models for Poisson counts. Biometrika 90 777–790.Davis, R. A. and Resnick, S. I. (1996). Limit theory for bilinear processes with heavy-tailed noise. The Annals of Applied Probability 6 1191–1210.Deistler, M. and Hannan, E. J. (1981). Some properties of the parameterization of ARMA systems with unknown order. Journal of Multivariate Analysis 11 474–484.Dufour, J. M. and Jouini, T. (2005). Asymptotic distribution of a simple linear estimator for VARMA models in echelon form. Statistical Modeling and Analysis for Complex Data Problems 209–240.Dunsmuir, W. and Hannan, E. J. (1976). Vector linear time series models. Advances in Applied Probability 8 339–364.Durbin, J. and Koopman, S. J. (2001). Time Series Analysis by State Space Methods. Oxford University Press, Oxford.Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50 987–1007.Engle, R. F. (2002). Dynamic conditional correlation. Journal of Business and Economic Statistics 20 339–350.Engle, R. F. and Bollerslev, T. (1986). Modelling the persistence of conditional variances. Econometric Reviews 5 1–50.Fuller, W. A. (1996). Introduction to Statistical Time Series, 2nd ed. John Wiley & Sons, New York.Geweke, J. and Porter-Hudak, S. (1983). The estimation and application of long memory time series models. Journal of Time Series Analysis 4 221–238.Gladyšhev, E. G. (1961). Periodically correlated random sequences. Soviet Math 2 385–388.Granger, C. W. J. (1982). Acronyms in time series analysis (ATSA). Journal of Time Series Analysis 3 103–107.Granger, C. W. J. and Andersen, A. P. (1978). An Introduction to Bilinear Time Series Models. Vandenhoeck and Ruprecht Göttingen.Granger, C. W. J. and Joyeux, R. (1980). An introduction to long-memory time series models and fractional differencing. Journal of Time Series Analysis 1 15–29.Gray, H. L., Zhang, N. F. and Woodward, W. A. (1989). On generalized fractional processes. Journal of Time Series Analysis 10 233–257.Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press, Princeton, New Jersey.Hannan, E. J. (1955). A test for singularities in Sydney rainfall. Australian Journal of Physics 8 289–297.Hannan, E. J. (1969). The identification of vector mixed autoregressive-moving average system. Biometrika 56 223–225.Hannan, E. J. (1970). Multiple Time Series. John Wiley & Sons, New York.Hannan, E. J. (1976). The identification and parameterization of ARMAX and state space forms. Econometrica 44 713–723.Hannan, E. J. (1979). The Statistical Theory of Linear Systems. In Developments in Statistics ( P. R. Krishnaiah, ed.) 83–121. Academic Press, New York.Hannan, E. J. and Deistler, M. (1987). The Statistical Theory of Linear Systems. John Wiley & Sons, New York.Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, Cambridge.Haslett, J. and Raftery, A. E. (1989). Space-time modelling with long-memory dependence: Assessing Ireland’s wind power resource. Applied Statistics 38 1–50.Hosking, J. R. M. (1981). Fractional differencing. Biometrika 68 165–176.Hui, Y. V. and Li, W. K. (1995). On fractionally differenced periodic processes. Sankhyā: The Indian Journal of Statistics, Series B 57 19–31.Jacobs, P. A. and Lewis, P. A. W. (1978a). Discrete time series generated by mixtures. I: Correlational and runs properties. Journal of the Royal Statistical Society. Series B (Methodological) 40 94–105.Jacobs, P. A. and Lewis, P. A. W. (1978b). Discrete time series generated by mixtures II: Asymptotic properties. Journal of the Royal Statistical Society. Series B (Methodological) 40 222–228.Jacobs, P. A. and Lewis, P. A. W. (1983). Stationary discrete autoregressive-moving average time series generated by mixtures. Journal of Time Series Analysis 4 19–36.Jones, R. H. (1980). Maximum likelihood fitting of ARMA models to time series with missing observations. Technometrics 22 389–395.Jones, R. H. and Brelsford, W. M. (1967). Time series with periodic structure. Biometrika 54 403–408.Kedem, B. and Fokianos, K. (2002). Regression Models for Time Series Analysis. John Wiley & Sons, New Jersey.Ko, K. and Vannucci, M. (2006). Bayesian wavelet-based methods for the detection of multiple changes of the long memory parameter. IEEE Transactions on Signal Processing 54 4461–4470.Kohn, R. (1979). Asymptotic estimation and hypothesis testing results for vector linear time series models. Econometrica 47 1005–1030.Kokoszka, P. S. and Taqqu, M. S. (1995). Fractional ARIMA with stable innovations. Stochastic Processes and their Applications 60 19–47.Kokoszka, P. S. and Taqqu, M. S. (1996). Parameter estimation for infinite variance fractional ARIMA. Annals of Statistics 24 1880–1913.Lawrance, A. J. and Lewis, P. A. W. (1980). The exponential autoregressive-moving average EARMA(p,q) process. Journal of the Royal Statistical Society. Series B (Methodological) 42 150–161.Ling, S. and Li, W. K. (1997). On fractionally integrated autoregressive moving-average time series models with conditional heteroscedasticity. Journal of the American Statistical Association 92 1184–1194.Liu, J. and Brockwell, P. J. (1988). On the general bilinear time series model. Journal of Applied Probability 25 553–564.Lund, R. and Basawa, I. V. (2000). Recursive prediction and likelihood evaluation for periodic ARMA models. Journal of Time Series Analysis 21 75–93.Lund, R., Shao, Q. and Basawa, I. (2006). Parsimonious periodic time series modeling. Australian & New Zealand Journal of Statistics 48 33–47.Lütkepohl, H. (1991). Introduction to Multiple Time Series Analysis. Springer-Verlag, New York.Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Springer, New York.MacDonald, I. L. and Zucchini, W. (1997). Hidden Markov and Other Models for Discrete-Valued Time Series. Chapman & Hall/CRC, Boca Raton.Mann, H. B. and Wald, A. (1943). On the statistical treatment of linear stochastic difference equations. Econometrica 11 173–220.Marriott, J., Ravishanker, N., Gelfand, A. and Pai, J. (1996). Bayesian analysis of ARMA processes: Complete sampling-based inference under exact likelihoods. In Bayesian Analysis in Statistics and Econometrics: Essays in Honor of Arnold Zellner ( D. Berry, K. Challoner and J. Geweke, eds.) 243–256. Wiley, New York.McKenzie, E. (1988). Some ARMA models for dependent sequences of Poisson counts. Advances in Applied Probability 20 822–835.Mikosch, T. and Starica, C. (2004). Nonstationarities in financial time series, the long-range dependence, and the IGARCH effects. Review of Economics and Statistics 86 378–390.Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica 59 347–370.Nelson, D. B. and Cao, C. Q. (1992). Inequality constraints in the univariate GARCH model. Journal of Business and Economic Statistics 10 229–235.Ooms, M. and Franses, P. H. (2001). A seasonal periodic long memory model for monthly river flows. Environmental Modelling & Software 16 559–569.Pagano, M. (1978). On periodic and multiple autoregressions. Annals of Statistics 6 1310–1317.Pai, J. S. and Ravishanker, N. (1998). Bayesian analysis of autoregressive fractionally integrated moving-average processes. Journal of Time Series Analysis 19 99–112.Palma, W. (2007). Long-Memory Time Series: Theory and Methods. John Wiley & Sons, New Jersey.Palma, W. and Chan, N. H. (2005). Efficient estimation of seasonal long-range-dependent processes. Journal of Time Series Analysis 26 863–892.Pfeifer, P. E. and Deutsch, S. J. (1980). A three-stage iterative procedure for space-time modeling. Technometrics 22 35–47.Prado, R. and West, M. (2010). Time Series Modeling, Computation and Inference. Chapman & Hall/CRC, Boca Raton.Quoreshi, A. M. M. S. (2008). A long memory count data time series model for financial application. Preprint.R Development Core Team, (2010). R: A Language and Environment for Statistical Computing. http://www.R-project.org.Ravishanker, N. and Ray, B. K. (1997). Bayesian analysis of vector ARMA models using Gibbs sampling. Journal of Forecasting 16 177–194.Ravishanker, N. and Ray, B. K. (2002). Bayesian prediction for vector ARFIMA processes. International Journal of Forecasting 18 207–214.Reinsel, G. C. (1997). Elements of Multivariate Time Series Analysis. Springer, New York.Resnick, S. I. and Willekens, E. (1991). Moving averages with random coefficients and random coefficient autoregressive models. Communications in Statistics. Stochastic Models 7 511–525.Rootzén, H. (1986). Extreme value theory for moving average processes. The Annals of Probability 14 612–652.Scotto, M. G. (2007). Extremes for solutions to stochastic difference equations with regularly varying tails. REVSTAT–Statistical Journal 5 229–247.Shao, Q. and Lund, R. (2004). Computation and characterization of autocorrelations and partial autocorrelations in periodic ARMA models. Journal of Time Series Analysis 25 359–372.Shumway, R. H. and Stoffer, D. S. (2006). Time Series Analysis and its Applications: With R Examples, 2nd ed. Springer, New York.Silvennoinen, A. and Teräsvirta, T. (2009). Multivariate GARCH models. In Handbook of Financial Time Series ( T. Andersen, R. Davis, J. Kreib, and T. Mikosch, eds.) Springer, New York.Sowell, F. (1992). Maximum likelihood estimation of stationary univariate fractionally integrated time series models. Journal of Econometrics 53 165–188.Startz, R. (2008). Binomial autoregressive moving average models with an application to U.S. recessions. Journal of Business and Economic Statistics 26 1–8.Stramer, O., Tweedie, R. L. and Brockwell, P. J. (1996). Existence and stability of continuous time threshold ARMA processes. Statistica Sinica 6 715–732.Subba Rao, T. (1981). On the theory of bilinear time series models. Journal of the Royal Statistical Society. Series B (Methodological) 43 244–255.Tong, H. and Lim, K. S. (1980). Threshold autoregression, limit cycles and cyclical data. Journal of the Royal Statistical Society. Series B (Methodological) 42 245–292.Troutman, B. M. (1979). Some results in periodic autoregression. Biometrika 66 219–228.Tsai, H. (2009). On continuous-time autoregressive fractionally integrated moving average processes. Bernoulli 15 178–194.Tsai, H. and Chan, K. S. (2000). A note on the covariance structure of a continuous-time ARMA process. Statistica Sinica 10 989–998.Tsai, H. and Chan, K. S. (2005). Maximum likelihood estimation of linear continuous time long memory processes with discrete time data. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 67 703–716.Tsai, H. and Chan, K. S. (2008). A note on inequality constraints in the GARCH model. Econometric Theory 24 823–828.Tsay, R. S. (1989). Parsimonious parameterization of vector autoregressive moving average models. Journal of Business and Economic Statistics 7 327–341.Tunnicliffe-Wilson, G. (1979). Some efficient computational procedures for high order ARMA models. Journal of Statistical Computation and Simulation 8 301–309.Ursu, E. and Duchesne, P. (2009). On modelling and diagnostic checking of vector periodic autoregressive time series models. Journal of Time Series Analysis 30 70–96.Vecchia, A. V. (1985a). Maximum likelihood estimation for periodic autoregressive moving average models. Technometrics 27 375–384.Vecchia, A. V. (1985b). Periodic autoregressive-moving average (PARMA) modeling with applications to water resources. Journal of the American Water Resources Association 21 721–730.Vidakovic, B. (1999). Statistical Modeling by Wavelets. John Wiley & Sons, New York.West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models, 2nd ed. Springer, New York.Wold, H. (1954). A Study in the Analysis of Stationary Time Series. Almquist & Wiksell, Stockholm.Woodward, W. A., Cheng, Q. C. and Gray, H. L. (1998). A k-factor GARMA long-memory model. Journal of Time Series Analysis 19 485–504.Zivot, E. and Wang, J. (2006). Modeling Financial Time Series with S-PLUS, 2nd ed. Springer, New York. Full Article
al Primal and dual model representations in kernel-based learning By projecteuclid.org Published On :: Wed, 25 Aug 2010 10:28 EDT Johan A.K. Suykens, Carlos Alzate, Kristiaan PelckmansSource: Statist. Surv., Volume 4, 148--183.Abstract: This paper discusses the role of primal and (Lagrange) dual model representations in problems of supervised and unsupervised learning. The specification of the estimation problem is conceived at the primal level as a constrained optimization problem. The constraints relate to the model which is expressed in terms of the feature map. From the conditions for optimality one jointly finds the optimal model representation and the model estimate. At the dual level the model is expressed in terms of a positive definite kernel function, which is characteristic for a support vector machine methodology. It is discussed how least squares support vector machines are playing a central role as core models across problems of regression, classification, principal component analysis, spectral clustering, canonical correlation analysis, dimensionality reduction and data visualization. Full Article
al Discrete variations of the fractional Brownian motion in the presence of outliers and an additive noise By projecteuclid.org Published On :: Thu, 05 Aug 2010 15:41 EDT Sophie Achard, Jean-François CoeurjollySource: Statist. Surv., Volume 4, 117--147.Abstract: This paper gives an overview of the problem of estimating the Hurst parameter of a fractional Brownian motion when the data are observed with outliers and/or with an additive noise by using methods based on discrete variations. We show that the classical estimation procedure based on the log-linearity of the variogram of dilated series is made more robust to outliers and/or an additive noise by considering sample quantiles and trimmed means of the squared series or differences of empirical variances. These different procedures are compared and discussed through a large simulation study and are implemented in the R package dvfBm. Full Article
al A survey of cross-validation procedures for model selection By projecteuclid.org Published On :: Thu, 05 Aug 2010 15:41 EDT Sylvain Arlot, Alain CelisseSource: Statist. Surv., Volume 4, 40--79.Abstract: Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its (apparent) universality. Many results exist on model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand. Full Article
al Was one of your ancestors a whaler? By feedproxy.google.com Published On :: Mon, 31 Jul 2017 06:25:29 +0000 Whaling – along with wool production – was one of the first primary industries after the establishment of New South Wa Full Article
al Arctic Amplification of Anthropogenic Forcing: A Vector Autoregressive Analysis. (arXiv:2005.02535v1 [econ.EM] CROSS LISTED) By arxiv.org Published On :: Arctic sea ice extent (SIE) in September 2019 ranked second-to-lowest in history and is trending downward. The understanding of how internal variability amplifies the effects of external $ ext{CO}_2$ forcing is still limited. We propose the VARCTIC, which is a Vector Autoregression (VAR) designed to capture and extrapolate Arctic feedback loops. VARs are dynamic simultaneous systems of equations, routinely estimated to predict and understand the interactions of multiple macroeconomic time series. Hence, the VARCTIC is a parsimonious compromise between fullblown climate models and purely statistical approaches that usually offer little explanation of the underlying mechanism. Our "business as usual" completely unconditional forecast has SIE hitting 0 in September by the 2060s. Impulse response functions reveal that anthropogenic $ ext{CO}_2$ emission shocks have a permanent effect on SIE - a property shared by no other shock. Further, we find Albedo- and Thickness-based feedbacks to be the main amplification channels through which $ ext{CO}_2$ anomalies impact SIE in the short/medium run. Conditional forecast analyses reveal that the future path of SIE crucially depends on the evolution of $ ext{CO}_2$ emissions, with outcomes ranging from recovering SIE to it reaching 0 in the 2050s. Finally, Albedo and Thickness feedbacks are shown to play an important role in accelerating the speed at which predicted SIE is heading towards 0. Full Article
al Unsupervised Pre-trained Models from Healthy ADLs Improve Parkinson's Disease Classification of Gait Patterns. (arXiv:2005.02589v2 [cs.LG] UPDATED) By arxiv.org Published On :: Application and use of deep learning algorithms for different healthcare applications is gaining interest at a steady pace. However, use of such algorithms can prove to be challenging as they require large amounts of training data that capture different possible variations. This makes it difficult to use them in a clinical setting since in most health applications researchers often have to work with limited data. Less data can cause the deep learning model to over-fit. In this paper, we ask how can we use data from a different environment, different use-case, with widely differing data distributions. We exemplify this use case by using single-sensor accelerometer data from healthy subjects performing activities of daily living - ADLs (source dataset), to extract features relevant to multi-sensor accelerometer gait data (target dataset) for Parkinson's disease classification. We train the pre-trained model using the source dataset and use it as a feature extractor. We show that the features extracted for the target dataset can be used to train an effective classification model. Our pre-trained source model consists of a convolutional autoencoder, and the target classification model is a simple multi-layer perceptron model. We explore two different pre-trained source models, trained using different activity groups, and analyze the influence the choice of pre-trained model has over the task of Parkinson's disease classification. Full Article
al Statistical errors in Monte Carlo-based inference for random elements. (arXiv:2005.02532v2 [math.ST] UPDATED) By arxiv.org Published On :: Monte Carlo simulation is useful to compute or estimate expected functionals of random elements if those random samples are possible to be generated from the true distribution. However, when the distribution has some unknown parameters, the samples must be generated from an estimated distribution with the parameters replaced by some estimators, which causes a statistical error in Monte Carlo estimation. This paper considers such a statistical error and investigates the asymptotic distributions of Monte Carlo-based estimators when the random elements are not only the real valued, but also functional valued random variables. We also investigate expected functionals for semimartingales in details. The consideration indicates that the Monte Carlo estimation can get worse when a semimartingale has a jump part with unremovable unknown parameters. Full Article
al Generating Thermal Image Data Samples using 3D Facial Modelling Techniques and Deep Learning Methodologies. (arXiv:2005.01923v2 [cs.CV] UPDATED) By arxiv.org Published On :: Methods for generating synthetic data have become of increasing importance to build large datasets required for Convolution Neural Networks (CNN) based deep learning techniques for a wide range of computer vision applications. In this work, we extend existing methodologies to show how 2D thermal facial data can be mapped to provide 3D facial models. For the proposed research work we have used tufts datasets for generating 3D varying face poses by using a single frontal face pose. The system works by refining the existing image quality by performing fusion based image preprocessing operations. The refined outputs have better contrast adjustments, decreased noise level and higher exposedness of the dark regions. It makes the facial landmarks and temperature patterns on the human face more discernible and visible when compared to original raw data. Different image quality metrics are used to compare the refined version of images with original images. In the next phase of the proposed study, the refined version of images is used to create 3D facial geometry structures by using Convolution Neural Networks (CNN). The generated outputs are then imported in blender software to finally extract the 3D thermal facial outputs of both males and females. The same technique is also used on our thermal face data acquired using prototype thermal camera (developed under Heliaus EU project) in an indoor lab environment which is then used for generating synthetic 3D face data along with varying yaw face angles and lastly facial depth map is generated. Full Article
al Interpreting Rate-Distortion of Variational Autoencoder and Using Model Uncertainty for Anomaly Detection. (arXiv:2005.01889v2 [cs.LG] UPDATED) By arxiv.org Published On :: Building a scalable machine learning system for unsupervised anomaly detection via representation learning is highly desirable. One of the prevalent methods is using a reconstruction error from variational autoencoder (VAE) via maximizing the evidence lower bound. We revisit VAE from the perspective of information theory to provide some theoretical foundations on using the reconstruction error, and finally arrive at a simpler and more effective model for anomaly detection. In addition, to enhance the effectiveness of detecting anomalies, we incorporate a practical model uncertainty measure into the metric. We show empirically the competitive performance of our approach on benchmark datasets. Full Article
al Is the NUTS algorithm correct?. (arXiv:2005.01336v2 [stat.CO] UPDATED) By arxiv.org Published On :: This paper is devoted to investigate whether the popular No U-turn (NUTS) sampling algorithm is correct, i.e. whether the target probability distribution is emph{exactly} conserved by the algorithm. It turns out that one of the Gibbs substeps used in the algorithm cannot always be guaranteed to be correct. Full Article
al Can a powerful neural network be a teacher for a weaker neural network?. (arXiv:2005.00393v2 [cs.LG] UPDATED) By arxiv.org Published On :: The transfer learning technique is widely used to learning in one context and applying it to another, i.e. the capacity to apply acquired knowledge and skills to new situations. But is it possible to transfer the learning from a deep neural network to a weaker neural network? Is it possible to improve the performance of a weak neural network using the knowledge acquired by a more powerful neural network? In this work, during the training process of a weak network, we add a loss function that minimizes the distance between the features previously learned from a strong neural network with the features that the weak network must try to learn. To demonstrate the effectiveness and robustness of our approach, we conducted a large number of experiments using three known datasets and demonstrated that a weak neural network can increase its performance if its learning process is driven by a more powerful neural network. Full Article
al A bimodal gamma distribution: Properties, regression model and applications. (arXiv:2004.12491v2 [stat.ME] UPDATED) By arxiv.org Published On :: In this paper we propose a bimodal gamma distribution using a quadratic transformation based on the alpha-skew-normal model. We discuss several properties of this distribution such as mean, variance, moments, hazard rate and entropy measures. Further, we propose a new regression model with censored data based on the bimodal gamma distribution. This regression model can be very useful to the analysis of real data and could give more realistic fits than other special regression models. Monte Carlo simulations were performed to check the bias in the maximum likelihood estimation. The proposed models are applied to two real data sets found in literature. Full Article
al A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced Cardiac Magnetic Resonance Imaging. (arXiv:2004.12314v3 [cs.CV] UPDATED) By arxiv.org Published On :: Segmentation of cardiac images, particularly late gadolinium-enhanced magnetic resonance imaging (LGE-MRI) widely used for visualizing diseased cardiac structures, is a crucial first step for clinical diagnosis and treatment. However, direct segmentation of LGE-MRIs is challenging due to its attenuated contrast. Since most clinical studies have relied on manual and labor-intensive approaches, automatic methods are of high interest, particularly optimized machine learning approaches. To address this, we organized the "2018 Left Atrium Segmentation Challenge" using 154 3D LGE-MRIs, currently the world's largest cardiac LGE-MRI dataset, and associated labels of the left atrium segmented by three medical experts, ultimately attracting the participation of 27 international teams. In this paper, extensive analysis of the submitted algorithms using technical and biological metrics was performed by undergoing subgroup analysis and conducting hyper-parameter analysis, offering an overall picture of the major design choices of convolutional neural networks (CNNs) and practical considerations for achieving state-of-the-art left atrium segmentation. Results show the top method achieved a dice score of 93.2% and a mean surface to a surface distance of 0.7 mm, significantly outperforming prior state-of-the-art. Particularly, our analysis demonstrated that double, sequentially used CNNs, in which a first CNN is used for automatic region-of-interest localization and a subsequent CNN is used for refined regional segmentation, achieved far superior results than traditional methods and pipelines containing single CNNs. This large-scale benchmarking study makes a significant step towards much-improved segmentation methods for cardiac LGE-MRIs, and will serve as an important benchmark for evaluating and comparing the future works in the field. Full Article
al Excess registered deaths in England and Wales during the COVID-19 pandemic, March 2020 and April 2020. (arXiv:2004.11355v4 [stat.AP] UPDATED) By arxiv.org Published On :: Official counts of COVID-19 deaths have been criticized for potentially including people who did not die of COVID-19 but merely died with COVID-19. I address that critique by fitting a generalized additive model to weekly counts of all registered deaths in England and Wales during the 2010s. The model produces baseline rates of death registrations expected in the absence of the COVID-19 pandemic, and comparing those baselines to recent counts of registered deaths exposes the emergence of excess deaths late in March 2020. Among adults aged 45+, about 38,700 excess deaths were registered in the 5 weeks comprising 21 March through 24 April (612 $pm$ 416 from 21$-$27 March, 5675 $pm$ 439 from 28 March through 3 April, then 9183 $pm$ 468, 12,712 $pm$ 589, and 10,511 $pm$ 567 in April's next 3 weeks). Both the Office for National Statistics's respective count of 26,891 death certificates which mention COVID-19, and the Department of Health and Social Care's hospital-focused count of 21,222 deaths, are appreciably less, implying that their counting methods have underestimated rather than overestimated the pandemic's true death toll. If underreporting rates have held steady, about 45,900 direct and indirect COVID-19 deaths might have been registered by April's end but not yet publicly reported in full. Full Article
al On a phase transition in general order spline regression. (arXiv:2004.10922v2 [math.ST] UPDATED) By arxiv.org Published On :: In the Gaussian sequence model $Y= heta_0 + varepsilon$ in $mathbb{R}^n$, we study the fundamental limit of approximating the signal $ heta_0$ by a class $Theta(d,d_0,k)$ of (generalized) splines with free knots. Here $d$ is the degree of the spline, $d_0$ is the order of differentiability at each inner knot, and $k$ is the maximal number of pieces. We show that, given any integer $dgeq 0$ and $d_0in{-1,0,ldots,d-1}$, the minimax rate of estimation over $Theta(d,d_0,k)$ exhibits the following phase transition: egin{equation*} egin{aligned} inf_{widetilde{ heta}}sup_{ hetainTheta(d,d_0, k)}mathbb{E}_ heta|widetilde{ heta} - heta|^2 asymp_d egin{cases} kloglog(16n/k), & 2leq kleq k_0,\ klog(en/k), & k geq k_0+1. end{cases} end{aligned} end{equation*} The transition boundary $k_0$, which takes the form $lfloor{(d+1)/(d-d_0) floor} + 1$, demonstrates the critical role of the regularity parameter $d_0$ in the separation between a faster $log log(16n)$ and a slower $log(en)$ rate. We further show that, once encouraging an additional '$d$-monotonicity' shape constraint (including monotonicity for $d = 0$ and convexity for $d=1$), the above phase transition is eliminated and the faster $kloglog(16n/k)$ rate can be achieved for all $k$. These results provide theoretical support for developing $ell_0$-penalized (shape-constrained) spline regression procedures as useful alternatives to $ell_1$- and $ell_2$-penalized ones. Full Article
al A Critical Overview of Privacy-Preserving Approaches for Collaborative Forecasting. (arXiv:2004.09612v3 [cs.LG] UPDATED) By arxiv.org Published On :: Cooperation between different data owners may lead to an improvement in forecast quality - for instance by benefiting from spatial-temporal dependencies in geographically distributed time series. Due to business competitive factors and personal data protection questions, said data owners might be unwilling to share their data, which increases the interest in collaborative privacy-preserving forecasting. This paper analyses the state-of-the-art and unveils several shortcomings of existing methods in guaranteeing data privacy when employing Vector Autoregressive (VAR) models. The paper also provides mathematical proofs and numerical analysis to evaluate existing privacy-preserving methods, dividing them into three groups: data transformation, secure multi-party computations, and decomposition methods. The analysis shows that state-of-the-art techniques have limitations in preserving data privacy, such as a trade-off between privacy and forecasting accuracy, while the original data in iterative model fitting processes, in which intermediate results are shared, can be inferred after some iterations. Full Article
al Deep transfer learning for improving single-EEG arousal detection. (arXiv:2004.05111v2 [cs.CV] UPDATED) By arxiv.org Published On :: Datasets in sleep science present challenges for machine learning algorithms due to differences in recording setups across clinics. We investigate two deep transfer learning strategies for overcoming the channel mismatch problem for cases where two datasets do not contain exactly the same setup leading to degraded performance in single-EEG models. Specifically, we train a baseline model on multivariate polysomnography data and subsequently replace the first two layers to prepare the architecture for single-channel electroencephalography data. Using a fine-tuning strategy, our model yields similar performance to the baseline model (F1=0.682 and F1=0.694, respectively), and was significantly better than a comparable single-channel model. Our results are promising for researchers working with small databases who wish to use deep learning models pre-trained on larger databases. Full Article
al Capturing and Explaining Trajectory Singularities using Composite Signal Neural Networks. (arXiv:2003.10810v2 [cs.LG] UPDATED) By arxiv.org Published On :: Spatial trajectories are ubiquitous and complex signals. Their analysis is crucial in many research fields, from urban planning to neuroscience. Several approaches have been proposed to cluster trajectories. They rely on hand-crafted features, which struggle to capture the spatio-temporal complexity of the signal, or on Artificial Neural Networks (ANNs) which can be more efficient but less interpretable. In this paper we present a novel ANN architecture designed to capture the spatio-temporal patterns characteristic of a set of trajectories, while taking into account the demographics of the navigators. Hence, our model extracts markers linked to both behaviour and demographics. We propose a composite signal analyser (CompSNN) combining three simple ANN modules. Each of these modules uses different signal representations of the trajectory while remaining interpretable. Our CompSNN performs significantly better than its modules taken in isolation and allows to visualise which parts of the signal were most useful to discriminate the trajectories. Full Article
al Mnemonics Training: Multi-Class Incremental Learning without Forgetting. (arXiv:2002.10211v3 [cs.CV] UPDATED) By arxiv.org Published On :: Multi-Class Incremental Learning (MCIL) aims to learn new concepts by incrementally updating a model trained on previous concepts. However, there is an inherent trade-off to effectively learning new concepts without catastrophic forgetting of previous ones. To alleviate this issue, it has been proposed to keep around a few examples of the previous concepts but the effectiveness of this approach heavily depends on the representativeness of these examples. This paper proposes a novel and automatic framework we call mnemonics, where we parameterize exemplars and make them optimizable in an end-to-end manner. We train the framework through bilevel optimizations, i.e., model-level and exemplar-level. We conduct extensive experiments on three MCIL benchmarks, CIFAR-100, ImageNet-Subset and ImageNet, and show that using mnemonics exemplars can surpass the state-of-the-art by a large margin. Interestingly and quite intriguingly, the mnemonics exemplars tend to be on the boundaries between different classes. Full Article
al A Distributionally Robust Area Under Curve Maximization Model. (arXiv:2002.07345v2 [math.OC] UPDATED) By arxiv.org Published On :: Area under ROC curve (AUC) is a widely used performance measure for classification models. We propose two new distributionally robust AUC maximization models (DR-AUC) that rely on the Kantorovich metric and approximate the AUC with the hinge loss function. We consider the two cases with respectively fixed and variable support for the worst-case distribution. We use duality theory to reformulate the DR-AUC models and derive tractable convex optimization problems. The numerical experiments show that the proposed DR-AUC models -- benchmarked with the standard deterministic AUC and the support vector machine models - perform better in general and in particular improve the worst-case out-of-sample performance over the majority of the considered datasets, thereby showing their robustness. The results are particularly encouraging since our numerical experiments are conducted with training sets of small size which have been known to be conducive to low out-of-sample performance. Full Article
al Statistical aspects of nuclear mass models. (arXiv:2002.04151v3 [nucl-th] UPDATED) By arxiv.org Published On :: We study the information content of nuclear masses from the perspective of global models of nuclear binding energies. To this end, we employ a number of statistical methods and diagnostic tools, including Bayesian calibration, Bayesian model averaging, chi-square correlation analysis, principal component analysis, and empirical coverage probability. Using a Bayesian framework, we investigate the structure of the 4-parameter Liquid Drop Model by considering discrepant mass domains for calibration. We then use the chi-square correlation framework to analyze the 14-parameter Skyrme energy density functional calibrated using homogeneous and heterogeneous datasets. We show that a quite dramatic parameter reduction can be achieved in both cases. The advantage of Bayesian model averaging for improving uncertainty quantification is demonstrated. The statistical approaches used are pedagogically described; in this context this work can serve as a guide for future applications. Full Article
al Cyclic Boosting -- an explainable supervised machine learning algorithm. (arXiv:2002.03425v2 [cs.LG] UPDATED) By arxiv.org Published On :: Supervised machine learning algorithms have seen spectacular advances and surpassed human level performance in a wide range of specific applications. However, using complex ensemble or deep learning algorithms typically results in black box models, where the path leading to individual predictions cannot be followed in detail. In order to address this issue, we propose the novel "Cyclic Boosting" machine learning algorithm, which allows to efficiently perform accurate regression and classification tasks while at the same time allowing a detailed understanding of how each individual prediction was made. Full Article
al On the impact of selected modern deep-learning techniques to the performance and celerity of classification models in an experimental high-energy physics use case. (arXiv:2002.01427v3 [physics.data-an] UPDATED) By arxiv.org Published On :: Beginning from a basic neural-network architecture, we test the potential benefits offered by a range of advanced techniques for machine learning, in particular deep learning, in the context of a typical classification problem encountered in the domain of high-energy physics, using a well-studied dataset: the 2014 Higgs ML Kaggle dataset. The advantages are evaluated in terms of both performance metrics and the time required to train and apply the resulting models. Techniques examined include domain-specific data-augmentation, learning rate and momentum scheduling, (advanced) ensembling in both model-space and weight-space, and alternative architectures and connection methods. Following the investigation, we arrive at a model which achieves equal performance to the winning solution of the original Kaggle challenge, whilst being significantly quicker to train and apply, and being suitable for use with both GPU and CPU hardware setups. These reductions in timing and hardware requirements potentially allow the use of more powerful algorithms in HEP analyses, where models must be retrained frequently, sometimes at short notice, by small groups of researchers with limited hardware resources. Additionally, a new wrapper library for PyTorch called LUMINis presented, which incorporates all of the techniques studied. Full Article
al A priori generalization error for two-layer ReLU neural network through minimum norm solution. (arXiv:1912.03011v3 [cs.LG] UPDATED) By arxiv.org Published On :: We focus on estimating emph{a priori} generalization error of two-layer ReLU neural networks (NNs) trained by mean squared error, which only depends on initial parameters and the target function, through the following research line. We first estimate emph{a priori} generalization error of finite-width two-layer ReLU NN with constraint of minimal norm solution, which is proved by cite{zhang2019type} to be an equivalent solution of a linearized (w.r.t. parameter) finite-width two-layer NN. As the width goes to infinity, the linearized NN converges to the NN in Neural Tangent Kernel (NTK) regime citep{jacot2018neural}. Thus, we can derive the emph{a priori} generalization error of two-layer ReLU NN in NTK regime. The distance between NN in a NTK regime and a finite-width NN with gradient training is estimated by cite{arora2019exact}. Based on the results in cite{arora2019exact}, our work proves an emph{a priori} generalization error bound of two-layer ReLU NNs. This estimate uses the intrinsic implicit bias of the minimum norm solution without requiring extra regularity in the loss function. This emph{a priori} estimate also implies that NN does not suffer from curse of dimensionality, and a small generalization error can be achieved without requiring exponentially large number of neurons. In addition the research line proposed in this paper can also be used to study other properties of the finite-width network, such as the posterior generalization error. Full Article
al Sampling random graph homomorphisms and applications to network data analysis. (arXiv:1910.09483v2 [math.PR] UPDATED) By arxiv.org Published On :: A graph homomorphism is a map between two graphs that preserves adjacency relations. We consider the problem of sampling a random graph homomorphism from a graph $F$ into a large network $mathcal{G}$. We propose two complementary MCMC algorithms for sampling a random graph homomorphisms and establish bounds on their mixing times and concentration of their time averages. Based on our sampling algorithms, we propose a novel framework for network data analysis that circumvents some of the drawbacks in methods based on independent and neigborhood sampling. Various time averages of the MCMC trajectory give us various computable observables, including well-known ones such as homomorphism density and average clustering coefficient and their generalizations. Furthermore, we show that these network observables are stable with respect to a suitably renormalized cut distance between networks. We provide various examples and simulations demonstrating our framework through synthetic networks. We also apply our framework for network clustering and classification problems using the Facebook100 dataset and Word Adjacency Networks of a set of classic novels. Full Article
al Bayesian factor models for multivariate categorical data obtained from questionnaires. (arXiv:1910.04283v2 [stat.AP] UPDATED) By arxiv.org Published On :: Factor analysis is a flexible technique for assessment of multivariate dependence and codependence. Besides being an exploratory tool used to reduce the dimensionality of multivariate data, it allows estimation of common factors that often have an interesting theoretical interpretation in real problems. However, standard factor analysis is only applicable when the variables are scaled, which is often inappropriate, for example, in data obtained from questionnaires in the field of psychology,where the variables are often categorical. In this framework, we propose a factor model for the analysis of multivariate ordered and non-ordered polychotomous data. The inference procedure is done under the Bayesian approach via Markov chain Monte Carlo methods. Two Monte-Carlo simulation studies are presented to investigate the performance of this approach in terms of estimation bias, precision and assessment of the number of factors. We also illustrate the proposed method to analyze participants' responses to the Motivational State Questionnaire dataset, developed to study emotions in laboratory and field settings. Full Article
al Differentiable Sparsification for Deep Neural Networks. (arXiv:1910.03201v2 [cs.LG] UPDATED) By arxiv.org Published On :: A deep neural network has relieved the burden of feature engineering by human experts, but comparable efforts are instead required to determine an effective architecture. On the other hands, as the size of a network has over-grown, a lot of resources are also invested to reduce its size. These problems can be addressed by sparsification of an over-complete model, which removes redundant parameters or connections by pruning them away after training or encouraging them to become zero during training. In general, however, these approaches are not fully differentiable and interrupt an end-to-end training process with the stochastic gradient descent in that they require either a parameter selection or a soft-thresholding step. In this paper, we propose a fully differentiable sparsification method for deep neural networks, which allows parameters to be exactly zero during training, and thus can learn the sparsified structure and the weights of networks simultaneously using the stochastic gradient descent. We apply the proposed method to various popular models in order to show its effectiveness. Full Article
al DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs. (arXiv:1909.13003v4 [cs.LG] UPDATED) By arxiv.org Published On :: A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty. We cast POMDP filtering and planning problems as two closely related Sequential Monte Carlo (SMC) processes, one over the real states and the other over the future optimal trajectories, and combine the merits of these two parts in a new model named the DualSMC network. In particular, we first introduce an adversarial particle filter that leverages the adversarial relationship between its internal components. Based on the filtering results, we then propose a planning algorithm that extends the previous SMC planning approach [Piche et al., 2018] to continuous POMDPs with an uncertainty-dependent policy. Crucially, not only can DualSMC handle complex observations such as image input but also it remains highly interpretable. It is shown to be effective in three continuous POMDP domains: the floor positioning domain, the 3D light-dark navigation domain, and a modified Reacher domain. Full Article