de

A new log-linear bimodal Birnbaum–Saunders regression model with application to survival data

Francisco Cribari-Neto, Rodney V. Fonseca.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 329--355.

Abstract:
The log-linear Birnbaum–Saunders model has been widely used in empirical applications. We introduce an extension of this model based on a recently proposed version of the Birnbaum–Saunders distribution which is more flexible than the standard Birnbaum–Saunders law since its density may assume both unimodal and bimodal shapes. We show how to perform point estimation, interval estimation and hypothesis testing inferences on the parameters that index the regression model we propose. We also present a number of diagnostic tools, such as residual analysis, local influence, generalized leverage, generalized Cook’s distance and model misspecification tests. We investigate the usefulness of model selection criteria and the accuracy of prediction intervals for the proposed model. Results of Monte Carlo simulations are presented. Finally, we also present and discuss an empirical application.




de

Failure rate of Birnbaum–Saunders distributions: Shape, change-point, estimation and robustness

Emilia Athayde, Assis Azevedo, Michelli Barros, Víctor Leiva.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 301--328.

Abstract:
The Birnbaum–Saunders (BS) distribution has been largely studied and applied. A random variable with BS distribution is a transformation of another random variable with standard normal distribution. Generalized BS distributions are obtained when the normally distributed random variable is replaced by another symmetrically distributed random variable. This allows us to obtain a wide class of positively skewed models with lighter and heavier tails than the BS model. Its failure rate admits several shapes, including the unimodal case, with its change-point being able to be used for different purposes. For example, to establish the reduction in a dose, and then in the cost of the medical treatment. We analyze the failure rates of generalized BS distributions obtained by the logistic, normal and Student-t distributions, considering their shape and change-point, estimating them, evaluating their robustness, assessing their performance by simulations, and applying the results to real data from different areas.




de

Modified information criterion for testing changes in skew normal model

Khamis K. Said, Wei Ning, Yubin Tian.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 280--300.

Abstract:
In this paper, we study the change point problem for the skew normal distribution model from the view of model selection problem. The detection procedure based on the modified information criterion (MIC) for change problem is proposed. Such a procedure has advantage in detecting the changes in early and late stage of a data comparing to the one based on the traditional Schwarz information criterion which is well known as Bayesian information criterion (BIC) by considering the complexity of the models. Due to the difficulty in deriving the analytic asymptotic distribution of the test statistic based on the MIC procedure, the bootstrap simulation is provided to obtain the critical values at the different significance levels. Simulations are conducted to illustrate the comparisons of performance between MIC, BIC and likelihood ratio test (LRT). Such an approach is applied on two stock market data sets to indicate the detection procedure.




de

A brief review of optimal scaling of the main MCMC approaches and optimal scaling of additive TMCMC under non-regular cases

Kushal K. Dey, Sourabh Bhattacharya.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 222--266.

Abstract:
Transformation based Markov Chain Monte Carlo (TMCMC) was proposed by Dutta and Bhattacharya ( Statistical Methodology 16 (2014) 100–116) as an efficient alternative to the Metropolis–Hastings algorithm, especially in high dimensions. The main advantage of this algorithm is that it simultaneously updates all components of a high dimensional parameter using appropriate move types defined by deterministic transformation of a single random variable. This results in reduction in time complexity at each step of the chain and enhances the acceptance rate. In this paper, we first provide a brief review of the optimal scaling theory for various existing MCMC approaches, comparing and contrasting them with the corresponding TMCMC approaches.The optimal scaling of the simplest form of TMCMC, namely additive TMCMC , has been studied extensively for the Gaussian proposal density in Dey and Bhattacharya (2017a). Here, we discuss diffusion-based optimal scaling behavior of additive TMCMC for non-Gaussian proposal densities—in particular, uniform, Student’s $t$ and Cauchy proposals. Although we could not formally prove our diffusion result for the Cauchy proposal, simulation based results lead us to conjecture that at least the recipe for obtaining general optimal scaling and optimal acceptance rate holds for the Cauchy case as well. We also consider diffusion based optimal scaling of TMCMC when the target density is discontinuous. Such non-regular situations have been studied in the case of Random Walk Metropolis Hastings (RWMH) algorithm by Neal and Roberts ( Methodology and Computing in Applied Probability 13 (2011) 583–601) using expected squared jumping distance (ESJD), but the diffusion theory based scaling has not been considered. We compare our diffusion based optimally scaled TMCMC approach with the ESJD based optimally scaled RWM with simulation studies involving several target distributions and proposal distributions including the challenging Cauchy proposal case, showing that additive TMCMC outperforms RWMH in almost all cases considered.




de

Simple tail index estimation for dependent and heterogeneous data with missing values

Ivana Ilić, Vladica M. Veličković.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 1, 192--203.

Abstract:
Financial returns are known to be nonnormal and tend to have fat-tailed distribution. Also, the dependence of large values in a stochastic process is an important topic in risk, insurance and finance. In the presence of missing values, we deal with the asymptotic properties of a simple “median” estimator of the tail index based on random variables with the heavy-tailed distribution function and certain dependence among the extremes. Weak consistency and asymptotic normality of the proposed estimator are established. The estimator is a special case of a well-known estimator defined in Bacro and Brito [ Statistics & Decisions 3 (1993) 133–143]. The advantage of the estimator is its robustness against deviations and compared to Hill’s, it is less affected by the fluctuations related to the maximum of the sample or by the presence of outliers. Several examples are analyzed in order to support the proofs.




de

The equivalence of dynamic and static asset allocations under the uncertainty caused by Poisson processes

Yong-Chao Zhang, Na Zhang.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 1, 184--191.

Abstract:
We investigate the equivalence of dynamic and static asset allocations in the case where the price process of a risky asset is driven by a Poisson process. Under some mild conditions, we obtain a necessary and sufficient condition for the equivalence of dynamic and static asset allocations. In addition, we provide a simple sufficient condition for the equivalence.




de

An estimation method for latent traits and population parameters in Nominal Response Model

Caio L. N. Azevedo, Dalton F. Andrade

Source: Braz. J. Probab. Stat., Volume 24, Number 3, 415--433.

Abstract:
The nominal response model (NRM) was proposed by Bock [ Psychometrika 37 (1972) 29–51] in order to improve the latent trait (ability) estimation in multiple choice tests with nominal items. When the item parameters are known, expectation a posteriori or maximum a posteriori methods are commonly employed to estimate the latent traits, considering a standard symmetric normal distribution as the latent traits prior density. However, when this item set is presented to a new group of examinees, it is not only necessary to estimate their latent traits but also the population parameters of this group. This article has two main purposes: first, to develop a Monte Carlo Markov Chain algorithm to estimate both latent traits and population parameters concurrently. This algorithm comprises the Metropolis–Hastings within Gibbs sampling algorithm (MHWGS) proposed by Patz and Junker [ Journal of Educational and Behavioral Statistics 24 (1999b) 346–366]. Second, to compare, in the latent trait recovering, the performance of this method with three other methods: maximum likelihood, expectation a posteriori and maximum a posteriori. The comparisons were performed by varying the total number of items (NI), the number of categories and the values of the mean and the variance of the latent trait distribution. The results showed that MHWGS outperforms the other methods concerning the latent traits estimation as well as it recoveries properly the population parameters. Furthermore, we found that NI accounts for the highest percentage of the variability in the accuracy of latent trait estimation.




de

Estimating the size of a hidden finite set: Large-sample behavior of estimators

Si Cheng, Daniel J. Eck, Forrest W. Crawford.

Source: Statistics Surveys, Volume 14, 1--31.

Abstract:
A finite set is “hidden” if its elements are not directly enumerable or if its size cannot be ascertained via a deterministic query. In public health, epidemiology, demography, ecology and intelligence analysis, researchers have developed a wide variety of indirect statistical approaches, under different models for sampling and observation, for estimating the size of a hidden set. Some methods make use of random sampling with known or estimable sampling probabilities, and others make structural assumptions about relationships (e.g. ordering or network information) between the elements that comprise the hidden set. In this review, we describe models and methods for learning about the size of a hidden finite set, with special attention to asymptotic properties of estimators. We study the properties of these methods under two asymptotic regimes, “infill” in which the number of fixed-size samples increases, but the population size remains constant, and “outfill” in which the sample size and population size grow together. Statistical properties under these two regimes can be dramatically different.




de

Halfspace depth and floating body

Stanislav Nagy, Carsten Schütt, Elisabeth M. Werner.

Source: Statistics Surveys, Volume 13, 52--118.

Abstract:
Little known relations of the renown concept of the halfspace depth for multivariate data with notions from convex and affine geometry are discussed. Maximum halfspace depth may be regarded as a measure of symmetry for random vectors. As such, the maximum depth stands as a generalization of a measure of symmetry for convex sets, well studied in geometry. Under a mild assumption, the upper level sets of the halfspace depth coincide with the convex floating bodies of measures used in the definition of the affine surface area for convex bodies in Euclidean spaces. These connections enable us to partially resolve some persistent open problems regarding theoretical properties of the depth.




de

A review of dynamic network models with latent variables

Bomin Kim, Kevin H. Lee, Lingzhou Xue, Xiaoyue Niu.

Source: Statistics Surveys, Volume 12, 105--135.

Abstract:
We present a selective review of statistical modeling of dynamic networks. We focus on models with latent variables, specifically, the latent space models and the latent class models (or stochastic blockmodels), which investigate both the observed features and the unobserved structure of networks. We begin with an overview of the static models, and then we introduce the dynamic extensions. For each dynamic model, we also discuss its applications that have been studied in the literature, with the data source listed in Appendix. Based on the review, we summarize a list of open problems and challenges in dynamic network modeling with latent variables.




de

Variable selection methods for model-based clustering

Michael Fop, Thomas Brendan Murphy.

Source: Statistics Surveys, Volume 12, 18--65.

Abstract:
Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received a lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the interpretation of the clustering results. This review provides a summary of the methods developed for variable selection in model-based clustering. Existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples.




de

A design-sensitive approach to fitting regression models with complex survey data

Phillip S. Kott.

Source: Statistics Surveys, Volume 12, 1--17.

Abstract:
Fitting complex survey data to regression equations is explored under a design-sensitive model-based framework. A robust version of the standard model assumes that the expected value of the difference between the dependent variable and its model-based prediction is zero no matter what the values of the explanatory variables. The extended model assumes only that the difference is uncorrelated with the covariates. Little is assumed about the error structure of this difference under either model other than independence across primary sampling units. The standard model often fails in practice, but the extended model very rarely does. Under this framework some of the methods developed in the conventional design-based, pseudo-maximum-likelihood framework, such as fitting weighted estimating equations and sandwich mean-squared-error estimation, are retained but their interpretations change. Few of the ideas here are new to the refereed literature. The goal instead is to collect those ideas and put them into a unified conceptual framework.




de

Basic models and questions in statistical network analysis

Miklós Z. Rácz, Sébastien Bubeck.

Source: Statistics Surveys, Volume 11, 1--47.

Abstract:
Extracting information from large graphs has become an important statistical problem since network data is now common in various fields. In this minicourse we will investigate the most natural statistical questions for three canonical probabilistic models of networks: (i) community detection in the stochastic block model, (ii) finding the embedding of a random geometric graph, and (iii) finding the original vertex in a preferential attachment tree. Along the way we will cover many interesting topics in probability theory such as Pólya urns, large deviation theory, concentration of measure in high dimension, entropic central limit theorems, and more.




de

A unified treatment for non-asymptotic and asymptotic approaches to minimax signal detection

Clément Marteau, Theofanis Sapatinas.

Source: Statistics Surveys, Volume 9, 253--297.

Abstract:
We are concerned with minimax signal detection. In this setting, we discuss non-asymptotic and asymptotic approaches through a unified treatment. In particular, we consider a Gaussian sequence model that contains classical models as special cases, such as, direct, well-posed inverse and ill-posed inverse problems. Working with certain ellipsoids in the space of squared-summable sequences of real numbers, with a ball of positive radius removed, we compare the construction of lower and upper bounds for the minimax separation radius (non-asymptotic approach) and the minimax separation rate (asymptotic approach) that have been proposed in the literature. Some additional contributions, bringing to light links between non-asymptotic and asymptotic approaches to minimax signal, are also presented. An example of a mildly ill-posed inverse problem is used for illustrative purposes. In particular, it is shown that tools used to derive ‘asymptotic’ results can be exploited to draw ‘non-asymptotic’ conclusions, and vice-versa. In order to enhance our understanding of these two minimax signal detection paradigms, we bring into light hitherto unknown similarities and links between non-asymptotic and asymptotic approaches.




de

Some models and methods for the analysis of observational data

José A. Ferreira.

Source: Statistics Surveys, Volume 9, 106--208.

Abstract:
This article provides a concise and essentially self-contained exposition of some of the most important models and non-parametric methods for the analysis of observational data, and a substantial number of illustrations of their application. Although for the most part our presentation follows P. Rosenbaum’s book, “Observational Studies”, and naturally draws on related literature, it contains original elements and simplifies and generalizes some basic results. The illustrations, based on simulated data, show the methods at work in some detail, highlighting pitfalls and emphasizing certain subjective aspects of the statistical analyses.




de

Semi-parametric estimation for conditional independence multivariate finite mixture models

Didier Chauveau, David R. Hunter, Michael Levine.

Source: Statistics Surveys, Volume 9, 1--31.

Abstract:
The conditional independence assumption for nonparametric multivariate finite mixture models, a weaker form of the well-known conditional independence assumption for random effects models for longitudinal data, is the subject of an increasing number of theoretical and algorithmic developments in the statistical literature. After presenting a survey of this literature, including an in-depth discussion of the all-important identifiability results, this article describes and extends an algorithm for estimation of the parameters in these models. The algorithm works for any number of components in three or more dimensions. It possesses a descent property and can be easily adapted to situations where the data are grouped in blocks of conditionally independent variables. We discuss how to adapt this algorithm to various location-scale models that link component densities, and we even adapt it to a particular class of univariate mixture problems in which the components are assumed symmetric. We give a bandwidth selection procedure for our algorithm. Finally, we demonstrate the effectiveness of our algorithm using a simulation study and two psychometric datasets.




de

Adaptive clinical trial designs for phase I cancer studies

Oleksandr Sverdlov, Weng Kee Wong, Yevgen Ryeznik.

Source: Statistics Surveys, Volume 8, 2--44.

Abstract:
Adaptive clinical trials are becoming increasingly popular research designs for clinical investigation. Adaptive designs are particularly useful in phase I cancer studies where clinical data are scant and the goals are to assess the drug dose-toxicity profile and to determine the maximum tolerated dose while minimizing the number of study patients treated at suboptimal dose levels. In the current work we give an overview of adaptive design methods for phase I cancer trials. We find that modern statistical literature is replete with novel adaptive designs that have clearly defined objectives and established statistical properties, and are shown to outperform conventional dose finding methods such as the 3+3 design, both in terms of statistical efficiency and in terms of minimizing the number of patients treated at highly toxic or nonefficacious doses. We discuss statistical, logistical, and regulatory aspects of these designs and present some links to non-commercial statistical software for implementing these methods in practice.




de

Errata: A survey of Bayesian predictive methods for model assessment, selection and comparison

Aki Vehtari, Janne Ojanen.

Source: Statistics Surveys, Volume 8, , 1--1.

Abstract:
Errata for “A survey of Bayesian predictive methods for model assessment, selection and comparison” by A. Vehtari and J. Ojanen, Statistics Surveys , 6 (2012), 142–228. doi:10.1214/12-SS102.




de

Analyzing complex functional brain networks: Fusing statistics and network science to understand the brain

Sean L. Simpson, F. DuBois Bowman, Paul J. Laurienti

Source: Statist. Surv., Volume 7, 1--36.

Abstract:
Complex functional brain network analyses have exploded over the last decade, gaining traction due to their profound clinical implications. The application of network science (an interdisciplinary offshoot of graph theory) has facilitated these analyses and enabled examining the brain as an integrated system that produces complex behaviors. While the field of statistics has been integral in advancing activation analyses and some connectivity analyses in functional neuroimaging research, it has yet to play a commensurate role in complex network analyses. Fusing novel statistical methods with network-based functional neuroimage analysis will engender powerful analytical tools that will aid in our understanding of normal brain function as well as alterations due to various brain disorders. Here we survey widely used statistical and network science tools for analyzing fMRI network data and discuss the challenges faced in filling some of the remaining methodological gaps. When applied and interpreted correctly, the fusion of network scientific and statistical methods has a chance to revolutionize the understanding of brain function.




de

A survey of Bayesian predictive methods for model assessment, selection and comparison

Aki Vehtari, Janne Ojanen

Source: Statist. Surv., Volume 6, 142--228.

Abstract:
To date, several methods exist in the statistical literature for model assessment, which purport themselves specifically as Bayesian predictive methods. The decision theoretic assumptions on which these methods are based are not always clearly stated in the original articles, however. The aim of this survey is to provide a unified review of Bayesian predictive model assessment and selection methods, and of methods closely related to them. We review the various assumptions that are made in this context and discuss the connections between different approaches, with an emphasis on how each method approximates the expected utility of using a Bayesian model for the purpose of predicting future data.




de

The theory and application of penalized methods or Reproducing Kernel Hilbert Spaces made easy

Nancy Heckman

Source: Statist. Surv., Volume 6, 113--141.

Abstract:
The popular cubic smoothing spline estimate of a regression function arises as the minimizer of the penalized sum of squares $sum_{j}(Y_{j}-mu(t_{j}))^{2}+lambda int_{a}^{b}[mu''(t)]^{2},dt$, where the data are $t_{j},Y_{j}$, $j=1,ldots,n$. The minimization is taken over an infinite-dimensional function space, the space of all functions with square integrable second derivatives. But the calculations can be carried out in a finite-dimensional space. The reduction from minimizing over an infinite dimensional space to minimizing over a finite dimensional space occurs for more general objective functions: the data may be related to the function $mu$ in another way, the sum of squares may be replaced by a more suitable expression, or the penalty, $int_{a}^{b}[mu''(t)]^{2},dt$, might take a different form. This paper reviews the Reproducing Kernel Hilbert Space structure that provides a finite-dimensional solution for a general minimization problem. Particular attention is paid to the construction and study of the Reproducing Kernel Hilbert Space corresponding to a penalty based on a linear differential operator. In this case, one can often calculate the minimizer explicitly, using Green’s functions.




de

Statistical inference for disordered sphere packings

Jeffrey Picka

Source: Statist. Surv., Volume 6, 74--112.

Abstract:
This paper gives an overview of statistical inference for disordered sphere packing processes. These processes are used extensively in physics and engineering in order to represent the internal structure of composite materials, packed bed reactors, and powders at rest, and are used as initial arrangements of grains in the study of avalanches and other problems involving powders in motion. Packing processes are spatial processes which are neither stationary nor ergodic. Classical spatial statistical models and procedures cannot be applied to these processes, but alternative models and procedures can be developed based on ideas from statistical physics. Most of the development of models and statistics for sphere packings has been undertaken by scientists and engineers. This review summarizes their results from an inferential perspective.




de

Data confidentiality: A review of methods for statistical disclosure limitation and methods for assessing privacy

Gregory J. Matthews, Ofer Harel

Source: Statist. Surv., Volume 5, 1--29.

Abstract:
There is an ever increasing demand from researchers for access to useful microdata files. However, there are also growing concerns regarding the privacy of the individuals contained in the microdata. Ideally, microdata could be released in such a way that a balance between usefulness of the data and privacy is struck. This paper presents a review of proposed methods of statistical disclosure control and techniques for assessing the privacy of such methods under different definitions of disclosure.

References:
Abowd, J., Woodcock, S., 2001. Disclosure limitation in longitudinal linked data. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, 215–277.

Adam, N.R., Worthmann, J.C., 1989. Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21 (4), 515–556.

Armstrong, M., Rushton, G., Zimmerman, D.L., 1999. Geographically masking health data to preserve confidentiality. Statistics in Medicine 18 (5), 497–525.

Bethlehem, J.G., Keller, W., Pannekoek, J., 1990. Disclosure control of microdata. Jorunal of the American Statistical Association 85, 38–45.

Blum, A., Dwork, C., McSherry, F., Nissam, K., 2005. Practical privacy: The sulq framework. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. pp. 128–138.

Bowden, R.J., Sim, A.B., 1992. The privacy bootstrap. Journal of Business and Economic Statistics 10 (3), 337–345.

Carlson, M., Salabasis, M., 2002. A data-swapping technique for generating synthetic samples; a method for disclosure control. Res. Official Statist. (5), 35–64.

Cox, L.H., 1980. Suppression methodology and statistical disclosure control. Journal of the American Statistical Association 75, 377–385.

Cox, L.H., 1984. Disclosure control methods for frequency count data. Tech. rep., U.S. Bureau of the Census.

Cox, L.H., 1987. A constructive procedure for unbiased controlled rounding. Journal of the American Statistical Association 82, 520–524.

Cox, L.H., 1994. Matrix masking methods for disclosure limitation in microdata. Survey Methodology 6, 165–169.

Cox, L.H., Fagan, J.T., Greenberg, B., Hemmig, R., 1987. Disclosure avoidance techniques for tabular data. Tech. rep., U.S. Bureau of the Census.

Dalenius, T., 1977. Towards a methodology for statistical disclosure control. Statistik Tidskrift 15, 429–444.

Dalenius, T., 1986. Finding a needle in a haystack - or identifying anonymous census record. Journal of Official Statistics 2 (3), 329–336.

Dalenius, T., Denning, D., 1982. A hybrid scheme for release of statistics. Statistisk Tidskrift.

Dalenius, T., Reiss, S.P., 1982. Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference 6, 73–85.

De Waal, A., Hundepool, A., Willenborg, L., 1995. Argus: Software for statistical disclosure control of microdata. U.S. Census Bureau.

DeGroot, M.H., 1962. Uncertainty, information, and sequential experiments. Annals of Mathematical Statistics 33, 404–419.

DeGroot, M.H., 1970. Optimal Statistical Decisions. Mansell, London.

Dinur, I., Nissam, K., 2003. Revealing information while preserving privacy. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principlesof Database Systems. pp. 202–210.

Domingo-Ferrer, J., Torra, V., 2001a. A Quantitative Comparison of Disclosure Control Methods for Microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (Eds.), Confidentiality, Disclosure and Data Access - Theory and Practical Applications for Statistical Agencies. North-Holland, Amsterdam, Ch. 6, pp. 113–135.

Domingo-Ferrer, J., Torra, V., 2001b. Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (Eds.), Confidentiality, Disclosure and Data Access - Theory and Practical Applications for Statistical Agencies. North-Holland, Amsterdam, Ch. 5, pp. 93–112.

Duncan, G., Lambert, D., 1986. Disclosure-limited data dissemination. Journal of the American Statistical Association 81, 10–28.

Duncan, G., Lambert, D., 1989. The risk of disclosure for microdata. Journal of Business & Economic Statistics 7, 207–217.

Duncan, G., Pearson, R., 1991. Enhancing access to microdata while protecting confidentiality: prospects for the future (with discussion). Statistical Science 6, 219–232.

Dwork, C., 2006. Differential privacy. In: ICALP. Springer, pp. 1–12.

Dwork, C., 2008. An ad omnia approach to defining and achieving private data analysis. In: Lecture Notes in Computer Science. Springer, p. 10.

Dwork, C., Lei, J., 2009. Differential privacy and robust statistics. In: Proceedings of the 41th Annual ACM Symposium on Theory of Computing (STOC). pp. 371–380.

Dwork, C., Mcsherry, F., Nissim, K., Smith, A., 2006. Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference. Springer, pp. 265–284.

Dwork, C., Nissam, K., 2004. Privacy-preserving datamining on vertically partitioned databases. In: Advances in Cryptology: Proceedings of Crypto. pp. 528–544.

Elliot, M., 2000. DIS: a new approach to the measurement of statistical disclosure risk. International Journal of Risk Assessment and Management 2, 39–48.

Federal Committee on Statistical Methodology (FCSM), 2005. Statistical policy working group 22 - report on statistical disclosure limitation methodology. U.S. Census Bureau.

Fellegi, I.P., 1972. On the question of statistical confidentiality. Journal of the American Statistical Association 67 (337), 7–18.

Fienberg, S.E., McIntyre, J., 2004. Data swapping: Variations on a theme by Dalenius and Reiss. In: Domingo-Ferrer, J., Torra, V. (Eds.), Privacy in Statistical Databases. Vol. 3050 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, pp. 519, http://dx.doi.org/10.1007/ 978-3-540-25955-8_2

Fuller, W., 1993. Masking procedurse for microdata disclosure limitation. Journal of Official Statistics 9, 383–406.

General Assembly of the United Nations, 1948. Universal declaration of human rights.

Gouweleeuw, J., P. Kooiman, L.W., de Wolf, P.-P., 1998. Post randomisation for statistical disclosure control: Theory and implementation. Journal of Official Statistics 14 (4), 463–478.

Greenberg, B., 1987. Rank swapping for masking ordinal microdata. Tech. rep., U.S. Bureau of the Census (unpublished manuscript), Suitland, Maryland, USA.

Greenberg, B.G., Abul-Ela, A.-L.A., Simmons, W.R., Horvitz, D.G., 1969. The unrelated question randomized response model: Theoretical framework. Journal of the American Statistical Association 64 (326), 520–539.

Harel, O., Zhou, X.-H., 2007. Multiple imputation: Review and theory, implementation and software. Statistics in Medicine 26, 3057–3077.

Hundepool, A., Domingo-ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Nordholt, E.S., Seri, G., paul De Wolf, P., 2006. A CENtre of EXcellence for Statistical Disclosure Control Handbook on Statistical Disclosure Control Version 1.01.

Hundepool, A., Wetering, A. v.d., Ramaswamy, R., Wolf, P.d., Giessing, S., Fischetti, M., Salazar, J., Castro, J., Lowthian, P., Feb. 2005. τ-argus 3.1 user manual. Statistics Netherlands, Voorburg NL.

Hundepool, A., Willenborg, L., 1996. μ- and τ-argus: Software for statistical disclosure control. Third International Seminar on Statistical Confidentiality, Bled.

Karr, A., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P., 2006. A framework for evaluating the utility of data altered to protect confidentiality. American Statistician 60 (3), 224–232.

Kaufman, S., Seastrom, M., Roey, S., 2005. Do disclosure controls to protect confidentiality degrade the quality of the data? In: American Statistical Association, Proceedings of the Section on Survey Research.

Kennickell, A.B., 1997. Multiple imputation and disclosure protection: the case of the 1995 survey of consumer finances. Record Linkage Techniques, 248–267.

Kim, J., 1986. Limiting disclosure in microdata based on random noise and transformation. Bureau of the Census.

Krumm, J., 2007. Inference attacks on location tracks. Proceedings of Fifth International Conference on Pervasive Computingy, 127–143.

Li, N., Li, T., Venkatasubramanian, S., 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on. pp. 106–115.

Liew, C.K., Choi, U.J., Liew, C.J., 1985. A data distortion by probability distribution. ACM Trans. Database Syst. 10 (3), 395–411.

Little, R.J.A., 1993. Statistical analysis of masked data. Journal of Official Statistics 9, 407–426.

Little, R.J.A., Rubin, D.B., 1987. Statistical Analysis with Missing Data. John Wiley & Sons.

Liu, F., Little, R.J.A., 2002. Selective multiple mputation of keys for statistical disclosure control in microdata. In: Proceedings Joint Statistical Meet. pp. 2133–2138.

Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L., April 2008. Privacy: Theory meets practice on the map. In: International Conference on Data Engineering. Cornell University Comuputer Science Department, Cornell, USA, p. 10.

Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M., 2007. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1 (1), 3.

Manning, A.M., Haglin, D.J., Keane, J.A., 2008. A recursive search algorithm for statistical disclosure assessment. Data Min. Knowl. Discov. 16 (2), 165–196.

Marsh, C., Skinner, C., Arber, S., Penhale, B., Openshaw, S., Hobcraft, J., Lievesley, D., Walford, N., 1991. The case for samples of anonymized records from the 1991 census. Journal of the Royal Statistical Society 154 (2), 305–340.

Matthews, G.J., Harel, O., Aseltine, R.H., 2010a. Assessing database privacy using the area under the receiver-operator characteristic curve. Health Services and Outcomes Research Methodology 10 (1), 1–15.

Matthews, G.J., Harel, O., Aseltine, R.H., 2010b. Examining the robustness of fully synthetic data techniques for data with binary variables. Journal of Statistical Computation and Simulation 80 (6), 609–624.

Moore, Jr., R., 1996. Controlled data-swapping techniques for masking public use microdata. Census Tech Report.

Mugge, R., 1983. Issues in protecting confidentiality in national health statistics. Proceedings of the Section on Survey Research Methods.

Nissim, K., Raskhodnikova, S., Smith, A., 2007. Smooth sensitivity and sampling in private data analysis. In: STOC ’07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing. pp. 75–84.

Paass, G., 1988. Disclosure risk and disclosure avoidance for microdata. Journal of Business and Economic Statistics 6 (4), 487–500.

Palley, M., Simonoff, J., 1987. The use of regression methodology for the compromise of confidential information in statistical databases. ACM Trans. Database Systems 12 (4), 593–608.

Raghunathan, T.E., Reiter, J.P., Rubin, D.B., 2003. Multiple imputation for statistical disclosure limitation. Journal of Official Statistics 19 (1), 1–16.

Rajasekaran, S., Harel, O., Zuba, M., Matthews, G.J., Aseltine, Jr., R., 2009. Responsible data releases. In: Proceedings 9th Industrial Conference on Data Mining (ICDM). Springer LNCS, pp. 388–400.

Reiss, S.P., 1984. Practical data-swapping: The first steps. CM Transactions on Database Systems 9, 20–37.

Reiter, J.P., 2002. Satisfying disclosure restriction with synthetic data sets. Journal of Official Statistics 18 (4), 531–543.

Reiter, J.P., 2003. Inference for partially synthetic, public use microdata sets. Survey Methodology 29 (2), 181–188.

Reiter, J.P., 2004a. New approaches to data dissemination: A glimpse into the future (?). Chance 17 (3), 11–15.

Reiter, J.P., 2004b. Simultaneous use of multiple imputation for missing data and disclosure limitation. Survey Methodology 30 (2), 235–242.

Reiter, J.P., 2005a. Estimating risks of identification disclosure in microdata. Journal of the American Statistical Association 100, 1103–1112.

Reiter, J.P., 2005b. Releasing multiply imputed, synthetic public use microdata: An illustration and empirical study. Journal of the Royal Statistical Society, Series A: Statistics in Society 168 (1), 185–205.

Reiter, J.P., 2005c. Using CART to generate partially synthetic public use microdata. Journal of Official Statistics 21 (3), 441–462.

Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.

Rubin, D.B., 1993. Comment on “Statistical disclosure limitation”. Journal of Official Statistics 9, 461–468.

Rubner, Y., Tomasi, C., Guibas, L.J., 1998. A metric for distributions with applications to image databases. Computer Vision, IEEE International Conference on 0, 59.

Sarathy, R., Muralidhar, K., 2002a. The security of confidential numerical data in databases. Information Systems Research 13 (4), 389–403.

Sarathy, R., Muralidhar, K., 2002b. The security of confidential numerical data in databases. Info. Sys. Research 13 (4), 389–403.

Schafer, J.L., Graham, J.W., 2002. Missing data: Our view of state of the art. Psychological Methods 7 (2), 147–177.

Singh, A., Yu, F., Dunteman, G., 2003. MASSC: A new data mask for limiting statistical information loss and disclosure. In: Proceedings of the Joint UNECE/EUROSTAT Work Session on Statistical Data Confidentiality. pp. 373–394.

Skinner, C., 2009. Statistical disclosure control for survey data. In: Pfeffermann, D and Rao, C.R. eds. Handbook of Statistics Vol. 29A: Sample Surveys: Design, Methods and Applications. pp. 381–396.

Skinner, C., Marsh, C., Openshaw, S., Wymer, C., 1994. Disclosure control for census microdata. Journal of Official Statistics 10, 31–51.

Skinner, C., Shlomo, N., 2008. Assessing identification risk in survey microdata using log-linear models. Journal of the American Statistical Association 103, 989–1001.

Skinner, C.J., Elliot, M.J., 2002. A measure of disclosure risk for microdata. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64 (4), 855–867.

Smith, A., 2008. Efficient, dfferentially private point estimators. arXiv:0809.4794v1 [cs.CR].

Spruill, N.L., 1982. Measures of confidentiality. Statistics of Income and Related Administrative Record Research, 131–136.

Spruill, N.L., 1983. The confidentiality and analytic usefulness of masked business microdata. In: Proceedings of the Section on Survey Reserach Microdata. American Statistical Association, pp. 602–607.

Sweeney, L., 1996. Replacing personally-identifying information in medical records, the scrub system. In: American Medical Informatics Association. Hanley and Belfus, Inc., pp. 333–337.

Sweeney, L., 1997. Guaranteeing anonymity when sharing medical data, the datafly system. Journal of the American Medical Informatics Association 4, 51–55.

Sweeney, L., 2002a. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 10 (5), 571–588.

Sweeney, L., 2002b. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 10 (5), 557–570.

Tendick, P., 1991. Optimal noise addition for preserving confidentiality in multivariate data. Journal of Statistical Planning and Inference 27 (2), 341–353.

United Nations Economic Comission for Europe (UNECE), 2007. Manging statistical cinfidentiality and microdata access: Principles and guidlinesof good practice.

Warner, S.L., 1965. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60 (309), 63–69.

Wasserman, L., Zhou, S., 2010. A statistical framework for differential privacy. Journal of the American Statistical Association 105 (489), 375–389.

Willenborg, L., de Waal, T., 2001. Elements of Statistical Disclosure Control. Springer-Verlag.

Woodward, B., 1995. The computer-based patient record and confidentiality. The New England Journal of Medicine, 1419–1422.




de

The ARMA alphabet soup: A tour of ARMA model variants

Scott H. Holan, Robert Lund, Ginger Davis

Source: Statist. Surv., Volume 4, 232--274.

Abstract:
Autoregressive moving-average (ARMA) difference equations are ubiquitous models for short memory time series and have parsimoniously described many stationary series. Variants of ARMA models have been proposed to describe more exotic series features such as long memory autocovariances, periodic autocovariances, and count support set structures. This review paper enumerates, compares, and contrasts the common variants of ARMA models in today’s literature. After the basic properties of ARMA models are reviewed, we tour ARMA variants that describe seasonal features, long memory behavior, multivariate series, changing variances (stochastic volatility) and integer counts. A list of ARMA variant acronyms is provided.

References:
Aknouche, A. and Guerbyenne, H. (2006). Recursive estimation of GARCH models. Communications in Statistics-Simulation and Computation 35 925–938.

Alzaid, A. A. and Al-Osh, M. (1990). An integer-valued pth-order autoregressive structure (INAR (p)) process. Journal of Applied Probability 27 314–324.

Anderson, P. L., Tesfaye, Y. G. and Meerschaert, M. M. (2007). Fourier-PARMA models and their application to river flows. Journal of Hydrologic Engineering 12 462–472.

Ansley, C. F. (1979). An algorithm for the exact likelihood of a mixed autoregressive-moving average process. Biometrika 66 59–65.

Basawa, I. V. and Lund, R. (2001). Large sample properties of parameter estimates for periodic ARMA models. Journal of Time Series Analysis 22 651–663.

Bauwens, L., Laurent, S. and Rombouts, J. V. K. (2006). Multivariate GARCH models: A survey. Journal of Applied Econometrics 21 79–109.

Bertelli, S. and Caporin, M. (2002). A note on calculating autocovariances of long-memory processes. Journal of Time Series Analysis 23 503–508.

Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31 307–327.

Bollerslev, T. (2008). Glossary to ARCH (GARCH). CREATES Research Paper 2008-49.

Bollerslev, T., Engle, R. F. and Wooldridge, J. M. (1988). A capital asset pricing model with time-varying covariances. The Journal of Political Economy 96 116–131.

Bondon, P. and Palma, W. (2007). A class of antipersistent processes. Journal of Time Series Analysis 28 261–273.

Bougerol, P. and Picard, N. (1992). Strict stationarity of generalized autoregressive processes. The Annals of Probability 20 1714–1730.

Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (2008). Time Series Analysis: Forecasting and Control, 4th ed. Wiley, New Jersey.

Breidt, F. J., Davis, R. A. and Trindade, A. A. (2001). Least absolute deviation estimation for all-pass time series models. Annals of Statistics 29 919–946.

Brockwell, P. J. (1994). On continuous-time threshold ARMA processes. Journal of Statistical Planning and Inference 39 291–303.

Brockwell, P. J. (2001). Continuous-time ARMA processes. In Stochastic Processes: Theory and Methods, ( D. N. Shanbhag and C. R. Rao, eds.). Handbook of Statistics 19 249–276. Elsevier.

Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods, 2nd ed. Springer, New York.

Brockwell, P. J. and Davis, R. A. (2002). Introduction to Time Series and Forecasting, 2nd ed. Springer, New York.

Brockwell, P. J. and Marquardt, T. (2005). Lèvy-driven and fractionally integrated ARMA processes with continuous-time paramaters. Statistica Sinica 15 477–494.

Chan, K. S. (1990). Testing for threshold autoregression. Annals of Statistics 18 1886–1894.

Chan, N. H. (2002). Time Series: Applications to Finance. John Wiley & Sons, New York.

Chan, N. H. and Palma, W. (1998). State space modeling of long-memory processes. Annals of Statistics 26 719–740.

Chan, N. H. and Palma, W. (2006). Estimation of long-memory time series models: A survey of different likelihood-based methods. Advances in Econometrics 20 89–121.

Chatfield, C. (2003). The Analysis of Time Series: An Introduction, 6th ed. Chapman & Hall/CRC, Boca Raton.

Chen, W., Hurvich, C. M. and Lu, Y. (2006). On the correlation matrix of the discrete Fourier transform and the fast solution of large Toeplitz systems for long-memory time series. Journal of the American Statistical Association 101 812–822.

Chernick, M. R., Hsing, T. and McCormick, W. P. (1991). Calculating the extremal index for a class of stationary sequences. Advances in Applied Probability 23 835–850.

Chib, S., Nardari, F. and Shephard, N. (2006). Analysis of high dimensional multivariate stochastic volatility models. Journal of Econometrics 134 341–371.

Cryer, J. D. and Chan, K. S. (2008). Time Series Analysis: With Applications in R. Springer, New York.

Cui, Y. and Lund, R. (2009). A new look at time series of counts. Biometrika 96 781–792.

Davis, R. A., Dunsmuir, W. T. M. and Wang, Y. (1999). Modeling time series of count data. In Asymptotics, Nonparametrics and Time Series, ( S. Ghosh, ed.). Statistics Textbooks Monograph 63–113. Marcel Dekker, New York.

Davis, R. A., Dunsmuir, W. and Streett, S. B. (2003). Observation-driven models for Poisson counts. Biometrika 90 777–790.

Davis, R. A. and Resnick, S. I. (1996). Limit theory for bilinear processes with heavy-tailed noise. The Annals of Applied Probability 6 1191–1210.

Deistler, M. and Hannan, E. J. (1981). Some properties of the parameterization of ARMA systems with unknown order. Journal of Multivariate Analysis 11 474–484.

Dufour, J. M. and Jouini, T. (2005). Asymptotic distribution of a simple linear estimator for VARMA models in echelon form. Statistical Modeling and Analysis for Complex Data Problems 209–240.

Dunsmuir, W. and Hannan, E. J. (1976). Vector linear time series models. Advances in Applied Probability 8 339–364.

Durbin, J. and Koopman, S. J. (2001). Time Series Analysis by State Space Methods. Oxford University Press, Oxford.

Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50 987–1007.

Engle, R. F. (2002). Dynamic conditional correlation. Journal of Business and Economic Statistics 20 339–350.

Engle, R. F. and Bollerslev, T. (1986). Modelling the persistence of conditional variances. Econometric Reviews 5 1–50.

Fuller, W. A. (1996). Introduction to Statistical Time Series, 2nd ed. John Wiley & Sons, New York.

Geweke, J. and Porter-Hudak, S. (1983). The estimation and application of long memory time series models. Journal of Time Series Analysis 4 221–238.

Gladyšhev, E. G. (1961). Periodically correlated random sequences. Soviet Math 2 385–388.

Granger, C. W. J. (1982). Acronyms in time series analysis (ATSA). Journal of Time Series Analysis 3 103–107.

Granger, C. W. J. and Andersen, A. P. (1978). An Introduction to Bilinear Time Series Models. Vandenhoeck and Ruprecht Göttingen.

Granger, C. W. J. and Joyeux, R. (1980). An introduction to long-memory time series models and fractional differencing. Journal of Time Series Analysis 1 15–29.

Gray, H. L., Zhang, N. F. and Woodward, W. A. (1989). On generalized fractional processes. Journal of Time Series Analysis 10 233–257.

Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press, Princeton, New Jersey.

Hannan, E. J. (1955). A test for singularities in Sydney rainfall. Australian Journal of Physics 8 289–297.

Hannan, E. J. (1969). The identification of vector mixed autoregressive-moving average system. Biometrika 56 223–225.

Hannan, E. J. (1970). Multiple Time Series. John Wiley & Sons, New York.

Hannan, E. J. (1976). The identification and parameterization of ARMAX and state space forms. Econometrica 44 713–723.

Hannan, E. J. (1979). The Statistical Theory of Linear Systems. In Developments in Statistics ( P. R. Krishnaiah, ed.) 83–121. Academic Press, New York.

Hannan, E. J. and Deistler, M. (1987). The Statistical Theory of Linear Systems. John Wiley & Sons, New York.

Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, Cambridge.

Haslett, J. and Raftery, A. E. (1989). Space-time modelling with long-memory dependence: Assessing Ireland’s wind power resource. Applied Statistics 38 1–50.

Hosking, J. R. M. (1981). Fractional differencing. Biometrika 68 165–176.

Hui, Y. V. and Li, W. K. (1995). On fractionally differenced periodic processes. Sankhyā: The Indian Journal of Statistics, Series B 57 19–31.

Jacobs, P. A. and Lewis, P. A. W. (1978a). Discrete time series generated by mixtures. I: Correlational and runs properties. Journal of the Royal Statistical Society. Series B (Methodological) 40 94–105.

Jacobs, P. A. and Lewis, P. A. W. (1978b). Discrete time series generated by mixtures II: Asymptotic properties. Journal of the Royal Statistical Society. Series B (Methodological) 40 222–228.

Jacobs, P. A. and Lewis, P. A. W. (1983). Stationary discrete autoregressive-moving average time series generated by mixtures. Journal of Time Series Analysis 4 19–36.

Jones, R. H. (1980). Maximum likelihood fitting of ARMA models to time series with missing observations. Technometrics 22 389–395.

Jones, R. H. and Brelsford, W. M. (1967). Time series with periodic structure. Biometrika 54 403–408.

Kedem, B. and Fokianos, K. (2002). Regression Models for Time Series Analysis. John Wiley & Sons, New Jersey.

Ko, K. and Vannucci, M. (2006). Bayesian wavelet-based methods for the detection of multiple changes of the long memory parameter. IEEE Transactions on Signal Processing 54 4461–4470.

Kohn, R. (1979). Asymptotic estimation and hypothesis testing results for vector linear time series models. Econometrica 47 1005–1030.

Kokoszka, P. S. and Taqqu, M. S. (1995). Fractional ARIMA with stable innovations. Stochastic Processes and their Applications 60 19–47.

Kokoszka, P. S. and Taqqu, M. S. (1996). Parameter estimation for infinite variance fractional ARIMA. Annals of Statistics 24 1880–1913.

Lawrance, A. J. and Lewis, P. A. W. (1980). The exponential autoregressive-moving average EARMA(p,q) process. Journal of the Royal Statistical Society. Series B (Methodological) 42 150–161.

Ling, S. and Li, W. K. (1997). On fractionally integrated autoregressive moving-average time series models with conditional heteroscedasticity. Journal of the American Statistical Association 92 1184–1194.

Liu, J. and Brockwell, P. J. (1988). On the general bilinear time series model. Journal of Applied Probability 25 553–564.

Lund, R. and Basawa, I. V. (2000). Recursive prediction and likelihood evaluation for periodic ARMA models. Journal of Time Series Analysis 21 75–93.

Lund, R., Shao, Q. and Basawa, I. (2006). Parsimonious periodic time series modeling. Australian & New Zealand Journal of Statistics 48 33–47.

Lütkepohl, H. (1991). Introduction to Multiple Time Series Analysis. Springer-Verlag, New York.

Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Springer, New York.

MacDonald, I. L. and Zucchini, W. (1997). Hidden Markov and Other Models for Discrete-Valued Time Series. Chapman & Hall/CRC, Boca Raton.

Mann, H. B. and Wald, A. (1943). On the statistical treatment of linear stochastic difference equations. Econometrica 11 173–220.

Marriott, J., Ravishanker, N., Gelfand, A. and Pai, J. (1996). Bayesian analysis of ARMA processes: Complete sampling-based inference under exact likelihoods. In Bayesian Analysis in Statistics and Econometrics: Essays in Honor of Arnold Zellner ( D. Berry, K. Challoner and J. Geweke, eds.) 243–256. Wiley, New York.

McKenzie, E. (1988). Some ARMA models for dependent sequences of Poisson counts. Advances in Applied Probability 20 822–835.

Mikosch, T. and Starica, C. (2004). Nonstationarities in financial time series, the long-range dependence, and the IGARCH effects. Review of Economics and Statistics 86 378–390.

Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica 59 347–370.

Nelson, D. B. and Cao, C. Q. (1992). Inequality constraints in the univariate GARCH model. Journal of Business and Economic Statistics 10 229–235.

Ooms, M. and Franses, P. H. (2001). A seasonal periodic long memory model for monthly river flows. Environmental Modelling & Software 16 559–569.

Pagano, M. (1978). On periodic and multiple autoregressions. Annals of Statistics 6 1310–1317.

Pai, J. S. and Ravishanker, N. (1998). Bayesian analysis of autoregressive fractionally integrated moving-average processes. Journal of Time Series Analysis 19 99–112.

Palma, W. (2007). Long-Memory Time Series: Theory and Methods. John Wiley & Sons, New Jersey.

Palma, W. and Chan, N. H. (2005). Efficient estimation of seasonal long-range-dependent processes. Journal of Time Series Analysis 26 863–892.

Pfeifer, P. E. and Deutsch, S. J. (1980). A three-stage iterative procedure for space-time modeling. Technometrics 22 35–47.

Prado, R. and West, M. (2010). Time Series Modeling, Computation and Inference. Chapman & Hall/CRC, Boca Raton.

Quoreshi, A. M. M. S. (2008). A long memory count data time series model for financial application. Preprint.

R Development Core Team, (2010). R: A Language and Environment for Statistical Computing. http://www.R-project.org.

Ravishanker, N. and Ray, B. K. (1997). Bayesian analysis of vector ARMA models using Gibbs sampling. Journal of Forecasting 16 177–194.

Ravishanker, N. and Ray, B. K. (2002). Bayesian prediction for vector ARFIMA processes. International Journal of Forecasting 18 207–214.

Reinsel, G. C. (1997). Elements of Multivariate Time Series Analysis. Springer, New York.

Resnick, S. I. and Willekens, E. (1991). Moving averages with random coefficients and random coefficient autoregressive models. Communications in Statistics. Stochastic Models 7 511–525.

Rootzén, H. (1986). Extreme value theory for moving average processes. The Annals of Probability 14 612–652.

Scotto, M. G. (2007). Extremes for solutions to stochastic difference equations with regularly varying tails. REVSTAT–Statistical Journal 5 229–247.

Shao, Q. and Lund, R. (2004). Computation and characterization of autocorrelations and partial autocorrelations in periodic ARMA models. Journal of Time Series Analysis 25 359–372.

Shumway, R. H. and Stoffer, D. S. (2006). Time Series Analysis and its Applications: With R Examples, 2nd ed. Springer, New York.

Silvennoinen, A. and Teräsvirta, T. (2009). Multivariate GARCH models. In Handbook of Financial Time Series ( T. Andersen, R. Davis, J. Kreib, and T. Mikosch, eds.) Springer, New York.

Sowell, F. (1992). Maximum likelihood estimation of stationary univariate fractionally integrated time series models. Journal of Econometrics 53 165–188.

Startz, R. (2008). Binomial autoregressive moving average models with an application to U.S. recessions. Journal of Business and Economic Statistics 26 1–8.

Stramer, O., Tweedie, R. L. and Brockwell, P. J. (1996). Existence and stability of continuous time threshold ARMA processes. Statistica Sinica 6 715–732.

Subba Rao, T. (1981). On the theory of bilinear time series models. Journal of the Royal Statistical Society. Series B (Methodological) 43 244–255.

Tong, H. and Lim, K. S. (1980). Threshold autoregression, limit cycles and cyclical data. Journal of the Royal Statistical Society. Series B (Methodological) 42 245–292.

Troutman, B. M. (1979). Some results in periodic autoregression. Biometrika 66 219–228.

Tsai, H. (2009). On continuous-time autoregressive fractionally integrated moving average processes. Bernoulli 15 178–194.

Tsai, H. and Chan, K. S. (2000). A note on the covariance structure of a continuous-time ARMA process. Statistica Sinica 10 989–998.

Tsai, H. and Chan, K. S. (2005). Maximum likelihood estimation of linear continuous time long memory processes with discrete time data. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 67 703–716.

Tsai, H. and Chan, K. S. (2008). A note on inequality constraints in the GARCH model. Econometric Theory 24 823–828.

Tsay, R. S. (1989). Parsimonious parameterization of vector autoregressive moving average models. Journal of Business and Economic Statistics 7 327–341.

Tunnicliffe-Wilson, G. (1979). Some efficient computational procedures for high order ARMA models. Journal of Statistical Computation and Simulation 8 301–309.

Ursu, E. and Duchesne, P. (2009). On modelling and diagnostic checking of vector periodic autoregressive time series models. Journal of Time Series Analysis 30 70–96.

Vecchia, A. V. (1985a). Maximum likelihood estimation for periodic autoregressive moving average models. Technometrics 27 375–384.

Vecchia, A. V. (1985b). Periodic autoregressive-moving average (PARMA) modeling with applications to water resources. Journal of the American Water Resources Association 21 721–730.

Vidakovic, B. (1999). Statistical Modeling by Wavelets. John Wiley & Sons, New York.

West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models, 2nd ed. Springer, New York.

Wold, H. (1954). A Study in the Analysis of Stationary Time Series. Almquist & Wiksell, Stockholm.

Woodward, W. A., Cheng, Q. C. and Gray, H. L. (1998). A k-factor GARMA long-memory model. Journal of Time Series Analysis 19 485–504.

Zivot, E. and Wang, J. (2006). Modeling Financial Time Series with S-PLUS, 2nd ed. Springer, New York.




de

Identifying the consequences of dynamic treatment strategies: A decision-theoretic overview

A. Philip Dawid, Vanessa Didelez

Source: Statist. Surv., Volume 4, 184--231.

Abstract:
We consider the problem of learning about and comparing the consequences of dynamic treatment strategies on the basis of observational data. We formulate this within a probabilistic decision-theoretic framework. Our approach is compared with related work by Robins and others: in particular, we show how Robins’s ‘ G -computation’ algorithm arises naturally from this decision-theoretic perspective. Careful attention is paid to the mathematical and substantive conditions required to justify the use of this formula. These conditions revolve around a property we term stability , which relates the probabilistic behaviours of observational and interventional regimes. We show how an assumption of ‘sequential randomization’ (or ‘no unmeasured confounders’), or an alternative assumption of ‘sequential irrelevance’, can be used to infer stability. Probabilistic influence diagrams are used to simplify manipulations, and their power and limitations are discussed. We compare our approach with alternative formulations based on causal DAGs or potential response models. We aim to show that formulating the problem of assessing dynamic treatment strategies as a problem of decision analysis brings clarity, simplicity and generality.

References:
Arjas, E. and Parner, J. (2004). Causal reasoning from longitudinal data. Scandinavian Journal of Statistics 31 171–187.

Arjas, E. and Saarela, O. (2010). Optimal dynamic regimes: Presenting a case for predictive inference. The International Journal of Biostatistics 6. http://tinyurl.com/33dfssf

Cowell, R. G., Dawid, A. P., Lauritzen, S. L. and Spiegelhalter, D. J. (1999). Probabilistic Networks and Expert Systems. Springer, New York.

Dawid, A. P. (1979). Conditional independence in statistical theory (with Discussion). Journal of the Royal Statistical Society, Series B 41 1–31.

Dawid, A. P. (1992). Applications of a general propagation algorithm for probabilistic expert systems. Statistics and Computing 2 25–36.

Dawid, A. P. (1998). Conditional independence. In Encyclopedia of Statistical Science ({U}pdate Volume 2) ( S. Kotz, C. B. Read and D. L. Banks, eds.) 146–155. Wiley-Interscience, New York.

Dawid, A. P. (2000). Causal inference without counterfactuals (with Discussion). Journal of the American Statistical Association 95 407–448.

Dawid, A. P. (2001). Separoids: A mathematical framework for conditional independence and irrelevance. Annals of Mathematics and Artificial Intelligence 32 335–372.

Dawid, A. P. (2002). Influence diagrams for causal modelling and inference. International Statistical Review 70 161–189. Corrigenda, ibid ., 437.

Dawid, A. P. (2003). Causal inference using influence diagrams: The problem of partial compliance (with Discussion). In Highly Structured Stochastic Systems ( P. J. Green, N. L. Hjort and S. Richardson, eds.) 45–81. Oxford University Press.

Dawid, A. P. (2010). Beware of the DAG! In Proceedings of the NIPS 2008 Workshop on Causality. Journal of Machine Learning Research Workshop and Conference Proceedings ( D. Janzing, I. Guyon and B. Schölkopf, eds.) 6 59–86. http://tinyurl.com/33va7tm

Dawid, A. P. and Didelez, V. (2008). Identifying optimal sequential decisions. In Proceedings of the Twenty-Fourth Annual Conference on Uncertainty in Artificial Intelligence (UAI-08) ( D. McAllester and A. Nicholson, eds.). 113-120. AUAI Press, Corvallis, Oregon. http://tinyurl.com/3899qpp

Dechter, R. (2003). Constraint Processing. Morgan Kaufmann Publishers.

Didelez, V., Dawid, A. P. and Geneletti, S. G. (2006). Direct and indirect effects of sequential treatments. In Proceedings of the Twenty-Second Annual Conference on Uncertainty in Artificial Intelligence (UAI-06) ( R. Dechter and T. Richardson, eds.). 138-146. AUAI Press, Arlington, Virginia. http://tinyurl.com/32w3f4e

Didelez, V., Kreiner, S. and Keiding, N. (2010). Graphical models for inference under outcome dependent sampling. Statistical Science (to appear).

Didelez, V. and Sheehan, N. S. (2007). Mendelian randomisation: Why epidemiology needs a formal language for causality. In Causality and Probability in the Sciences, ( F. Russo and J. Williamson, eds.). Texts in Philosophy Series 5 263–292. College Publications, London.

Eichler, M. and Didelez, V. (2010). Granger-causality and the effect of interventions in time series. Lifetime Data Analysis 16 3–32.

Ferguson, T. S. (1967). Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New York, London.

Geneletti, S. G. (2007). Identifying direct and indirect effects in a non–counterfactual framework. Journal of the Royal Statistical Society: Series B 69 199–215.

Geneletti, S. G. and Dawid, A. P. (2010). Defining and identifying the effect of treatment on the treated. In Causality in the Sciences ( P. M. Illari, F. Russo and J. Williamson, eds.) Oxford University Press (to appear).

Gill, R. D. and Robins, J. M. (2001). Causal inference for complex longitudinal data: The continuous case. Annals of Statistics 29 1785–1811.

Guo, H. and Dawid, A. P. (2010). Sufficient covariates and linear propensity analysis. In Proceedings of the Thirteenth International Workshop on Artificial Intelligence and Statistics, (AISTATS) 2010, Chia Laguna, Sardinia, Italy, May 13-15, 2010. Journal of Machine Learning Research Workshop and Conference Proceedings ( Y. W. Teh and D. M. Titterington, eds.) 9 281–288. http://tinyurl.com/33lmuj7

Henderson, R., Ansel, P. and Alshibani, D. (2010). Regret-regression for optimal dynamic treatment regimes. Biometrics (to appear). doi:10.1111/j.1541-0420.2009.01368.x

Hernán, M. A. and Taubman, S. L. (2008). Does obesity shorten life? The importance of well defined interventions to answer causal questions. International Journal of Obesity 32 S8–S14.

Holland, P. W. (1986). Statistics and causal inference (with Discussion). Journal of the American Statistical Association 81 945–970.

Huang, Y. and Valtorta, M. (2006). Identifiability in causal Bayesian networks: A sound and complete algorithm. In AAAI’06: Proceedings of the 21st National Conference on Artificial Intelligence 1149–1154. AAAI Press.

Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science 22 523–539.

Lauritzen, S. L., Dawid, A. P., Larsen, B. N. and Leimer, H. G. (1990). Independence properties of directed Markov fields. Networks 20 491–505.

Lok, J., Gill, R., van der Vaart, A. and Robins, J. (2004). Estimating the causal effect of a time-varying treatment on time-to-event using structural nested failure time models. Statistica Neerlandica 58 271–295.

Moodie, E. M., Richardson, T. S. and Stephens, D. A. (2007). Demystifying optimal dynamic treatment regimes. Biometrics 63 447–455.

Murphy, S. A. (2003). Optimal dynamic treatment regimes (with Discussion). Journal of the Royal Statistical Society, Series B 65 331-366.

Oliver, R. M. and Smith, J. Q., eds. (1990). Influence Diagrams, Belief Nets and Decision Analysis. John Wiley and Sons, Chichester, United Kingdom.

Pearl, J. (1995). Causal diagrams for empirical research (with Discussion). Biometrika 82 669-710.

Pearl, J. (2009). Causality: Models, Reasoning and Inference, Second ed. Cambridge University Press, Cambridge.

Pearl, J. and Paz, A. (1987). Graphoids: A graph-based logic for reasoning about relevance relations. In Advances in Artificial Intelligence ( D. Hogg and L. Steels, eds.) II 357–363. North-Holland, Amsterdam.

Pearl, J. and Robins, J. (1995). Probabilistic evaluation of sequential plans from causal models with hidden variables. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence ( P. Besnard and S. Hanks, eds.) 444–453. Morgan Kaufmann Publishers, San Francisco.

Raiffa, H. (1968). Decision Analysis. Addison-Wesley, Reading, Massachusetts.

Robins, J. M. (1986). A new approach to causal inference in mortality studies with sustained exposure periods—Application to control of the healthy worker survivor effect. Mathematical Modelling 7 1393–1512.

Robins, J. M. (1987). Addendum to “A new approach to causal inference in mortality studies with sustained exposure periods—Application to control of the healthy worker survivor effect”. Computers & Mathematics with Applications 14 923–945.

Robins, J. M. (1989). The analysis of randomized and nonrandomized AIDS treatment trials using a new approach to causal inference in longitudinal studies. In Health Service Research Methodology: A Focus on AIDS ( L. Sechrest, H. Freeman and A. Mulley, eds.) 113–159. NCSHR, U.S. Public Health Service.

Robins, J. M. (1992). Estimation of the time-dependent accelerated failure time model in the presence of confounding factors. Biometrika 79 321–324.

Robins, J. M. (1997). Causal inference from complex longitudinal data. In Latent Variable Modeling and Applications to Causality, ( M. Berkane, ed.). Lecture Notes in Statistics 120 69–117. Springer-Verlag, New York.

Robins, J. M. (1998). Structural nested failure time models. In Survival Analysis, ( P. K. Andersen and N. Keiding, eds.). Encyclopedia of Biostatistics 6 4372–4389. John Wiley and Sons, Chichester, UK.

Robins, J. M. (2000). Robust estimation in sequentially ignorable missing data and causal inference models. In Proceedings of the American Statistical Association Section on Bayesian Statistical Science 1999 6–10.

Robins, J. M. (2004). Optimal structural nested models for optimal sequential decisions. In Proceedings of the Second Seattle Symposium on Biostatistics ( D. Y. Lin and P. Heagerty, eds.) 189–326. Springer, New York.

Robins, J. M., Greenland, S. and Hu, F. C. (1999). Estimation of the causal effect of a time-varying exposure on the marginal mean of a repeated binary outcome. Journal of the American Statistical Association 94 687–700.

Robins, J. M., Hernán, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11 550–560.

Robins, J. M. and Wasserman, L. A. (1997). Estimation of effects of sequential treatments by reparameterizing directed acyclic graphs. In Proceedings of the 13th Annual Conference on Uncertainty in Artificial Intelligence ( D. Geiger and P. Shenoy, eds.) 409-420. Morgan Kaufmann Publishers, San Francisco. http://tinyurl.com/33ghsas

Rosthøj, S., Fullwood, C., Henderson, R. and Stewart, S. (2006). Estimation of optimal dynamic anticoagulation regimes from observational data: A regret-based approach. Statistics in Medicine 25 4197–4215.

Shpitser, I. and Pearl, J. (2006a). Identification of conditional interventional distributions. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06) ( R. Dechter and T. Richardson, eds.). 437–444. AUAI Press, Corvallis, Oregon. http://tinyurl.com/2um8w47

Shpitser, I. and Pearl, J. (2006b). Identification of joint interventional distributions in recursive semi-Markovian causal models. In Proceedings of the Twenty-First National Conference on Artificial Intelligence 1219–1226. AAAI Press, Menlo Park, California.

Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction and Search, Second ed. Springer-Verlag, New York.

Sterne, J. A. C., May, M., Costagliola, D., de Wolf, F., Phillips, A. N., Harris, R., Funk, M. J., Geskus, R. B., Gill, J., Dabis, F., Miro, J. M., Justice, A. C., Ledergerber, B., Fatkenheuer, G., Hogg, R. S., D’Arminio-Monforte, A., Saag, M., Smith, C., Staszewski, S., Egger, M., Cole, S. R. and When To Start Consortium (2009). Timing of initiation of antiretroviral therapy in AIDS-Free HIV-1-infected patients: A collaborative analysis of 18 HIV cohort studies. Lancet 373 1352–1363.

Taubman, S. L., Robins, J. M., Mittleman, M. A. and Hernán, M. A. (2009). Intervening on risk factors for coronary heart disease: An application of the parametric g-formula. International Journal of Epidemiology 38 1599–1611.

Tian, J. (2008). Identifying dynamic sequential plans. In Proceedings of the Twenty-Fourth Annual Conference on Uncertainty in Artificial Intelligence (UAI-08) ( D. McAllester and A. Nicholson, eds.). 554–561. AUAI Press, Corvallis, Oregon. http://tinyurl.com/36ufx2h

Verma, T. and Pearl, J. (1990). Causal networks: Semantics and expressiveness. In Uncertainty in Artificial Intelligence 4 ( R. D. Shachter, T. S. Levitt, L. N. Kanal and J. F. Lemmer, eds.) 69–76. North-Holland, Amsterdam.




de

Primal and dual model representations in kernel-based learning

Johan A.K. Suykens, Carlos Alzate, Kristiaan Pelckmans

Source: Statist. Surv., Volume 4, 148--183.

Abstract:
This paper discusses the role of primal and (Lagrange) dual model representations in problems of supervised and unsupervised learning. The specification of the estimation problem is conceived at the primal level as a constrained optimization problem. The constraints relate to the model which is expressed in terms of the feature map. From the conditions for optimality one jointly finds the optimal model representation and the model estimate. At the dual level the model is expressed in terms of a positive definite kernel function, which is characteristic for a support vector machine methodology. It is discussed how least squares support vector machines are playing a central role as core models across problems of regression, classification, principal component analysis, spectral clustering, canonical correlation analysis, dimensionality reduction and data visualization.




de

Finite mixture models and model-based clustering

Volodymyr Melnykov, Ranjan Maitra

Source: Statist. Surv., Volume 4, 80--116.

Abstract:
Finite mixture models have a long history in statistics, having been used to model population heterogeneity, generalize distributional assumptions, and lately, for providing a convenient yet formal framework for clustering and classification. This paper provides a detailed review into mixture models and model-based clustering. Recent trends as well as open problems in the area are also discussed.




de

A survey of cross-validation procedures for model selection

Sylvain Arlot, Alain Celisse

Source: Statist. Surv., Volume 4, 40--79.

Abstract:
Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its (apparent) universality. Many results exist on model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.




de

Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules

Michael P. Fay, Michael A. Proschan

Source: Statist. Surv., Volume 4, 1--39.

Abstract:
In a mathematical approach to hypothesis tests, we start with a clearly defined set of hypotheses and choose the test with the best properties for those hypotheses. In practice, we often start with less precise hypotheses. For example, often a researcher wants to know which of two groups generally has the larger responses, and either a t-test or a Wilcoxon-Mann-Whitney (WMW) test could be acceptable. Although both t-tests and WMW tests are usually associated with quite different hypotheses, the decision rule and p-value from either test could be associated with many different sets of assumptions, which we call perspectives. It is useful to have many of the different perspectives to which a decision rule may be applied collected in one place, since each perspective allows a different interpretation of the associated p-value. Here we collect many such perspectives for the two-sample t-test, the WMW test and other related tests. We discuss validity and consistency under each perspective and discuss recommendations between the tests in light of these many different perspectives. Finally, we briefly discuss a decision rule for testing genetic neutrality where knowledge of the many perspectives is vital to the proper interpretation of the decision rule.




de

Unsupervised Pre-trained Models from Healthy ADLs Improve Parkinson's Disease Classification of Gait Patterns. (arXiv:2005.02589v2 [cs.LG] UPDATED)

Application and use of deep learning algorithms for different healthcare applications is gaining interest at a steady pace. However, use of such algorithms can prove to be challenging as they require large amounts of training data that capture different possible variations. This makes it difficult to use them in a clinical setting since in most health applications researchers often have to work with limited data. Less data can cause the deep learning model to over-fit. In this paper, we ask how can we use data from a different environment, different use-case, with widely differing data distributions. We exemplify this use case by using single-sensor accelerometer data from healthy subjects performing activities of daily living - ADLs (source dataset), to extract features relevant to multi-sensor accelerometer gait data (target dataset) for Parkinson's disease classification. We train the pre-trained model using the source dataset and use it as a feature extractor. We show that the features extracted for the target dataset can be used to train an effective classification model. Our pre-trained source model consists of a convolutional autoencoder, and the target classification model is a simple multi-layer perceptron model. We explore two different pre-trained source models, trained using different activity groups, and analyze the influence the choice of pre-trained model has over the task of Parkinson's disease classification.




de

Generating Thermal Image Data Samples using 3D Facial Modelling Techniques and Deep Learning Methodologies. (arXiv:2005.01923v2 [cs.CV] UPDATED)

Methods for generating synthetic data have become of increasing importance to build large datasets required for Convolution Neural Networks (CNN) based deep learning techniques for a wide range of computer vision applications. In this work, we extend existing methodologies to show how 2D thermal facial data can be mapped to provide 3D facial models. For the proposed research work we have used tufts datasets for generating 3D varying face poses by using a single frontal face pose. The system works by refining the existing image quality by performing fusion based image preprocessing operations. The refined outputs have better contrast adjustments, decreased noise level and higher exposedness of the dark regions. It makes the facial landmarks and temperature patterns on the human face more discernible and visible when compared to original raw data. Different image quality metrics are used to compare the refined version of images with original images. In the next phase of the proposed study, the refined version of images is used to create 3D facial geometry structures by using Convolution Neural Networks (CNN). The generated outputs are then imported in blender software to finally extract the 3D thermal facial outputs of both males and females. The same technique is also used on our thermal face data acquired using prototype thermal camera (developed under Heliaus EU project) in an indoor lab environment which is then used for generating synthetic 3D face data along with varying yaw face angles and lastly facial depth map is generated.




de

Interpreting Rate-Distortion of Variational Autoencoder and Using Model Uncertainty for Anomaly Detection. (arXiv:2005.01889v2 [cs.LG] UPDATED)

Building a scalable machine learning system for unsupervised anomaly detection via representation learning is highly desirable. One of the prevalent methods is using a reconstruction error from variational autoencoder (VAE) via maximizing the evidence lower bound. We revisit VAE from the perspective of information theory to provide some theoretical foundations on using the reconstruction error, and finally arrive at a simpler and more effective model for anomaly detection. In addition, to enhance the effectiveness of detecting anomalies, we incorporate a practical model uncertainty measure into the metric. We show empirically the competitive performance of our approach on benchmark datasets.




de

How many modes can a constrained Gaussian mixture have?. (arXiv:2005.01580v2 [math.ST] UPDATED)

We show, by an explicit construction, that a mixture of univariate Gaussians with variance 1 and means in $[-A,A]$ can have $Omega(A^2)$ modes. This disproves a recent conjecture of Dytso, Yagli, Poor and Shamai [IEEE Trans. Inform. Theory, Apr. 2020], who showed that such a mixture can have at most $O(A^2)$ modes and surmised that the upper bound could be improved to $O(A)$. Our result holds even if an additional variance constraint is imposed on the mixing distribution. Extending the result to higher dimensions, we exhibit a mixture of Gaussians in $mathbb{R}^d$, with identity covariances and means inside $[-A,A]^d$, that has $Omega(A^{2d})$ modes.




de

Data-Space Inversion Using a Recurrent Autoencoder for Time-Series Parameterization. (arXiv:2005.00061v2 [stat.ML] UPDATED)

Data-space inversion (DSI) and related procedures represent a family of methods applicable for data assimilation in subsurface flow settings. These methods differ from model-based techniques in that they provide only posterior predictions for quantities (time series) of interest, not posterior models with calibrated parameters. DSI methods require a large number of flow simulations to first be performed on prior geological realizations. Given observed data, posterior predictions can then be generated directly. DSI operates in a Bayesian setting and provides posterior samples of the data vector. In this work we develop and evaluate a new approach for data parameterization in DSI. Parameterization reduces the number of variables to determine in the inversion, and it maintains the physical character of the data variables. The new parameterization uses a recurrent autoencoder (RAE) for dimension reduction, and a long-short-term memory (LSTM) network to represent flow-rate time series. The RAE-based parameterization is combined with an ensemble smoother with multiple data assimilation (ESMDA) for posterior generation. Results are presented for two- and three-phase flow in a 2D channelized system and a 3D multi-Gaussian model. The RAE procedure, along with existing DSI treatments, are assessed through comparison to reference rejection sampling (RS) results. The new DSI methodology is shown to consistently outperform existing approaches, in terms of statistical agreement with RS results. The method is also shown to accurately capture derived quantities, which are computed from variables considered directly in DSI. This requires correlation and covariance between variables to be properly captured, and accuracy in these relationships is demonstrated. The RAE-based parameterization developed here is clearly useful in DSI, and it may also find application in other subsurface flow problems.




de

A bimodal gamma distribution: Properties, regression model and applications. (arXiv:2004.12491v2 [stat.ME] UPDATED)

In this paper we propose a bimodal gamma distribution using a quadratic transformation based on the alpha-skew-normal model. We discuss several properties of this distribution such as mean, variance, moments, hazard rate and entropy measures. Further, we propose a new regression model with censored data based on the bimodal gamma distribution. This regression model can be very useful to the analysis of real data and could give more realistic fits than other special regression models. Monte Carlo simulations were performed to check the bias in the maximum likelihood estimation. The proposed models are applied to two real data sets found in literature.




de

Excess registered deaths in England and Wales during the COVID-19 pandemic, March 2020 and April 2020. (arXiv:2004.11355v4 [stat.AP] UPDATED)

Official counts of COVID-19 deaths have been criticized for potentially including people who did not die of COVID-19 but merely died with COVID-19. I address that critique by fitting a generalized additive model to weekly counts of all registered deaths in England and Wales during the 2010s. The model produces baseline rates of death registrations expected in the absence of the COVID-19 pandemic, and comparing those baselines to recent counts of registered deaths exposes the emergence of excess deaths late in March 2020. Among adults aged 45+, about 38,700 excess deaths were registered in the 5 weeks comprising 21 March through 24 April (612 $pm$ 416 from 21$-$27 March, 5675 $pm$ 439 from 28 March through 3 April, then 9183 $pm$ 468, 12,712 $pm$ 589, and 10,511 $pm$ 567 in April's next 3 weeks). Both the Office for National Statistics's respective count of 26,891 death certificates which mention COVID-19, and the Department of Health and Social Care's hospital-focused count of 21,222 deaths, are appreciably less, implying that their counting methods have underestimated rather than overestimated the pandemic's true death toll. If underreporting rates have held steady, about 45,900 direct and indirect COVID-19 deaths might have been registered by April's end but not yet publicly reported in full.




de

On a phase transition in general order spline regression. (arXiv:2004.10922v2 [math.ST] UPDATED)

In the Gaussian sequence model $Y= heta_0 + varepsilon$ in $mathbb{R}^n$, we study the fundamental limit of approximating the signal $ heta_0$ by a class $Theta(d,d_0,k)$ of (generalized) splines with free knots. Here $d$ is the degree of the spline, $d_0$ is the order of differentiability at each inner knot, and $k$ is the maximal number of pieces. We show that, given any integer $dgeq 0$ and $d_0in{-1,0,ldots,d-1}$, the minimax rate of estimation over $Theta(d,d_0,k)$ exhibits the following phase transition: egin{equation*} egin{aligned} inf_{widetilde{ heta}}sup_{ hetainTheta(d,d_0, k)}mathbb{E}_ heta|widetilde{ heta} - heta|^2 asymp_d egin{cases} kloglog(16n/k), & 2leq kleq k_0,\ klog(en/k), & k geq k_0+1. end{cases} end{aligned} end{equation*} The transition boundary $k_0$, which takes the form $lfloor{(d+1)/(d-d_0) floor} + 1$, demonstrates the critical role of the regularity parameter $d_0$ in the separation between a faster $log log(16n)$ and a slower $log(en)$ rate. We further show that, once encouraging an additional '$d$-monotonicity' shape constraint (including monotonicity for $d = 0$ and convexity for $d=1$), the above phase transition is eliminated and the faster $kloglog(16n/k)$ rate can be achieved for all $k$. These results provide theoretical support for developing $ell_0$-penalized (shape-constrained) spline regression procedures as useful alternatives to $ell_1$- and $ell_2$-penalized ones.




de

Deep transfer learning for improving single-EEG arousal detection. (arXiv:2004.05111v2 [cs.CV] UPDATED)

Datasets in sleep science present challenges for machine learning algorithms due to differences in recording setups across clinics. We investigate two deep transfer learning strategies for overcoming the channel mismatch problem for cases where two datasets do not contain exactly the same setup leading to degraded performance in single-EEG models. Specifically, we train a baseline model on multivariate polysomnography data and subsequently replace the first two layers to prepare the architecture for single-channel electroencephalography data. Using a fine-tuning strategy, our model yields similar performance to the baseline model (F1=0.682 and F1=0.694, respectively), and was significantly better than a comparable single-channel model. Our results are promising for researchers working with small databases who wish to use deep learning models pre-trained on larger databases.




de

Strong Converse for Testing Against Independence over a Noisy channel. (arXiv:2004.00775v2 [cs.IT] UPDATED)

A distributed binary hypothesis testing (HT) problem over a noisy (discrete and memoryless) channel studied previously by the authors is investigated from the perspective of the strong converse property. It was shown by Ahlswede and Csisz'{a}r that a strong converse holds in the above setting when the channel is rate-limited and noiseless. Motivated by this observation, we show that the strong converse continues to hold in the noisy channel setting for a special case of HT known as testing against independence (TAI), under the assumption that the channel transition matrix has non-zero elements. The proof utilizes the blowing up lemma and the recent change of measure technique of Tyagi and Watanabe as the key tools.




de

Risk-Aware Energy Scheduling for Edge Computing with Microgrid: A Multi-Agent Deep Reinforcement Learning Approach. (arXiv:2003.02157v2 [physics.soc-ph] UPDATED)

In recent years, multi-access edge computing (MEC) is a key enabler for handling the massive expansion of Internet of Things (IoT) applications and services. However, energy consumption of a MEC network depends on volatile tasks that induces risk for energy demand estimations. As an energy supplier, a microgrid can facilitate seamless energy supply. However, the risk associated with energy supply is also increased due to unpredictable energy generation from renewable and non-renewable sources. Especially, the risk of energy shortfall is involved with uncertainties in both energy consumption and generation. In this paper, we study a risk-aware energy scheduling problem for a microgrid-powered MEC network. First, we formulate an optimization problem considering the conditional value-at-risk (CVaR) measurement for both energy consumption and generation, where the objective is to minimize the loss of energy shortfall of the MEC networks and we show this problem is an NP-hard problem. Second, we analyze our formulated problem using a multi-agent stochastic game that ensures the joint policy Nash equilibrium, and show the convergence of the proposed model. Third, we derive the solution by applying a multi-agent deep reinforcement learning (MADRL)-based asynchronous advantage actor-critic (A3C) algorithm with shared neural networks. This method mitigates the curse of dimensionality of the state space and chooses the best policy among the agents for the proposed problem. Finally, the experimental results establish a significant performance gain by considering CVaR for high accuracy energy scheduling of the proposed model than both the single and random agent models.




de

A Distributionally Robust Area Under Curve Maximization Model. (arXiv:2002.07345v2 [math.OC] UPDATED)

Area under ROC curve (AUC) is a widely used performance measure for classification models. We propose two new distributionally robust AUC maximization models (DR-AUC) that rely on the Kantorovich metric and approximate the AUC with the hinge loss function. We consider the two cases with respectively fixed and variable support for the worst-case distribution. We use duality theory to reformulate the DR-AUC models and derive tractable convex optimization problems. The numerical experiments show that the proposed DR-AUC models -- benchmarked with the standard deterministic AUC and the support vector machine models - perform better in general and in particular improve the worst-case out-of-sample performance over the majority of the considered datasets, thereby showing their robustness. The results are particularly encouraging since our numerical experiments are conducted with training sets of small size which have been known to be conducive to low out-of-sample performance.




de

Statistical aspects of nuclear mass models. (arXiv:2002.04151v3 [nucl-th] UPDATED)

We study the information content of nuclear masses from the perspective of global models of nuclear binding energies. To this end, we employ a number of statistical methods and diagnostic tools, including Bayesian calibration, Bayesian model averaging, chi-square correlation analysis, principal component analysis, and empirical coverage probability. Using a Bayesian framework, we investigate the structure of the 4-parameter Liquid Drop Model by considering discrepant mass domains for calibration. We then use the chi-square correlation framework to analyze the 14-parameter Skyrme energy density functional calibrated using homogeneous and heterogeneous datasets. We show that a quite dramatic parameter reduction can be achieved in both cases. The advantage of Bayesian model averaging for improving uncertainty quantification is demonstrated. The statistical approaches used are pedagogically described; in this context this work can serve as a guide for future applications.




de

On the impact of selected modern deep-learning techniques to the performance and celerity of classification models in an experimental high-energy physics use case. (arXiv:2002.01427v3 [physics.data-an] UPDATED)

Beginning from a basic neural-network architecture, we test the potential benefits offered by a range of advanced techniques for machine learning, in particular deep learning, in the context of a typical classification problem encountered in the domain of high-energy physics, using a well-studied dataset: the 2014 Higgs ML Kaggle dataset. The advantages are evaluated in terms of both performance metrics and the time required to train and apply the resulting models. Techniques examined include domain-specific data-augmentation, learning rate and momentum scheduling, (advanced) ensembling in both model-space and weight-space, and alternative architectures and connection methods.

Following the investigation, we arrive at a model which achieves equal performance to the winning solution of the original Kaggle challenge, whilst being significantly quicker to train and apply, and being suitable for use with both GPU and CPU hardware setups. These reductions in timing and hardware requirements potentially allow the use of more powerful algorithms in HEP analyses, where models must be retrained frequently, sometimes at short notice, by small groups of researchers with limited hardware resources. Additionally, a new wrapper library for PyTorch called LUMINis presented, which incorporates all of the techniques studied.




de

Bayesian factor models for multivariate categorical data obtained from questionnaires. (arXiv:1910.04283v2 [stat.AP] UPDATED)

Factor analysis is a flexible technique for assessment of multivariate dependence and codependence. Besides being an exploratory tool used to reduce the dimensionality of multivariate data, it allows estimation of common factors that often have an interesting theoretical interpretation in real problems. However, standard factor analysis is only applicable when the variables are scaled, which is often inappropriate, for example, in data obtained from questionnaires in the field of psychology,where the variables are often categorical. In this framework, we propose a factor model for the analysis of multivariate ordered and non-ordered polychotomous data. The inference procedure is done under the Bayesian approach via Markov chain Monte Carlo methods. Two Monte-Carlo simulation studies are presented to investigate the performance of this approach in terms of estimation bias, precision and assessment of the number of factors. We also illustrate the proposed method to analyze participants' responses to the Motivational State Questionnaire dataset, developed to study emotions in laboratory and field settings.




de

Differentiable Sparsification for Deep Neural Networks. (arXiv:1910.03201v2 [cs.LG] UPDATED)

A deep neural network has relieved the burden of feature engineering by human experts, but comparable efforts are instead required to determine an effective architecture. On the other hands, as the size of a network has over-grown, a lot of resources are also invested to reduce its size. These problems can be addressed by sparsification of an over-complete model, which removes redundant parameters or connections by pruning them away after training or encouraging them to become zero during training. In general, however, these approaches are not fully differentiable and interrupt an end-to-end training process with the stochastic gradient descent in that they require either a parameter selection or a soft-thresholding step. In this paper, we propose a fully differentiable sparsification method for deep neural networks, which allows parameters to be exactly zero during training, and thus can learn the sparsified structure and the weights of networks simultaneously using the stochastic gradient descent. We apply the proposed method to various popular models in order to show its effectiveness.




de

DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs. (arXiv:1909.13003v4 [cs.LG] UPDATED)

A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty. We cast POMDP filtering and planning problems as two closely related Sequential Monte Carlo (SMC) processes, one over the real states and the other over the future optimal trajectories, and combine the merits of these two parts in a new model named the DualSMC network. In particular, we first introduce an adversarial particle filter that leverages the adversarial relationship between its internal components. Based on the filtering results, we then propose a planning algorithm that extends the previous SMC planning approach [Piche et al., 2018] to continuous POMDPs with an uncertainty-dependent policy. Crucially, not only can DualSMC handle complex observations such as image input but also it remains highly interpretable. It is shown to be effective in three continuous POMDP domains: the floor positioning domain, the 3D light-dark navigation domain, and a modified Reacher domain.




de

Estimating drift parameters in a non-ergodic Gaussian Vasicek-type model. (arXiv:1909.06155v2 [math.PR] UPDATED)

We study the problem of parameter estimation for a non-ergodic Gaussian Vasicek-type model defined as $dX_t=(mu+ heta X_t)dt+dG_t, tgeq0$ with unknown parameters $ heta>0$ and $muinR$, where $G$ is a Gaussian process. We provide least square-type estimators $widetilde{ heta}_T$ and $widetilde{mu}_T$ respectively for the drift parameters $ heta$ and $mu$ based on continuous-time observations ${X_t, tin[0,T]}$ as $T ightarrowinfty$.

Our aim is to derive some sufficient conditions on the driving Gaussian process $G$ in order to ensure that $widetilde{ heta}_T$ and $widetilde{mu}_T$ are strongly consistent, the limit distribution of $widetilde{ heta}_T$ is a Cauchy-type distribution and $widetilde{mu}_T$ is asymptotically normal. We apply our result to fractional Vasicek, subfractional Vasicek and bifractional Vasicek processes. In addition, this work extends the result of cite{EEO} studied in the case where $mu=0$.




de

Additive Bayesian variable selection under censoring and misspecification. (arXiv:1907.13563v3 [stat.ME] UPDATED)

We study the interplay of two important issues on Bayesian model selection (BMS): censoring and model misspecification. We consider additive accelerated failure time (AAFT), Cox proportional hazards and probit models, and a more general concave log-likelihood structure. A fundamental question is what solution can one hope BMS to provide, when (inevitably) models are misspecified. We show that asymptotically BMS keeps any covariate with predictive power for either the outcome or censoring times, and discards other covariates. Misspecification refers to assuming the wrong model or functional effect on the response, including using a finite basis for a truly non-parametric effect, or omitting truly relevant covariates. We argue for using simple models that are computationally practical yet attain good power to detect potentially complex effects, despite misspecification. Misspecification and censoring both have an asymptotically negligible effect on (suitably-defined) false positives, but their impact on power is exponential. We portray these issues via simple descriptions of early/late censoring and the drop in predictive accuracy due to misspecification. From a methods point of view, we consider local priors and a novel structure that combines local and non-local priors to enforce sparsity. We develop algorithms to capitalize on the AAFT tractability, approximations to AAFT and probit likelihoods giving significant computational gains, a simple augmented Gibbs sampler to hierarchically explore linear and non-linear effects, and an implementation in the R package mombf. We illustrate the proposed methods and others based on likelihood penalties via extensive simulations under misspecification and censoring. We present two applications concerning the effect of gene expression on colon and breast cancer.




de

Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes. (arXiv:1212.4137v2 [stat.ML] UPDATED)

Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations. In this paper we consider 8 different optimization formulations for computing a single sparse loading vector; these are obtained by combining the following factors: we employ two norms for measuring variance (L2, L1) and two sparsity-inducing norms (L0, L1), which are used in two different ways (constraint, penalty). Three of our formulations, notably the one with L0 constraint and L1 variance, have not been considered in the literature. We give a unifying reformulation which we propose to solve via a natural alternating maximization (AM) method. We show the the AM method is nontrivially equivalent to GPower (Journ'{e}e et al; JMLR 11:517--553, 2010) for all our formulations. Besides this, we provide 24 efficient parallel SPCA implementations: 3 codes (multi-core, GPU and cluster) for each of the 8 problems. Parallelism in the methods is aimed at i) speeding up computations (our GPU code can be 100 times faster than an efficient serial code written in C++), ii) obtaining solutions explaining more variance and iii) dealing with big data problems (our cluster code is able to solve a 357 GB problem in about a minute).




de

Nonstationary Bayesian modeling for a large data set of derived surface temperature return values. (arXiv:2005.03658v1 [stat.ME])

Heat waves resulting from prolonged extreme temperatures pose a significant risk to human health globally. Given the limitations of observations of extreme temperature, climate models are often used to characterize extreme temperature globally, from which one can derive quantities like return values to summarize the magnitude of a low probability event for an arbitrary geographic location. However, while these derived quantities are useful on their own, it is also often important to apply a spatial statistical model to such data in order to, e.g., understand how the spatial dependence properties of the return values vary over space and emulate the climate model for generating additional spatial fields with corresponding statistical properties. For these objectives, when modeling global data it is critical to use a nonstationary covariance function. Furthermore, given that the output of modern global climate models can be on the order of $mathcal{O}(10^4)$, it is important to utilize approximate Gaussian process methods to enable inference. In this paper, we demonstrate the application of methodology introduced in Risser and Turek (2020) to conduct a nonstationary and fully Bayesian analysis of a large data set of 20-year return values derived from an ensemble of global climate model runs with over 50,000 spatial locations. This analysis uses the freely available BayesNSGP software package for R.