mo Modified information criterion for testing changes in skew normal model By projecteuclid.org Published On :: Mon, 04 Mar 2019 04:00 EST Khamis K. Said, Wei Ning, Yubin Tian. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 280--300.Abstract: In this paper, we study the change point problem for the skew normal distribution model from the view of model selection problem. The detection procedure based on the modified information criterion (MIC) for change problem is proposed. Such a procedure has advantage in detecting the changes in early and late stage of a data comparing to the one based on the traditional Schwarz information criterion which is well known as Bayesian information criterion (BIC) by considering the complexity of the models. Due to the difficulty in deriving the analytic asymptotic distribution of the test statistic based on the MIC procedure, the bootstrap simulation is provided to obtain the critical values at the different significance levels. Simulations are conducted to illustrate the comparisons of performance between MIC, BIC and likelihood ratio test (LRT). Such an approach is applied on two stock market data sets to indicate the detection procedure. Full Article
mo An estimation method for latent traits and population parameters in Nominal Response Model By projecteuclid.org Published On :: Thu, 05 Aug 2010 15:41 EDT Caio L. N. Azevedo, Dalton F. AndradeSource: Braz. J. Probab. Stat., Volume 24, Number 3, 415--433.Abstract: The nominal response model (NRM) was proposed by Bock [ Psychometrika 37 (1972) 29–51] in order to improve the latent trait (ability) estimation in multiple choice tests with nominal items. When the item parameters are known, expectation a posteriori or maximum a posteriori methods are commonly employed to estimate the latent traits, considering a standard symmetric normal distribution as the latent traits prior density. However, when this item set is presented to a new group of examinees, it is not only necessary to estimate their latent traits but also the population parameters of this group. This article has two main purposes: first, to develop a Monte Carlo Markov Chain algorithm to estimate both latent traits and population parameters concurrently. This algorithm comprises the Metropolis–Hastings within Gibbs sampling algorithm (MHWGS) proposed by Patz and Junker [ Journal of Educational and Behavioral Statistics 24 (1999b) 346–366]. Second, to compare, in the latent trait recovering, the performance of this method with three other methods: maximum likelihood, expectation a posteriori and maximum a posteriori. The comparisons were performed by varying the total number of items (NI), the number of categories and the values of the mean and the variance of the latent trait distribution. The results showed that MHWGS outperforms the other methods concerning the latent traits estimation as well as it recoveries properly the population parameters. Furthermore, we found that NI accounts for the highest percentage of the variability in the accuracy of latent trait estimation. Full Article
mo Any waking morning By dal.novanet.ca Published On :: Fri, 1 May 2020 19:34:09 -0300 Author: Soutar-Hynes, Mary Lou, author.Callnumber: PS 8587 O9756 A85 2019ISBN: 9781771336413 (softcover) Full Article
mo BETWEEN SPIRIT AND EMOTION. By dal.novanet.ca Published On :: Fri, 1 May 2020 19:34:09 -0300 Author: ROGERS, JANET.Callnumber: PS 8585 O395158 A92 2018ISBN: 1772310832 Full Article
mo Public-private partnerships in Canada : law, policy and value for money By dal.novanet.ca Published On :: Fri, 1 May 2020 19:34:09 -0300 Author: Murphy, Timothy J. (Timothy John), author.Callnumber: KE 1465 M87 2019ISBN: 9780433457985 (Cloth) Full Article
mo Globalizing capital : a history of the international monetary system By dal.novanet.ca Published On :: Fri, 1 May 2020 19:34:09 -0300 Author: Eichengreen, Barry J., author.Callnumber: HG 3881 E347 2019ISBN: 9780691193908 (paperback) Full Article
mo Flexible, boundary adapted, nonparametric methods for the estimation of univariate piecewise-smooth functions By projecteuclid.org Published On :: Tue, 04 Feb 2020 04:00 EST Umberto Amato, Anestis Antoniadis, Italia De Feis. Source: Statistics Surveys, Volume 14, 32--70.Abstract: We present and compare some nonparametric estimation methods (wavelet and/or spline-based) designed to recover a one-dimensional piecewise-smooth regression function in both a fixed equidistant or not equidistant design regression model and a random design model. Wavelet methods are known to be very competitive in terms of denoising and compression, due to the simultaneous localization property of a function in time and frequency. However, boundary assumptions, such as periodicity or symmetry, generate bias and artificial wiggles which degrade overall accuracy. Simple methods have been proposed in the literature for reducing the bias at the boundaries. We introduce new ones based on adaptive combinations of two estimators. The underlying idea is to combine a highly accurate method for non-regular functions, e.g., wavelets, with one well behaved at boundaries, e.g., Splines or Local Polynomial. We provide some asymptotic optimal results supporting our approach. All the methods can handle data with a random design. We also sketch some generalization to the multidimensional setting. To study the performance of the proposed approaches we have conducted an extensive set of simulations on synthetic data. An interesting regression analysis of two real data applications using these procedures unambiguously demonstrates their effectiveness. Full Article
mo Additive monotone regression in high and lower dimensions By projecteuclid.org Published On :: Wed, 19 Jun 2019 22:00 EDT Solveig Engebretsen, Ingrid K. Glad. Source: Statistics Surveys, Volume 13, 1--51.Abstract: In numerous problems where the aim is to estimate the effect of a predictor variable on a response, one can assume a monotone relationship. For example, dose-effect models in medicine are of this type. In a multiple regression setting, additive monotone regression models assume that each predictor has a monotone effect on the response. In this paper, we present an overview and comparison of very recent frequentist methods for fitting additive monotone regression models. Three of the methods we present can be used both in the high dimensional setting, where the number of parameters $p$ exceeds the number of observations $n$, and in the classical multiple setting where $1<pleq n$. However, many of the most recent methods only apply to the classical setting. The methods are compared through simulation experiments in terms of efficiency, prediction error and variable selection properties in both settings, and they are applied to the Boston housing data. We conclude with some recommendations on when the various methods perform best. Full Article
mo A review of dynamic network models with latent variables By projecteuclid.org Published On :: Mon, 03 Sep 2018 04:01 EDT Bomin Kim, Kevin H. Lee, Lingzhou Xue, Xiaoyue Niu. Source: Statistics Surveys, Volume 12, 105--135.Abstract: We present a selective review of statistical modeling of dynamic networks. We focus on models with latent variables, specifically, the latent space models and the latent class models (or stochastic blockmodels), which investigate both the observed features and the unobserved structure of networks. We begin with an overview of the static models, and then we introduce the dynamic extensions. For each dynamic model, we also discuss its applications that have been studied in the literature, with the data source listed in Appendix. Based on the review, we summarize a list of open problems and challenges in dynamic network modeling with latent variables. Full Article
mo Variable selection methods for model-based clustering By projecteuclid.org Published On :: Thu, 26 Apr 2018 04:00 EDT Michael Fop, Thomas Brendan Murphy. Source: Statistics Surveys, Volume 12, 18--65.Abstract: Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received a lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the interpretation of the clustering results. This review provides a summary of the methods developed for variable selection in model-based clustering. Existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples. Full Article
mo A design-sensitive approach to fitting regression models with complex survey data By projecteuclid.org Published On :: Wed, 17 Jan 2018 04:00 EST Phillip S. Kott. Source: Statistics Surveys, Volume 12, 1--17.Abstract: Fitting complex survey data to regression equations is explored under a design-sensitive model-based framework. A robust version of the standard model assumes that the expected value of the difference between the dependent variable and its model-based prediction is zero no matter what the values of the explanatory variables. The extended model assumes only that the difference is uncorrelated with the covariates. Little is assumed about the error structure of this difference under either model other than independence across primary sampling units. The standard model often fails in practice, but the extended model very rarely does. Under this framework some of the methods developed in the conventional design-based, pseudo-maximum-likelihood framework, such as fitting weighted estimating equations and sandwich mean-squared-error estimation, are retained but their interpretations change. Few of the ideas here are new to the refereed literature. The goal instead is to collect those ideas and put them into a unified conceptual framework. Full Article
mo Basic models and questions in statistical network analysis By projecteuclid.org Published On :: Thu, 07 Sep 2017 22:02 EDT Miklós Z. Rácz, Sébastien Bubeck. Source: Statistics Surveys, Volume 11, 1--47.Abstract: Extracting information from large graphs has become an important statistical problem since network data is now common in various fields. In this minicourse we will investigate the most natural statistical questions for three canonical probabilistic models of networks: (i) community detection in the stochastic block model, (ii) finding the embedding of a random geometric graph, and (iii) finding the original vertex in a preferential attachment tree. Along the way we will cover many interesting topics in probability theory such as Pólya urns, large deviation theory, concentration of measure in high dimension, entropic central limit theorems, and more. Full Article
mo Some models and methods for the analysis of observational data By projecteuclid.org Published On :: Tue, 15 Sep 2015 20:40 EDT José A. Ferreira. Source: Statistics Surveys, Volume 9, 106--208.Abstract: This article provides a concise and essentially self-contained exposition of some of the most important models and non-parametric methods for the analysis of observational data, and a substantial number of illustrations of their application. Although for the most part our presentation follows P. Rosenbaum’s book, “Observational Studies”, and naturally draws on related literature, it contains original elements and simplifies and generalizes some basic results. The illustrations, based on simulated data, show the methods at work in some detail, highlighting pitfalls and emphasizing certain subjective aspects of the statistical analyses. Full Article
mo Semi-parametric estimation for conditional independence multivariate finite mixture models By projecteuclid.org Published On :: Fri, 06 Feb 2015 08:39 EST Didier Chauveau, David R. Hunter, Michael Levine. Source: Statistics Surveys, Volume 9, 1--31.Abstract: The conditional independence assumption for nonparametric multivariate finite mixture models, a weaker form of the well-known conditional independence assumption for random effects models for longitudinal data, is the subject of an increasing number of theoretical and algorithmic developments in the statistical literature. After presenting a survey of this literature, including an in-depth discussion of the all-important identifiability results, this article describes and extends an algorithm for estimation of the parameters in these models. The algorithm works for any number of components in three or more dimensions. It possesses a descent property and can be easily adapted to situations where the data are grouped in blocks of conditionally independent variables. We discuss how to adapt this algorithm to various location-scale models that link component densities, and we even adapt it to a particular class of univariate mixture problems in which the components are assumed symmetric. We give a bandwidth selection procedure for our algorithm. Finally, we demonstrate the effectiveness of our algorithm using a simulation study and two psychometric datasets. Full Article
mo Errata: A survey of Bayesian predictive methods for model assessment, selection and comparison By projecteuclid.org Published On :: Wed, 26 Feb 2014 09:10 EST Aki Vehtari, Janne Ojanen. Source: Statistics Surveys, Volume 8, , 1--1.Abstract: Errata for “A survey of Bayesian predictive methods for model assessment, selection and comparison” by A. Vehtari and J. Ojanen, Statistics Surveys , 6 (2012), 142–228. doi:10.1214/12-SS102. Full Article
mo A survey of Bayesian predictive methods for model assessment, selection and comparison By projecteuclid.org Published On :: Thu, 27 Dec 2012 12:22 EST Aki Vehtari, Janne OjanenSource: Statist. Surv., Volume 6, 142--228.Abstract: To date, several methods exist in the statistical literature for model assessment, which purport themselves specifically as Bayesian predictive methods. The decision theoretic assumptions on which these methods are based are not always clearly stated in the original articles, however. The aim of this survey is to provide a unified review of Bayesian predictive model assessment and selection methods, and of methods closely related to them. We review the various assumptions that are made in this context and discuss the connections between different approaches, with an emphasis on how each method approximates the expected utility of using a Bayesian model for the purpose of predicting future data. Full Article
mo The ARMA alphabet soup: A tour of ARMA model variants By projecteuclid.org Published On :: Tue, 07 Dec 2010 09:23 EST Scott H. Holan, Robert Lund, Ginger DavisSource: Statist. Surv., Volume 4, 232--274.Abstract: Autoregressive moving-average (ARMA) difference equations are ubiquitous models for short memory time series and have parsimoniously described many stationary series. Variants of ARMA models have been proposed to describe more exotic series features such as long memory autocovariances, periodic autocovariances, and count support set structures. This review paper enumerates, compares, and contrasts the common variants of ARMA models in today’s literature. After the basic properties of ARMA models are reviewed, we tour ARMA variants that describe seasonal features, long memory behavior, multivariate series, changing variances (stochastic volatility) and integer counts. A list of ARMA variant acronyms is provided. References:Aknouche, A. and Guerbyenne, H. (2006). Recursive estimation of GARCH models. Communications in Statistics-Simulation and Computation 35 925–938.Alzaid, A. A. and Al-Osh, M. (1990). An integer-valued pth-order autoregressive structure (INAR (p)) process. Journal of Applied Probability 27 314–324.Anderson, P. L., Tesfaye, Y. G. and Meerschaert, M. M. (2007). Fourier-PARMA models and their application to river flows. Journal of Hydrologic Engineering 12 462–472.Ansley, C. F. (1979). An algorithm for the exact likelihood of a mixed autoregressive-moving average process. Biometrika 66 59–65.Basawa, I. V. and Lund, R. (2001). Large sample properties of parameter estimates for periodic ARMA models. Journal of Time Series Analysis 22 651–663.Bauwens, L., Laurent, S. and Rombouts, J. V. K. (2006). Multivariate GARCH models: A survey. Journal of Applied Econometrics 21 79–109.Bertelli, S. and Caporin, M. (2002). A note on calculating autocovariances of long-memory processes. Journal of Time Series Analysis 23 503–508.Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31 307–327.Bollerslev, T. (2008). Glossary to ARCH (GARCH). CREATES Research Paper 2008-49.Bollerslev, T., Engle, R. F. and Wooldridge, J. M. (1988). A capital asset pricing model with time-varying covariances. The Journal of Political Economy 96 116–131.Bondon, P. and Palma, W. (2007). A class of antipersistent processes. Journal of Time Series Analysis 28 261–273.Bougerol, P. and Picard, N. (1992). Strict stationarity of generalized autoregressive processes. The Annals of Probability 20 1714–1730.Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (2008). Time Series Analysis: Forecasting and Control, 4th ed. Wiley, New Jersey.Breidt, F. J., Davis, R. A. and Trindade, A. A. (2001). Least absolute deviation estimation for all-pass time series models. Annals of Statistics 29 919–946.Brockwell, P. J. (1994). On continuous-time threshold ARMA processes. Journal of Statistical Planning and Inference 39 291–303.Brockwell, P. J. (2001). Continuous-time ARMA processes. In Stochastic Processes: Theory and Methods, ( D. N. Shanbhag and C. R. Rao, eds.). Handbook of Statistics 19 249–276. Elsevier.Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods, 2nd ed. Springer, New York.Brockwell, P. J. and Davis, R. A. (2002). Introduction to Time Series and Forecasting, 2nd ed. Springer, New York.Brockwell, P. J. and Marquardt, T. (2005). Lèvy-driven and fractionally integrated ARMA processes with continuous-time paramaters. Statistica Sinica 15 477–494.Chan, K. S. (1990). Testing for threshold autoregression. Annals of Statistics 18 1886–1894.Chan, N. H. (2002). Time Series: Applications to Finance. John Wiley & Sons, New York.Chan, N. H. and Palma, W. (1998). State space modeling of long-memory processes. Annals of Statistics 26 719–740.Chan, N. H. and Palma, W. (2006). Estimation of long-memory time series models: A survey of different likelihood-based methods. Advances in Econometrics 20 89–121.Chatfield, C. (2003). The Analysis of Time Series: An Introduction, 6th ed. Chapman & Hall/CRC, Boca Raton.Chen, W., Hurvich, C. M. and Lu, Y. (2006). On the correlation matrix of the discrete Fourier transform and the fast solution of large Toeplitz systems for long-memory time series. Journal of the American Statistical Association 101 812–822.Chernick, M. R., Hsing, T. and McCormick, W. P. (1991). Calculating the extremal index for a class of stationary sequences. Advances in Applied Probability 23 835–850.Chib, S., Nardari, F. and Shephard, N. (2006). Analysis of high dimensional multivariate stochastic volatility models. Journal of Econometrics 134 341–371.Cryer, J. D. and Chan, K. S. (2008). Time Series Analysis: With Applications in R. Springer, New York.Cui, Y. and Lund, R. (2009). A new look at time series of counts. Biometrika 96 781–792.Davis, R. A., Dunsmuir, W. T. M. and Wang, Y. (1999). Modeling time series of count data. In Asymptotics, Nonparametrics and Time Series, ( S. Ghosh, ed.). Statistics Textbooks Monograph 63–113. Marcel Dekker, New York.Davis, R. A., Dunsmuir, W. and Streett, S. B. (2003). Observation-driven models for Poisson counts. Biometrika 90 777–790.Davis, R. A. and Resnick, S. I. (1996). Limit theory for bilinear processes with heavy-tailed noise. The Annals of Applied Probability 6 1191–1210.Deistler, M. and Hannan, E. J. (1981). Some properties of the parameterization of ARMA systems with unknown order. Journal of Multivariate Analysis 11 474–484.Dufour, J. M. and Jouini, T. (2005). Asymptotic distribution of a simple linear estimator for VARMA models in echelon form. Statistical Modeling and Analysis for Complex Data Problems 209–240.Dunsmuir, W. and Hannan, E. J. (1976). Vector linear time series models. Advances in Applied Probability 8 339–364.Durbin, J. and Koopman, S. J. (2001). Time Series Analysis by State Space Methods. Oxford University Press, Oxford.Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50 987–1007.Engle, R. F. (2002). Dynamic conditional correlation. Journal of Business and Economic Statistics 20 339–350.Engle, R. F. and Bollerslev, T. (1986). Modelling the persistence of conditional variances. Econometric Reviews 5 1–50.Fuller, W. A. (1996). Introduction to Statistical Time Series, 2nd ed. John Wiley & Sons, New York.Geweke, J. and Porter-Hudak, S. (1983). The estimation and application of long memory time series models. Journal of Time Series Analysis 4 221–238.Gladyšhev, E. G. (1961). Periodically correlated random sequences. Soviet Math 2 385–388.Granger, C. W. J. (1982). Acronyms in time series analysis (ATSA). Journal of Time Series Analysis 3 103–107.Granger, C. W. J. and Andersen, A. P. (1978). An Introduction to Bilinear Time Series Models. Vandenhoeck and Ruprecht Göttingen.Granger, C. W. J. and Joyeux, R. (1980). An introduction to long-memory time series models and fractional differencing. Journal of Time Series Analysis 1 15–29.Gray, H. L., Zhang, N. F. and Woodward, W. A. (1989). On generalized fractional processes. Journal of Time Series Analysis 10 233–257.Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press, Princeton, New Jersey.Hannan, E. J. (1955). A test for singularities in Sydney rainfall. Australian Journal of Physics 8 289–297.Hannan, E. J. (1969). The identification of vector mixed autoregressive-moving average system. Biometrika 56 223–225.Hannan, E. J. (1970). Multiple Time Series. John Wiley & Sons, New York.Hannan, E. J. (1976). The identification and parameterization of ARMAX and state space forms. Econometrica 44 713–723.Hannan, E. J. (1979). The Statistical Theory of Linear Systems. In Developments in Statistics ( P. R. Krishnaiah, ed.) 83–121. Academic Press, New York.Hannan, E. J. and Deistler, M. (1987). The Statistical Theory of Linear Systems. John Wiley & Sons, New York.Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press, Cambridge.Haslett, J. and Raftery, A. E. (1989). Space-time modelling with long-memory dependence: Assessing Ireland’s wind power resource. Applied Statistics 38 1–50.Hosking, J. R. M. (1981). Fractional differencing. Biometrika 68 165–176.Hui, Y. V. and Li, W. K. (1995). On fractionally differenced periodic processes. Sankhyā: The Indian Journal of Statistics, Series B 57 19–31.Jacobs, P. A. and Lewis, P. A. W. (1978a). Discrete time series generated by mixtures. I: Correlational and runs properties. Journal of the Royal Statistical Society. Series B (Methodological) 40 94–105.Jacobs, P. A. and Lewis, P. A. W. (1978b). Discrete time series generated by mixtures II: Asymptotic properties. Journal of the Royal Statistical Society. Series B (Methodological) 40 222–228.Jacobs, P. A. and Lewis, P. A. W. (1983). Stationary discrete autoregressive-moving average time series generated by mixtures. Journal of Time Series Analysis 4 19–36.Jones, R. H. (1980). Maximum likelihood fitting of ARMA models to time series with missing observations. Technometrics 22 389–395.Jones, R. H. and Brelsford, W. M. (1967). Time series with periodic structure. Biometrika 54 403–408.Kedem, B. and Fokianos, K. (2002). Regression Models for Time Series Analysis. John Wiley & Sons, New Jersey.Ko, K. and Vannucci, M. (2006). Bayesian wavelet-based methods for the detection of multiple changes of the long memory parameter. IEEE Transactions on Signal Processing 54 4461–4470.Kohn, R. (1979). Asymptotic estimation and hypothesis testing results for vector linear time series models. Econometrica 47 1005–1030.Kokoszka, P. S. and Taqqu, M. S. (1995). Fractional ARIMA with stable innovations. Stochastic Processes and their Applications 60 19–47.Kokoszka, P. S. and Taqqu, M. S. (1996). Parameter estimation for infinite variance fractional ARIMA. Annals of Statistics 24 1880–1913.Lawrance, A. J. and Lewis, P. A. W. (1980). The exponential autoregressive-moving average EARMA(p,q) process. Journal of the Royal Statistical Society. Series B (Methodological) 42 150–161.Ling, S. and Li, W. K. (1997). On fractionally integrated autoregressive moving-average time series models with conditional heteroscedasticity. Journal of the American Statistical Association 92 1184–1194.Liu, J. and Brockwell, P. J. (1988). On the general bilinear time series model. Journal of Applied Probability 25 553–564.Lund, R. and Basawa, I. V. (2000). Recursive prediction and likelihood evaluation for periodic ARMA models. Journal of Time Series Analysis 21 75–93.Lund, R., Shao, Q. and Basawa, I. (2006). Parsimonious periodic time series modeling. Australian & New Zealand Journal of Statistics 48 33–47.Lütkepohl, H. (1991). Introduction to Multiple Time Series Analysis. Springer-Verlag, New York.Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Springer, New York.MacDonald, I. L. and Zucchini, W. (1997). Hidden Markov and Other Models for Discrete-Valued Time Series. Chapman & Hall/CRC, Boca Raton.Mann, H. B. and Wald, A. (1943). On the statistical treatment of linear stochastic difference equations. Econometrica 11 173–220.Marriott, J., Ravishanker, N., Gelfand, A. and Pai, J. (1996). Bayesian analysis of ARMA processes: Complete sampling-based inference under exact likelihoods. In Bayesian Analysis in Statistics and Econometrics: Essays in Honor of Arnold Zellner ( D. Berry, K. Challoner and J. Geweke, eds.) 243–256. Wiley, New York.McKenzie, E. (1988). Some ARMA models for dependent sequences of Poisson counts. Advances in Applied Probability 20 822–835.Mikosch, T. and Starica, C. (2004). Nonstationarities in financial time series, the long-range dependence, and the IGARCH effects. Review of Economics and Statistics 86 378–390.Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica 59 347–370.Nelson, D. B. and Cao, C. Q. (1992). Inequality constraints in the univariate GARCH model. Journal of Business and Economic Statistics 10 229–235.Ooms, M. and Franses, P. H. (2001). A seasonal periodic long memory model for monthly river flows. Environmental Modelling & Software 16 559–569.Pagano, M. (1978). On periodic and multiple autoregressions. Annals of Statistics 6 1310–1317.Pai, J. S. and Ravishanker, N. (1998). Bayesian analysis of autoregressive fractionally integrated moving-average processes. Journal of Time Series Analysis 19 99–112.Palma, W. (2007). Long-Memory Time Series: Theory and Methods. John Wiley & Sons, New Jersey.Palma, W. and Chan, N. H. (2005). Efficient estimation of seasonal long-range-dependent processes. Journal of Time Series Analysis 26 863–892.Pfeifer, P. E. and Deutsch, S. J. (1980). A three-stage iterative procedure for space-time modeling. Technometrics 22 35–47.Prado, R. and West, M. (2010). Time Series Modeling, Computation and Inference. Chapman & Hall/CRC, Boca Raton.Quoreshi, A. M. M. S. (2008). A long memory count data time series model for financial application. Preprint.R Development Core Team, (2010). R: A Language and Environment for Statistical Computing. http://www.R-project.org.Ravishanker, N. and Ray, B. K. (1997). Bayesian analysis of vector ARMA models using Gibbs sampling. Journal of Forecasting 16 177–194.Ravishanker, N. and Ray, B. K. (2002). Bayesian prediction for vector ARFIMA processes. International Journal of Forecasting 18 207–214.Reinsel, G. C. (1997). Elements of Multivariate Time Series Analysis. Springer, New York.Resnick, S. I. and Willekens, E. (1991). Moving averages with random coefficients and random coefficient autoregressive models. Communications in Statistics. Stochastic Models 7 511–525.Rootzén, H. (1986). Extreme value theory for moving average processes. The Annals of Probability 14 612–652.Scotto, M. G. (2007). Extremes for solutions to stochastic difference equations with regularly varying tails. REVSTAT–Statistical Journal 5 229–247.Shao, Q. and Lund, R. (2004). Computation and characterization of autocorrelations and partial autocorrelations in periodic ARMA models. Journal of Time Series Analysis 25 359–372.Shumway, R. H. and Stoffer, D. S. (2006). Time Series Analysis and its Applications: With R Examples, 2nd ed. Springer, New York.Silvennoinen, A. and Teräsvirta, T. (2009). Multivariate GARCH models. In Handbook of Financial Time Series ( T. Andersen, R. Davis, J. Kreib, and T. Mikosch, eds.) Springer, New York.Sowell, F. (1992). Maximum likelihood estimation of stationary univariate fractionally integrated time series models. Journal of Econometrics 53 165–188.Startz, R. (2008). Binomial autoregressive moving average models with an application to U.S. recessions. Journal of Business and Economic Statistics 26 1–8.Stramer, O., Tweedie, R. L. and Brockwell, P. J. (1996). Existence and stability of continuous time threshold ARMA processes. Statistica Sinica 6 715–732.Subba Rao, T. (1981). On the theory of bilinear time series models. Journal of the Royal Statistical Society. Series B (Methodological) 43 244–255.Tong, H. and Lim, K. S. (1980). Threshold autoregression, limit cycles and cyclical data. Journal of the Royal Statistical Society. Series B (Methodological) 42 245–292.Troutman, B. M. (1979). Some results in periodic autoregression. Biometrika 66 219–228.Tsai, H. (2009). On continuous-time autoregressive fractionally integrated moving average processes. Bernoulli 15 178–194.Tsai, H. and Chan, K. S. (2000). A note on the covariance structure of a continuous-time ARMA process. Statistica Sinica 10 989–998.Tsai, H. and Chan, K. S. (2005). Maximum likelihood estimation of linear continuous time long memory processes with discrete time data. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 67 703–716.Tsai, H. and Chan, K. S. (2008). A note on inequality constraints in the GARCH model. Econometric Theory 24 823–828.Tsay, R. S. (1989). Parsimonious parameterization of vector autoregressive moving average models. Journal of Business and Economic Statistics 7 327–341.Tunnicliffe-Wilson, G. (1979). Some efficient computational procedures for high order ARMA models. Journal of Statistical Computation and Simulation 8 301–309.Ursu, E. and Duchesne, P. (2009). On modelling and diagnostic checking of vector periodic autoregressive time series models. Journal of Time Series Analysis 30 70–96.Vecchia, A. V. (1985a). Maximum likelihood estimation for periodic autoregressive moving average models. Technometrics 27 375–384.Vecchia, A. V. (1985b). Periodic autoregressive-moving average (PARMA) modeling with applications to water resources. Journal of the American Water Resources Association 21 721–730.Vidakovic, B. (1999). Statistical Modeling by Wavelets. John Wiley & Sons, New York.West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models, 2nd ed. Springer, New York.Wold, H. (1954). A Study in the Analysis of Stationary Time Series. Almquist & Wiksell, Stockholm.Woodward, W. A., Cheng, Q. C. and Gray, H. L. (1998). A k-factor GARMA long-memory model. Journal of Time Series Analysis 19 485–504.Zivot, E. and Wang, J. (2006). Modeling Financial Time Series with S-PLUS, 2nd ed. Springer, New York. Full Article
mo Primal and dual model representations in kernel-based learning By projecteuclid.org Published On :: Wed, 25 Aug 2010 10:28 EDT Johan A.K. Suykens, Carlos Alzate, Kristiaan PelckmansSource: Statist. Surv., Volume 4, 148--183.Abstract: This paper discusses the role of primal and (Lagrange) dual model representations in problems of supervised and unsupervised learning. The specification of the estimation problem is conceived at the primal level as a constrained optimization problem. The constraints relate to the model which is expressed in terms of the feature map. From the conditions for optimality one jointly finds the optimal model representation and the model estimate. At the dual level the model is expressed in terms of a positive definite kernel function, which is characteristic for a support vector machine methodology. It is discussed how least squares support vector machines are playing a central role as core models across problems of regression, classification, principal component analysis, spectral clustering, canonical correlation analysis, dimensionality reduction and data visualization. Full Article
mo Discrete variations of the fractional Brownian motion in the presence of outliers and an additive noise By projecteuclid.org Published On :: Thu, 05 Aug 2010 15:41 EDT Sophie Achard, Jean-François CoeurjollySource: Statist. Surv., Volume 4, 117--147.Abstract: This paper gives an overview of the problem of estimating the Hurst parameter of a fractional Brownian motion when the data are observed with outliers and/or with an additive noise by using methods based on discrete variations. We show that the classical estimation procedure based on the log-linearity of the variogram of dilated series is made more robust to outliers and/or an additive noise by considering sample quantiles and trimmed means of the squared series or differences of empirical variances. These different procedures are compared and discussed through a large simulation study and are implemented in the R package dvfBm. Full Article
mo Finite mixture models and model-based clustering By projecteuclid.org Published On :: Thu, 05 Aug 2010 15:41 EDT Volodymyr Melnykov, Ranjan MaitraSource: Statist. Surv., Volume 4, 80--116.Abstract: Finite mixture models have a long history in statistics, having been used to model population heterogeneity, generalize distributional assumptions, and lately, for providing a convenient yet formal framework for clustering and classification. This paper provides a detailed review into mixture models and model-based clustering. Recent trends as well as open problems in the area are also discussed. Full Article
mo A survey of cross-validation procedures for model selection By projecteuclid.org Published On :: Thu, 05 Aug 2010 15:41 EDT Sylvain Arlot, Alain CelisseSource: Statist. Surv., Volume 4, 40--79.Abstract: Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its (apparent) universality. Many results exist on model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand. Full Article
mo Unsupervised Pre-trained Models from Healthy ADLs Improve Parkinson's Disease Classification of Gait Patterns. (arXiv:2005.02589v2 [cs.LG] UPDATED) By arxiv.org Published On :: Application and use of deep learning algorithms for different healthcare applications is gaining interest at a steady pace. However, use of such algorithms can prove to be challenging as they require large amounts of training data that capture different possible variations. This makes it difficult to use them in a clinical setting since in most health applications researchers often have to work with limited data. Less data can cause the deep learning model to over-fit. In this paper, we ask how can we use data from a different environment, different use-case, with widely differing data distributions. We exemplify this use case by using single-sensor accelerometer data from healthy subjects performing activities of daily living - ADLs (source dataset), to extract features relevant to multi-sensor accelerometer gait data (target dataset) for Parkinson's disease classification. We train the pre-trained model using the source dataset and use it as a feature extractor. We show that the features extracted for the target dataset can be used to train an effective classification model. Our pre-trained source model consists of a convolutional autoencoder, and the target classification model is a simple multi-layer perceptron model. We explore two different pre-trained source models, trained using different activity groups, and analyze the influence the choice of pre-trained model has over the task of Parkinson's disease classification. Full Article
mo Statistical errors in Monte Carlo-based inference for random elements. (arXiv:2005.02532v2 [math.ST] UPDATED) By arxiv.org Published On :: Monte Carlo simulation is useful to compute or estimate expected functionals of random elements if those random samples are possible to be generated from the true distribution. However, when the distribution has some unknown parameters, the samples must be generated from an estimated distribution with the parameters replaced by some estimators, which causes a statistical error in Monte Carlo estimation. This paper considers such a statistical error and investigates the asymptotic distributions of Monte Carlo-based estimators when the random elements are not only the real valued, but also functional valued random variables. We also investigate expected functionals for semimartingales in details. The consideration indicates that the Monte Carlo estimation can get worse when a semimartingale has a jump part with unremovable unknown parameters. Full Article
mo Generating Thermal Image Data Samples using 3D Facial Modelling Techniques and Deep Learning Methodologies. (arXiv:2005.01923v2 [cs.CV] UPDATED) By arxiv.org Published On :: Methods for generating synthetic data have become of increasing importance to build large datasets required for Convolution Neural Networks (CNN) based deep learning techniques for a wide range of computer vision applications. In this work, we extend existing methodologies to show how 2D thermal facial data can be mapped to provide 3D facial models. For the proposed research work we have used tufts datasets for generating 3D varying face poses by using a single frontal face pose. The system works by refining the existing image quality by performing fusion based image preprocessing operations. The refined outputs have better contrast adjustments, decreased noise level and higher exposedness of the dark regions. It makes the facial landmarks and temperature patterns on the human face more discernible and visible when compared to original raw data. Different image quality metrics are used to compare the refined version of images with original images. In the next phase of the proposed study, the refined version of images is used to create 3D facial geometry structures by using Convolution Neural Networks (CNN). The generated outputs are then imported in blender software to finally extract the 3D thermal facial outputs of both males and females. The same technique is also used on our thermal face data acquired using prototype thermal camera (developed under Heliaus EU project) in an indoor lab environment which is then used for generating synthetic 3D face data along with varying yaw face angles and lastly facial depth map is generated. Full Article
mo Interpreting Rate-Distortion of Variational Autoencoder and Using Model Uncertainty for Anomaly Detection. (arXiv:2005.01889v2 [cs.LG] UPDATED) By arxiv.org Published On :: Building a scalable machine learning system for unsupervised anomaly detection via representation learning is highly desirable. One of the prevalent methods is using a reconstruction error from variational autoencoder (VAE) via maximizing the evidence lower bound. We revisit VAE from the perspective of information theory to provide some theoretical foundations on using the reconstruction error, and finally arrive at a simpler and more effective model for anomaly detection. In addition, to enhance the effectiveness of detecting anomalies, we incorporate a practical model uncertainty measure into the metric. We show empirically the competitive performance of our approach on benchmark datasets. Full Article
mo How many modes can a constrained Gaussian mixture have?. (arXiv:2005.01580v2 [math.ST] UPDATED) By arxiv.org Published On :: We show, by an explicit construction, that a mixture of univariate Gaussians with variance 1 and means in $[-A,A]$ can have $Omega(A^2)$ modes. This disproves a recent conjecture of Dytso, Yagli, Poor and Shamai [IEEE Trans. Inform. Theory, Apr. 2020], who showed that such a mixture can have at most $O(A^2)$ modes and surmised that the upper bound could be improved to $O(A)$. Our result holds even if an additional variance constraint is imposed on the mixing distribution. Extending the result to higher dimensions, we exhibit a mixture of Gaussians in $mathbb{R}^d$, with identity covariances and means inside $[-A,A]^d$, that has $Omega(A^{2d})$ modes. Full Article
mo A bimodal gamma distribution: Properties, regression model and applications. (arXiv:2004.12491v2 [stat.ME] UPDATED) By arxiv.org Published On :: In this paper we propose a bimodal gamma distribution using a quadratic transformation based on the alpha-skew-normal model. We discuss several properties of this distribution such as mean, variance, moments, hazard rate and entropy measures. Further, we propose a new regression model with censored data based on the bimodal gamma distribution. This regression model can be very useful to the analysis of real data and could give more realistic fits than other special regression models. Monte Carlo simulations were performed to check the bias in the maximum likelihood estimation. The proposed models are applied to two real data sets found in literature. Full Article
mo Mnemonics Training: Multi-Class Incremental Learning without Forgetting. (arXiv:2002.10211v3 [cs.CV] UPDATED) By arxiv.org Published On :: Multi-Class Incremental Learning (MCIL) aims to learn new concepts by incrementally updating a model trained on previous concepts. However, there is an inherent trade-off to effectively learning new concepts without catastrophic forgetting of previous ones. To alleviate this issue, it has been proposed to keep around a few examples of the previous concepts but the effectiveness of this approach heavily depends on the representativeness of these examples. This paper proposes a novel and automatic framework we call mnemonics, where we parameterize exemplars and make them optimizable in an end-to-end manner. We train the framework through bilevel optimizations, i.e., model-level and exemplar-level. We conduct extensive experiments on three MCIL benchmarks, CIFAR-100, ImageNet-Subset and ImageNet, and show that using mnemonics exemplars can surpass the state-of-the-art by a large margin. Interestingly and quite intriguingly, the mnemonics exemplars tend to be on the boundaries between different classes. Full Article
mo A Distributionally Robust Area Under Curve Maximization Model. (arXiv:2002.07345v2 [math.OC] UPDATED) By arxiv.org Published On :: Area under ROC curve (AUC) is a widely used performance measure for classification models. We propose two new distributionally robust AUC maximization models (DR-AUC) that rely on the Kantorovich metric and approximate the AUC with the hinge loss function. We consider the two cases with respectively fixed and variable support for the worst-case distribution. We use duality theory to reformulate the DR-AUC models and derive tractable convex optimization problems. The numerical experiments show that the proposed DR-AUC models -- benchmarked with the standard deterministic AUC and the support vector machine models - perform better in general and in particular improve the worst-case out-of-sample performance over the majority of the considered datasets, thereby showing their robustness. The results are particularly encouraging since our numerical experiments are conducted with training sets of small size which have been known to be conducive to low out-of-sample performance. Full Article
mo Statistical aspects of nuclear mass models. (arXiv:2002.04151v3 [nucl-th] UPDATED) By arxiv.org Published On :: We study the information content of nuclear masses from the perspective of global models of nuclear binding energies. To this end, we employ a number of statistical methods and diagnostic tools, including Bayesian calibration, Bayesian model averaging, chi-square correlation analysis, principal component analysis, and empirical coverage probability. Using a Bayesian framework, we investigate the structure of the 4-parameter Liquid Drop Model by considering discrepant mass domains for calibration. We then use the chi-square correlation framework to analyze the 14-parameter Skyrme energy density functional calibrated using homogeneous and heterogeneous datasets. We show that a quite dramatic parameter reduction can be achieved in both cases. The advantage of Bayesian model averaging for improving uncertainty quantification is demonstrated. The statistical approaches used are pedagogically described; in this context this work can serve as a guide for future applications. Full Article
mo On the impact of selected modern deep-learning techniques to the performance and celerity of classification models in an experimental high-energy physics use case. (arXiv:2002.01427v3 [physics.data-an] UPDATED) By arxiv.org Published On :: Beginning from a basic neural-network architecture, we test the potential benefits offered by a range of advanced techniques for machine learning, in particular deep learning, in the context of a typical classification problem encountered in the domain of high-energy physics, using a well-studied dataset: the 2014 Higgs ML Kaggle dataset. The advantages are evaluated in terms of both performance metrics and the time required to train and apply the resulting models. Techniques examined include domain-specific data-augmentation, learning rate and momentum scheduling, (advanced) ensembling in both model-space and weight-space, and alternative architectures and connection methods. Following the investigation, we arrive at a model which achieves equal performance to the winning solution of the original Kaggle challenge, whilst being significantly quicker to train and apply, and being suitable for use with both GPU and CPU hardware setups. These reductions in timing and hardware requirements potentially allow the use of more powerful algorithms in HEP analyses, where models must be retrained frequently, sometimes at short notice, by small groups of researchers with limited hardware resources. Additionally, a new wrapper library for PyTorch called LUMINis presented, which incorporates all of the techniques studied. Full Article
mo Sampling random graph homomorphisms and applications to network data analysis. (arXiv:1910.09483v2 [math.PR] UPDATED) By arxiv.org Published On :: A graph homomorphism is a map between two graphs that preserves adjacency relations. We consider the problem of sampling a random graph homomorphism from a graph $F$ into a large network $mathcal{G}$. We propose two complementary MCMC algorithms for sampling a random graph homomorphisms and establish bounds on their mixing times and concentration of their time averages. Based on our sampling algorithms, we propose a novel framework for network data analysis that circumvents some of the drawbacks in methods based on independent and neigborhood sampling. Various time averages of the MCMC trajectory give us various computable observables, including well-known ones such as homomorphism density and average clustering coefficient and their generalizations. Furthermore, we show that these network observables are stable with respect to a suitably renormalized cut distance between networks. We provide various examples and simulations demonstrating our framework through synthetic networks. We also apply our framework for network clustering and classification problems using the Facebook100 dataset and Word Adjacency Networks of a set of classic novels. Full Article
mo Bayesian factor models for multivariate categorical data obtained from questionnaires. (arXiv:1910.04283v2 [stat.AP] UPDATED) By arxiv.org Published On :: Factor analysis is a flexible technique for assessment of multivariate dependence and codependence. Besides being an exploratory tool used to reduce the dimensionality of multivariate data, it allows estimation of common factors that often have an interesting theoretical interpretation in real problems. However, standard factor analysis is only applicable when the variables are scaled, which is often inappropriate, for example, in data obtained from questionnaires in the field of psychology,where the variables are often categorical. In this framework, we propose a factor model for the analysis of multivariate ordered and non-ordered polychotomous data. The inference procedure is done under the Bayesian approach via Markov chain Monte Carlo methods. Two Monte-Carlo simulation studies are presented to investigate the performance of this approach in terms of estimation bias, precision and assessment of the number of factors. We also illustrate the proposed method to analyze participants' responses to the Motivational State Questionnaire dataset, developed to study emotions in laboratory and field settings. Full Article
mo Estimating drift parameters in a non-ergodic Gaussian Vasicek-type model. (arXiv:1909.06155v2 [math.PR] UPDATED) By arxiv.org Published On :: We study the problem of parameter estimation for a non-ergodic Gaussian Vasicek-type model defined as $dX_t=(mu+ heta X_t)dt+dG_t, tgeq0$ with unknown parameters $ heta>0$ and $muinR$, where $G$ is a Gaussian process. We provide least square-type estimators $widetilde{ heta}_T$ and $widetilde{mu}_T$ respectively for the drift parameters $ heta$ and $mu$ based on continuous-time observations ${X_t, tin[0,T]}$ as $T ightarrowinfty$. Our aim is to derive some sufficient conditions on the driving Gaussian process $G$ in order to ensure that $widetilde{ heta}_T$ and $widetilde{mu}_T$ are strongly consistent, the limit distribution of $widetilde{ heta}_T$ is a Cauchy-type distribution and $widetilde{mu}_T$ is asymptotically normal. We apply our result to fractional Vasicek, subfractional Vasicek and bifractional Vasicek processes. In addition, this work extends the result of cite{EEO} studied in the case where $mu=0$. Full Article
mo Nonstationary Bayesian modeling for a large data set of derived surface temperature return values. (arXiv:2005.03658v1 [stat.ME]) By arxiv.org Published On :: Heat waves resulting from prolonged extreme temperatures pose a significant risk to human health globally. Given the limitations of observations of extreme temperature, climate models are often used to characterize extreme temperature globally, from which one can derive quantities like return values to summarize the magnitude of a low probability event for an arbitrary geographic location. However, while these derived quantities are useful on their own, it is also often important to apply a spatial statistical model to such data in order to, e.g., understand how the spatial dependence properties of the return values vary over space and emulate the climate model for generating additional spatial fields with corresponding statistical properties. For these objectives, when modeling global data it is critical to use a nonstationary covariance function. Furthermore, given that the output of modern global climate models can be on the order of $mathcal{O}(10^4)$, it is important to utilize approximate Gaussian process methods to enable inference. In this paper, we demonstrate the application of methodology introduced in Risser and Turek (2020) to conduct a nonstationary and fully Bayesian analysis of a large data set of 20-year return values derived from an ensemble of global climate model runs with over 50,000 spatial locations. This analysis uses the freely available BayesNSGP software package for R. Full Article
mo Visualisation and knowledge discovery from interpretable models. (arXiv:2005.03632v1 [cs.LG]) By arxiv.org Published On :: Increasing number of sectors which affect human lives, are using Machine Learning (ML) tools. Hence the need for understanding their working mechanism and evaluating their fairness in decision-making, are becoming paramount, ushering in the era of Explainable AI (XAI). In this contribution we introduced a few intrinsically interpretable models which are also capable of dealing with missing values, in addition to extracting knowledge from the dataset and about the problem. These models are also capable of visualisation of the classifier and decision boundaries: they are the angle based variants of Learning Vector Quantization. We have demonstrated the algorithms on a synthetic dataset and a real-world one (heart disease dataset from the UCI repository). The newly developed classifiers helped in investigating the complexities of the UCI dataset as a multiclass problem. The performance of the developed classifiers were comparable to those reported in literature for this dataset, with additional value of interpretability, when the dataset was treated as a binary class problem. Full Article
mo Phase Transitions of the Maximum Likelihood Estimates in the Tensor Curie-Weiss Model. (arXiv:2005.03631v1 [math.ST]) By arxiv.org Published On :: The $p$-tensor Curie-Weiss model is a two-parameter discrete exponential family for modeling dependent binary data, where the sufficient statistic has a linear term and a term with degree $p geq 2$. This is a special case of the tensor Ising model and the natural generalization of the matrix Curie-Weiss model, which provides a convenient mathematical abstraction for capturing, not just pairwise, but higher-order dependencies. In this paper we provide a complete description of the limiting properties of the maximum likelihood (ML) estimates of the natural parameters, given a single sample from the $p$-tensor Curie-Weiss model, for $p geq 3$, complementing the well-known results in the matrix ($p=2$) case (Comets and Gidas (1991)). Our results unearth various new phase transitions and surprising limit theorems, such as the existence of a 'critical' curve in the parameter space, where the limiting distribution of the ML estimates is a mixture with both continuous and discrete components. The number of mixture components is either two or three, depending on, among other things, the sign of one of the parameters and the parity of $p$. Another interesting revelation is the existence of certain 'special' points in the parameter space where the ML estimates exhibit a superefficiency phenomenon, converging to a non-Gaussian limiting distribution at rate $N^{frac{3}{4}}$. We discuss how these results can be used to construct confidence intervals for the model parameters and, as a byproduct of our analysis, obtain limit theorems for the sample mean, which provide key insights into the statistical properties of the model. Full Article
mo Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach. (arXiv:2005.03582v1 [cs.LG]) By arxiv.org Published On :: Early detection of patients vulnerable to infections acquired in the hospital environment is a challenge in current health systems given the impact that such infections have on patient mortality and healthcare costs. This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units by means of machine-learning methods. The aim is to support decision making addressed at reducing the incidence rate of infections. In this field, it is necessary to deal with the problem of building reliable classifiers from imbalanced datasets. We propose a clustering-based undersampling strategy to be used in combination with ensemble classifiers. A comparative study with data from 4616 patients was conducted in order to validate our proposal. We applied several single and ensemble classifiers both to the original dataset and to data preprocessed by means of different resampling methods. The results were analyzed by means of classic and recent metrics specifically designed for imbalanced data classification. They revealed that the proposal is more efficient in comparison with other approaches. Full Article
mo Robust location estimators in regression models with covariates and responses missing at random. (arXiv:2005.03511v1 [stat.ME]) By arxiv.org Published On :: This paper deals with robust marginal estimation under a general regression model when missing data occur in the response and also in some of covariates. The target is a marginal location parameter which is given through an $M-$functional. To obtain robust Fisher--consistent estimators, properly defined marginal distribution function estimators are considered. These estimators avoid the bias due to missing values by assuming a missing at random condition. Three methods are considered to estimate the marginal distribution function which allows to obtain the $M-$location of interest: the well-known inverse probability weighting, a convolution--based method that makes use of the regression model and an augmented inverse probability weighting procedure that prevents against misspecification. The robust proposed estimators and the classical ones are compared through a numerical study under different missing models including clean and contaminated samples. We illustrate the estimators behaviour under a nonlinear model. A real data set is also analysed. Full Article
mo On unbalanced data and common shock models in stochastic loss reserving. (arXiv:2005.03500v1 [q-fin.RM]) By arxiv.org Published On :: Introducing common shocks is a popular dependence modelling approach, with some recent applications in loss reserving. The main advantage of this approach is the ability to capture structural dependence coming from known relationships. In addition, it helps with the parsimonious construction of correlation matrices of large dimensions. However, complications arise in the presence of "unbalanced data", that is, when (expected) magnitude of observations over a single triangle, or between triangles, can vary substantially. Specifically, if a single common shock is applied to all of these cells, it can contribute insignificantly to the larger values and/or swamp the smaller ones, unless careful adjustments are made. This problem is further complicated in applications involving negative claim amounts. In this paper, we address this problem in the loss reserving context using a common shock Tweedie approach for unbalanced data. We show that the solution not only provides a much better balance of the common shock proportions relative to the unbalanced data, but it is also parsimonious. Finally, the common shock Tweedie model also provides distributional tractability. Full Article
mo Modeling High-Dimensional Unit-Root Time Series. (arXiv:2005.03496v1 [stat.ME]) By arxiv.org Published On :: In this paper, we propose a new procedure to build a structural-factor model for a vector unit-root time series. For a $p$-dimensional unit-root process, we assume that each component consists of a set of common factors, which may be unit-root non-stationary, and a set of stationary components, which contain the cointegrations among the unit-root processes. To further reduce the dimensionality, we also postulate that the stationary part of the series is a nonsingular linear transformation of certain common factors and idiosyncratic white noise components as in Gao and Tsay (2019a, b). The estimation of linear loading spaces of the unit-root factors and the stationary components is achieved by an eigenanalysis of some nonnegative definite matrix, and the separation between the stationary factors and the white noises is based on an eigenanalysis and a projected principal component analysis. Asymptotic properties of the proposed method are established for both fixed $p$ and diverging $p$ as the sample size $n$ tends to infinity. Both simulated and real examples are used to demonstrate the performance of the proposed method in finite samples. Full Article
mo Generative Feature Replay with Orthogonal Weight Modification for Continual Learning. (arXiv:2005.03490v1 [cs.LG]) By arxiv.org Published On :: The ability of intelligent agents to learn and remember multiple tasks sequentially is crucial to achieving artificial general intelligence. Many continual learning (CL) methods have been proposed to overcome catastrophic forgetting. Catastrophic forgetting notoriously impedes the sequential learning of neural networks as the data of previous tasks are unavailable. In this paper we focus on class incremental learning, a challenging CL scenario, in which classes of each task are disjoint and task identity is unknown during test. For this scenario, generative replay is an effective strategy which generates and replays pseudo data for previous tasks to alleviate catastrophic forgetting. However, it is not trivial to learn a generative model continually for relatively complex data. Based on recently proposed orthogonal weight modification (OWM) algorithm which can keep previously learned input-output mappings invariant approximately when learning new tasks, we propose to directly generate and replay feature. Empirical results on image and text datasets show our method can improve OWM consistently by a significant margin while conventional generative replay always results in a negative effect. Our method also beats a state-of-the-art generative replay method and is competitive with a strong baseline based on real data storage. Full Article
mo Feature Selection Methods for Uplift Modeling. (arXiv:2005.03447v1 [cs.LG]) By arxiv.org Published On :: Uplift modeling is a predictive modeling technique that estimates the user-level incremental effect of a treatment using machine learning models. It is often used for targeting promotions and advertisements, as well as for the personalization of product offerings. In these applications, there are often hundreds of features available to build such models. Keeping all the features in a model can be costly and inefficient. Feature selection is an essential step in the modeling process for multiple reasons: improving the estimation accuracy by eliminating irrelevant features, accelerating model training and prediction speed, reducing the monitoring and maintenance workload for feature data pipeline, and providing better model interpretation and diagnostics capability. However, feature selection methods for uplift modeling have been rarely discussed in the literature. Although there are various feature selection methods for standard machine learning models, we will demonstrate that those methods are sub-optimal for solving the feature selection problem for uplift modeling. To address this problem, we introduce a set of feature selection methods designed specifically for uplift modeling, including both filter methods and embedded methods. To evaluate the effectiveness of the proposed feature selection methods, we use different uplift models and measure the accuracy of each model with a different number of selected features. We use both synthetic and real data to conduct these experiments. We also implemented the proposed filter methods in an open source Python package (CausalML). Full Article
mo Interpreting Deep Models through the Lens of Data. (arXiv:2005.03442v1 [cs.LG]) By arxiv.org Published On :: Identification of input data points relevant for the classifier (i.e. serve as the support vector) has recently spurred the interest of researchers for both interpretability as well as dataset debugging. This paper presents an in-depth analysis of the methods which attempt to identify the influence of these data points on the resulting classifier. To quantify the quality of the influence, we curated a set of experiments where we debugged and pruned the dataset based on the influence information obtained from different methods. To do so, we provided the classifier with mislabeled examples that hampered the overall performance. Since the classifier is a combination of both the data and the model, therefore, it is essential to also analyze these influences for the interpretability of deep learning models. Analysis of the results shows that some interpretability methods can detect mislabels better than using a random approach, however, contrary to the claim of these methods, the sample selection based on the training loss showed a superior performance. Full Article
mo SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation. (arXiv:2005.03403v1 [cs.LG]) By arxiv.org Published On :: We present SmartExchange, an algorithm-hardware co-design framework to trade higher-cost memory storage/access for lower-cost computation, for energy-efficient inference of deep neural networks (DNNs). We develop a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2. To our best knowledge, this algorithm is the first formulation that integrates three mainstream model compression ideas: sparsification or pruning, decomposition, and quantization, into one unified framework. The resulting sparse and readily-quantized DNN thus enjoys greatly reduced energy consumption in data movement as well as weight storage. On top of that, we further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency performance. Extensive experiments show that 1) on the algorithm level, SmartExchange outperforms state-of-the-art compression techniques, including merely sparsification or pruning, decomposition, and quantization, in various ablation studies based on nine DNN models and four datasets; and 2) on the hardware level, the proposed SmartExchange based accelerator can improve the energy efficiency by up to 6.7$ imes$ and the speedup by up to 19.2$ imes$ over four state-of-the-art DNN accelerators, when benchmarked on seven DNN models (including four standard DNNs, two compact DNN models, and one segmentation model) and three datasets. Full Article
mo CARL: Controllable Agent with Reinforcement Learning for Quadruped Locomotion. (arXiv:2005.03288v1 [cs.LG]) By arxiv.org Published On :: Motion synthesis in a dynamic environment has been a long-standing problem for character animation. Methods using motion capture data tend to scale poorly in complex environments because of their larger capturing and labeling requirement. Physics-based controllers are effective in this regard, albeit less controllable. In this paper, we present CARL, a quadruped agent that can be controlled with high-level directives and react naturally to dynamic environments. Starting with an agent that can imitate individual animation clips, we use Generative Adversarial Networks to adapt high-level controls, such as speed and heading, to action distributions that correspond to the original animations. Further fine-tuning through the deep reinforcement learning enables the agent to recover from unseen external perturbations while producing smooth transitions. It then becomes straightforward to create autonomous agents in dynamic environments by adding navigation modules over the entire process. We evaluate our approach by measuring the agent's ability to follow user control and provide a visual analysis of the generated motion to show its effectiveness. Full Article
mo Classification of pediatric pneumonia using chest X-rays by functional regression. (arXiv:2005.03243v1 [stat.AP]) By arxiv.org Published On :: An accurate and prompt diagnosis of pediatric pneumonia is imperative for successful treatment intervention. One approach to diagnose pneumonia cases is using radiographic data. In this article, we propose a novel parsimonious scalar-on-image classification model adopting the ideas of functional data analysis. Our main idea is to treat images as functional measurements and exploit underlying covariance structures to select basis functions; these bases are then used in approximating both image profiles and corresponding regression coefficient. We re-express the regression model into a standard generalized linear model where the functional principal component scores are treated as covariates. We apply the method to (1) classify pneumonia against healthy and viral against bacterial pneumonia patients, and (2) test the null effect about the association between images and responses. Extensive simulation studies show excellent numerical performance in terms of classification, hypothesis testing, and efficient computation. Full Article
mo Detecting Latent Communities in Network Formation Models. (arXiv:2005.03226v1 [econ.EM]) By arxiv.org Published On :: This paper proposes a logistic undirected network formation model which allows for assortative matching on observed individual characteristics and the presence of edge-wise fixed effects. We model the coefficients of observed characteristics to have a latent community structure and the edge-wise fixed effects to be of low rank. We propose a multi-step estimation procedure involving nuclear norm regularization, sample splitting, iterative logistic regression and spectral clustering to detect the latent communities. We show that the latent communities can be exactly recovered when the expected degree of the network is of order log n or higher, where n is the number of nodes in the network. The finite sample performance of the new estimation and inference methods is illustrated through both simulated and real datasets. Full Article
mo Model Reduction and Neural Networks for Parametric PDEs. (arXiv:2005.03180v1 [math.NA]) By arxiv.org Published On :: We develop a general framework for data-driven approximation of input-output maps between infinite-dimensional spaces. The proposed approach is motivated by the recent successes of neural networks and deep learning, in combination with ideas from model reduction. This combination results in a neural network approximation which, in principle, is defined on infinite-dimensional spaces and, in practice, is robust to the dimension of finite-dimensional approximations of these spaces required for computation. For a class of input-output maps, and suitably chosen probability measures on the inputs, we prove convergence of the proposed approximation methodology. Numerically we demonstrate the effectiveness of the method on a class of parametric elliptic PDE problems, showing convergence and robustness of the approximation scheme with respect to the size of the discretization, and compare our method with existing algorithms from the literature. Full Article
mo MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation. (arXiv:2005.03161v1 [stat.ML]) By arxiv.org Published On :: Model Stealing (MS) attacks allow an adversary with black-box access to a Machine Learning model to replicate its functionality, compromising the confidentiality of the model. Such attacks train a clone model by using the predictions of the target model for different inputs. The effectiveness of such attacks relies heavily on the availability of data necessary to query the target model. Existing attacks either assume partial access to the dataset of the target model or availability of an alternate dataset with semantic similarities. This paper proposes MAZE -- a data-free model stealing attack using zeroth-order gradient estimation. In contrast to prior works, MAZE does not require any data and instead creates synthetic data using a generative model. Inspired by recent works in data-free Knowledge Distillation (KD), we train the generative model using a disagreement objective to produce inputs that maximize disagreement between the clone and the target model. However, unlike the white-box setting of KD, where the gradient information is available, training a generator for model stealing requires performing black-box optimization, as it involves accessing the target model under attack. MAZE relies on zeroth-order gradient estimation to perform this optimization and enables a highly accurate MS attack. Our evaluation with four datasets shows that MAZE provides a normalized clone accuracy in the range of 0.91x to 0.99x, and outperforms even the recent attacks that rely on partial data (JBDA, clone accuracy 0.13x to 0.69x) and surrogate data (KnockoffNets, clone accuracy 0.52x to 0.97x). We also study an extension of MAZE in the partial-data setting and develop MAZE-PD, which generates synthetic data closer to the target distribution. MAZE-PD further improves the clone accuracy (0.97x to 1.0x) and reduces the query required for the attack by 2x-24x. Full Article