f

Density for solutions to stochastic differential equations with unbounded drift

Christian Olivera, Ciprian Tudor.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 3, 520--531.

Abstract:
Via a special transform and by using the techniques of the Malliavin calculus, we analyze the density of the solution to a stochastic differential equation with unbounded drift.




f

Spatially adaptive Bayesian image reconstruction through locally-modulated Markov random field models

Salem M. Al-Gezeri, Robert G. Aykroyd.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 3, 498--519.

Abstract:
The use of Markov random field (MRF) models has proven to be a fruitful approach in a wide range of image processing applications. It allows local texture information to be incorporated in a systematic and unified way and allows statistical inference theory to be applied giving rise to novel output summaries and enhanced image interpretation. A great advantage of such low-level approaches is that they lead to flexible models, which can be applied to a wide range of imaging problems without the need for significant modification. This paper proposes and explores the use of conditional MRF models for situations where multiple images are to be processed simultaneously, or where only a single image is to be reconstructed and a sequential approach is taken. Although the coupling of image intensity values is a special case of our approach, the main extension over previous proposals is to allow the direct coupling of other properties, such as smoothness or texture. This is achieved using a local modulating function which adjusts the influence of global smoothing without the need for a fully inhomogeneous prior model. Several modulating functions are considered and a detailed simulation study, motivated by remote sensing applications in archaeological geophysics, of conditional reconstruction is presented. The results demonstrate that a substantial improvement in the quality of the image reconstruction, in terms of errors and residuals, can be achieved using this approach, especially at locations with rapid changes in the underlying intensity.




f

Fractional backward stochastic variational inequalities with non-Lipschitz coefficient

Katarzyna Jańczak-Borkowska.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 3, 480--497.

Abstract:
We prove the existence and uniqueness of the solution of backward stochastic variational inequalities with respect to fractional Brownian motion and with non-Lipschitz coefficient. We assume that $H>1/2$.




f

A rank-based Cramér–von-Mises-type test for two samples

Jamye Curry, Xin Dang, Hailin Sang.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 3, 425--454.

Abstract:
We study a rank based univariate two-sample distribution-free test. The test statistic is the difference between the average of between-group rank distances and the average of within-group rank distances. This test statistic is closely related to the two-sample Cramér–von Mises criterion. They are different empirical versions of a same quantity for testing the equality of two population distributions. Although they may be different for finite samples, they share the same expected value, variance and asymptotic properties. The advantage of the new rank based test over the classical one is its ease to generalize to the multivariate case. Rather than using the empirical process approach, we provide a different easier proof, bringing in a different perspective and insight. In particular, we apply the Hájek projection and orthogonal decomposition technique in deriving the asymptotics of the proposed rank based statistic. A numerical study compares power performance of the rank formulation test with other commonly-used nonparametric tests and recommendations on those tests are provided. Lastly, we propose a multivariate extension of the test based on the spatial rank.




f

Influence measures for the Waring regression model

Luisa Rivas, Manuel Galea.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 402--424.

Abstract:
In this paper, we present a regression model where the response variable is a count data that follows a Waring distribution. The Waring regression model allows for analysis of phenomena where the Geometric regression model is inadequate, because the probability of success on each trial, $p$, is different for each individual and $p$ has an associated distribution. Estimation is performed by maximum likelihood, through the maximization of the $Q$-function using EM algorithm. Diagnostic measures are calculated for this model. To illustrate the results, an application to real data is presented. Some specific details are given in the Appendix of the paper.




f

A temporal perspective on the rate of convergence in first-passage percolation under a moment condition

Daniel Ahlberg.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 397--401.

Abstract:
We study the rate of convergence in the celebrated Shape Theorem in first-passage percolation, obtaining the precise asymptotic rate of decay for the probability of linear order deviations under a moment condition. Our results are presented from a temporal perspective and complement previous work by the same author, in which the rate of convergence was studied from the standard spatial perspective.




f

Hierarchical modelling of power law processes for the analysis of repairable systems with different truncation times: An empirical Bayes approach

Rodrigo Citton P. dos Reis, Enrico A. Colosimo, Gustavo L. Gilardoni.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 374--396.

Abstract:
In the data analysis from multiple repairable systems, it is usual to observe both different truncation times and heterogeneity among the systems. Among other reasons, the latter is caused by different manufacturing lines and maintenance teams of the systems. In this paper, a hierarchical model is proposed for the statistical analysis of multiple repairable systems under different truncation times. A reparameterization of the power law process is proposed in order to obtain a quasi-conjugate bayesian analysis. An empirical Bayes approach is used to estimate model hyperparameters. The uncertainty in the estimate of these quantities are corrected by using a parametric bootstrap approach. The results are illustrated in a real data set of failure times of power transformers from an electric company in Brazil.




f

Necessary and sufficient conditions for the convergence of the consistent maximal displacement of the branching random walk

Bastien Mallein.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 356--373.

Abstract:
Consider a supercritical branching random walk on the real line. The consistent maximal displacement is the smallest of the distances between the trajectories followed by individuals at the $n$th generation and the boundary of the process. Fang and Zeitouni, and Faraud, Hu and Shi proved that under some integrability conditions, the consistent maximal displacement grows almost surely at rate $lambda^{*}n^{1/3}$ for some explicit constant $lambda^{*}$. We obtain here a necessary and sufficient condition for this asymptotic behaviour to hold.




f

Failure rate of Birnbaum–Saunders distributions: Shape, change-point, estimation and robustness

Emilia Athayde, Assis Azevedo, Michelli Barros, Víctor Leiva.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 301--328.

Abstract:
The Birnbaum–Saunders (BS) distribution has been largely studied and applied. A random variable with BS distribution is a transformation of another random variable with standard normal distribution. Generalized BS distributions are obtained when the normally distributed random variable is replaced by another symmetrically distributed random variable. This allows us to obtain a wide class of positively skewed models with lighter and heavier tails than the BS model. Its failure rate admits several shapes, including the unimodal case, with its change-point being able to be used for different purposes. For example, to establish the reduction in a dose, and then in the cost of the medical treatment. We analyze the failure rates of generalized BS distributions obtained by the logistic, normal and Student-t distributions, considering their shape and change-point, estimating them, evaluating their robustness, assessing their performance by simulations, and applying the results to real data from different areas.




f

Modified information criterion for testing changes in skew normal model

Khamis K. Said, Wei Ning, Yubin Tian.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 280--300.

Abstract:
In this paper, we study the change point problem for the skew normal distribution model from the view of model selection problem. The detection procedure based on the modified information criterion (MIC) for change problem is proposed. Such a procedure has advantage in detecting the changes in early and late stage of a data comparing to the one based on the traditional Schwarz information criterion which is well known as Bayesian information criterion (BIC) by considering the complexity of the models. Due to the difficulty in deriving the analytic asymptotic distribution of the test statistic based on the MIC procedure, the bootstrap simulation is provided to obtain the critical values at the different significance levels. Simulations are conducted to illustrate the comparisons of performance between MIC, BIC and likelihood ratio test (LRT). Such an approach is applied on two stock market data sets to indicate the detection procedure.




f

The coreset variational Bayes (CVB) algorithm for mixture analysis

Qianying Liu, Clare A. McGrory, Peter W. J. Baxter.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 267--279.

Abstract:
The pressing need for improved methods for analysing and coping with big data has opened up a new area of research for statisticians. Image analysis is an area where there is typically a very large number of data points to be processed per image, and often multiple images are captured over time. These issues make it challenging to design methodology that is reliable and yet still efficient enough to be of practical use. One promising emerging approach for this problem is to reduce the amount of data that actually has to be processed by extracting what we call coresets from the full dataset; analysis is then based on the coreset rather than the whole dataset. Coresets are representative subsamples of data that are carefully selected via an adaptive sampling approach. We propose a new approach called coreset variational Bayes (CVB) for mixture modelling; this is an algorithm which can perform a variational Bayes analysis of a dataset based on just an extracted coreset of the data. We apply our algorithm to weed image analysis.




f

A brief review of optimal scaling of the main MCMC approaches and optimal scaling of additive TMCMC under non-regular cases

Kushal K. Dey, Sourabh Bhattacharya.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 222--266.

Abstract:
Transformation based Markov Chain Monte Carlo (TMCMC) was proposed by Dutta and Bhattacharya ( Statistical Methodology 16 (2014) 100–116) as an efficient alternative to the Metropolis–Hastings algorithm, especially in high dimensions. The main advantage of this algorithm is that it simultaneously updates all components of a high dimensional parameter using appropriate move types defined by deterministic transformation of a single random variable. This results in reduction in time complexity at each step of the chain and enhances the acceptance rate. In this paper, we first provide a brief review of the optimal scaling theory for various existing MCMC approaches, comparing and contrasting them with the corresponding TMCMC approaches.The optimal scaling of the simplest form of TMCMC, namely additive TMCMC , has been studied extensively for the Gaussian proposal density in Dey and Bhattacharya (2017a). Here, we discuss diffusion-based optimal scaling behavior of additive TMCMC for non-Gaussian proposal densities—in particular, uniform, Student’s $t$ and Cauchy proposals. Although we could not formally prove our diffusion result for the Cauchy proposal, simulation based results lead us to conjecture that at least the recipe for obtaining general optimal scaling and optimal acceptance rate holds for the Cauchy case as well. We also consider diffusion based optimal scaling of TMCMC when the target density is discontinuous. Such non-regular situations have been studied in the case of Random Walk Metropolis Hastings (RWMH) algorithm by Neal and Roberts ( Methodology and Computing in Applied Probability 13 (2011) 583–601) using expected squared jumping distance (ESJD), but the diffusion theory based scaling has not been considered. We compare our diffusion based optimally scaled TMCMC approach with the ESJD based optimally scaled RWM with simulation studies involving several target distributions and proposal distributions including the challenging Cauchy proposal case, showing that additive TMCMC outperforms RWMH in almost all cases considered.




f

Simple tail index estimation for dependent and heterogeneous data with missing values

Ivana Ilić, Vladica M. Veličković.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 1, 192--203.

Abstract:
Financial returns are known to be nonnormal and tend to have fat-tailed distribution. Also, the dependence of large values in a stochastic process is an important topic in risk, insurance and finance. In the presence of missing values, we deal with the asymptotic properties of a simple “median” estimator of the tail index based on random variables with the heavy-tailed distribution function and certain dependence among the extremes. Weak consistency and asymptotic normality of the proposed estimator are established. The estimator is a special case of a well-known estimator defined in Bacro and Brito [ Statistics & Decisions 3 (1993) 133–143]. The advantage of the estimator is its robustness against deviations and compared to Hill’s, it is less affected by the fluctuations related to the maximum of the sample or by the presence of outliers. Several examples are analyzed in order to support the proofs.




f

The equivalence of dynamic and static asset allocations under the uncertainty caused by Poisson processes

Yong-Chao Zhang, Na Zhang.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 1, 184--191.

Abstract:
We investigate the equivalence of dynamic and static asset allocations in the case where the price process of a risky asset is driven by a Poisson process. Under some mild conditions, we obtain a necessary and sufficient condition for the equivalence of dynamic and static asset allocations. In addition, we provide a simple sufficient condition for the equivalence.




f

An estimation method for latent traits and population parameters in Nominal Response Model

Caio L. N. Azevedo, Dalton F. Andrade

Source: Braz. J. Probab. Stat., Volume 24, Number 3, 415--433.

Abstract:
The nominal response model (NRM) was proposed by Bock [ Psychometrika 37 (1972) 29–51] in order to improve the latent trait (ability) estimation in multiple choice tests with nominal items. When the item parameters are known, expectation a posteriori or maximum a posteriori methods are commonly employed to estimate the latent traits, considering a standard symmetric normal distribution as the latent traits prior density. However, when this item set is presented to a new group of examinees, it is not only necessary to estimate their latent traits but also the population parameters of this group. This article has two main purposes: first, to develop a Monte Carlo Markov Chain algorithm to estimate both latent traits and population parameters concurrently. This algorithm comprises the Metropolis–Hastings within Gibbs sampling algorithm (MHWGS) proposed by Patz and Junker [ Journal of Educational and Behavioral Statistics 24 (1999b) 346–366]. Second, to compare, in the latent trait recovering, the performance of this method with three other methods: maximum likelihood, expectation a posteriori and maximum a posteriori. The comparisons were performed by varying the total number of items (NI), the number of categories and the values of the mean and the variance of the latent trait distribution. The results showed that MHWGS outperforms the other methods concerning the latent traits estimation as well as it recoveries properly the population parameters. Furthermore, we found that NI accounts for the highest percentage of the variability in the accuracy of latent trait estimation.




f

Unlikeness is us : fourteen from the Exeter book

Exeter book. Selections. English
9781554471751 (softcover)




f

NDN coping mechanisms : notes from the field

Belcourt, Billy-Ray, author.
9781487005771 (softcover)




f

The Grand River watershed : a folk ecology : poems

Houle, Karen, author.
9781554471843 paperback




f

Nights below Foord Street : literature and popular culture in postindustrial Nova Scotia

Thompson, Peter, 1981- author.
0773559345




f

Heavy metalloid music : the story of Simply Saucer

Locke, Jesse, 1983- author.
9781771613682 (Paper)




f

Public-private partnerships in Canada : law, policy and value for money

Murphy, Timothy J. (Timothy John), author.
9780433457985 (Cloth)




f

Reclaiming indigenous governance : reflections and insights from Australia, Canada, New Zealand, and the United States

9780816539970 (paperback)




f

Globalizing capital : a history of the international monetary system

Eichengreen, Barry J., author.
9780691193908 (paperback)




f

Fully grown : why a stagnant economy is a sign of success

Vollrath, Dietrich, author.
9780226666006 hardcover




f

Documenting rebellions : a study of four lesbian and gay archives in queer times

Sheffield, Rebecka Taves, author.
9781634000918 paperback




f

Figuring racism in medieval Christianity

Kaplan, M. Lindsay, author.
9780190678241 hardcover alkaline paper




f

Can $p$-values be meaningfully interpreted without random sampling?

Norbert Hirschauer, Sven Grüner, Oliver Mußhoff, Claudia Becker, Antje Jantsch.

Source: Statistics Surveys, Volume 14, 71--91.

Abstract:
Besides the inferential errors that abound in the interpretation of $p$-values, the probabilistic pre-conditions (i.e. random sampling or equivalent) for using them at all are not often met by observational studies in the social sciences. This paper systematizes different sampling designs and discusses the restrictive requirements of data collection that are the indispensable prerequisite for using $p$-values.




f

Flexible, boundary adapted, nonparametric methods for the estimation of univariate piecewise-smooth functions

Umberto Amato, Anestis Antoniadis, Italia De Feis.

Source: Statistics Surveys, Volume 14, 32--70.

Abstract:
We present and compare some nonparametric estimation methods (wavelet and/or spline-based) designed to recover a one-dimensional piecewise-smooth regression function in both a fixed equidistant or not equidistant design regression model and a random design model. Wavelet methods are known to be very competitive in terms of denoising and compression, due to the simultaneous localization property of a function in time and frequency. However, boundary assumptions, such as periodicity or symmetry, generate bias and artificial wiggles which degrade overall accuracy. Simple methods have been proposed in the literature for reducing the bias at the boundaries. We introduce new ones based on adaptive combinations of two estimators. The underlying idea is to combine a highly accurate method for non-regular functions, e.g., wavelets, with one well behaved at boundaries, e.g., Splines or Local Polynomial. We provide some asymptotic optimal results supporting our approach. All the methods can handle data with a random design. We also sketch some generalization to the multidimensional setting. To study the performance of the proposed approaches we have conducted an extensive set of simulations on synthetic data. An interesting regression analysis of two real data applications using these procedures unambiguously demonstrates their effectiveness.




f

Estimating the size of a hidden finite set: Large-sample behavior of estimators

Si Cheng, Daniel J. Eck, Forrest W. Crawford.

Source: Statistics Surveys, Volume 14, 1--31.

Abstract:
A finite set is “hidden” if its elements are not directly enumerable or if its size cannot be ascertained via a deterministic query. In public health, epidemiology, demography, ecology and intelligence analysis, researchers have developed a wide variety of indirect statistical approaches, under different models for sampling and observation, for estimating the size of a hidden set. Some methods make use of random sampling with known or estimable sampling probabilities, and others make structural assumptions about relationships (e.g. ordering or network information) between the elements that comprise the hidden set. In this review, we describe models and methods for learning about the size of a hidden finite set, with special attention to asymptotic properties of estimators. We study the properties of these methods under two asymptotic regimes, “infill” in which the number of fixed-size samples increases, but the population size remains constant, and “outfill” in which the sample size and population size grow together. Statistical properties under these two regimes can be dramatically different.




f

Scalar-on-function regression for predicting distal outcomes from intensively gathered longitudinal data: Interpretability for applied scientists

John J. Dziak, Donna L. Coffman, Matthew Reimherr, Justin Petrovich, Runze Li, Saul Shiffman, Mariya P. Shiyko.

Source: Statistics Surveys, Volume 13, 150--180.

Abstract:
Researchers are sometimes interested in predicting a distal or external outcome (such as smoking cessation at follow-up) from the trajectory of an intensively recorded longitudinal variable (such as urge to smoke). This can be done in a semiparametric way via scalar-on-function regression. However, the resulting fitted coefficient regression function requires special care for correct interpretation, as it represents the joint relationship of time points to the outcome, rather than a marginal or cross-sectional relationship. We provide practical guidelines, based on experience with scientific applications, for helping practitioners interpret their results and illustrate these ideas using data from a smoking cessation study.




f

PLS for Big Data: A unified parallel algorithm for regularised group PLS

Pierre Lafaye de Micheaux, Benoît Liquet, Matthew Sutton.

Source: Statistics Surveys, Volume 13, 119--149.

Abstract:
Partial Least Squares (PLS) methods have been heavily exploited to analyse the association between two blocks of data. These powerful approaches can be applied to data sets where the number of variables is greater than the number of observations and in the presence of high collinearity between variables. Different sparse versions of PLS have been developed to integrate multiple data sets while simultaneously selecting the contributing variables. Sparse modeling is a key factor in obtaining better estimators and identifying associations between multiple data sets. The cornerstone of the sparse PLS methods is the link between the singular value decomposition (SVD) of a matrix (constructed from deflated versions of the original data) and least squares minimization in linear regression. We review four popular PLS methods for two blocks of data. A unified algorithm is proposed to perform all four types of PLS including their regularised versions. We present various approaches to decrease the computation time and show how the whole procedure can be scalable to big data sets. The bigsgPLS R package implements our unified algorithm and is available at https://github.com/matt-sutton/bigsgPLS .




f

Halfspace depth and floating body

Stanislav Nagy, Carsten Schütt, Elisabeth M. Werner.

Source: Statistics Surveys, Volume 13, 52--118.

Abstract:
Little known relations of the renown concept of the halfspace depth for multivariate data with notions from convex and affine geometry are discussed. Maximum halfspace depth may be regarded as a measure of symmetry for random vectors. As such, the maximum depth stands as a generalization of a measure of symmetry for convex sets, well studied in geometry. Under a mild assumption, the upper level sets of the halfspace depth coincide with the convex floating bodies of measures used in the definition of the affine surface area for convex bodies in Euclidean spaces. These connections enable us to partially resolve some persistent open problems regarding theoretical properties of the depth.




f

Pitfalls of significance testing and $p$-value variability: An econometrics perspective

Norbert Hirschauer, Sven Grüner, Oliver Mußhoff, Claudia Becker.

Source: Statistics Surveys, Volume 12, 136--172.

Abstract:
Data on how many scientific findings are reproducible are generally bleak and a wealth of papers have warned against misuses of the $p$-value and resulting false findings in recent years. This paper discusses the question of what we can(not) learn from the $p$-value, which is still widely considered as the gold standard of statistical validity. We aim to provide a non-technical and easily accessible resource for statistical practitioners who wish to spot and avoid misinterpretations and misuses of statistical significance tests. For this purpose, we first classify and describe the most widely discussed (“classical”) pitfalls of significance testing, and review published work on these misuses with a focus on regression-based “confirmatory” study. This includes a description of the single-study bias and a simulation-based illustration of how proper meta-analysis compares to misleading significance counts (“vote counting”). Going beyond the classical pitfalls, we also use simulation to provide intuition that relying on the statistical estimate “$p$-value” as a measure of evidence without considering its sample-to-sample variability falls short of the mark even within an otherwise appropriate interpretation. We conclude with a discussion of the exigencies of informed approaches to statistical inference and corresponding institutional reforms.




f

A review of dynamic network models with latent variables

Bomin Kim, Kevin H. Lee, Lingzhou Xue, Xiaoyue Niu.

Source: Statistics Surveys, Volume 12, 105--135.

Abstract:
We present a selective review of statistical modeling of dynamic networks. We focus on models with latent variables, specifically, the latent space models and the latent class models (or stochastic blockmodels), which investigate both the observed features and the unobserved structure of networks. We begin with an overview of the static models, and then we introduce the dynamic extensions. For each dynamic model, we also discuss its applications that have been studied in the literature, with the data source listed in Appendix. Based on the review, we summarize a list of open problems and challenges in dynamic network modeling with latent variables.




f

Variable selection methods for model-based clustering

Michael Fop, Thomas Brendan Murphy.

Source: Statistics Surveys, Volume 12, 18--65.

Abstract:
Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received a lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the interpretation of the clustering results. This review provides a summary of the methods developed for variable selection in model-based clustering. Existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples.




f

A design-sensitive approach to fitting regression models with complex survey data

Phillip S. Kott.

Source: Statistics Surveys, Volume 12, 1--17.

Abstract:
Fitting complex survey data to regression equations is explored under a design-sensitive model-based framework. A robust version of the standard model assumes that the expected value of the difference between the dependent variable and its model-based prediction is zero no matter what the values of the explanatory variables. The extended model assumes only that the difference is uncorrelated with the covariates. Little is assumed about the error structure of this difference under either model other than independence across primary sampling units. The standard model often fails in practice, but the extended model very rarely does. Under this framework some of the methods developed in the conventional design-based, pseudo-maximum-likelihood framework, such as fitting weighted estimating equations and sandwich mean-squared-error estimation, are retained but their interpretations change. Few of the ideas here are new to the refereed literature. The goal instead is to collect those ideas and put them into a unified conceptual framework.




f

A comparison of spatial predictors when datasets could be very large

Jonathan R. Bradley, Noel Cressie, Tao Shi.

Source: Statistics Surveys, Volume 10, 100--131.

Abstract:
In this article, we review and compare a number of methods of spatial prediction, where each method is viewed as an algorithm that processes spatial data. To demonstrate the breadth of available choices, we consider both traditional and more-recently-introduced spatial predictors. Specifically, in our exposition we review: traditional stationary kriging, smoothing splines, negative-exponential distance-weighting, fixed rank kriging, modified predictive processes, a stochastic partial differential equation approach, and lattice kriging. This comparison is meant to provide a service to practitioners wishing to decide between spatial predictors. Hence, we provide technical material for the unfamiliar, which includes the definition and motivation for each (deterministic and stochastic) spatial predictor. We use a benchmark dataset of $mathrm{CO}_{2}$ data from NASA’s AIRS instrument to address computational efficiencies that include CPU time and memory usage. Furthermore, the predictive performance of each spatial predictor is assessed empirically using a hold-out subset of the AIRS data.




f

Fundamentals of cone regression

Mariella Dimiccoli.

Source: Statistics Surveys, Volume 10, 53--99.

Abstract:
Cone regression is a particular case of quadratic programming that minimizes a weighted sum of squared residuals under a set of linear inequality constraints. Several important statistical problems such as isotonic, concave regression or ANOVA under partial orderings, just to name a few, can be considered as particular instances of the cone regression problem. Given its relevance in Statistics, this paper aims to address the fundamentals of cone regression from a theoretical and practical point of view. Several formulations of the cone regression problem are considered and, focusing on the particular case of concave regression as an example, several algorithms are analyzed and compared both qualitatively and quantitatively through numerical simulations. Several improvements to enhance numerical stability and bound the computational cost are proposed. For each analyzed algorithm, the pseudo-code and its corresponding code in Matlab are provided. The results from this study demonstrate that the choice of the optimization approach strongly impacts the numerical performances. It is also shown that methods are not currently available to solve efficiently cone regression problems with large dimension (more than many thousands of points). We suggest further research to fill this gap by exploiting and adapting classical multi-scale strategy to compute an approximate solution.




f

A survey of bootstrap methods in finite population sampling

Zeinab Mashreghi, David Haziza, Christian Léger.

Source: Statistics Surveys, Volume 10, 1--52.

Abstract:
We review bootstrap methods in the context of survey data where the effect of the sampling design on the variability of estimators has to be taken into account. We present the methods in a unified way by classifying them in three classes: pseudo-population, direct, and survey weights methods. We cover variance estimation and the construction of confidence intervals for stratified simple random sampling as well as some unequal probability sampling designs. We also address the problem of variance estimation in presence of imputation to compensate for item non-response.




f

A unified treatment for non-asymptotic and asymptotic approaches to minimax signal detection

Clément Marteau, Theofanis Sapatinas.

Source: Statistics Surveys, Volume 9, 253--297.

Abstract:
We are concerned with minimax signal detection. In this setting, we discuss non-asymptotic and asymptotic approaches through a unified treatment. In particular, we consider a Gaussian sequence model that contains classical models as special cases, such as, direct, well-posed inverse and ill-posed inverse problems. Working with certain ellipsoids in the space of squared-summable sequences of real numbers, with a ball of positive radius removed, we compare the construction of lower and upper bounds for the minimax separation radius (non-asymptotic approach) and the minimax separation rate (asymptotic approach) that have been proposed in the literature. Some additional contributions, bringing to light links between non-asymptotic and asymptotic approaches to minimax signal, are also presented. An example of a mildly ill-posed inverse problem is used for illustrative purposes. In particular, it is shown that tools used to derive ‘asymptotic’ results can be exploited to draw ‘non-asymptotic’ conclusions, and vice-versa. In order to enhance our understanding of these two minimax signal detection paradigms, we bring into light hitherto unknown similarities and links between non-asymptotic and asymptotic approaches.




f

Statistical inference for dynamical systems: A review

Kevin McGoff, Sayan Mukherjee, Natesh Pillai.

Source: Statistics Surveys, Volume 9, 209--252.

Abstract:
The topic of statistical inference for dynamical systems has been studied widely across several fields. In this survey we focus on methods related to parameter estimation for nonlinear dynamical systems. Our objective is to place results across distinct disciplines in a common setting and highlight opportunities for further research.




f

Some models and methods for the analysis of observational data

José A. Ferreira.

Source: Statistics Surveys, Volume 9, 106--208.

Abstract:
This article provides a concise and essentially self-contained exposition of some of the most important models and non-parametric methods for the analysis of observational data, and a substantial number of illustrations of their application. Although for the most part our presentation follows P. Rosenbaum’s book, “Observational Studies”, and naturally draws on related literature, it contains original elements and simplifies and generalizes some basic results. The illustrations, based on simulated data, show the methods at work in some detail, highlighting pitfalls and emphasizing certain subjective aspects of the statistical analyses.




f

$M$-functionals of multivariate scatter

Lutz Dümbgen, Markus Pauly, Thomas Schweizer.

Source: Statistics Surveys, Volume 9, 32--105.

Abstract:
This survey provides a self-contained account of $M$-estimation of multivariate scatter. In particular, we present new proofs for existence of the underlying $M$-functionals and discuss their weak continuity and differentiability. This is done in a rather general framework with matrix-valued random variables. By doing so we reveal a connection between Tyler’s (1987a) $M$-functional of scatter and the estimation of proportional covariance matrices. Moreover, this general framework allows us to treat a new class of scatter estimators, based on symmetrizations of arbitrary order. Finally these results are applied to $M$-estimation of multivariate location and scatter via multivariate $t$-distributions.




f

Semi-parametric estimation for conditional independence multivariate finite mixture models

Didier Chauveau, David R. Hunter, Michael Levine.

Source: Statistics Surveys, Volume 9, 1--31.

Abstract:
The conditional independence assumption for nonparametric multivariate finite mixture models, a weaker form of the well-known conditional independence assumption for random effects models for longitudinal data, is the subject of an increasing number of theoretical and algorithmic developments in the statistical literature. After presenting a survey of this literature, including an in-depth discussion of the all-important identifiability results, this article describes and extends an algorithm for estimation of the parameters in these models. The algorithm works for any number of components in three or more dimensions. It possesses a descent property and can be easily adapted to situations where the data are grouped in blocks of conditionally independent variables. We discuss how to adapt this algorithm to various location-scale models that link component densities, and we even adapt it to a particular class of univariate mixture problems in which the components are assumed symmetric. We give a bandwidth selection procedure for our algorithm. Finally, we demonstrate the effectiveness of our algorithm using a simulation study and two psychometric datasets.




f

Adaptive clinical trial designs for phase I cancer studies

Oleksandr Sverdlov, Weng Kee Wong, Yevgen Ryeznik.

Source: Statistics Surveys, Volume 8, 2--44.

Abstract:
Adaptive clinical trials are becoming increasingly popular research designs for clinical investigation. Adaptive designs are particularly useful in phase I cancer studies where clinical data are scant and the goals are to assess the drug dose-toxicity profile and to determine the maximum tolerated dose while minimizing the number of study patients treated at suboptimal dose levels. In the current work we give an overview of adaptive design methods for phase I cancer trials. We find that modern statistical literature is replete with novel adaptive designs that have clearly defined objectives and established statistical properties, and are shown to outperform conventional dose finding methods such as the 3+3 design, both in terms of statistical efficiency and in terms of minimizing the number of patients treated at highly toxic or nonefficacious doses. We discuss statistical, logistical, and regulatory aspects of these designs and present some links to non-commercial statistical software for implementing these methods in practice.




f

Errata: A survey of Bayesian predictive methods for model assessment, selection and comparison

Aki Vehtari, Janne Ojanen.

Source: Statistics Surveys, Volume 8, , 1--1.

Abstract:
Errata for “A survey of Bayesian predictive methods for model assessment, selection and comparison” by A. Vehtari and J. Ojanen, Statistics Surveys , 6 (2012), 142–228. doi:10.1214/12-SS102.




f

Analyzing complex functional brain networks: Fusing statistics and network science to understand the brain

Sean L. Simpson, F. DuBois Bowman, Paul J. Laurienti

Source: Statist. Surv., Volume 7, 1--36.

Abstract:
Complex functional brain network analyses have exploded over the last decade, gaining traction due to their profound clinical implications. The application of network science (an interdisciplinary offshoot of graph theory) has facilitated these analyses and enabled examining the brain as an integrated system that produces complex behaviors. While the field of statistics has been integral in advancing activation analyses and some connectivity analyses in functional neuroimaging research, it has yet to play a commensurate role in complex network analyses. Fusing novel statistical methods with network-based functional neuroimage analysis will engender powerful analytical tools that will aid in our understanding of normal brain function as well as alterations due to various brain disorders. Here we survey widely used statistical and network science tools for analyzing fMRI network data and discuss the challenges faced in filling some of the remaining methodological gaps. When applied and interpreted correctly, the fusion of network scientific and statistical methods has a chance to revolutionize the understanding of brain function.




f

A survey of Bayesian predictive methods for model assessment, selection and comparison

Aki Vehtari, Janne Ojanen

Source: Statist. Surv., Volume 6, 142--228.

Abstract:
To date, several methods exist in the statistical literature for model assessment, which purport themselves specifically as Bayesian predictive methods. The decision theoretic assumptions on which these methods are based are not always clearly stated in the original articles, however. The aim of this survey is to provide a unified review of Bayesian predictive model assessment and selection methods, and of methods closely related to them. We review the various assumptions that are made in this context and discuss the connections between different approaches, with an emphasis on how each method approximates the expected utility of using a Bayesian model for the purpose of predicting future data.




f

The theory and application of penalized methods or Reproducing Kernel Hilbert Spaces made easy

Nancy Heckman

Source: Statist. Surv., Volume 6, 113--141.

Abstract:
The popular cubic smoothing spline estimate of a regression function arises as the minimizer of the penalized sum of squares $sum_{j}(Y_{j}-mu(t_{j}))^{2}+lambda int_{a}^{b}[mu''(t)]^{2},dt$, where the data are $t_{j},Y_{j}$, $j=1,ldots,n$. The minimization is taken over an infinite-dimensional function space, the space of all functions with square integrable second derivatives. But the calculations can be carried out in a finite-dimensional space. The reduction from minimizing over an infinite dimensional space to minimizing over a finite dimensional space occurs for more general objective functions: the data may be related to the function $mu$ in another way, the sum of squares may be replaced by a more suitable expression, or the penalty, $int_{a}^{b}[mu''(t)]^{2},dt$, might take a different form. This paper reviews the Reproducing Kernel Hilbert Space structure that provides a finite-dimensional solution for a general minimization problem. Particular attention is paid to the construction and study of the Reproducing Kernel Hilbert Space corresponding to a penalty based on a linear differential operator. In this case, one can often calculate the minimizer explicitly, using Green’s functions.




f

Statistical inference for disordered sphere packings

Jeffrey Picka

Source: Statist. Surv., Volume 6, 74--112.

Abstract:
This paper gives an overview of statistical inference for disordered sphere packing processes. These processes are used extensively in physics and engineering in order to represent the internal structure of composite materials, packed bed reactors, and powders at rest, and are used as initial arrangements of grains in the study of avalanches and other problems involving powders in motion. Packing processes are spatial processes which are neither stationary nor ergodic. Classical spatial statistical models and procedures cannot be applied to these processes, but alternative models and procedures can be developed based on ideas from statistical physics. Most of the development of models and statistics for sphere packings has been undertaken by scientists and engineers. This review summarizes their results from an inferential perspective.