it

Propensity score weighting for causal inference with multiple treatments

Fan Li, Fan Li.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2389--2415.

Abstract:
Causal or unconfounded descriptive comparisons between multiple groups are common in observational studies. Motivated from a racial disparity study in health services research, we propose a unified propensity score weighting framework, the balancing weights, for estimating causal effects with multiple treatments. These weights incorporate the generalized propensity scores to balance the weighted covariate distribution of each treatment group, all weighted toward a common prespecified target population. The class of balancing weights include several existing approaches such as the inverse probability weights and trimming weights as special cases. Within this framework, we propose a set of target estimands based on linear contrasts. We further develop the generalized overlap weights, constructed as the product of the inverse probability weights and the harmonic mean of the generalized propensity scores. The generalized overlap weighting scheme corresponds to the target population with the most overlap in covariates across the multiple treatments. These weights are bounded and thus bypass the problem of extreme propensities. We show that the generalized overlap weights minimize the total asymptotic variance of the moment weighting estimators for the pairwise contrasts within the class of balancing weights. We consider two balance check criteria and propose a new sandwich variance estimator for estimating the causal effects with generalized overlap weights. We apply these methods to study the racial disparities in medical expenditure between several racial groups using the 2009 Medical Expenditure Panel Survey (MEPS) data. Simulations were carried out to compare with existing methods.




it

Predicting paleoclimate from compositional data using multivariate Gaussian process inverse prediction

John R. Tipton, Mevin B. Hooten, Connor Nolan, Robert K. Booth, Jason McLachlan.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2363--2388.

Abstract:
Multivariate compositional count data arise in many applications including ecology, microbiology, genetics and paleoclimate. A frequent question in the analysis of multivariate compositional count data is what underlying values of a covariate(s) give rise to the observed composition. Learning the relationship between covariates and the compositional count allows for inverse prediction of unobserved covariates given compositional count observations. Gaussian processes provide a flexible framework for modeling functional responses with respect to a covariate without assuming a functional form. Many scientific disciplines use Gaussian process approximations to improve prediction and make inference on latent processes and parameters. When prediction is desired on unobserved covariates given realizations of the response variable, this is called inverse prediction. Because inverse prediction is often mathematically and computationally challenging, predicting unobserved covariates often requires fitting models that are different from the hypothesized generative model. We present a novel computational framework that allows for efficient inverse prediction using a Gaussian process approximation to generative models. Our framework enables scientific learning about how the latent processes co-vary with respect to covariates while simultaneously providing predictions of missing covariates. The proposed framework is capable of efficiently exploring the high dimensional, multi-modal latent spaces that arise in the inverse problem. To demonstrate flexibility, we apply our method in a generalized linear model framework to predict latent climate states given multivariate count data. Based on cross-validation, our model has predictive skill competitive with current methods while simultaneously providing formal, statistical inference on the underlying community dynamics of the biological system previously not available.




it

A latent discrete Markov random field approach to identifying and classifying historical forest communities based on spatial multivariate tree species counts

Stephen Berg, Jun Zhu, Murray K. Clayton, Monika E. Shea, David J. Mladenoff.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2312--2340.

Abstract:
The Wisconsin Public Land Survey database describes historical forest composition at high spatial resolution and is of interest in ecological studies of forest composition in Wisconsin just prior to significant Euro-American settlement. For such studies it is useful to identify recurring subpopulations of tree species known as communities, but standard clustering approaches for subpopulation identification do not account for dependence between spatially nearby observations. Here, we develop and fit a latent discrete Markov random field model for the purpose of identifying and classifying historical forest communities based on spatially referenced multivariate tree species counts across Wisconsin. We show empirically for the actual dataset and through simulation that our latent Markov random field modeling approach improves prediction and parameter estimation performance. For model fitting we introduce a new stochastic approximation algorithm which enables computationally efficient estimation and classification of large amounts of spatial multivariate count data.




it

Fitting a deeply nested hierarchical model to a large book review dataset using a moment-based estimator

Ningshan Zhang, Kyle Schmaus, Patrick O. Perry.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2260--2288.

Abstract:
We consider a particular instance of a common problem in recommender systems, using a database of book reviews to inform user-targeted recommendations. In our dataset, books are categorized into genres and subgenres. To exploit this nested taxonomy, we use a hierarchical model that enables information pooling across across similar items at many levels within the genre hierarchy. The main challenge in deploying this model is computational. The data sizes are large and fitting the model at scale using off-the-shelf maximum likelihood procedures is prohibitive. To get around this computational bottleneck, we extend a moment-based fitting procedure proposed for fitting single-level hierarchical models to the general case of arbitrarily deep hierarchies. This extension is an order of magnitude faster than standard maximum likelihood procedures. The fitting method can be deployed beyond recommender systems to general contexts with deeply nested hierarchical generalized linear mixed models.




it

Joint model of accelerated failure time and mechanistic nonlinear model for censored covariates, with application in HIV/AIDS

Hongbin Zhang, Lang Wu.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2140--2157.

Abstract:
For a time-to-event outcome with censored time-varying covariates, a joint Cox model with a linear mixed effects model is the standard modeling approach. In some applications such as AIDS studies, mechanistic nonlinear models are available for some covariate process such as viral load during anti-HIV treatments, derived from the underlying data-generation mechanisms and disease progression. Such a mechanistic nonlinear covariate model may provide better-predicted values when the covariates are left censored or mismeasured. When the focus is on the impact of the time-varying covariate process on the survival outcome, an accelerated failure time (AFT) model provides an excellent alternative to the Cox proportional hazard model since an AFT model is formulated to allow the influence of the outcome by the entire covariate process. In this article, we consider a nonlinear mixed effects model for the censored covariates in an AFT model, implemented using a Monte Carlo EM algorithm, under the framework of a joint model for simultaneous inference. We apply the joint model to an HIV/AIDS data to gain insights for assessing the association between viral load and immunological restoration during antiretroviral therapy. Simulation is conducted to compare model performance when the covariate model and the survival model are misspecified.




it

Fire seasonality identification with multimodality tests

Jose Ameijeiras-Alonso, Akli Benali, Rosa M. Crujeiras, Alberto Rodríguez-Casal, José M. C. Pereira.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2120--2139.

Abstract:
Understanding the role of vegetation fires in the Earth system is an important environmental problem. Although fire occurrence is influenced by natural factors, human activity related to land use and management has altered the temporal patterns of fire in several regions of the world. Hence, for a better insight into fires regimes it is of special interest to analyze where human activity has altered fire seasonality. For doing so, multimodality tests are a useful tool for determining the number of annual fire peaks. The periodicity of fires and their complex distributional features motivate the use of nonparametric circular statistics. The unsatisfactory performance of previous circular nonparametric proposals for testing multimodality justifies the introduction of a new approach, considering an adapted version of the excess mass statistic, jointly with a bootstrap calibration algorithm. A systematic application of the test on the Russia–Kazakhstan area is presented in order to determine how many fire peaks can be identified in this region. A False Discovery Rate correction, accounting for the spatial dependence of the data, is also required.




it

Statistical inference for partially observed branching processes with application to cell lineage tracking of in vivo hematopoiesis

Jason Xu, Samson Koelle, Peter Guttorp, Chuanfeng Wu, Cynthia Dunbar, Janis L. Abkowitz, Vladimir N. Minin.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2091--2119.

Abstract:
Single-cell lineage tracking strategies enabled by recent experimental technologies have produced significant insights into cell fate decisions, but lack the quantitative framework necessary for rigorous statistical analysis of mechanistic models describing cell division and differentiation. In this paper, we develop such a framework with corresponding moment-based parameter estimation techniques for continuous-time, multi-type branching processes. Such processes provide a probabilistic model of how cells divide and differentiate, and we apply our method to study hematopoiesis , the mechanism of blood cell production. We derive closed-form expressions for higher moments in a general class of such models. These analytical results allow us to efficiently estimate parameters of much richer statistical models of hematopoiesis than those used in previous statistical studies. To our knowledge, the method provides the first rate inference procedure for fitting such models to time series data generated from cellular barcoding experiments. After validating the methodology in simulation studies, we apply our estimator to hematopoietic lineage tracking data from rhesus macaques. Our analysis provides a more complete understanding of cell fate decisions during hematopoiesis in nonhuman primates, which may be more relevant to human biology and clinical strategies than previous findings from murine studies. For example, in addition to previously estimated hematopoietic stem cell self-renewal rate, we are able to estimate fate decision probabilities and to compare structurally distinct models of hematopoiesis using cross validation. These estimates of fate decision probabilities and our model selection results should help biologists compare competing hypotheses about how progenitor cells differentiate. The methodology is transferrable to a large class of stochastic compartmental and multi-type branching models, commonly used in studies of cancer progression, epidemiology and many other fields.




it

A semiparametric modeling approach using Bayesian Additive Regression Trees with an application to evaluate heterogeneous treatment effects

Bret Zeldow, Vincent Lo Re III, Jason Roy.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1989--2010.

Abstract:
Bayesian Additive Regression Trees (BART) is a flexible machine learning algorithm capable of capturing nonlinearities between an outcome and covariates and interactions among covariates. We extend BART to a semiparametric regression framework in which the conditional expectation of an outcome is a function of treatment, its effect modifiers, and confounders. The confounders are allowed to have unspecified functional form, while treatment and effect modifiers that are directly related to the research question are given a linear form. The result is a Bayesian semiparametric linear regression model where the posterior distribution of the parameters of the linear part can be interpreted as in parametric Bayesian regression. This is useful in situations where a subset of the variables are of substantive interest and the others are nuisance variables that we would like to control for. An example of this occurs in causal modeling with the structural mean model (SMM). Under certain causal assumptions, our method can be used as a Bayesian SMM. Our methods are demonstrated with simulation studies and an application to dataset involving adults with HIV/Hepatitis C coinfection who newly initiate antiretroviral therapy. The methods are available in an R package called semibart.




it

Incorporating conditional dependence in latent class models for probabilistic record linkage: Does it matter?

Huiping Xu, Xiaochun Li, Changyu Shen, Siu L. Hui, Shaun Grannis.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1753--1790.

Abstract:
The conditional independence assumption of the Felligi and Sunter (FS) model in probabilistic record linkage is often violated when matching real-world data. Ignoring conditional dependence has been shown to seriously bias parameter estimates. However, in record linkage, the ultimate goal is to inform the match status of record pairs and therefore, record linkage algorithms should be evaluated in terms of matching accuracy. In the literature, more flexible models have been proposed to relax the conditional independence assumption, but few studies have assessed whether such accommodations improve matching accuracy. In this paper, we show that incorporating the conditional dependence appropriately yields comparable or improved matching accuracy than the FS model using three real-world data linkage examples. Through a simulation study, we further investigate when conditional dependence models provide improved matching accuracy. Our study shows that the FS model is generally robust to the conditional independence assumption and provides comparable matching accuracy as the more complex conditional dependence models. However, when the match prevalence approaches 0% or 100% and conditional dependence exists in the dominating class, it is necessary to address conditional dependence as the FS model produces suboptimal matching accuracy. The need to address conditional dependence becomes less important when highly discriminating fields are used. Our simulation study also shows that conditional dependence models with misspecified dependence structure could produce less accurate record matching than the FS model and therefore we caution against the blind use of conditional dependence models.




it

Sequential decision model for inference and prediction on nonuniform hypergraphs with application to knot matching from computational forestry

Seong-Hwan Jun, Samuel W. K. Wong, James V. Zidek, Alexandre Bouchard-Côté.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1678--1707.

Abstract:
In this paper, we consider the knot-matching problem arising in computational forestry. The knot-matching problem is an important problem that needs to be solved to advance the state of the art in automatic strength prediction of lumber. We show that this problem can be formulated as a quadripartite matching problem and develop a sequential decision model that admits efficient parameter estimation along with a sequential Monte Carlo sampler on graph matching that can be utilized for rapid sampling of graph matching. We demonstrate the effectiveness of our methods on 30 manually annotated boards and present findings from various simulation studies to provide further evidence supporting the efficacy of our methods.




it

Network classification with applications to brain connectomics

Jesús D. Arroyo Relión, Daniel Kessler, Elizaveta Levina, Stephan F. Taylor.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1648--1677.

Abstract:
While statistical analysis of a single network has received a lot of attention in recent years, with a focus on social networks, analysis of a sample of networks presents its own challenges which require a different set of analytic tools. Here we study the problem of classification of networks with labeled nodes, motivated by applications in neuroimaging. Brain networks are constructed from imaging data to represent functional connectivity between regions of the brain, and previous work has shown the potential of such networks to distinguish between various brain disorders, giving rise to a network classification problem. Existing approaches tend to either treat all edge weights as a long vector, ignoring the network structure, or focus on graph topology as represented by summary measures while ignoring the edge weights. Our goal is to design a classification method that uses both the individual edge information and the network structure of the data in a computationally efficient way, and that can produce a parsimonious and interpretable representation of differences in brain connectivity patterns between classes. We propose a graph classification method that uses edge weights as predictors but incorporates the network nature of the data via penalties that promote sparsity in the number of nodes, in addition to the usual sparsity penalties that encourage selection of edges. We implement the method via efficient convex optimization and provide a detailed analysis of data from two fMRI studies of schizophrenia.




it

Modeling seasonality and serial dependence of electricity price curves with warping functional autoregressive dynamics

Ying Chen, J. S. Marron, Jiejie Zhang.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1590--1616.

Abstract:
Electricity prices are high dimensional, serially dependent and have seasonal variations. We propose a Warping Functional AutoRegressive (WFAR) model that simultaneously accounts for the cross time-dependence and seasonal variations of the large dimensional data. In particular, electricity price curves are obtained by smoothing over the $24$ discrete hourly prices on each day. In the functional domain, seasonal phase variations are separated from level amplitude changes in a warping process with the Fisher–Rao distance metric, and the aligned (season-adjusted) electricity price curves are modeled in the functional autoregression framework. In a real application, the WFAR model provides superior out-of-sample forecast accuracy in both a normal functioning market, Nord Pool, and an extreme situation, the California market. The forecast performance as well as the relative accuracy improvement are stable for different markets and different time periods.




it

Distributional regression forests for probabilistic precipitation forecasting in complex terrain

Lisa Schlosser, Torsten Hothorn, Reto Stauffer, Achim Zeileis.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1564--1589.

Abstract:
To obtain a probabilistic model for a dependent variable based on some set of explanatory variables, a distributional approach is often adopted where the parameters of the distribution are linked to regressors. In many classical models this only captures the location of the distribution but over the last decade there has been increasing interest in distributional regression approaches modeling all parameters including location, scale and shape. Notably, so-called nonhomogeneous Gaussian regression (NGR) models both mean and variance of a Gaussian response and is particularly popular in weather forecasting. Moreover, generalized additive models for location, scale and shape (GAMLSS) provide a framework where each distribution parameter is modeled separately capturing smooth linear or nonlinear effects. However, when variable selection is required and/or there are nonsmooth dependencies or interactions (especially unknown or of high-order), it is challenging to establish a good GAMLSS. A natural alternative in these situations would be the application of regression trees or random forests but, so far, no general distributional framework is available for these. Therefore, a framework for distributional regression trees and forests is proposed that blends regression trees and random forests with classical distributions from the GAMLSS framework as well as their censored or truncated counterparts. To illustrate these novel approaches in practice, they are employed to obtain probabilistic precipitation forecasts at numerous sites in a mountainous region (Tyrol, Austria) based on a large number of numerical weather prediction quantities. It is shown that the novel distributional regression forests automatically select variables and interactions, performing on par or often even better than GAMLSS specified either through prior meteorological knowledge or a computationally more demanding boosting approach.




it

Spatio-temporal short-term wind forecast: A calibrated regime-switching method

Ahmed Aziz Ezzat, Mikyoung Jun, Yu Ding.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1484--1510.

Abstract:
Accurate short-term forecasts are indispensable for the integration of wind energy in power grids. On a wind farm, local wind conditions exhibit sizeable variations at a fine temporal resolution. Existing statistical models may capture the in-sample variations in wind behavior, but are often shortsighted to those occurring in the near future, that is, in the forecast horizon. The calibrated regime-switching method proposed in this paper introduces an action of regime dependent calibration on the predictand (here the wind speed variable), which helps correct the bias resulting from out-of-sample variations in wind behavior. This is achieved by modeling the calibration as a function of two elements: the wind regime at the time of the forecast (and the calibration is therefore regime dependent), and the runlength, which is the time elapsed since the last observed regime change. In addition to regime-switching dynamics, the proposed model also accounts for other features of wind fields: spatio-temporal dependencies, transport effect of wind and nonstationarity. Using one year of turbine-specific wind data, we show that the calibrated regime-switching method can offer a wide margin of improvement over existing forecasting methods in terms of both wind speed and power.




it

Identifying multiple changes for a functional data sequence with application to freeway traffic segmentation

Jeng-Min Chiou, Yu-Ting Chen, Tailen Hsing.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1430--1463.

Abstract:
Motivated by the study of road segmentation partitioned by shifts in traffic conditions along a freeway, we introduce a two-stage procedure, Dynamic Segmentation and Backward Elimination (DSBE), for identifying multiple changes in the mean functions for a sequence of functional data. The Dynamic Segmentation procedure searches for all possible changepoints using the derived global optimality criterion coupled with the local strategy of at-most-one-changepoint by dividing the entire sequence into individual subsequences that are recursively adjusted until convergence. Then, the Backward Elimination procedure verifies these changepoints by iteratively testing the unlikely changes to ensure their significance until no more changepoints can be removed. By combining the local strategy with the global optimal changepoint criterion, the DSBE algorithm is conceptually simple and easy to implement and performs better than the binary segmentation-based approach at detecting small multiple changes. The consistency property of the changepoint estimators and the convergence of the algorithm are proved. We apply DSBE to detect changes in traffic streams through real freeway traffic data. The practical performance of DSBE is also investigated through intensive simulation studies for various scenarios.




it

A hidden Markov model approach to characterizing the photo-switching behavior of fluorophores

Lekha Patel, Nils Gustafsson, Yu Lin, Raimund Ober, Ricardo Henriques, Edward Cohen.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1397--1429.

Abstract:
Fluorescing molecules (fluorophores) that stochastically switch between photon-emitting and dark states underpin some of the most celebrated advancements in super-resolution microscopy. While this stochastic behavior has been heavily exploited, full characterization of the underlying models can potentially drive forward further imaging methodologies. Under the assumption that fluorophores move between fluorescing and dark states as continuous time Markov processes, the goal is to use a sequence of images to select a model and estimate the transition rates. We use a hidden Markov model to relate the observed discrete time signal to the hidden continuous time process. With imaging involving several repeat exposures of the fluorophore, we show the observed signal depends on both the current and past states of the hidden process, producing emission probabilities that depend on the transition rate parameters to be estimated. To tackle this unusual coupling of the transition and emission probabilities, we conceive transmission (transition-emission) matrices that capture all dependencies of the model. We provide a scheme of computing these matrices and adapt the forward-backward algorithm to compute a likelihood which is readily optimized to provide rate estimates. When confronted with several model proposals, combining this procedure with the Bayesian Information Criterion provides accurate model selection.




it

Imputation and post-selection inference in models with missing data: An application to colorectal cancer surveillance guidelines

Lin Liu, Yuqi Qiu, Loki Natarajan, Karen Messer.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1370--1396.

Abstract:
It is common to encounter missing data among the potential predictor variables in the setting of model selection. For example, in a recent study we attempted to improve the US guidelines for risk stratification after screening colonoscopy ( Cancer Causes Control 27 (2016) 1175–1185), with the aim to help reduce both overuse and underuse of follow-on surveillance colonoscopy. The goal was to incorporate selected additional informative variables into a neoplasia risk-prediction model, going beyond the three currently established risk factors, using a large dataset pooled from seven different prospective studies in North America. Unfortunately, not all candidate variables were collected in all studies, so that one or more important potential predictors were missing on over half of the subjects. Thus, while variable selection was a main focus of the study, it was necessary to address the substantial amount of missing data. Multiple imputation can effectively address missing data, and there are also good approaches to incorporate the variable selection process into model-based confidence intervals. However, there is not consensus on appropriate methods of inference which address both issues simultaneously. Our goal here is to study the properties of model-based confidence intervals in the setting of imputation for missing data followed by variable selection. We use both simulation and theory to compare three approaches to such post-imputation-selection inference: a multiple-imputation approach based on Rubin’s Rules for variance estimation ( Comput. Statist. Data Anal. 71 (2014) 758–770); a single imputation-selection followed by bootstrap percentile confidence intervals; and a new bootstrap model-averaging approach presented here, following Efron ( J. Amer. Statist. Assoc. 109 (2014) 991–1007). We investigate relative strengths and weaknesses of each method. The “Rubin’s Rules” multiple imputation estimator can have severe undercoverage, and is not recommended. The imputation-selection estimator with bootstrap percentile confidence intervals works well. The bootstrap-model-averaged estimator, with the “Efron’s Rules” estimated variance, may be preferred if the true effect sizes are moderate. We apply these results to the colorectal neoplasia risk-prediction problem which motivated the present work.




it

Stratonovich type integration with respect to fractional Brownian motion with Hurst parameter less than $1/2$

Jorge A. León.

Source: Bernoulli, Volume 26, Number 3, 2436--2462.

Abstract:
Let $B^{H}$ be a fractional Brownian motion with Hurst parameter $Hin (0,1/2)$ and $p:mathbb{R} ightarrow mathbb{R}$ a polynomial function. The main purpose of this paper is to introduce a Stratonovich type stochastic integral with respect to $B^{H}$, whose domain includes the process $p(B^{H})$. That is, an integral that allows us to integrate $p(B^{H})$ with respect to $B^{H}$, which does not happen with the symmetric integral given by Russo and Vallois ( Probab. Theory Related Fields 97 (1993) 403–421) in general. Towards this end, we combine the approaches utilized by León and Nualart ( Stochastic Process. Appl. 115 (2005) 481–492), and Russo and Vallois ( Probab. Theory Related Fields 97 (1993) 403–421), whose aims are to extend the domain of the divergence operator for Gaussian processes and to define some stochastic integrals, respectively. Then, we study the relation between this Stratonovich integral and the extension of the divergence operator (see León and Nualart ( Stochastic Process. Appl. 115 (2005) 481–492)), an Itô formula and the existence of a unique solution of some Stratonovich stochastic differential equations. These last results have been analyzed by Alòs, León and Nualart ( Taiwanese J. Math. 5 (2001) 609–632), where the Hurst paramert $H$ belongs to the interval $(1/4,1/2)$.




it

Local law and Tracy–Widom limit for sparse stochastic block models

Jong Yun Hwang, Ji Oon Lee, Wooseok Yang.

Source: Bernoulli, Volume 26, Number 3, 2400--2435.

Abstract:
We consider the spectral properties of sparse stochastic block models, where $N$ vertices are partitioned into $K$ balanced communities. Under an assumption that the intra-community probability and inter-community probability are of similar order, we prove a local semicircle law up to the spectral edges, with an explicit formula on the deterministic shift of the spectral edge. We also prove that the fluctuation of the extremal eigenvalues is given by the GOE Tracy–Widom law after rescaling and centering the entries of sparse stochastic block models. Applying the result to sparse stochastic block models, we rigorously prove that there is a large gap between the outliers and the spectral edge without centering.




it

Frequency domain theory for functional time series: Variance decomposition and an invariance principle

Piotr Kokoszka, Neda Mohammadi Jouzdani.

Source: Bernoulli, Volume 26, Number 3, 2383--2399.

Abstract:
This paper is concerned with frequency domain theory for functional time series, which are temporally dependent sequences of functions in a Hilbert space. We consider a variance decomposition, which is more suitable for such a data structure than the variance decomposition based on the Karhunen–Loéve expansion. The decomposition we study uses eigenvalues of spectral density operators, which are functional analogs of the spectral density of a stationary scalar time series. We propose estimators of the variance components and derive convergence rates for their mean square error as well as their asymptotic normality. The latter is derived from a frequency domain invariance principle for the estimators of the spectral density operators. This principle is established for a broad class of linear time series models. It is a main contribution of the paper.




it

Bayesian linear regression for multivariate responses under group sparsity

Bo Ning, Seonghyun Jeong, Subhashis Ghosal.

Source: Bernoulli, Volume 26, Number 3, 2353--2382.

Abstract:
We study frequentist properties of a Bayesian high-dimensional multivariate linear regression model with correlated responses. The predictors are separated into many groups and the group structure is pre-determined. Two features of the model are unique: (i) group sparsity is imposed on the predictors; (ii) the covariance matrix is unknown and its dimensions can also be high. We choose a product of independent spike-and-slab priors on the regression coefficients and a new prior on the covariance matrix based on its eigendecomposition. Each spike-and-slab prior is a mixture of a point mass at zero and a multivariate density involving the $ell_{2,1}$-norm. We first obtain the posterior contraction rate, the bounds on the effective dimension of the model with high posterior probabilities. We then show that the multivariate regression coefficients can be recovered under certain compatibility conditions. Finally, we quantify the uncertainty for the regression coefficients with frequentist validity through a Bernstein–von Mises type theorem. The result leads to selection consistency for the Bayesian method. We derive the posterior contraction rate using the general theory by constructing a suitable test from the first principle using moment bounds for certain likelihood ratios. This leads to posterior concentration around the truth with respect to the average Rényi divergence of order $1/2$. This technique of obtaining the required tests for posterior contraction rate could be useful in many other problems.




it

Weighted Lépingle inequality

Pavel Zorin-Kranich.

Source: Bernoulli, Volume 26, Number 3, 2311--2318.

Abstract:
We prove an estimate for weighted $p$th moments of the pathwise $r$-variation of a martingale in terms of the $A_{p}$ characteristic of the weight. The novelty of the proof is that we avoid real interpolation techniques.




it

On Sobolev tests of uniformity on the circle with an extension to the sphere

Sreenivasa Rao Jammalamadaka, Simos Meintanis, Thomas Verdebout.

Source: Bernoulli, Volume 26, Number 3, 2226--2252.

Abstract:
Circular and spherical data arise in many applications, especially in biology, Earth sciences and astronomy. In dealing with such data, one of the preliminary steps before any further inference, is to test if such data is isotropic, that is, uniformly distributed around the circle or the sphere. In view of its importance, there is a considerable literature on the topic. In the present work, we provide new tests of uniformity on the circle based on original asymptotic results. Our tests are motivated by the shape of locally and asymptotically maximin tests of uniformity against generalized von Mises distributions. We show that they are uniformly consistent. Empirical power comparisons with several competing procedures are presented via simulations. The new tests detect particularly well multimodal alternatives such as mixtures of von Mises distributions. A practically-oriented combination of the new tests with already existing Sobolev tests is proposed. An extension to testing uniformity on the sphere, along with some simulations, is included. The procedures are illustrated on a real dataset.




it

Exponential integrability and exit times of diffusions on sub-Riemannian and metric measure spaces

Anton Thalmaier, James Thompson.

Source: Bernoulli, Volume 26, Number 3, 2202--2225.

Abstract:
In this article, we derive moment estimates, exponential integrability, concentration inequalities and exit times estimates for canonical diffusions firstly on sub-Riemannian limits of Riemannian foliations and secondly in the nonsmooth setting of $operatorname{RCD}^{*}(K,N)$ spaces. In each case, the necessary ingredients are Itô’s formula and a comparison theorem for the Laplacian, for which we refer to the recent literature. As an application, we derive pointwise Carmona-type estimates on eigenfunctions of Schrödinger operators.




it

Scaling limits for super-replication with transient price impact

Peter Bank, Yan Dolinsky.

Source: Bernoulli, Volume 26, Number 3, 2176--2201.

Abstract:
We prove a scaling limit theorem for the super-replication cost of options in a Cox–Ross–Rubinstein binomial model with transient price impact. The correct scaling turns out to keep the market depth parameter constant while resilience over fixed periods of time grows in inverse proportion with the duration between trading times. For vanilla options, the scaling limit is found to coincide with the one obtained by PDE-methods in ( Math. Finance 22 (2012) 250–276) for models with purely temporary price impact. These models are a special case of our framework and so our probabilistic scaling limit argument allows one to expand the scope of the scaling limit result to path-dependent options.




it

Directional differentiability for supremum-type functionals: Statistical applications

Javier Cárcamo, Antonio Cuevas, Luis-Alberto Rodríguez.

Source: Bernoulli, Volume 26, Number 3, 2143--2175.

Abstract:
We show that various functionals related to the supremum of a real function defined on an arbitrary set or a measure space are Hadamard directionally differentiable. We specifically consider the supremum norm, the supremum, the infimum, and the amplitude of a function. The (usually non-linear) derivatives of these maps adopt simple expressions under suitable assumptions on the underlying space. As an application, we improve and extend to the multidimensional case the results in Raghavachari ( Ann. Statist. 1 (1973) 67–73) regarding the limiting distributions of Kolmogorov–Smirnov type statistics under the alternative hypothesis. Similar results are obtained for analogous statistics associated with copulas. We additionally solve an open problem about the Berk–Jones statistic proposed by Jager and Wellner (In A Festschrift for Herman Rubin (2004) 319–331 IMS). Finally, the asymptotic distribution of maximum mean discrepancies over Donsker classes of functions is derived.




it

Noncommutative Lebesgue decomposition and contiguity with applications in quantum statistics

Akio Fujiwara, Koichi Yamagata.

Source: Bernoulli, Volume 26, Number 3, 2105--2142.

Abstract:
We herein develop a theory of contiguity in the quantum domain based upon a novel quantum analogue of the Lebesgue decomposition. The theory thus formulated is pertinent to the weak quantum local asymptotic normality introduced in the previous paper [Yamagata, Fujiwara, and Gill, Ann. Statist. 41 (2013) 2197–2217], yielding substantial enlargement of the scope of quantum statistics.




it

First-order covariance inequalities via Stein’s method

Marie Ernst, Gesine Reinert, Yvik Swan.

Source: Bernoulli, Volume 26, Number 3, 2051--2081.

Abstract:
We propose probabilistic representations for inverse Stein operators (i.e., solutions to Stein equations) under general conditions; in particular, we deduce new simple expressions for the Stein kernel. These representations allow to deduce uniform and nonuniform Stein factors (i.e., bounds on solutions to Stein equations) and lead to new covariance identities expressing the covariance between arbitrary functionals of an arbitrary univariate target in terms of a weighted covariance of the derivatives of the functionals. Our weights are explicit, easily computable in most cases and expressed in terms of objects familiar within the context of Stein’s method. Applications of the Cauchy–Schwarz inequality to these weighted covariance identities lead to sharp upper and lower covariance bounds and, in particular, weighted Poincaré inequalities. Many examples are given and, in particular, classical variance bounds due to Klaassen, Brascamp and Lieb or Otto and Menz are corollaries. Connections with more recent literature are also detailed.




it

On sampling from a log-concave density using kinetic Langevin diffusions

Arnak S. Dalalyan, Lionel Riou-Durand.

Source: Bernoulli, Volume 26, Number 3, 1956--1988.

Abstract:
Langevin diffusion processes and their discretizations are often used for sampling from a target density. The most convenient framework for assessing the quality of such a sampling scheme corresponds to smooth and strongly log-concave densities defined on $mathbb{R}^{p}$. The present work focuses on this framework and studies the behavior of the Monte Carlo algorithm based on discretizations of the kinetic Langevin diffusion. We first prove the geometric mixing property of the kinetic Langevin diffusion with a mixing rate that is optimal in terms of its dependence on the condition number. We then use this result for obtaining improved guarantees of sampling using the kinetic Langevin Monte Carlo method, when the quality of sampling is measured by the Wasserstein distance. We also consider the situation where the Hessian of the log-density of the target distribution is Lipschitz-continuous. In this case, we introduce a new discretization of the kinetic Langevin diffusion and prove that this leads to a substantial improvement of the upper bound on the sampling error measured in Wasserstein distance.




it

Busemann functions and semi-infinite O’Connell–Yor polymers

Tom Alberts, Firas Rassoul-Agha, Mackenzie Simper.

Source: Bernoulli, Volume 26, Number 3, 1927--1955.

Abstract:
We prove that given any fixed asymptotic velocity, the finite length O’Connell–Yor polymer has an infinite length limit satisfying the law of large numbers with this velocity. By a Markovian property of the quenched polymer this reduces to showing the existence of Busemann functions : almost sure limits of ratios of random point-to-point partition functions. The key ingredients are the Burke property of the O’Connell–Yor polymer and a comparison lemma for the ratios of partition functions. We also show the existence of infinite length limits in the Brownian last passage percolation model.




it

On the best constant in the martingale version of Fefferman’s inequality

Adam Osękowski.

Source: Bernoulli, Volume 26, Number 3, 1912--1926.

Abstract:
Let $X=(X_{t})_{tgeq 0}in H^{1}$ and $Y=(Y_{t})_{tgeq 0}in{mathrm{BMO}} $ be arbitrary continuous-path martingales. The paper contains the proof of the inequality egin{equation*}mathbb{E}int _{0}^{infty }iglvert dlangle X,Y angle_{t}igrvert leq sqrt{2}Vert XVert _{H^{1}}Vert YVert _{mathrm{BMO}_{2}},end{equation*} and the constant $sqrt{2}$ is shown to be the best possible. The proof rests on the construction of a certain special function, enjoying appropriate size and concavity conditions.




it

Functional weak limit theorem for a local empirical process of non-stationary time series and its application

Ulrike Mayer, Henryk Zähle, Zhou Zhou.

Source: Bernoulli, Volume 26, Number 3, 1891--1911.

Abstract:
We derive a functional weak limit theorem for a local empirical process of a wide class of piece-wise locally stationary (PLS) time series. The latter result is applied to derive the asymptotics of weighted empirical quantiles and weighted V-statistics of non-stationary time series. The class of admissible underlying time series is illustrated by means of PLS linear processes and PLS ARCH processes.




it

Logarithmic Sobolev inequalities for finite spin systems and applications

Holger Sambale, Arthur Sinulis.

Source: Bernoulli, Volume 26, Number 3, 1863--1890.

Abstract:
We derive sufficient conditions for a probability measure on a finite product space (a spin system ) to satisfy a (modified) logarithmic Sobolev inequality. We establish these conditions for various examples, such as the (vertex-weighted) exponential random graph model, the random coloring and the hard-core model with fugacity. This leads to two separate branches of applications. The first branch is given by mixing time estimates of the Glauber dynamics. The proofs do not rely on coupling arguments, but instead use functional inequalities. As a byproduct, this also yields exponential decay of the relative entropy along the Glauber semigroup. Secondly, we investigate the concentration of measure phenomenon (particularly of higher order) for these spin systems. We show the effect of better concentration properties by centering not around the mean, but around a stochastic term in the exponential random graph model. From there, one can deduce a central limit theorem for the number of triangles from the CLT of the edge count. In the Erdős–Rényi model the first-order approximation leads to a quantification and a proof of a central limit theorem for subgraph counts.




it

Kernel and wavelet density estimators on manifolds and more general metric spaces

Galatia Cleanthous, Athanasios G. Georgiadis, Gerard Kerkyacharian, Pencho Petrushev, Dominique Picard.

Source: Bernoulli, Volume 26, Number 3, 1832--1862.

Abstract:
We consider the problem of estimating the density of observations taking values in classical or nonclassical spaces such as manifolds and more general metric spaces. Our setting is quite general but also sufficiently rich in allowing the development of smooth functional calculus with well localized spectral kernels, Besov regularity spaces, and wavelet type systems. Kernel and both linear and nonlinear wavelet density estimators are introduced and studied. Convergence rates for these estimators are established and discussed.




it

Optimal functional supervised classification with separation condition

Sébastien Gadat, Sébastien Gerchinovitz, Clément Marteau.

Source: Bernoulli, Volume 26, Number 3, 1797--1831.

Abstract:
We consider the binary supervised classification problem with the Gaussian functional model introduced in ( Math. Methods Statist. 22 (2013) 213–225). Taking advantage of the Gaussian structure, we design a natural plug-in classifier and derive a family of upper bounds on its worst-case excess risk over Sobolev spaces. These bounds are parametrized by a separation distance quantifying the difficulty of the problem, and are proved to be optimal (up to logarithmic factors) through matching minimax lower bounds. Using the recent works of (In Advances in Neural Information Processing Systems (2014) 3437–3445 Curran Associates) and ( Ann. Statist. 44 (2016) 982–1009), we also derive a logarithmic lower bound showing that the popular $k$-nearest neighbors classifier is far from optimality in this specific functional setting.




it

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Xin Bing, Florentina Bunea, Marten Wegkamp.

Source: Bernoulli, Volume 26, Number 3, 1765--1796.

Abstract:
Topic models have become popular for the analysis of data that consists in a collection of n independent multinomial observations, with parameters $N_{i}inmathbb{N}$ and $Pi_{i}in[0,1]^{p}$ for $i=1,ldots,n$. The model links all cell probabilities, collected in a $p imes n$ matrix $Pi$, via the assumption that $Pi$ can be factorized as the product of two nonnegative matrices $Ain[0,1]^{p imes K}$ and $Win[0,1]^{K imes n}$. Topic models have been originally developed in text mining, when one browses through $n$ documents, based on a dictionary of $p$ words, and covering $K$ topics. In this terminology, the matrix $A$ is called the word-topic matrix, and is the main target of estimation. It can be viewed as a matrix of conditional probabilities, and it is uniquely defined, under appropriate separability assumptions, discussed in detail in this work. Notably, the unique $A$ is required to satisfy what is commonly known as the anchor word assumption, under which $A$ has an unknown number of rows respectively proportional to the canonical basis vectors in $mathbb{R}^{K}$. The indices of such rows are referred to as anchor words. Recent computationally feasible algorithms, with theoretical guarantees, utilize constructively this assumption by linking the estimation of the set of anchor words with that of estimating the $K$ vertices of a simplex. This crucial step in the estimation of $A$ requires $K$ to be known, and cannot be easily extended to the more realistic set-up when $K$ is unknown. This work takes a different view on anchor word estimation, and on the estimation of $A$. We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates $K$ from the observed data. We derive new finite sample minimax lower bounds for the estimation of $A$, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any $n,N_{i},p$ and $K$, and both $p$ and $K$ are allowed to increase with $n$, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics $K$, while we provide the competing methods with the correct value in our simulations.




it

Local differential privacy: Elbow effect in optimal density estimation and adaptation over Besov ellipsoids

Cristina Butucea, Amandine Dubois, Martin Kroll, Adrien Saumard.

Source: Bernoulli, Volume 26, Number 3, 1727--1764.

Abstract:
We address the problem of non-parametric density estimation under the additional constraint that only privatised data are allowed to be published and available for inference. For this purpose, we adopt a recent generalisation of classical minimax theory to the framework of local $alpha$-differential privacy and provide a lower bound on the rate of convergence over Besov spaces $mathcal{B}^{s}_{pq}$ under mean integrated $mathbb{L}^{r}$-risk. This lower bound is deteriorated compared to the standard setup without privacy, and reveals a twofold elbow effect. In order to fulfill the privacy requirement, we suggest adding suitably scaled Laplace noise to empirical wavelet coefficients. Upper bounds within (at most) a logarithmic factor are derived under the assumption that $alpha$ stays bounded as $n$ increases: A linear but non-adaptive wavelet estimator is shown to attain the lower bound whenever $pgeq r$ but provides a slower rate of convergence otherwise. An adaptive non-linear wavelet estimator with appropriately chosen smoothing parameters and thresholding is shown to attain the lower bound within a logarithmic factor for all cases.




it

On the probability distribution of the local times of diagonally operator-self-similar Gaussian fields with stationary increments

Kamran Kalbasi, Thomas Mountford.

Source: Bernoulli, Volume 26, Number 2, 1504--1534.

Abstract:
In this paper, we study the local times of vector-valued Gaussian fields that are ‘diagonally operator-self-similar’ and whose increments are stationary. Denoting the local time of such a Gaussian field around the spatial origin and over the temporal unit hypercube by $Z$, we show that there exists $lambdain(0,1)$ such that under some quite weak conditions, $lim_{n ightarrow+infty}frac{sqrt[n]{mathbb{E}(Z^{n})}}{n^{lambda}}$ and $lim_{x ightarrow+infty}frac{-logmathbb{P}(Z>x)}{x^{frac{1}{lambda}}}$ both exist and are strictly positive (possibly $+infty$). Moreover, we show that if the underlying Gaussian field is ‘strongly locally nondeterministic’, the above limits will be finite as well. These results are then applied to establish similar statements for the intersection local times of diagonally operator-self-similar Gaussian fields with stationary increments.




it

Limit theorems for long-memory flows on Wiener chaos

Shuyang Bai, Murad S. Taqqu.

Source: Bernoulli, Volume 26, Number 2, 1473--1503.

Abstract:
We consider a long-memory stationary process, defined not through a moving average type structure, but by a flow generated by a measure-preserving transform and by a multiple Wiener–Itô integral. The flow is described using a notion of mixing for infinite-measure spaces introduced by Krickeberg (In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), Vol. II: Contributions to Probability Theory, Part 2 (1967) 431–446 Univ. California Press). Depending on the interplay between the spreading rate of the flow and the order of the multiple integral, one can recover known central or non-central limit theorems, and also obtain joint convergence of multiple integrals of different orders.




it

A characterization of the finiteness of perpetual integrals of Lévy processes

Martin Kolb, Mladen Savov.

Source: Bernoulli, Volume 26, Number 2, 1453--1472.

Abstract:
We derive a criterium for the almost sure finiteness of perpetual integrals of Lévy processes for a class of real functions including all continuous functions and for general one-dimensional Lévy processes that drifts to plus infinity. This generalizes previous work of Döring and Kyprianou, who considered Lévy processes having a local time, leaving the general case as an open problem. It turns out, that the criterium in the general situation simplifies significantly in the situation, where the process has a local time, but we also demonstrate that in general our criterium can not be reduced. This answers an open problem posed in ( J. Theoret. Probab. 29 (2016) 1192–1198).




it

Around the entropic Talagrand inequality

Giovanni Conforti, Luigia Ripani.

Source: Bernoulli, Volume 26, Number 2, 1431--1452.

Abstract:
In this article, we study generalization of the classical Talagrand transport-entropy inequality in which the Wasserstein distance is replaced by the entropic transportation cost. This class of inequalities has been introduced in the recent work ( Probab. Theory Related Fields 174 (2019) 1–47), in connection with the study of Schrödinger bridges. We provide several equivalent characterizations in terms of reverse hypercontractivity for the heat semigroup, contractivity of the Hamilton–Jacobi–Bellman semigroup and dimension-free concentration of measure. Properties such as tensorization and relations to other functional inequalities are also investigated. In particular, we show that the inequalities studied in this article are implied by a Logarithmic Sobolev inequality and imply Talagrand inequality.




it

The moduli of non-differentiability for Gaussian random fields with stationary increments

Wensheng Wang, Zhonggen Su, Yimin Xiao.

Source: Bernoulli, Volume 26, Number 2, 1410--1430.

Abstract:
We establish the exact moduli of non-differentiability of Gaussian random fields with stationary increments. As an application of the result, we prove that the uniform Hölder condition for the maximum local times of Gaussian random fields with stationary increments obtained in Xiao (1997) is optimal. These results are applicable to fractional Riesz–Bessel processes and stationary Gaussian random fields in the Matérn and Cauchy classes.




it

Stratonovich stochastic differential equation with irregular coefficients: Girsanov’s example revisited

Ilya Pavlyukevich, Georgiy Shevchenko.

Source: Bernoulli, Volume 26, Number 2, 1381--1409.

Abstract:
In this paper, we study the Stratonovich stochastic differential equation $mathrm{d}X=|X|^{alpha }circ mathrm{d}B$, $alpha in (-1,1)$, which has been introduced by Cherstvy et al. ( New J. Phys. 15 (2013) 083039) in the context of analysis of anomalous diffusions in heterogeneous media. We determine its weak and strong solutions, which are homogeneous strong Markov processes spending zero time at $0$: for $alpha in (0,1)$, these solutions have the form egin{equation*}X_{t}^{ heta }=((1-alpha)B_{t}^{ heta })^{1/(1-alpha )},end{equation*} where $B^{ heta }$ is the $ heta $-skew Brownian motion driven by $B$ and starting at $frac{1}{1-alpha }(X_{0})^{1-alpha }$, $ heta in [-1,1]$, and $(x)^{gamma }=|x|^{gamma }operatorname{sign}x$; for $alpha in (-1,0]$, only the case $ heta =0$ is possible. The central part of the paper consists in the proof of the existence of a quadratic covariation $[f(B^{ heta }),B]$ for a locally square integrable function $f$ and is based on the time-reversion technique for Markovian diffusions.




it

On stability of traveling wave solutions for integro-differential equations related to branching Markov processes

Pasha Tkachov.

Source: Bernoulli, Volume 26, Number 2, 1354--1380.

Abstract:
The aim of this paper is to prove stability of traveling waves for integro-differential equations connected with branching Markov processes. In other words, the limiting law of the left-most particle of a (time-continuous) branching Markov process with a Lévy non-branching part is demonstrated. The key idea is to approximate the branching Markov process by a branching random walk and apply the result of Aïdékon [ Ann. Probab. 41 (2013) 1362–1426] on the limiting law of the latter one.




it

Consistent structure estimation of exponential-family random graph models with block structure

Michael Schweinberger.

Source: Bernoulli, Volume 26, Number 2, 1205--1233.

Abstract:
We consider the challenging problem of statistical inference for exponential-family random graph models based on a single observation of a random graph with complex dependence. To facilitate statistical inference, we consider random graphs with additional structure in the form of block structure. We have shown elsewhere that when the block structure is known, it facilitates consistency results for $M$-estimators of canonical and curved exponential-family random graph models with complex dependence, such as transitivity. In practice, the block structure is known in some applications (e.g., multilevel networks), but is unknown in others. When the block structure is unknown, the first and foremost question is whether it can be recovered with high probability based on a single observation of a random graph with complex dependence. The main consistency results of the paper show that it is possible to do so under weak dependence and smoothness conditions. These results confirm that exponential-family random graph models with block structure constitute a promising direction of statistical network analysis.




it

Characterization of probability distribution convergence in Wasserstein distance by $L^{p}$-quantization error function

Yating Liu, Gilles Pagès.

Source: Bernoulli, Volume 26, Number 2, 1171--1204.

Abstract:
We establish conditions to characterize probability measures by their $L^{p}$-quantization error functions in both $mathbb{R}^{d}$ and Hilbert settings. This characterization is two-fold: static (identity of two distributions) and dynamic (convergence for the $L^{p}$-Wasserstein distance). We first propose a criterion on the quantization level $N$, valid for any norm on $mathbb{R}^{d}$ and any order $p$ based on a geometrical approach involving the Voronoï diagram. Then, we prove that in the $L^{2}$-case on a (separable) Hilbert space, the condition on the level $N$ can be reduced to $N=2$, which is optimal. More quantization based characterization cases in dimension 1 and a discussion of the completeness of a distance defined by the quantization error function can be found at the end of this paper.




it

A Bayesian nonparametric approach to log-concave density estimation

Ester Mariucci, Kolyan Ray, Botond Szabó.

Source: Bernoulli, Volume 26, Number 2, 1070--1097.

Abstract:
The estimation of a log-concave density on $mathbb{R}$ is a canonical problem in the area of shape-constrained nonparametric inference. We present a Bayesian nonparametric approach to this problem based on an exponentiated Dirichlet process mixture prior and show that the posterior distribution converges to the log-concave truth at the (near-) minimax rate in Hellinger distance. Our proof proceeds by establishing a general contraction result based on the log-concave maximum likelihood estimator that prevents the need for further metric entropy calculations. We further present computationally more feasible approximations and both an empirical and hierarchical Bayes approach. All priors are illustrated numerically via simulations.




it

Degeneracy in sparse ERGMs with functions of degrees as sufficient statistics

Sumit Mukherjee.

Source: Bernoulli, Volume 26, Number 2, 1016--1043.

Abstract:
A sufficient criterion for “non-degeneracy” is given for Exponential Random Graph Models on sparse graphs with sufficient statistics which are functions of the degree sequence. This criterion explains why statistics such as alternating $k$-star are non-degenerate, whereas subgraph counts are degenerate. It is further shown that this criterion is “almost” tight. Existence of consistent estimates is then proved for non-degenerate Exponential Random Graph Models.




it

Stable processes conditioned to hit an interval continuously from the outside

Leif Döring, Philip Weissmann.

Source: Bernoulli, Volume 26, Number 2, 980--1015.

Abstract:
Conditioning stable Lévy processes on zero probability events recently became a tractable subject since several explicit formulas emerged from a deep analysis using the Lamperti transformations for self-similar Markov processes. In this article, we derive new harmonic functions and use them to explain how to condition stable processes to hit continuously a compact interval from the outside.




it

Recurrence of multidimensional persistent random walks. Fourier and series criteria

Peggy Cénac, Basile de Loynes, Yoann Offret, Arnaud Rousselle.

Source: Bernoulli, Volume 26, Number 2, 858--892.

Abstract:
The recurrence and transience of persistent random walks built from variable length Markov chains are investigated. It turns out that these stochastic processes can be seen as Lévy walks for which the persistence times depend on some internal Markov chain: they admit Markov random walk skeletons. A recurrence versus transience dichotomy is highlighted. Assuming the positive recurrence of the driving chain, a sufficient Fourier criterion for the recurrence, close to the usual Chung–Fuchs one, is given and a series criterion is derived. The key tool is the Nagaev–Guivarc’h method. Finally, we focus on particular two-dimensional persistent random walks, including directionally reinforced random walks, for which necessary and sufficient Fourier and series criteria are obtained. Inspired by ( Adv. Math. 208 (2007) 680–698), we produce a genuine counterexample to the conjecture of ( Adv. Math. 117 (1996) 239–252). As for the one-dimensional case studied in ( J. Theoret. Probab. 31 (2018) 232–243), it is easier for a persistent random walk than its skeleton to be recurrent. However, such examples are much more difficult to exhibit in the higher dimensional context. These results are based on a surprisingly novel – to our knowledge – upper bound for the Lévy concentration function associated with symmetric distributions.