en

Modeling seasonality and serial dependence of electricity price curves with warping functional autoregressive dynamics

Ying Chen, J. S. Marron, Jiejie Zhang.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1590--1616.

Abstract:
Electricity prices are high dimensional, serially dependent and have seasonal variations. We propose a Warping Functional AutoRegressive (WFAR) model that simultaneously accounts for the cross time-dependence and seasonal variations of the large dimensional data. In particular, electricity price curves are obtained by smoothing over the $24$ discrete hourly prices on each day. In the functional domain, seasonal phase variations are separated from level amplitude changes in a warping process with the Fisher–Rao distance metric, and the aligned (season-adjusted) electricity price curves are modeled in the functional autoregression framework. In a real application, the WFAR model provides superior out-of-sample forecast accuracy in both a normal functioning market, Nord Pool, and an extreme situation, the California market. The forecast performance as well as the relative accuracy improvement are stable for different markets and different time periods.




en

Identifying multiple changes for a functional data sequence with application to freeway traffic segmentation

Jeng-Min Chiou, Yu-Ting Chen, Tailen Hsing.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1430--1463.

Abstract:
Motivated by the study of road segmentation partitioned by shifts in traffic conditions along a freeway, we introduce a two-stage procedure, Dynamic Segmentation and Backward Elimination (DSBE), for identifying multiple changes in the mean functions for a sequence of functional data. The Dynamic Segmentation procedure searches for all possible changepoints using the derived global optimality criterion coupled with the local strategy of at-most-one-changepoint by dividing the entire sequence into individual subsequences that are recursively adjusted until convergence. Then, the Backward Elimination procedure verifies these changepoints by iteratively testing the unlikely changes to ensure their significance until no more changepoints can be removed. By combining the local strategy with the global optimal changepoint criterion, the DSBE algorithm is conceptually simple and easy to implement and performs better than the binary segmentation-based approach at detecting small multiple changes. The consistency property of the changepoint estimators and the convergence of the algorithm are proved. We apply DSBE to detect changes in traffic streams through real freeway traffic data. The practical performance of DSBE is also investigated through intensive simulation studies for various scenarios.




en

A hidden Markov model approach to characterizing the photo-switching behavior of fluorophores

Lekha Patel, Nils Gustafsson, Yu Lin, Raimund Ober, Ricardo Henriques, Edward Cohen.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1397--1429.

Abstract:
Fluorescing molecules (fluorophores) that stochastically switch between photon-emitting and dark states underpin some of the most celebrated advancements in super-resolution microscopy. While this stochastic behavior has been heavily exploited, full characterization of the underlying models can potentially drive forward further imaging methodologies. Under the assumption that fluorophores move between fluorescing and dark states as continuous time Markov processes, the goal is to use a sequence of images to select a model and estimate the transition rates. We use a hidden Markov model to relate the observed discrete time signal to the hidden continuous time process. With imaging involving several repeat exposures of the fluorophore, we show the observed signal depends on both the current and past states of the hidden process, producing emission probabilities that depend on the transition rate parameters to be estimated. To tackle this unusual coupling of the transition and emission probabilities, we conceive transmission (transition-emission) matrices that capture all dependencies of the model. We provide a scheme of computing these matrices and adapt the forward-backward algorithm to compute a likelihood which is readily optimized to provide rate estimates. When confronted with several model proposals, combining this procedure with the Bayesian Information Criterion provides accurate model selection.




en

Imputation and post-selection inference in models with missing data: An application to colorectal cancer surveillance guidelines

Lin Liu, Yuqi Qiu, Loki Natarajan, Karen Messer.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1370--1396.

Abstract:
It is common to encounter missing data among the potential predictor variables in the setting of model selection. For example, in a recent study we attempted to improve the US guidelines for risk stratification after screening colonoscopy ( Cancer Causes Control 27 (2016) 1175–1185), with the aim to help reduce both overuse and underuse of follow-on surveillance colonoscopy. The goal was to incorporate selected additional informative variables into a neoplasia risk-prediction model, going beyond the three currently established risk factors, using a large dataset pooled from seven different prospective studies in North America. Unfortunately, not all candidate variables were collected in all studies, so that one or more important potential predictors were missing on over half of the subjects. Thus, while variable selection was a main focus of the study, it was necessary to address the substantial amount of missing data. Multiple imputation can effectively address missing data, and there are also good approaches to incorporate the variable selection process into model-based confidence intervals. However, there is not consensus on appropriate methods of inference which address both issues simultaneously. Our goal here is to study the properties of model-based confidence intervals in the setting of imputation for missing data followed by variable selection. We use both simulation and theory to compare three approaches to such post-imputation-selection inference: a multiple-imputation approach based on Rubin’s Rules for variance estimation ( Comput. Statist. Data Anal. 71 (2014) 758–770); a single imputation-selection followed by bootstrap percentile confidence intervals; and a new bootstrap model-averaging approach presented here, following Efron ( J. Amer. Statist. Assoc. 109 (2014) 991–1007). We investigate relative strengths and weaknesses of each method. The “Rubin’s Rules” multiple imputation estimator can have severe undercoverage, and is not recommended. The imputation-selection estimator with bootstrap percentile confidence intervals works well. The bootstrap-model-averaged estimator, with the “Efron’s Rules” estimated variance, may be preferred if the true effect sizes are moderate. We apply these results to the colorectal neoplasia risk-prediction problem which motivated the present work.




en

Frequency domain theory for functional time series: Variance decomposition and an invariance principle

Piotr Kokoszka, Neda Mohammadi Jouzdani.

Source: Bernoulli, Volume 26, Number 3, 2383--2399.

Abstract:
This paper is concerned with frequency domain theory for functional time series, which are temporally dependent sequences of functions in a Hilbert space. We consider a variance decomposition, which is more suitable for such a data structure than the variance decomposition based on the Karhunen–Loéve expansion. The decomposition we study uses eigenvalues of spectral density operators, which are functional analogs of the spectral density of a stationary scalar time series. We propose estimators of the variance components and derive convergence rates for their mean square error as well as their asymptotic normality. The latter is derived from a frequency domain invariance principle for the estimators of the spectral density operators. This principle is established for a broad class of linear time series models. It is a main contribution of the paper.




en

Convergence of persistence diagrams for topological crackle

Takashi Owada, Omer Bobrowski.

Source: Bernoulli, Volume 26, Number 3, 2275--2310.

Abstract:
In this paper, we study the persistent homology associated with topological crackle generated by distributions with an unbounded support. Persistent homology is a topological and algebraic structure that tracks the creation and destruction of topological cycles (generalizations of loops or holes) in different dimensions. Topological crackle is a term that refers to topological cycles generated by random points far away from the bulk of other points, when the support is unbounded. We establish weak convergence results for persistence diagrams – a point process representation for persistent homology, where each topological cycle is represented by its $({mathit{birth},mathit{death}})$ coordinates. In this work, we treat persistence diagrams as random closed sets, so that the resulting weak convergence is defined in terms of the Fell topology. Using this framework, we show that the limiting persistence diagrams can be divided into two parts. The first part is a deterministic limit containing a densely-growing number of persistence pairs with a shorter lifespan. The second part is a two-dimensional Poisson process, representing persistence pairs with a longer lifespan.




en

Concentration of the spectral norm of Erdős–Rényi random graphs

Gábor Lugosi, Shahar Mendelson, Nikita Zhivotovskiy.

Source: Bernoulli, Volume 26, Number 3, 2253--2274.

Abstract:
We present results on the concentration properties of the spectral norm $|A_{p}|$ of the adjacency matrix $A_{p}$ of an Erdős–Rényi random graph $G(n,p)$. First, we consider the Erdős–Rényi random graph process and prove that $|A_{p}|$ is uniformly concentrated over the range $pin[Clog n/n,1]$. The analysis is based on delocalization arguments, uniform laws of large numbers, together with the entropy method to prove concentration inequalities. As an application of our techniques, we prove sharp sub-Gaussian moment inequalities for $|A_{p}|$ for all $pin[clog^{3}n/n,1]$ that improve the general bounds of Alon, Krivelevich, and Vu ( Israel J. Math. 131 (2002) 259–267) and some of the more recent results of Erdős et al. ( Ann. Probab. 41 (2013) 2279–2375). Both results are consistent with the asymptotic result of Füredi and Komlós ( Combinatorica 1 (1981) 233–241) that holds for fixed $p$ as $n oinfty$.




en

On Sobolev tests of uniformity on the circle with an extension to the sphere

Sreenivasa Rao Jammalamadaka, Simos Meintanis, Thomas Verdebout.

Source: Bernoulli, Volume 26, Number 3, 2226--2252.

Abstract:
Circular and spherical data arise in many applications, especially in biology, Earth sciences and astronomy. In dealing with such data, one of the preliminary steps before any further inference, is to test if such data is isotropic, that is, uniformly distributed around the circle or the sphere. In view of its importance, there is a considerable literature on the topic. In the present work, we provide new tests of uniformity on the circle based on original asymptotic results. Our tests are motivated by the shape of locally and asymptotically maximin tests of uniformity against generalized von Mises distributions. We show that they are uniformly consistent. Empirical power comparisons with several competing procedures are presented via simulations. The new tests detect particularly well multimodal alternatives such as mixtures of von Mises distributions. A practically-oriented combination of the new tests with already existing Sobolev tests is proposed. An extension to testing uniformity on the sphere, along with some simulations, is included. The procedures are illustrated on a real dataset.




en

Exponential integrability and exit times of diffusions on sub-Riemannian and metric measure spaces

Anton Thalmaier, James Thompson.

Source: Bernoulli, Volume 26, Number 3, 2202--2225.

Abstract:
In this article, we derive moment estimates, exponential integrability, concentration inequalities and exit times estimates for canonical diffusions firstly on sub-Riemannian limits of Riemannian foliations and secondly in the nonsmooth setting of $operatorname{RCD}^{*}(K,N)$ spaces. In each case, the necessary ingredients are Itô’s formula and a comparison theorem for the Laplacian, for which we refer to the recent literature. As an application, we derive pointwise Carmona-type estimates on eigenfunctions of Schrödinger operators.




en

Scaling limits for super-replication with transient price impact

Peter Bank, Yan Dolinsky.

Source: Bernoulli, Volume 26, Number 3, 2176--2201.

Abstract:
We prove a scaling limit theorem for the super-replication cost of options in a Cox–Ross–Rubinstein binomial model with transient price impact. The correct scaling turns out to keep the market depth parameter constant while resilience over fixed periods of time grows in inverse proportion with the duration between trading times. For vanilla options, the scaling limit is found to coincide with the one obtained by PDE-methods in ( Math. Finance 22 (2012) 250–276) for models with purely temporary price impact. These models are a special case of our framework and so our probabilistic scaling limit argument allows one to expand the scope of the scaling limit result to path-dependent options.




en

Directional differentiability for supremum-type functionals: Statistical applications

Javier Cárcamo, Antonio Cuevas, Luis-Alberto Rodríguez.

Source: Bernoulli, Volume 26, Number 3, 2143--2175.

Abstract:
We show that various functionals related to the supremum of a real function defined on an arbitrary set or a measure space are Hadamard directionally differentiable. We specifically consider the supremum norm, the supremum, the infimum, and the amplitude of a function. The (usually non-linear) derivatives of these maps adopt simple expressions under suitable assumptions on the underlying space. As an application, we improve and extend to the multidimensional case the results in Raghavachari ( Ann. Statist. 1 (1973) 67–73) regarding the limiting distributions of Kolmogorov–Smirnov type statistics under the alternative hypothesis. Similar results are obtained for analogous statistics associated with copulas. We additionally solve an open problem about the Berk–Jones statistic proposed by Jager and Wellner (In A Festschrift for Herman Rubin (2004) 319–331 IMS). Finally, the asymptotic distribution of maximum mean discrepancies over Donsker classes of functions is derived.




en

Matching strings in encoded sequences

Adriana Coutinho, Rodrigo Lambert, Jérôme Rousseau.

Source: Bernoulli, Volume 26, Number 3, 2021--2050.

Abstract:
We investigate the length of the longest common substring for encoded sequences and its asymptotic behaviour. The main result is a strong law of large numbers for a re-scaled version of this quantity, which presents an explicit relation with the Rényi entropy of the source. We apply this result to the zero-inflated contamination model and the stochastic scrabble. In the case of dynamical systems, this problem is equivalent to the shortest distance between two observed orbits and its limiting relationship with the correlation dimension of the pushforward measure. An extension to the shortest distance between orbits for random dynamical systems is also provided.




en

On sampling from a log-concave density using kinetic Langevin diffusions

Arnak S. Dalalyan, Lionel Riou-Durand.

Source: Bernoulli, Volume 26, Number 3, 1956--1988.

Abstract:
Langevin diffusion processes and their discretizations are often used for sampling from a target density. The most convenient framework for assessing the quality of such a sampling scheme corresponds to smooth and strongly log-concave densities defined on $mathbb{R}^{p}$. The present work focuses on this framework and studies the behavior of the Monte Carlo algorithm based on discretizations of the kinetic Langevin diffusion. We first prove the geometric mixing property of the kinetic Langevin diffusion with a mixing rate that is optimal in terms of its dependence on the condition number. We then use this result for obtaining improved guarantees of sampling using the kinetic Langevin Monte Carlo method, when the quality of sampling is measured by the Wasserstein distance. We also consider the situation where the Hessian of the log-density of the target distribution is Lipschitz-continuous. In this case, we introduce a new discretization of the kinetic Langevin diffusion and prove that this leads to a substantial improvement of the upper bound on the sampling error measured in Wasserstein distance.




en

Kernel and wavelet density estimators on manifolds and more general metric spaces

Galatia Cleanthous, Athanasios G. Georgiadis, Gerard Kerkyacharian, Pencho Petrushev, Dominique Picard.

Source: Bernoulli, Volume 26, Number 3, 1832--1862.

Abstract:
We consider the problem of estimating the density of observations taking values in classical or nonclassical spaces such as manifolds and more general metric spaces. Our setting is quite general but also sufficiently rich in allowing the development of smooth functional calculus with well localized spectral kernels, Besov regularity spaces, and wavelet type systems. Kernel and both linear and nonlinear wavelet density estimators are introduced and studied. Convergence rates for these estimators are established and discussed.




en

Local differential privacy: Elbow effect in optimal density estimation and adaptation over Besov ellipsoids

Cristina Butucea, Amandine Dubois, Martin Kroll, Adrien Saumard.

Source: Bernoulli, Volume 26, Number 3, 1727--1764.

Abstract:
We address the problem of non-parametric density estimation under the additional constraint that only privatised data are allowed to be published and available for inference. For this purpose, we adopt a recent generalisation of classical minimax theory to the framework of local $alpha$-differential privacy and provide a lower bound on the rate of convergence over Besov spaces $mathcal{B}^{s}_{pq}$ under mean integrated $mathbb{L}^{r}$-risk. This lower bound is deteriorated compared to the standard setup without privacy, and reveals a twofold elbow effect. In order to fulfill the privacy requirement, we suggest adding suitably scaled Laplace noise to empirical wavelet coefficients. Upper bounds within (at most) a logarithmic factor are derived under the assumption that $alpha$ stays bounded as $n$ increases: A linear but non-adaptive wavelet estimator is shown to attain the lower bound whenever $pgeq r$ but provides a slower rate of convergence otherwise. An adaptive non-linear wavelet estimator with appropriately chosen smoothing parameters and thresholding is shown to attain the lower bound within a logarithmic factor for all cases.




en

On the eigenproblem for Gaussian bridges

Pavel Chigansky, Marina Kleptsyna, Dmytro Marushkevych.

Source: Bernoulli, Volume 26, Number 3, 1706--1726.

Abstract:
Spectral decomposition of the covariance operator is one of the main building blocks in the theory and applications of Gaussian processes. Unfortunately, it is notoriously hard to derive in a closed form. In this paper, we consider the eigenproblem for Gaussian bridges. Given a base process, its bridge is obtained by conditioning the trajectories to start and terminate at the given points. What can be said about the spectrum of a bridge, given the spectrum of its base process? We show how this question can be answered asymptotically for a family of processes, including the fractional Brownian motion.




en

Influence of the seed in affine preferential attachment trees

David Corlin Marchand, Ioan Manolescu.

Source: Bernoulli, Volume 26, Number 3, 1665--1705.

Abstract:
We study randomly growing trees governed by the affine preferential attachment rule. Starting with a seed tree $S$, vertices are attached one by one, each linked by an edge to a random vertex of the current tree, chosen with a probability proportional to an affine function of its degree. This yields a one-parameter family of preferential attachment trees $(T_{n}^{S})_{ngeq |S|}$, of which the linear model is a particular case. Depending on the choice of the parameter, the power-laws governing the degrees in $T_{n}^{S}$ have different exponents. We study the problem of the asymptotic influence of the seed $S$ on the law of $T_{n}^{S}$. We show that, for any two distinct seeds $S$ and $S'$, the laws of $T_{n}^{S}$ and $T_{n}^{S'}$ remain at uniformly positive total-variation distance as $n$ increases. This is a continuation of Curien et al. ( J. Éc. Polytech. Math. 2 (2015) 1–34), which in turn was inspired by a conjecture of Bubeck et al. ( IEEE Trans. Netw. Sci. Eng. 2 (2015) 30–39). The technique developed here is more robust than previous ones and is likely to help in the study of more general attachment mechanisms.




en

Estimating the number of connected components in a graph via subgraph sampling

Jason M. Klusowski, Yihong Wu.

Source: Bernoulli, Volume 26, Number 3, 1635--1664.

Abstract:
Learning properties of large graphs from samples has been an important problem in statistical network analysis since the early work of Goodman ( Ann. Math. Stat. 20 (1949) 572–579) and Frank ( Scand. J. Stat. 5 (1978) 177–188). We revisit a problem formulated by Frank ( Scand. J. Stat. 5 (1978) 177–188) of estimating the number of connected components in a large graph based on the subgraph sampling model, in which we randomly sample a subset of the vertices and observe the induced subgraph. The key question is whether accurate estimation is achievable in the sublinear regime where only a vanishing fraction of the vertices are sampled. We show that it is impossible if the parent graph is allowed to contain high-degree vertices or long induced cycles. For the class of chordal graphs, where induced cycles of length four or above are forbidden, we characterize the optimal sample complexity within constant factors and construct linear-time estimators that provably achieve these bounds. This significantly expands the scope of previous results which have focused on unbiased estimators and special classes of graphs such as forests or cliques. Both the construction and the analysis of the proposed methodology rely on combinatorial properties of chordal graphs and identities of induced subgraph counts. They, in turn, also play a key role in proving minimax lower bounds based on construction of random instances of graphs with matching structures of small subgraphs.




en

Sojourn time dimensions of fractional Brownian motion

Ivan Nourdin, Giovanni Peccati, Stéphane Seuret.

Source: Bernoulli, Volume 26, Number 3, 1619--1634.

Abstract:
We describe the size of the sets of sojourn times $E_{gamma }={tgeq 0:|B_{t}|leq t^{gamma }}$ associated with a fractional Brownian motion $B$ in terms of various large scale dimensions.




en

Efficient estimation in single index models through smoothing splines

Arun K. Kuchibhotla, Rohit K. Patra.

Source: Bernoulli, Volume 26, Number 2, 1587--1618.

Abstract:
We consider estimation and inference in a single index regression model with an unknown but smooth link function. In contrast to the standard approach of using kernels or regression splines, we use smoothing splines to estimate the smooth link function. We develop a method to compute the penalized least squares estimators (PLSEs) of the parametric and the nonparametric components given independent and identically distributed (i.i.d.) data. We prove the consistency and find the rates of convergence of the estimators. We establish asymptotic normality under mild assumption and prove asymptotic efficiency of the parametric component under homoscedastic errors. A finite sample simulation corroborates our asymptotic theory. We also analyze a car mileage data set and a Ozone concentration data set. The identifiability and existence of the PLSEs are also investigated.




en

On the probability distribution of the local times of diagonally operator-self-similar Gaussian fields with stationary increments

Kamran Kalbasi, Thomas Mountford.

Source: Bernoulli, Volume 26, Number 2, 1504--1534.

Abstract:
In this paper, we study the local times of vector-valued Gaussian fields that are ‘diagonally operator-self-similar’ and whose increments are stationary. Denoting the local time of such a Gaussian field around the spatial origin and over the temporal unit hypercube by $Z$, we show that there exists $lambdain(0,1)$ such that under some quite weak conditions, $lim_{n ightarrow+infty}frac{sqrt[n]{mathbb{E}(Z^{n})}}{n^{lambda}}$ and $lim_{x ightarrow+infty}frac{-logmathbb{P}(Z>x)}{x^{frac{1}{lambda}}}$ both exist and are strictly positive (possibly $+infty$). Moreover, we show that if the underlying Gaussian field is ‘strongly locally nondeterministic’, the above limits will be finite as well. These results are then applied to establish similar statements for the intersection local times of diagonally operator-self-similar Gaussian fields with stationary increments.




en

Limit theorems for long-memory flows on Wiener chaos

Shuyang Bai, Murad S. Taqqu.

Source: Bernoulli, Volume 26, Number 2, 1473--1503.

Abstract:
We consider a long-memory stationary process, defined not through a moving average type structure, but by a flow generated by a measure-preserving transform and by a multiple Wiener–Itô integral. The flow is described using a notion of mixing for infinite-measure spaces introduced by Krickeberg (In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), Vol. II: Contributions to Probability Theory, Part 2 (1967) 431–446 Univ. California Press). Depending on the interplay between the spreading rate of the flow and the order of the multiple integral, one can recover known central or non-central limit theorems, and also obtain joint convergence of multiple integrals of different orders.




en

A characterization of the finiteness of perpetual integrals of Lévy processes

Martin Kolb, Mladen Savov.

Source: Bernoulli, Volume 26, Number 2, 1453--1472.

Abstract:
We derive a criterium for the almost sure finiteness of perpetual integrals of Lévy processes for a class of real functions including all continuous functions and for general one-dimensional Lévy processes that drifts to plus infinity. This generalizes previous work of Döring and Kyprianou, who considered Lévy processes having a local time, leaving the general case as an open problem. It turns out, that the criterium in the general situation simplifies significantly in the situation, where the process has a local time, but we also demonstrate that in general our criterium can not be reduced. This answers an open problem posed in ( J. Theoret. Probab. 29 (2016) 1192–1198).




en

Around the entropic Talagrand inequality

Giovanni Conforti, Luigia Ripani.

Source: Bernoulli, Volume 26, Number 2, 1431--1452.

Abstract:
In this article, we study generalization of the classical Talagrand transport-entropy inequality in which the Wasserstein distance is replaced by the entropic transportation cost. This class of inequalities has been introduced in the recent work ( Probab. Theory Related Fields 174 (2019) 1–47), in connection with the study of Schrödinger bridges. We provide several equivalent characterizations in terms of reverse hypercontractivity for the heat semigroup, contractivity of the Hamilton–Jacobi–Bellman semigroup and dimension-free concentration of measure. Properties such as tensorization and relations to other functional inequalities are also investigated. In particular, we show that the inequalities studied in this article are implied by a Logarithmic Sobolev inequality and imply Talagrand inequality.




en

The moduli of non-differentiability for Gaussian random fields with stationary increments

Wensheng Wang, Zhonggen Su, Yimin Xiao.

Source: Bernoulli, Volume 26, Number 2, 1410--1430.

Abstract:
We establish the exact moduli of non-differentiability of Gaussian random fields with stationary increments. As an application of the result, we prove that the uniform Hölder condition for the maximum local times of Gaussian random fields with stationary increments obtained in Xiao (1997) is optimal. These results are applicable to fractional Riesz–Bessel processes and stationary Gaussian random fields in the Matérn and Cauchy classes.




en

Stratonovich stochastic differential equation with irregular coefficients: Girsanov’s example revisited

Ilya Pavlyukevich, Georgiy Shevchenko.

Source: Bernoulli, Volume 26, Number 2, 1381--1409.

Abstract:
In this paper, we study the Stratonovich stochastic differential equation $mathrm{d}X=|X|^{alpha }circ mathrm{d}B$, $alpha in (-1,1)$, which has been introduced by Cherstvy et al. ( New J. Phys. 15 (2013) 083039) in the context of analysis of anomalous diffusions in heterogeneous media. We determine its weak and strong solutions, which are homogeneous strong Markov processes spending zero time at $0$: for $alpha in (0,1)$, these solutions have the form egin{equation*}X_{t}^{ heta }=((1-alpha)B_{t}^{ heta })^{1/(1-alpha )},end{equation*} where $B^{ heta }$ is the $ heta $-skew Brownian motion driven by $B$ and starting at $frac{1}{1-alpha }(X_{0})^{1-alpha }$, $ heta in [-1,1]$, and $(x)^{gamma }=|x|^{gamma }operatorname{sign}x$; for $alpha in (-1,0]$, only the case $ heta =0$ is possible. The central part of the paper consists in the proof of the existence of a quadratic covariation $[f(B^{ heta }),B]$ for a locally square integrable function $f$ and is based on the time-reversion technique for Markovian diffusions.




en

On stability of traveling wave solutions for integro-differential equations related to branching Markov processes

Pasha Tkachov.

Source: Bernoulli, Volume 26, Number 2, 1354--1380.

Abstract:
The aim of this paper is to prove stability of traveling waves for integro-differential equations connected with branching Markov processes. In other words, the limiting law of the left-most particle of a (time-continuous) branching Markov process with a Lévy non-branching part is demonstrated. The key idea is to approximate the branching Markov process by a branching random walk and apply the result of Aïdékon [ Ann. Probab. 41 (2013) 1362–1426] on the limiting law of the latter one.




en

A new McKean–Vlasov stochastic interpretation of the parabolic–parabolic Keller–Segel model: The one-dimensional case

Denis Talay, Milica Tomašević.

Source: Bernoulli, Volume 26, Number 2, 1323--1353.

Abstract:
In this paper, we analyze a stochastic interpretation of the one-dimensional parabolic–parabolic Keller–Segel system without cut-off. It involves an original type of McKean–Vlasov interaction kernel. At the particle level, each particle interacts with all the past of each other particle by means of a time integrated functional involving a singular kernel. At the mean-field level studied here, the McKean–Vlasov limit process interacts with all the past time marginals of its probability distribution in a similarly singular way. We prove that the parabolic–parabolic Keller–Segel system in the whole Euclidean space and the corresponding McKean–Vlasov stochastic differential equation are well-posed for any values of the parameters of the model.




en

Rates of convergence in de Finetti’s representation theorem, and Hausdorff moment problem

Emanuele Dolera, Stefano Favaro.

Source: Bernoulli, Volume 26, Number 2, 1294--1322.

Abstract:
Given a sequence ${X_{n}}_{ngeq 1}$ of exchangeable Bernoulli random variables, the celebrated de Finetti representation theorem states that $frac{1}{n}sum_{i=1}^{n}X_{i}stackrel{a.s.}{longrightarrow }Y$ for a suitable random variable $Y:Omega ightarrow [0,1]$ satisfying $mathsf{P}[X_{1}=x_{1},dots ,X_{n}=x_{n}|Y]=Y^{sum_{i=1}^{n}x_{i}}(1-Y)^{n-sum_{i=1}^{n}x_{i}}$. In this paper, we study the rate of convergence in law of $frac{1}{n}sum_{i=1}^{n}X_{i}$ to $Y$ under the Kolmogorov distance. After showing that a rate of the type of $1/n^{alpha }$ can be obtained for any index $alpha in (0,1]$, we find a sufficient condition on the distribution of $Y$ for the achievement of the optimal rate of convergence, that is $1/n$. Besides extending and strengthening recent results under the weaker Wasserstein distance, our main result weakens the regularity hypotheses on $Y$ in the context of the Hausdorff moment problem.




en

Strictly weak consensus in the uniform compass model on $mathbb{Z}$

Nina Gantert, Markus Heydenreich, Timo Hirscher.

Source: Bernoulli, Volume 26, Number 2, 1269--1293.

Abstract:
We investigate a model for opinion dynamics, where individuals (modeled by vertices of a graph) hold certain abstract opinions. As time progresses, neighboring individuals interact with each other, and this interaction results in a realignment of opinions closer towards each other. This mechanism triggers formation of consensus among the individuals. Our main focus is on strong consensus (i.e., global agreement of all individuals) versus weak consensus (i.e., local agreement among neighbors). By extending a known model to a more general opinion space, which lacks a “central” opinion acting as a contraction point, we provide an example of an opinion formation process on the one-dimensional lattice $mathbb{Z}$ with weak consensus but no strong consensus.




en

Dynamic linear discriminant analysis in high dimensional space

Binyan Jiang, Ziqi Chen, Chenlei Leng.

Source: Bernoulli, Volume 26, Number 2, 1234--1268.

Abstract:
High-dimensional data that evolve dynamically feature predominantly in the modern data era. As a partial response to this, recent years have seen increasing emphasis to address the dimensionality challenge. However, the non-static nature of these datasets is largely ignored. This paper addresses both challenges by proposing a novel yet simple dynamic linear programming discriminant (DLPD) rule for binary classification. Different from the usual static linear discriminant analysis, the new method is able to capture the changing distributions of the underlying populations by modeling their means and covariances as smooth functions of covariates of interest. Under an approximate sparse condition, we show that the conditional misclassification rate of the DLPD rule converges to the Bayes risk in probability uniformly over the range of the variables used for modeling the dynamics, when the dimensionality is allowed to grow exponentially with the sample size. The minimax lower bound of the estimation of the Bayes risk is also established, implying that the misclassification rate of our proposed rule is minimax-rate optimal. The promising performance of the DLPD rule is illustrated via extensive simulation studies and the analysis of a breast cancer dataset.




en

Consistent structure estimation of exponential-family random graph models with block structure

Michael Schweinberger.

Source: Bernoulli, Volume 26, Number 2, 1205--1233.

Abstract:
We consider the challenging problem of statistical inference for exponential-family random graph models based on a single observation of a random graph with complex dependence. To facilitate statistical inference, we consider random graphs with additional structure in the form of block structure. We have shown elsewhere that when the block structure is known, it facilitates consistency results for $M$-estimators of canonical and curved exponential-family random graph models with complex dependence, such as transitivity. In practice, the block structure is known in some applications (e.g., multilevel networks), but is unknown in others. When the block structure is unknown, the first and foremost question is whether it can be recovered with high probability based on a single observation of a random graph with complex dependence. The main consistency results of the paper show that it is possible to do so under weak dependence and smoothness conditions. These results confirm that exponential-family random graph models with block structure constitute a promising direction of statistical network analysis.




en

Characterization of probability distribution convergence in Wasserstein distance by $L^{p}$-quantization error function

Yating Liu, Gilles Pagès.

Source: Bernoulli, Volume 26, Number 2, 1171--1204.

Abstract:
We establish conditions to characterize probability measures by their $L^{p}$-quantization error functions in both $mathbb{R}^{d}$ and Hilbert settings. This characterization is two-fold: static (identity of two distributions) and dynamic (convergence for the $L^{p}$-Wasserstein distance). We first propose a criterion on the quantization level $N$, valid for any norm on $mathbb{R}^{d}$ and any order $p$ based on a geometrical approach involving the Voronoï diagram. Then, we prove that in the $L^{2}$-case on a (separable) Hilbert space, the condition on the level $N$ can be reduced to $N=2$, which is optimal. More quantization based characterization cases in dimension 1 and a discussion of the completeness of a distance defined by the quantization error function can be found at the end of this paper.




en

Interacting reinforced stochastic processes: Statistical inference based on the weighted empirical means

Giacomo Aletti, Irene Crimaldi, Andrea Ghiglietti.

Source: Bernoulli, Volume 26, Number 2, 1098--1138.

Abstract:
This work deals with a system of interacting reinforced stochastic processes , where each process $X^{j}=(X_{n,j})_{n}$ is located at a vertex $j$ of a finite weighted directed graph, and it can be interpreted as the sequence of “actions” adopted by an agent $j$ of the network. The interaction among the dynamics of these processes depends on the weighted adjacency matrix $W$ associated to the underlying graph: indeed, the probability that an agent $j$ chooses a certain action depends on its personal “inclination” $Z_{n,j}$ and on the inclinations $Z_{n,h}$, with $h eq j$, of the other agents according to the entries of $W$. The best known example of reinforced stochastic process is the Pólya urn. The present paper focuses on the weighted empirical means $N_{n,j}=sum_{k=1}^{n}q_{n,k}X_{k,j}$, since, for example, the current experience is more important than the past one in reinforced learning. Their almost sure synchronization and some central limit theorems in the sense of stable convergence are proven. The new approach with weighted means highlights the key points in proving some recent results for the personal inclinations $Z^{j}=(Z_{n,j})_{n}$ and for the empirical means $overline{X}^{j}=(sum_{k=1}^{n}X_{k,j}/n)_{n}$ given in recent papers (e.g. Aletti, Crimaldi and Ghiglietti (2019), Ann. Appl. Probab. 27 (2017) 3787–3844, Crimaldi et al. Stochastic Process. Appl. 129 (2019) 70–101). In fact, with a more sophisticated decomposition of the considered processes, we can understand how the different convergence rates of the involved stochastic processes combine. From an application point of view, we provide confidence intervals for the common limit inclination of the agents and a test statistics to make inference on the matrix $W$, based on the weighted empirical means. In particular, we answer a research question posed in Aletti, Crimaldi and Ghiglietti (2019).




en

A Bayesian nonparametric approach to log-concave density estimation

Ester Mariucci, Kolyan Ray, Botond Szabó.

Source: Bernoulli, Volume 26, Number 2, 1070--1097.

Abstract:
The estimation of a log-concave density on $mathbb{R}$ is a canonical problem in the area of shape-constrained nonparametric inference. We present a Bayesian nonparametric approach to this problem based on an exponentiated Dirichlet process mixture prior and show that the posterior distribution converges to the log-concave truth at the (near-) minimax rate in Hellinger distance. Our proof proceeds by establishing a general contraction result based on the log-concave maximum likelihood estimator that prevents the need for further metric entropy calculations. We further present computationally more feasible approximations and both an empirical and hierarchical Bayes approach. All priors are illustrated numerically via simulations.




en

Degeneracy in sparse ERGMs with functions of degrees as sufficient statistics

Sumit Mukherjee.

Source: Bernoulli, Volume 26, Number 2, 1016--1043.

Abstract:
A sufficient criterion for “non-degeneracy” is given for Exponential Random Graph Models on sparse graphs with sufficient statistics which are functions of the degree sequence. This criterion explains why statistics such as alternating $k$-star are non-degenerate, whereas subgraph counts are degenerate. It is further shown that this criterion is “almost” tight. Existence of consistent estimates is then proved for non-degenerate Exponential Random Graph Models.




en

Distances and large deviations in the spatial preferential attachment model

Christian Hirsch, Christian Mönch.

Source: Bernoulli, Volume 26, Number 2, 927--947.

Abstract:
This paper considers two asymptotic properties of a spatial preferential-attachment model introduced by E. Jacob and P. Mörters (In Algorithms and Models for the Web Graph (2013) 14–25 Springer). First, in a regime of strong linear reinforcement, we show that typical distances are at most of doubly-logarithmic order. Second, we derive a large deviation principle for the empirical neighbourhood structure and express the rate function as solution to an entropy minimisation problem in the space of stationary marked point processes.




en

Convergence of the age structure of general schemes of population processes

Jie Yen Fan, Kais Hamza, Peter Jagers, Fima Klebaner.

Source: Bernoulli, Volume 26, Number 2, 893--926.

Abstract:
We consider a family of general branching processes with reproduction parameters depending on the age of the individual as well as the population age structure and a parameter $K$, which may represent the carrying capacity. These processes are Markovian in the age structure. In a previous paper ( Proc. Steklov Inst. Math. 282 (2013) 90–105), the Law of Large Numbers as $K o infty $ was derived. Here we prove the central limit theorem, namely the weak convergence of the fluctuation processes in an appropriate Skorokhod space. We also show that the limit is driven by a stochastic partial differential equation.




en

Recurrence of multidimensional persistent random walks. Fourier and series criteria

Peggy Cénac, Basile de Loynes, Yoann Offret, Arnaud Rousselle.

Source: Bernoulli, Volume 26, Number 2, 858--892.

Abstract:
The recurrence and transience of persistent random walks built from variable length Markov chains are investigated. It turns out that these stochastic processes can be seen as Lévy walks for which the persistence times depend on some internal Markov chain: they admit Markov random walk skeletons. A recurrence versus transience dichotomy is highlighted. Assuming the positive recurrence of the driving chain, a sufficient Fourier criterion for the recurrence, close to the usual Chung–Fuchs one, is given and a series criterion is derived. The key tool is the Nagaev–Guivarc’h method. Finally, we focus on particular two-dimensional persistent random walks, including directionally reinforced random walks, for which necessary and sufficient Fourier and series criteria are obtained. Inspired by ( Adv. Math. 208 (2007) 680–698), we produce a genuine counterexample to the conjecture of ( Adv. Math. 117 (1996) 239–252). As for the one-dimensional case studied in ( J. Theoret. Probab. 31 (2018) 232–243), it is easier for a persistent random walk than its skeleton to be recurrent. However, such examples are much more difficult to exhibit in the higher dimensional context. These results are based on a surprisingly novel – to our knowledge – upper bound for the Lévy concentration function associated with symmetric distributions.




en

Stochastic differential equations with a fractionally filtered delay: A semimartingale model for long-range dependent processes

Richard A. Davis, Mikkel Slot Nielsen, Victor Rohde.

Source: Bernoulli, Volume 26, Number 2, 799--827.

Abstract:
In this paper, we introduce a model, the stochastic fractional delay differential equation (SFDDE), which is based on the linear stochastic delay differential equation and produces stationary processes with hyperbolically decaying autocovariance functions. The model departs from the usual way of incorporating this type of long-range dependence into a short-memory model as it is obtained by applying a fractional filter to the drift term rather than to the noise term. The advantages of this approach are that the corresponding long-range dependent solutions are semimartingales and the local behavior of the sample paths is unaffected by the degree of long memory. We prove existence and uniqueness of solutions to the SFDDEs and study their spectral densities and autocovariance functions. Moreover, we define a subclass of SFDDEs which we study in detail and relate to the well-known fractionally integrated CARMA processes. Finally, we consider the task of simulating from the defining SFDDEs.




en

Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces

Jing Lei.

Source: Bernoulli, Volume 26, Number 1, 767--798.

Abstract:
We provide upper bounds of the expected Wasserstein distance between a probability measure and its empirical version, generalizing recent results for finite dimensional Euclidean spaces and bounded functional spaces. Such a generalization can cover Euclidean spaces with large dimensionality, with the optimal dependence on the dimensionality. Our method also covers the important case of Gaussian processes in separable Hilbert spaces, with rate-optimal upper bounds for functional data distributions whose coordinates decay geometrically or polynomially. Moreover, our bounds of the expected value can be combined with mean-concentration results to yield improved exponential tail probability bounds for the Wasserstein error of empirical measures under Bernstein-type or log Sobolev-type conditions.




en

A Feynman–Kac result via Markov BSDEs with generalised drivers

Elena Issoglio, Francesco Russo.

Source: Bernoulli, Volume 26, Number 1, 728--766.

Abstract:
In this paper, we investigate BSDEs where the driver contains a distributional term (in the sense of generalised functions) and derive general Feynman–Kac formulae related to these BSDEs. We introduce an integral operator to give sense to the equation and then we show the existence of a strong solution employing results on a related PDE. Due to the irregularity of the driver, the $Y$-component of a couple $(Y,Z)$ solving the BSDE is not necessarily a semimartingale but a weak Dirichlet process.




en

A unified approach to coupling SDEs driven by Lévy noise and some applications

Mingjie Liang, René L. Schilling, Jian Wang.

Source: Bernoulli, Volume 26, Number 1, 664--693.

Abstract:
We present a general method to construct couplings of stochastic differential equations driven by Lévy noise in terms of coupling operators. This approach covers both coupling by reflection and refined basic coupling which are often discussed in the literature. As applications, we prove regularity results for the transition semigroups and obtain successful couplings for the solutions to stochastic differential equations driven by additive Lévy noise.




en

On frequentist coverage errors of Bayesian credible sets in moderately high dimensions

Keisuke Yano, Kengo Kato.

Source: Bernoulli, Volume 26, Number 1, 616--641.

Abstract:
In this paper, we study frequentist coverage errors of Bayesian credible sets for an approximately linear regression model with (moderately) high dimensional regressors, where the dimension of the regressors may increase with but is smaller than the sample size. Specifically, we consider quasi-Bayesian inference on the slope vector under the quasi-likelihood with Gaussian error distribution. Under this setup, we derive finite sample bounds on frequentist coverage errors of Bayesian credible rectangles. Derivation of those bounds builds on a novel Berry–Esseen type bound on quasi-posterior distributions and recent results on high-dimensional CLT on hyperrectangles. We use this general result to quantify coverage errors of Castillo–Nickl and $L^{infty}$-credible bands for Gaussian white noise models, linear inverse problems, and (possibly non-Gaussian) nonparametric regression models. In particular, we show that Bayesian credible bands for those nonparametric models have coverage errors decaying polynomially fast in the sample size, implying advantages of Bayesian credible bands over confidence bands based on extreme value theory.




en

Consistent semiparametric estimators for recurrent event times models with application to virtual age models

Eric Beutner, Laurent Bordes, Laurent Doyen.

Source: Bernoulli, Volume 26, Number 1, 557--586.

Abstract:
Virtual age models are very useful to analyse recurrent events. Among the strengths of these models is their ability to account for treatment (or intervention) effects after an event occurrence. Despite their flexibility for modeling recurrent events, the number of applications is limited. This seems to be a result of the fact that in the semiparametric setting all the existing results assume the virtual age function that describes the treatment (or intervention) effects to be known. This shortcoming can be overcome by considering semiparametric virtual age models with parametrically specified virtual age functions. Yet, fitting such a model is a difficult task. Indeed, it has recently been shown that for these models the standard profile likelihood method fails to lead to consistent estimators. Here we show that consistent estimators can be constructed by smoothing the profile log-likelihood function appropriately. We show that our general result can be applied to most of the relevant virtual age models of the literature. Our approach shows that empirical process techniques may be a worthwhile alternative to martingale methods for studying asymptotic properties of these inference methods. A simulation study is provided to illustrate our consistency results together with an application to real data.




en

Tail expectile process and risk assessment

Abdelaati Daouia, Stéphane Girard, Gilles Stupfler.

Source: Bernoulli, Volume 26, Number 1, 531--556.

Abstract:
Expectiles define a least squares analogue of quantiles. They are determined by tail expectations rather than tail probabilities. For this reason and many other theoretical and practical merits, expectiles have recently received a lot of attention, especially in actuarial and financial risk management. Their estimation, however, typically requires to consider non-explicit asymmetric least squares estimates rather than the traditional order statistics used for quantile estimation. This makes the study of the tail expectile process a lot harder than that of the standard tail quantile process. Under the challenging model of heavy-tailed distributions, we derive joint weighted Gaussian approximations of the tail empirical expectile and quantile processes. We then use this powerful result to introduce and study new estimators of extreme expectiles and the standard quantile-based expected shortfall, as well as a novel expectile-based form of expected shortfall. Our estimators are built on general weighted combinations of both top order statistics and asymmetric least squares estimates. Some numerical simulations and applications to actuarial and financial data are provided.




en

Subspace perspective on canonical correlation analysis: Dimension reduction and minimax rates

Zhuang Ma, Xiaodong Li.

Source: Bernoulli, Volume 26, Number 1, 432--470.

Abstract:
Canonical correlation analysis (CCA) is a fundamental statistical tool for exploring the correlation structure between two sets of random variables. In this paper, motivated by the recent success of applying CCA to learn low dimensional representations of high dimensional objects, we propose two losses based on the principal angles between the model spaces spanned by the sample canonical variates and their population correspondents, respectively. We further characterize the non-asymptotic error bounds for the estimation risks under the proposed error metrics, which reveal how the performance of sample CCA depends adaptively on key quantities including the dimensions, the sample size, the condition number of the covariance matrices and particularly the population canonical correlation coefficients. The optimality of our uniform upper bounds is also justified by lower-bound analysis based on stringent and localized parameter spaces. To the best of our knowledge, for the first time our paper separates $p_{1}$ and $p_{2}$ for the first order term in the upper bounds without assuming the residual correlations are zeros. More significantly, our paper derives $(1-lambda_{k}^{2})(1-lambda_{k+1}^{2})/(lambda_{k}-lambda_{k+1})^{2}$ for the first time in the non-asymptotic CCA estimation convergence rates, which is essential to understand the behavior of CCA when the leading canonical correlation coefficients are close to $1$.




en

Construction results for strong orthogonal arrays of strength three

Chenlu Shi, Boxin Tang.

Source: Bernoulli, Volume 26, Number 1, 418--431.

Abstract:
Strong orthogonal arrays were recently introduced as a class of space-filling designs for computer experiments. The most attractive are those of strength three for their economical run sizes. Although the existence of strong orthogonal arrays of strength three has been completely characterized, the construction of these arrays has not been explored. In this paper, we provide a systematic and comprehensive study on the construction of these arrays, with the aim at better space-filling properties. Besides various characterizing results, three families of strength-three strong orthogonal arrays are presented. One of these families deserves special mention, as the arrays in this family enjoy almost all of the space-filling properties of strength-four strong orthogonal arrays, and do so with much more economical run sizes than the latter. The theory of maximal designs and their doubling constructions plays a crucial role in many of theoretical developments.




en

High dimensional deformed rectangular matrices with applications in matrix denoising

Xiucai Ding.

Source: Bernoulli, Volume 26, Number 1, 387--417.

Abstract:
We consider the recovery of a low rank $M imes N$ matrix $S$ from its noisy observation $ ilde{S}$ in the high dimensional framework when $M$ is comparable to $N$. We propose two efficient estimators for $S$ under two different regimes. Our analysis relies on the local asymptotics of the eigenstructure of large dimensional rectangular matrices with finite rank perturbation. We derive the convergent limits and rates for the singular values and vectors for such matrices.




en

Weak convergence of quantile and expectile processes under general assumptions

Tobias Zwingmann, Hajo Holzmann.

Source: Bernoulli, Volume 26, Number 1, 323--351.

Abstract:
We show weak convergence of quantile and expectile processes to Gaussian limit processes in the space of bounded functions endowed with an appropriate semimetric which is based on the concepts of epi- and hypo- convergence as introduced in A. Bücher, J. Segers and S. Volgushev (2014), ‘ When Uniform Weak Convergence Fails: Empirical Processes for Dependence Functions and Residuals via Epi- and Hypographs ’, Annals of Statistics 42 . We impose assumptions for which it is known that weak convergence with respect to the supremum norm generally fails to hold. For quantiles, we consider stationary observations, where the marginal distribution function is assumed to be strictly increasing and continuous except for finitely many points and to admit strictly positive – possibly infinite – left- and right-sided derivatives. For expectiles, we focus on independent and identically distributed (i.i.d.) observations. Only a finite second moment and continuity at the boundary points but no further smoothness properties of the distribution function are required. We also show consistency of the bootstrap for this mode of convergence in the i.i.d. case for quantiles and expectiles.