un Rate optimal Chernoff bound and application to community detection in the stochastic block models By projecteuclid.org Published On :: Tue, 24 Mar 2020 22:01 EDT Zhixin Zhou, Ping Li. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1302--1347.Abstract: The Chernoff coefficient is known to be an upper bound of Bayes error probability in classification problem. In this paper, we will develop a rate optimal Chernoff bound on the Bayes error probability. The new bound is not only an upper bound but also a lower bound of Bayes error probability up to a constant factor. Moreover, we will apply this result to community detection in the stochastic block models. As a clustering problem, the optimal misclassification rate of community detection problem can be characterized by our rate optimal Chernoff bound. This can be formalized by deriving a minimax error rate over certain parameter space of stochastic block models, then achieving such an error rate by a feasible algorithm employing multiple steps of EM type updates. Full Article
un Sparsely observed functional time series: estimation and prediction By projecteuclid.org Published On :: Thu, 27 Feb 2020 22:04 EST Tomáš Rubín, Victor M. Panaretos. Source: Electronic Journal of Statistics, Volume 14, Number 1, 1137--1210.Abstract: Functional time series analysis, whether based on time or frequency domain methodology, has traditionally been carried out under the assumption of complete observation of the constituent series of curves, assumed stationary. Nevertheless, as is often the case with independent functional data, it may well happen that the data available to the analyst are not the actual sequence of curves, but relatively few and noisy measurements per curve, potentially at different locations in each curve’s domain. Under this sparse sampling regime, neither the established estimators of the time series’ dynamics nor their corresponding theoretical analysis will apply. The subject of this paper is to tackle the problem of estimating the dynamics and of recovering the latent process of smooth curves in the sparse regime. Assuming smoothness of the latent curves, we construct a consistent nonparametric estimator of the series’ spectral density operator and use it to develop a frequency-domain recovery approach, that predicts the latent curve at a given time by borrowing strength from the (estimated) dynamic correlations in the series across time. This new methodology is seen to comprehensively outperform a naive recovery approach that would ignore temporal dependence and use only methodology employed in the i.i.d. setting and hinging on the lag zero covariance. Further to predicting the latent curves from their noisy point samples, the method fills in gaps in the sequence (curves nowhere sampled), denoises the data, and serves as a basis for forecasting. Means of providing corresponding confidence bands are also investigated. A simulation study interestingly suggests that sparse observation for a longer time period may provide better performance than dense observation for a shorter period, in the presence of smoothness. The methodology is further illustrated by application to an environmental data set on fair-weather atmospheric electricity, which naturally leads to a sparse functional time series. Full Article
un On the distribution, model selection properties and uniqueness of the Lasso estimator in low and high dimensions By projecteuclid.org Published On :: Mon, 17 Feb 2020 22:06 EST Karl Ewald, Ulrike Schneider. Source: Electronic Journal of Statistics, Volume 14, Number 1, 944--969.Abstract: We derive expressions for the finite-sample distribution of the Lasso estimator in the context of a linear regression model in low as well as in high dimensions by exploiting the structure of the optimization problem defining the estimator. In low dimensions, we assume full rank of the regressor matrix and present expressions for the cumulative distribution function as well as the densities of the absolutely continuous parts of the estimator. Our results are presented for the case of normally distributed errors, but do not hinge on this assumption and can easily be generalized. Additionally, we establish an explicit formula for the correspondence between the Lasso and the least-squares estimator. We derive analogous results for the distribution in less explicit form in high dimensions where we make no assumptions on the regressor matrix at all. In this setting, we also investigate the model selection properties of the Lasso and show that possibly only a subset of models might be selected by the estimator, completely independently of the observed response vector. Finally, we present a condition for uniqueness of the estimator that is necessary as well as sufficient. Full Article
un Generalized bounds for active subspaces By projecteuclid.org Published On :: Mon, 17 Feb 2020 22:06 EST Mario Teixeira Parente, Jonas Wallin, Barbara Wohlmuth. Source: Electronic Journal of Statistics, Volume 14, Number 1, 917--943.Abstract: In this article, we consider scenarios in which traditional estimates for the active subspace method based on probabilistic Poincaré inequalities are not valid due to unbounded Poincaré constants. Consequently, we propose a framework that allows to derive generalized estimates in the sense that it enables to control the trade-off between the size of the Poincaré constant and a weaker order of the final error bound. In particular, we investigate independently exponentially distributed random variables in dimension two or larger and give explicit expressions for corresponding Poincaré constants showing their dependence on the dimension of the problem. Finally, we suggest possibilities for future work that aim for extending the class of distributions applicable to the active subspace method as we regard this as an opportunity to enlarge its usability. Full Article
un Reduction problems and deformation approaches to nonstationary covariance functions over spheres By projecteuclid.org Published On :: Tue, 11 Feb 2020 22:03 EST Emilio Porcu, Rachid Senoussi, Enner Mendoza, Moreno Bevilacqua. Source: Electronic Journal of Statistics, Volume 14, Number 1, 890--916.Abstract: The paper considers reduction problems and deformation approaches for nonstationary covariance functions on the $(d-1)$-dimensional spheres, $mathbb{S}^{d-1}$, embedded in the $d$-dimensional Euclidean space. Given a covariance function $C$ on $mathbb{S}^{d-1}$, we chase a pair $(R,Psi)$, for a function $R:[-1,+1] o mathbb{R}$ and a smooth bijection $Psi$, such that $C$ can be reduced to a geodesically isotropic one: $C(mathbf{x},mathbf{y})=R(langle Psi (mathbf{x}),Psi (mathbf{y}) angle )$, with $langle cdot ,cdot angle $ denoting the dot product. The problem finds motivation in recent statistical literature devoted to the analysis of global phenomena, defined typically over the sphere of $mathbb{R}^{3}$. The application domains considered in the manuscript makes the problem mathematically challenging. We show the uniqueness of the representation in the reduction problem. Then, under some regularity assumptions, we provide an inversion formula to recover the bijection $Psi$, when it exists, for a given $C$. We also give sufficient conditions for reducibility. Full Article
un Universal Latent Space Model Fitting for Large Networks with Edge Covariates By Published On :: 2020 Latent space models are effective tools for statistical modeling and visualization of network data. Due to their close connection to generalized linear models, it is also natural to incorporate covariate information in them. The current paper presents two universal fitting algorithms for networks with edge covariates: one based on nuclear norm penalization and the other based on projected gradient descent. Both algorithms are motivated by maximizing the likelihood function for an existing class of inner-product models, and we establish their statistical rates of convergence for these models. In addition, the theory informs us that both methods work simultaneously for a wide range of different latent space models that allow latent positions to affect edge formation in flexible ways, such as distance models. Furthermore, the effectiveness of the methods is demonstrated on a number of real world network data sets for different statistical tasks, including community detection with and without edge covariates, and network assisted learning. Full Article
un Lower Bounds for Parallel and Randomized Convex Optimization By Published On :: 2020 We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation and in the high-dimensional regime. We show that the answer is negative for both deterministic and randomized algorithms applied to essentially any of the interesting geometries and nonsmooth, weakly-smooth, or smooth objective functions. In particular, we show that it is not possible to obtain a polylogarithmic (in the sequential complexity of the problem) number of parallel rounds with a polynomial (in the dimension) number of queries per round. In the majority of these settings and when the dimension of the space is polynomial in the inverse target accuracy, our lower bounds match the oracle complexity of sequential convex optimization, up to at most a logarithmic factor in the dimension, which makes them (nearly) tight. Another conceptual contribution of our work is in providing a general and streamlined framework for proving lower bounds in the setting of parallel convex optimization. Prior to our work, lower bounds for parallel convex optimization algorithms were only known in a small fraction of the settings considered in this paper, mainly applying to Euclidean ($ell_2$) and $ell_infty$ spaces. Full Article
un On Mahalanobis Distance in Functional Settings By Published On :: 2020 Mahalanobis distance is a classical tool in multivariate analysis. We suggest here an extension of this concept to the case of functional data. More precisely, the proposed definition concerns those statistical problems where the sample data are real functions defined on a compact interval of the real line. The obvious difficulty for such a functional extension is the non-invertibility of the covariance operator in infinite-dimensional cases. Unlike other recent proposals, our definition is suggested and motivated in terms of the Reproducing Kernel Hilbert Space (RKHS) associated with the stochastic process that generates the data. The proposed distance is a true metric; it depends on a unique real smoothing parameter which is fully motivated in RKHS terms. Moreover, it shares some properties of its finite dimensional counterpart: it is invariant under isometries, it can be consistently estimated from the data and its sampling distribution is known under Gaussian models. An empirical study for two statistical applications, outliers detection and binary classification, is included. The results are quite competitive when compared to other recent proposals in the literature. Full Article
un Perturbation Bounds for Procrustes, Classical Scaling, and Trilateration, with Applications to Manifold Learning By Published On :: 2020 One of the common tasks in unsupervised learning is dimensionality reduction, where the goal is to find meaningful low-dimensional structures hidden in high-dimensional data. Sometimes referred to as manifold learning, this problem is closely related to the problem of localization, which aims at embedding a weighted graph into a low-dimensional Euclidean space. Several methods have been proposed for localization, and also manifold learning. Nonetheless, the robustness property of most of them is little understood. In this paper, we obtain perturbation bounds for classical scaling and trilateration, which are then applied to derive performance bounds for Isomap, Landmark Isomap, and Maximum Variance Unfolding. A new perturbation bound for procrustes analysis plays a key role. Full Article
un A Unified Framework for Structured Graph Learning via Spectral Constraints By Published On :: 2020 Graph learning from data is a canonical problem that has received substantial attention in the literature. Learning a structured graph is essential for interpretability and identification of the relationships among data. In general, learning a graph with a specific structure is an NP-hard combinatorial problem and thus designing a general tractable algorithm is challenging. Some useful structured graphs include connected, sparse, multi-component, bipartite, and regular graphs. In this paper, we introduce a unified framework for structured graph learning that combines Gaussian graphical model and spectral graph theory. We propose to convert combinatorial structural constraints into spectral constraints on graph matrices and develop an optimization framework based on block majorization-minimization to solve structured graph learning problem. The proposed algorithms are provably convergent and practically amenable for a number of graph based applications such as data clustering. Extensive numerical experiments with both synthetic and real data sets illustrate the effectiveness of the proposed algorithms. An open source R package containing the code for all the experiments is available at https://CRAN.R-project.org/package=spectralGraphTopology. Full Article
un Lower Bounds for Testing Graphical Models: Colorings and Antiferromagnetic Ising Models By Published On :: 2020 We study the identity testing problem in the context of spin systems or undirected graphical models, where it takes the following form: given the parameter specification of the model $M$ and a sampling oracle for the distribution $mu_{M^*}$ of an unknown model $M^*$, can we efficiently determine if the two models $M$ and $M^*$ are the same? We consider identity testing for both soft-constraint and hard-constraint systems. In particular, we prove hardness results in two prototypical cases, the Ising model and proper colorings, and explore whether identity testing is any easier than structure learning. For the ferromagnetic (attractive) Ising model, Daskalakis et al. (2018) presented a polynomial-time algorithm for identity testing. We prove hardness results in the antiferromagnetic (repulsive) setting in the same regime of parameters where structure learning is known to require a super-polynomial number of samples. Specifically, for $n$-vertex graphs of maximum degree $d$, we prove that if $|eta| d = omega(log{n})$ (where $eta$ is the inverse temperature parameter), then there is no polynomial running time identity testing algorithm unless $RP=NP$. In the hard-constraint setting, we present hardness results for identity testing for proper colorings. Our results are based on the presumed hardness of #BIS, the problem of (approximately) counting independent sets in bipartite graphs. Full Article
un Generalized Nonbacktracking Bounds on the Influence By Published On :: 2020 This paper develops deterministic upper and lower bounds on the influence measure in a network, more precisely, the expected number of nodes that a seed set can influence in the independent cascade model. In particular, our bounds exploit r-nonbacktracking walks and Fortuin-Kasteleyn-Ginibre (FKG) type inequalities, and are computed by message passing algorithms. Further, we provide parameterized versions of the bounds that control the trade-off between efficiency and accuracy. Finally, the tightness of the bounds is illustrated on various network models. Full Article
un Provably robust estimation of modulo 1 samples of a smooth function with applications to phase unwrapping By Published On :: 2020 Consider an unknown smooth function $f: [0,1]^d ightarrow mathbb{R}$, and assume we are given $n$ noisy mod 1 samples of $f$, i.e., $y_i = (f(x_i) + eta_i) mod 1$, for $x_i in [0,1]^d$, where $eta_i$ denotes the noise. Given the samples $(x_i,y_i)_{i=1}^{n}$, our goal is to recover smooth, robust estimates of the clean samples $f(x_i) mod 1$. We formulate a natural approach for solving this problem, which works with angular embeddings of the noisy mod 1 samples over the unit circle, inspired by the angular synchronization framework. This amounts to solving a smoothness regularized least-squares problem -- a quadratically constrained quadratic program (QCQP) -- where the variables are constrained to lie on the unit circle. Our proposed approach is based on solving its relaxation, which is a trust-region sub-problem and hence solvable efficiently. We provide theoretical guarantees demonstrating its robustness to noise for adversarial, as well as random Gaussian and Bernoulli noise models. To the best of our knowledge, these are the first such theoretical results for this problem. We demonstrate the robustness and efficiency of our proposed approach via extensive numerical simulations on synthetic data, along with a simple least-squares based solution for the unwrapping stage, that recovers the original samples of $f$ (up to a global shift). It is shown to perform well at high levels of noise, when taking as input the denoised modulo $1$ samples. Finally, we also consider two other approaches for denoising the modulo 1 samples that leverage tools from Riemannian optimization on manifolds, including a Burer-Monteiro approach for a semidefinite programming relaxation of our formulation. For the two-dimensional version of the problem, which has applications in synthetic aperture radar interferometry (InSAR), we are able to solve instances of real-world data with a million sample points in under 10 seconds, on a personal laptop. Full Article
un Learning with Fenchel-Young losses By Published On :: 2020 Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction. Understanding the core principles and theoretical properties underpinning these losses is key to choose the right loss for the right problem, as well as to create new losses which combine their strengths. In this paper, we introduce Fenchel-Young losses, a generic way to construct a convex loss function for a regularized prediction function. We provide an in-depth study of their properties in a very broad setting, covering all the aforementioned supervised learning tasks, and revealing new connections between sparsity, generalized entropies, and separation margins. We show that Fenchel-Young losses unify many well-known loss functions and allow to create useful new ones easily. Finally, we derive efficient predictive and training algorithms, making Fenchel-Young losses appealing both in theory and practice. Full Article
un Causal Discovery Toolbox: Uncovering causal relationships in Python By Published On :: 2020 This paper presents a new open source Python framework for causal discovery from observational data and domain background knowledge, aimed at causal graph and causal mechanism modeling. The cdt package implements an end-to-end approach, recovering the direct dependencies (the skeleton of the causal graph) and the causal relationships between variables. It includes algorithms from the `Bnlearn' and `Pcalg' packages, together with algorithms for pairwise causal discovery such as ANM. Full Article
un Latent Simplex Position Model: High Dimensional Multi-view Clustering with Uncertainty Quantification By Published On :: 2020 High dimensional data often contain multiple facets, and several clustering patterns can co-exist under different variable subspaces, also known as the views. While multi-view clustering algorithms were proposed, the uncertainty quantification remains difficult --- a particular challenge is in the high complexity of estimating the cluster assignment probability under each view, and sharing information among views. In this article, we propose an approximate Bayes approach --- treating the similarity matrices generated over the views as rough first-stage estimates for the co-assignment probabilities; in its Kullback-Leibler neighborhood, we obtain a refined low-rank matrix, formed by the pairwise product of simplex coordinates. Interestingly, each simplex coordinate directly encodes the cluster assignment uncertainty. For multi-view clustering, we let each view draw a parameterization from a few candidates, leading to dimension reduction. With high model flexibility, the estimation can be efficiently carried out as a continuous optimization problem, hence enjoys gradient-based computation. The theory establishes the connection of this model to a random partition distribution under multiple views. Compared to single-view clustering approaches, substantially more interpretable results are obtained when clustering brains from a human traumatic brain injury study, using high-dimensional gene expression data. Full Article
un Branch and Bound for Piecewise Linear Neural Network Verification By Published On :: 2020 The success of Deep Learning and its potential use in many safety-critical applicationshas motivated research on formal verification of Neural Network (NN) models. In thiscontext, verification involves proving or disproving that an NN model satisfies certaininput-output properties. Despite the reputation of learned NN models as black boxes,and the theoretical hardness of proving useful properties about them, researchers havebeen successful in verifying some classes of models by exploiting their piecewise linearstructure and taking insights from formal methods such as Satisifiability Modulo Theory.However, these methods are still far from scaling to realistic neural networks. To facilitateprogress on this crucial area, we exploit the Mixed Integer Linear Programming (MIP) formulation of verification to propose a family of algorithms based on Branch-and-Bound (BaB). We show that our family contains previous verification methods as special cases.With the help of the BaB framework, we make three key contributions. Firstly, we identifynew methods that combine the strengths of multiple existing approaches, accomplishingsignificant performance improvements over previous state of the art. Secondly, we introducean effective branching strategy on ReLU non-linearities. This branching strategy allows usto efficiently and successfully deal with high input dimensional problems with convolutionalnetwork architecture, on which previous methods fail frequently. Finally, we proposecomprehensive test data sets and benchmarks which includes a collection of previouslyreleased testcases. We use the data sets to conduct a thorough experimental comparison ofexisting and new algorithms and to provide an inclusive analysis of the factors impactingthe hardness of verification problems. Full Article
un A Convex Parametrization of a New Class of Universal Kernel Functions By Published On :: 2020 The accuracy and complexity of kernel learning algorithms is determined by the set of kernels over which it is able to optimize. An ideal set of kernels should: admit a linear parameterization (tractability); be dense in the set of all kernels (accuracy); and every member should be universal so that the hypothesis space is infinite-dimensional (scalability). Currently, there is no class of kernel that meets all three criteria - e.g. Gaussians are not tractable or accurate; polynomials are not scalable. We propose a new class that meet all three criteria - the Tessellated Kernel (TK) class. Specifically, the TK class: admits a linear parameterization using positive matrices; is dense in all kernels; and every element in the class is universal. This implies that the use of TK kernels for learning the kernel can obviate the need for selecting candidate kernels in algorithms such as SimpleMKL and parameters such as the bandwidth. Numerical testing on soft margin Support Vector Machine (SVM) problems show that algorithms using TK kernels outperform other kernel learning algorithms and neural networks. Furthermore, our results show that when the ratio of the number of training data to features is high, the improvement of TK over MKL increases significantly. Full Article
un Fast Rates for General Unbounded Loss Functions: From ERM to Generalized Bayes By Published On :: 2020 We present new excess risk bounds for general unbounded loss functions including log loss and squared loss, where the distribution of the losses may be heavy-tailed. The bounds hold for general estimators, but they are optimized when applied to $eta$-generalized Bayesian, MDL, and empirical risk minimization estimators. In the case of log loss, the bounds imply convergence rates for generalized Bayesian inference under misspecification in terms of a generalization of the Hellinger metric as long as the learning rate $eta$ is set correctly. For general loss functions, our bounds rely on two separate conditions: the $v$-GRIP (generalized reversed information projection) conditions, which control the lower tail of the excess loss; and the newly introduced witness condition, which controls the upper tail. The parameter $v$ in the $v$-GRIP conditions determines the achievable rate and is akin to the exponent in the Tsybakov margin condition and the Bernstein condition for bounded losses, which the $v$-GRIP conditions generalize; favorable $v$ in combination with small model complexity leads to $ ilde{O}(1/n)$ rates. The witness condition allows us to connect the excess risk to an 'annealed' version thereof, by which we generalize several previous results connecting Hellinger and Rényi divergence to KL divergence. Full Article
un Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions By Published On :: 2020 We consider the standard model of distributed optimization of a sum of functions $F(mathbf z) = sum_{i=1}^n f_i(mathbf z)$, where node $i$ in a network holds the function $f_i(mathbf z)$. We allow for a harsh network model characterized by asynchronous updates, message delays, unpredictable message losses, and directed communication among nodes. In this setting, we analyze a modification of the Gradient-Push method for distributed optimization, assuming that (i) node $i$ is capable of generating gradients of its function $f_i(mathbf z)$ corrupted by zero-mean bounded-support additive noise at each step, (ii) $F(mathbf z)$ is strongly convex, and (iii) each $f_i(mathbf z)$ has Lipschitz gradients. We show that our proposed method asymptotically performs as well as the best bounds on centralized gradient descent that takes steps in the direction of the sum of the noisy gradients of all the functions $f_1(mathbf z), ldots, f_n(mathbf z)$ at each step. Full Article
un Unique Sharp Local Minimum in L1-minimization Complete Dictionary Learning By Published On :: 2020 We study the problem of globally recovering a dictionary from a set of signals via $ell_1$-minimization. We assume that the signals are generated as i.i.d. random linear combinations of the $K$ atoms from a complete reference dictionary $D^*in mathbb R^{K imes K}$, where the linear combination coefficients are from either a Bernoulli type model or exact sparse model. First, we obtain a necessary and sufficient norm condition for the reference dictionary $D^*$ to be a sharp local minimum of the expected $ell_1$ objective function. Our result substantially extends that of Wu and Yu (2015) and allows the combination coefficient to be non-negative. Secondly, we obtain an explicit bound on the region within which the objective value of the reference dictionary is minimal. Thirdly, we show that the reference dictionary is the unique sharp local minimum, thus establishing the first known global property of $ell_1$-minimization dictionary learning. Motivated by the theoretical results, we introduce a perturbation based test to determine whether a dictionary is a sharp local minimum of the objective function. In addition, we also propose a new dictionary learning algorithm based on Block Coordinate Descent, called DL-BCD, which is guaranteed to decrease the obective function monotonically. Simulation studies show that DL-BCD has competitive performance in terms of recovery rate compared to other state-of-the-art dictionary learning algorithms when the reference dictionary is generated from random Gaussian matrices. Full Article
un Community-Based Group Graphical Lasso By Published On :: 2020 A new strategy for probabilistic graphical modeling is developed that draws parallels to community detection analysis. The method jointly estimates an undirected graph and homogeneous communities of nodes. The structure of the communities is taken into account when estimating the graph and at the same time, the structure of the graph is accounted for when estimating communities of nodes. The procedure uses a joint group graphical lasso approach with community detection-based grouping, such that some groups of edges co-occur in the estimated graph. The grouping structure is unknown and is estimated based on community detection algorithms. Theoretical derivations regarding graph convergence and sparsistency, as well as accuracy of community recovery are included, while the method's empirical performance is illustrated in an fMRI context, as well as with simulated examples. Full Article
un The weight function in the subtree kernel is decisive By Published On :: 2020 Tree data are ubiquitous because they model a large variety of situations, e.g., the architecture of plants, the secondary structure of RNA, or the hierarchy of XML files. Nevertheless, the analysis of these non-Euclidean data is difficult per se. In this paper, we focus on the subtree kernel that is a convolution kernel for tree data introduced by Vishwanathan and Smola in the early 2000's. More precisely, we investigate the influence of the weight function from a theoretical perspective and in real data applications. We establish on a 2-classes stochastic model that the performance of the subtree kernel is improved when the weight of leaves vanishes, which motivates the definition of a new weight function, learned from the data and not fixed by the user as usually done. To this end, we define a unified framework for computing the subtree kernel from ordered or unordered trees, that is particularly suitable for tuning parameters. We show through eight real data classification problems the great efficiency of our approach, in particular for small data sets, which also states the high importance of the weight function. Finally, a visualization tool of the significant features is derived. Full Article
un Union of Low-Rank Tensor Spaces: Clustering and Completion By Published On :: 2020 We consider the problem of clustering and completing a set of tensors with missing data that are drawn from a union of low-rank tensor spaces. In the clustering problem, given a partially sampled tensor data that is composed of a number of subtensors, each chosen from one of a certain number of unknown tensor spaces, we need to group the subtensors that belong to the same tensor space. We provide a geometrical analysis on the sampling pattern and subsequently derive the sampling rate that guarantees the correct clustering under some assumptions with high probability. Moreover, we investigate the fundamental conditions for finite/unique completability for the union of tensor spaces completion problem. Both deterministic and probabilistic conditions on the sampling pattern to ensure finite/unique completability are obtained. For both the clustering and completion problems, our tensor analysis provides significantly better bound than the bound given by the matrix analysis applied to any unfolding of the tensor data. Full Article
un GADMM: Fast and Communication Efficient Framework for Distributed Machine Learning By Published On :: 2020 When the data is distributed across multiple servers, lowering the communication cost between the servers (or workers) while solving the distributed learning problem is an important problem and is the focus of this paper. In particular, we propose a fast, and communication-efficient decentralized framework to solve the distributed machine learning (DML) problem. The proposed algorithm, Group Alternating Direction Method of Multipliers (GADMM) is based on the Alternating Direction Method of Multipliers (ADMM) framework. The key novelty in GADMM is that it solves the problem in a decentralized topology where at most half of the workers are competing for the limited communication resources at any given time. Moreover, each worker exchanges the locally trained model only with two neighboring workers, thereby training a global model with a lower amount of communication overhead in each exchange. We prove that GADMM converges to the optimal solution for convex loss functions, and numerically show that it converges faster and more communication-efficient than the state-of-the-art communication-efficient algorithms such as the Lazily Aggregated Gradient (LAG) and dual averaging, in linear and logistic regression tasks on synthetic and real datasets. Furthermore, we propose Dynamic GADMM (D-GADMM), a variant of GADMM, and prove its convergence under the time-varying network topology of the workers. Full Article
un Q&A with Tara June Winch By feedproxy.google.com Published On :: Mon, 27 Apr 2020 01:40:36 +0000 Tara June Winch's profound novel The Yield has won three NSW Premier's Literary Awards prizes this year, inclu Full Article
un Youth & Community Initiatives Funding available By www.eastgwillimbury.ca Published On :: Thu, 20 Feb 2020 18:27:25 GMT Full Article
un Town launches new Community Support Hotline By www.eastgwillimbury.ca Published On :: Tue, 28 Apr 2020 23:15:02 GMT Full Article
un Branching random walks with uncountably many extinction probability vectors By projecteuclid.org Published On :: Mon, 04 May 2020 04:00 EDT Daniela Bertacchi, Fabio Zucca. Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 2, 426--438.Abstract: Given a branching random walk on a set $X$, we study its extinction probability vectors $mathbf{q}(cdot,A)$. Their components are the probability that the process goes extinct in a fixed $Asubseteq X$, when starting from a vertex $xin X$. The set of extinction probability vectors (obtained letting $A$ vary among all subsets of $X$) is a subset of the set of the fixed points of the generating function of the branching random walk. In particular here we are interested in the cardinality of the set of extinction probability vectors. We prove results which allow to understand whether the probability of extinction in a set $A$ is different from the one of extinction in another set $B$. In many cases there are only two possible extinction probability vectors and so far, in more complicated examples, only a finite number of distinct extinction probability vectors had been explicitly found. Whether a branching random walk could have an infinite number of distinct extinction probability vectors was not known. We apply our results to construct examples of branching random walks with uncountably many distinct extinction probability vectors. Full Article
un Reliability estimation in a multicomponent stress-strength model for Burr XII distribution under progressive censoring By projecteuclid.org Published On :: Mon, 04 May 2020 04:00 EDT Raj Kamal Maurya, Yogesh Mani Tripathi. Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 2, 345--369.Abstract: We consider estimation of the multicomponent stress-strength reliability under progressive Type II censoring under the assumption that stress and strength variables follow Burr XII distributions with a common shape parameter. Maximum likelihood estimates of the reliability are obtained along with asymptotic intervals when common shape parameter may be known or unknown. Bayes estimates are also derived under the squared error loss function using different approximation methods. Further, we obtain exact Bayes and uniformly minimum variance unbiased estimates of the reliability for the case common shape parameter is known. The highest posterior density intervals are also obtained. We perform Monte Carlo simulations to compare the performance of proposed estimates and present a discussion based on this study. Finally, two real data sets are analyzed for illustration purposes. Full Article
un Recent developments in complex and spatially correlated functional data By projecteuclid.org Published On :: Mon, 04 May 2020 04:00 EDT Israel Martínez-Hernández, Marc G. Genton. Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 2, 204--229.Abstract: As high-dimensional and high-frequency data are being collected on a large scale, the development of new statistical models is being pushed forward. Functional data analysis provides the required statistical methods to deal with large-scale and complex data by assuming that data are continuous functions, for example, realizations of a continuous process (curves) or continuous random field (surfaces), and that each curve or surface is considered as a single observation. Here, we provide an overview of functional data analysis when data are complex and spatially correlated. We provide definitions and estimators of the first and second moments of the corresponding functional random variable. We present two main approaches: The first assumes that data are realizations of a functional random field, that is, each observation is a curve with a spatial component. We call them spatial functional data . The second approach assumes that data are continuous deterministic fields observed over time. In this case, one observation is a surface or manifold, and we call them surface time series . For these two approaches, we describe software available for the statistical analysis. We also present a data illustration, using a high-resolution wind speed simulated dataset, as an example of the two approaches. The functional data approach offers a new paradigm of data analysis, where the continuous processes or random fields are considered as a single entity. We consider this approach to be very valuable in the context of big data. Full Article
un On estimating the location parameter of the selected exponential population under the LINEX loss function By projecteuclid.org Published On :: Mon, 03 Feb 2020 04:00 EST Mohd Arshad, Omer Abdalghani. Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 167--182.Abstract: Suppose that $pi_{1},pi_{2},ldots ,pi_{k}$ be $k(geq2)$ independent exponential populations having unknown location parameters $mu_{1},mu_{2},ldots,mu_{k}$ and known scale parameters $sigma_{1},ldots,sigma_{k}$. Let $mu_{[k]}=max {mu_{1},ldots,mu_{k}}$. For selecting the population associated with $mu_{[k]}$, a class of selection rules (proposed by Arshad and Misra [ Statistical Papers 57 (2016) 605–621]) is considered. We consider the problem of estimating the location parameter $mu_{S}$ of the selected population under the criterion of the LINEX loss function. We consider three natural estimators $delta_{N,1},delta_{N,2}$ and $delta_{N,3}$ of $mu_{S}$, based on the maximum likelihood estimators, uniformly minimum variance unbiased estimator (UMVUE) and minimum risk equivariant estimator (MREE) of $mu_{i}$’s, respectively. The uniformly minimum risk unbiased estimator (UMRUE) and the generalized Bayes estimator of $mu_{S}$ are derived. Under the LINEX loss function, a general result for improving a location-equivariant estimator of $mu_{S}$ is derived. Using this result, estimator better than the natural estimator $delta_{N,1}$ is obtained. We also shown that the estimator $delta_{N,1}$ is dominated by the natural estimator $delta_{N,3}$. Finally, we perform a simulation study to evaluate and compare risk functions among various competing estimators of $mu_{S}$. Full Article
un Application of weighted and unordered majorization orders in comparisons of parallel systems with exponentiated generalized gamma components By projecteuclid.org Published On :: Mon, 03 Feb 2020 04:00 EST Abedin Haidari, Amir T. Payandeh Najafabadi, Narayanaswamy Balakrishnan. Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 150--166.Abstract: Consider two parallel systems, say $A$ and $B$, with respective lifetimes $T_{1}$ and $T_{2}$ wherein independent component lifetimes of each system follow exponentiated generalized gamma distribution with possibly different exponential shape and scale parameters. We show here that $T_{2}$ is smaller than $T_{1}$ with respect to the usual stochastic order (reversed hazard rate order) if the vector of logarithm (the main vector) of scale parameters of System $B$ is weakly weighted majorized by that of System $A$, and if the vector of exponential shape parameters of System $A$ is unordered mojorized by that of System $B$. By means of some examples, we show that the above results can not be extended to the hazard rate and likelihood ratio orders. However, when the scale parameters of each system divide into two homogeneous groups, we verify that the usual stochastic and reversed hazard rate orders can be extended, respectively, to the hazard rate and likelihood ratio orders. The established results complete and strengthen some of the known results in the literature. Full Article
un Nonparametric discrimination of areal functional data By projecteuclid.org Published On :: Mon, 03 Feb 2020 04:00 EST Ahmad Younso. Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 112--126.Abstract: We consider a new nonparametric rule of classification, inspired from the classical moving window rule, that allows for the classification of spatially dependent functional data containing some completely missing curves. We investigate the consistency of this classifier under mild conditions. The practical use of the classifier will be illustrated through simulation studies. Full Article
un A joint mean-correlation modeling approach for longitudinal zero-inflated count data By projecteuclid.org Published On :: Mon, 03 Feb 2020 04:00 EST Weiping Zhang, Jiangli Wang, Fang Qian, Yu Chen. Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 35--50.Abstract: Longitudinal zero-inflated count data are widely encountered in many fields, while modeling the correlation between measurements for the same subject is more challenge due to the lack of suitable multivariate joint distributions. This paper studies a novel mean-correlation modeling approach for longitudinal zero-inflated regression model, solving both problems of specifying joint distribution and parsimoniously modeling correlations with no constraint. The joint distribution of zero-inflated discrete longitudinal responses is modeled by a copula model whose correlation parameters are innovatively represented in hyper-spherical coordinates. To overcome the computational intractability in maximizing the full likelihood function of the model, we further propose a computationally efficient pairwise likelihood approach. We then propose separated mean and correlation regression models to model these key quantities, such modeling approach can also handle irregularly and possibly subject-specific times points. The resulting estimators are shown to be consistent and asymptotically normal. Data example and simulations support the effectiveness of the proposed approach. Full Article
un Bayesian inference on power Lindley distribution based on different loss functions By projecteuclid.org Published On :: Mon, 26 Aug 2019 04:00 EDT Abbas Pak, M. E. Ghitany, Mohammad Reza Mahmoudi. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 4, 894--914.Abstract: This paper focuses on Bayesian estimation of the parameters and reliability function of the power Lindley distribution by using various symmetric and asymmetric loss functions. Assuming suitable priors on the parameters, Bayes estimates are derived by using squared error, linear exponential (linex) and general entropy loss functions. Since, under these loss functions, Bayes estimates of the parameters do not have closed forms we use lindley’s approximation technique to calculate the Bayes estimates. Moreover, we obtain the Bayes estimates of the parameters using a Markov Chain Monte Carlo (MCMC) method. Simulation studies are conducted in order to evaluate the performances of the proposed estimators under the considered loss functions. Finally, analysis of a real data set is presented for illustrative purposes. Full Article
un Time series of count data: A review, empirical comparisons and data analysis By projecteuclid.org Published On :: Mon, 26 Aug 2019 04:00 EDT Glaura C. Franco, Helio S. Migon, Marcos O. Prates. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 4, 756--781.Abstract: Observation and parameter driven models are commonly used in the literature to analyse time series of counts. In this paper, we study the characteristics of a variety of models and point out the main differences and similarities among these procedures, concerning parameter estimation, model fitting and forecasting. Alternatively to the literature, all inference was performed under the Bayesian paradigm. The models are fitted with a latent AR($p$) process in the mean, which accounts for autocorrelation in the data. An extensive simulation study shows that the estimates for the covariate parameters are remarkably similar across the different models. However, estimates for autoregressive coefficients and forecasts of future values depend heavily on the underlying process which generates the data. A real data set of bankruptcy in the United States is also analysed. Full Article
un Unions of random walk and percolation on infinite graphs By projecteuclid.org Published On :: Mon, 10 Jun 2019 04:04 EDT Kazuki Okamura. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 3, 586--637.Abstract: We consider a random object that is associated with both random walks and random media, specifically, the superposition of a configuration of subcritical Bernoulli percolation on an infinite connected graph and the trace of the simple random walk on the same graph. We investigate asymptotics for the number of vertices of the enlargement of the trace of the walk until a fixed time, when the time tends to infinity. This process is more highly self-interacting than the range of random walk, which yields difficulties. We show a law of large numbers on vertex-transitive transient graphs. We compare the process on a vertex-transitive graph with the process on a finitely modified graph of the original vertex-transitive graph and show their behaviors are similar. We show that the process fluctuates almost surely on a certain non-vertex-transitive graph. On the two-dimensional integer lattice, by investigating the size of the boundary of the trace, we give an estimate for variances of the process implying a law of large numbers. We give an example of a graph with unbounded degrees on which the process behaves in a singular manner. As by-products, some results for the range and the boundary, which will be of independent interest, are obtained. Full Article
un Fake uniformity in a shape inversion formula By projecteuclid.org Published On :: Mon, 10 Jun 2019 04:04 EDT Christian Rau. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 3, 549--557.Abstract: We revisit a shape inversion formula derived by Panaretos in the context of a particle density estimation problem with unknown rotation of the particle. A distribution is presented which imitates, or “fakes”, the uniformity or Haar distribution that is part of that formula. Full Article
un A Jackson network under general regime By projecteuclid.org Published On :: Mon, 10 Jun 2019 04:04 EDT Yair Y. Shaki. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 3, 532--548.Abstract: We consider a Jackson network in a general heavy traffic diffusion regime with the $alpha$-parametrization . We also assume that each customer may abandon the system while waiting. We show that in this regime the queue-length process converges to a multi-dimensional regulated Ornstein–Uhlenbeck process. Full Article
un Density for solutions to stochastic differential equations with unbounded drift By projecteuclid.org Published On :: Mon, 10 Jun 2019 04:04 EDT Christian Olivera, Ciprian Tudor. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 3, 520--531.Abstract: Via a special transform and by using the techniques of the Malliavin calculus, we analyze the density of the solution to a stochastic differential equation with unbounded drift. Full Article
un A temporal perspective on the rate of convergence in first-passage percolation under a moment condition By projecteuclid.org Published On :: Mon, 04 Mar 2019 04:00 EST Daniel Ahlberg. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 397--401.Abstract: We study the rate of convergence in the celebrated Shape Theorem in first-passage percolation, obtaining the precise asymptotic rate of decay for the probability of linear order deviations under a moment condition. Our results are presented from a temporal perspective and complement previous work by the same author, in which the rate of convergence was studied from the standard spatial perspective. Full Article
un Hierarchical modelling of power law processes for the analysis of repairable systems with different truncation times: An empirical Bayes approach By projecteuclid.org Published On :: Mon, 04 Mar 2019 04:00 EST Rodrigo Citton P. dos Reis, Enrico A. Colosimo, Gustavo L. Gilardoni. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 374--396.Abstract: In the data analysis from multiple repairable systems, it is usual to observe both different truncation times and heterogeneity among the systems. Among other reasons, the latter is caused by different manufacturing lines and maintenance teams of the systems. In this paper, a hierarchical model is proposed for the statistical analysis of multiple repairable systems under different truncation times. A reparameterization of the power law process is proposed in order to obtain a quasi-conjugate bayesian analysis. An empirical Bayes approach is used to estimate model hyperparameters. The uncertainty in the estimate of these quantities are corrected by using a parametric bootstrap approach. The results are illustrated in a real data set of failure times of power transformers from an electric company in Brazil. Full Article
un A new log-linear bimodal Birnbaum–Saunders regression model with application to survival data By projecteuclid.org Published On :: Mon, 04 Mar 2019 04:00 EST Francisco Cribari-Neto, Rodney V. Fonseca. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 329--355.Abstract: The log-linear Birnbaum–Saunders model has been widely used in empirical applications. We introduce an extension of this model based on a recently proposed version of the Birnbaum–Saunders distribution which is more flexible than the standard Birnbaum–Saunders law since its density may assume both unimodal and bimodal shapes. We show how to perform point estimation, interval estimation and hypothesis testing inferences on the parameters that index the regression model we propose. We also present a number of diagnostic tools, such as residual analysis, local influence, generalized leverage, generalized Cook’s distance and model misspecification tests. We investigate the usefulness of model selection criteria and the accuracy of prediction intervals for the proposed model. Results of Monte Carlo simulations are presented. Finally, we also present and discuss an empirical application. Full Article
un Failure rate of Birnbaum–Saunders distributions: Shape, change-point, estimation and robustness By projecteuclid.org Published On :: Mon, 04 Mar 2019 04:00 EST Emilia Athayde, Assis Azevedo, Michelli Barros, Víctor Leiva. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 301--328.Abstract: The Birnbaum–Saunders (BS) distribution has been largely studied and applied. A random variable with BS distribution is a transformation of another random variable with standard normal distribution. Generalized BS distributions are obtained when the normally distributed random variable is replaced by another symmetrically distributed random variable. This allows us to obtain a wide class of positively skewed models with lighter and heavier tails than the BS model. Its failure rate admits several shapes, including the unimodal case, with its change-point being able to be used for different purposes. For example, to establish the reduction in a dose, and then in the cost of the medical treatment. We analyze the failure rates of generalized BS distributions obtained by the logistic, normal and Student-t distributions, considering their shape and change-point, estimating them, evaluating their robustness, assessing their performance by simulations, and applying the results to real data from different areas. Full Article
un A brief review of optimal scaling of the main MCMC approaches and optimal scaling of additive TMCMC under non-regular cases By projecteuclid.org Published On :: Mon, 04 Mar 2019 04:00 EST Kushal K. Dey, Sourabh Bhattacharya. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 2, 222--266.Abstract: Transformation based Markov Chain Monte Carlo (TMCMC) was proposed by Dutta and Bhattacharya ( Statistical Methodology 16 (2014) 100–116) as an efficient alternative to the Metropolis–Hastings algorithm, especially in high dimensions. The main advantage of this algorithm is that it simultaneously updates all components of a high dimensional parameter using appropriate move types defined by deterministic transformation of a single random variable. This results in reduction in time complexity at each step of the chain and enhances the acceptance rate. In this paper, we first provide a brief review of the optimal scaling theory for various existing MCMC approaches, comparing and contrasting them with the corresponding TMCMC approaches.The optimal scaling of the simplest form of TMCMC, namely additive TMCMC , has been studied extensively for the Gaussian proposal density in Dey and Bhattacharya (2017a). Here, we discuss diffusion-based optimal scaling behavior of additive TMCMC for non-Gaussian proposal densities—in particular, uniform, Student’s $t$ and Cauchy proposals. Although we could not formally prove our diffusion result for the Cauchy proposal, simulation based results lead us to conjecture that at least the recipe for obtaining general optimal scaling and optimal acceptance rate holds for the Cauchy case as well. We also consider diffusion based optimal scaling of TMCMC when the target density is discontinuous. Such non-regular situations have been studied in the case of Random Walk Metropolis Hastings (RWMH) algorithm by Neal and Roberts ( Methodology and Computing in Applied Probability 13 (2011) 583–601) using expected squared jumping distance (ESJD), but the diffusion theory based scaling has not been considered. We compare our diffusion based optimally scaled TMCMC approach with the ESJD based optimally scaled RWM with simulation studies involving several target distributions and proposal distributions including the challenging Cauchy proposal case, showing that additive TMCMC outperforms RWMH in almost all cases considered. Full Article
un The equivalence of dynamic and static asset allocations under the uncertainty caused by Poisson processes By projecteuclid.org Published On :: Mon, 14 Jan 2019 04:01 EST Yong-Chao Zhang, Na Zhang. Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 1, 184--191.Abstract: We investigate the equivalence of dynamic and static asset allocations in the case where the price process of a risky asset is driven by a Poisson process. Under some mild conditions, we obtain a necessary and sufficient condition for the equivalence of dynamic and static asset allocations. In addition, we provide a simple sufficient condition for the equivalence. Full Article
un Unlikeness is us : fourteen from the Exeter book By dal.novanet.ca Published On :: Fri, 1 May 2020 19:34:09 -0300 Author: Exeter book. Selections. EnglishCallnumber: PS 8631 A8489 E94 2018ISBN: 9781554471751 (softcover) Full Article
un Odysseus asleep : uncollected sequences, 1994-2019 By dal.novanet.ca Published On :: Fri, 1 May 2020 19:34:09 -0300 Author: Sanger, Peter, 1943- author.Callnumber: PS 8587 A372 O44 2019ISBN: 9781554472048 Full Article
un Reclaiming indigenous governance : reflections and insights from Australia, Canada, New Zealand, and the United States By dal.novanet.ca Published On :: Fri, 1 May 2020 19:34:09 -0300 Callnumber: K 3247 R43 2019ISBN: 9780816539970 (paperback) Full Article