mi

The wilderness of mind : sacred plants in cross-cultural perspective / Marlene Dobkin De Rios.

Beverly Hills : Sage Publications, 1976.




mi

Methadone substitution therapy : policies and practices / edited by Hamid Ghodse, Carmel Clancy, Adenekan Oyefeso.

London : European Collaborating Centres in Addiction Studies, 1998.




mi

Evaluation of the 'progress' pilot projects "from recovery into work" / by Stephen Burniston, Jo Cutter, Neil Shaw, Michael Dodd.

York : York Consulting, 2001.




mi

The university chemical dependency project : final report : November 1 1986 / Steven A. Bloch, Steven Ungerleider.

[Indiana] : Integrated Research Services, Inc., 1986.




mi

Illuminated address presented to Andrew Lynch, 1925




mi

Ferguson family papers, 1885-1993




mi

David Milliss further papers, 1940s-2010




mi

Top three Mikayla Pivec moments: Pivec's OSU rebounding record highlights her impressive career

All-Pac-12 talent Mikayla Pivec's career in Corvallis has been memorable to say the least. While it's difficult to choose just three, her top moments include a career-high 19 rebounds against Washington, a buzzer-beating layup against ASU, and breaking Ruth Hamblin's Oregon State rebounding record this year against Stanford.




mi

Oregon's Sabrina Ionescu takes home Naismith Trophy Player of the Year honor

Sabrina Ionescu is the Naismith Trophy Player of the Year, concluding her illustrious Oregon career with one of the major postseason women's basketball awards. As the only player in college basketball history with 2,000 career points (2,562), 1,000 assists (1,091) and 1,000 rebounds (1,040) and the NCAA all-time leader with 26 triple-doubles, Ionescu has continued to rack up player of the year honors for her remarkable senior season.




mi

Oregon's Ionescu wins women's Naismith Player of the Year

Already named The Associated Press women's player of the year, Ionescu was awarded the Naismith Trophy for the most outstanding women's basketball player on Friday. Ionescu, who won AP All-American honors three times, shattered the NCAA career triple-double mark with 26 and became the first player in college history to have 2,000 points, 1,000 rebounds and 1,000 assists. Ionescu averaged 17.5 points, 9.1 assists and 8.6 rebounds with eight triple-doubles as a senior this season.




mi

Texas women's basketball coach Karen Aston dismissed

AUSTIN, Texas (AP) -- Texas dismissed women's basketball coach Karen Aston on Friday, ending an eight-year stint that included four straight trips to the NCAA Tournament Sweet 16 from 2015-2018.




mi

Texas hires Schaefer from Mississippi State

Texas moved quickly to hire a new women's basketball coach, luring Vic Schaefer away from powerhouse Mississippi State on Sunday. Texas athletic director Chris Del Conte announced the move by tweeting a picture of himself with Schaefer and his family holding up the “Hook'em Horns” hand signal. The move comes just two days after Texas dismissed eight-year coach Karen Aston, who had only one losing season in her tenure and had led the Longhorns to the Sweet 16 or farther four times.




mi

Sydney Wiese, recovering from coronavirus, continually talking with friends and family: 'Our world is uniting'

Hear how former Oregon State guard and current member of the WNBA's LA Sparks Sydney Wiese is recovering from a COVID-19 diagnosis, seeing friends and family show support and love during a trying time.




mi

Oregon's Sabrina Ionescu, Ruthy Hebard, Satou Sabally share meaning of Naismith Starting 5 honor

Pac-12 Networks' Ashley Adamson speaks with Oregon stars Sabrina Ionescu, Ruthy Hebard and Satou Sabally to hear how special their recent Naismith Starting 5 honor was, as the Ducks comprise three of the nation's top five players. Ionescu (point guard), Sabally (small forward) and Hebard (power forward) led the Ducks to a 31-2 record in the 2019-20 season before it was cut short.




mi

WNBA Draft Profile: Do-it-all OSU talent Mikayla Pivec has her sights set on a pro breakout

Oregon State guard Mikayla Pivec is the epitome of a versatile player. Her 1,030 career rebounds were the most in school history, and she finished just one assist shy of becoming the first in OSU history to tally 1,500 points, 1,000 rebounds and 500 assists. She'll head to the WNBA looking to showcase her talents at the next level following the 2020 WNBA Draft.




mi

Mississippi State hires Nikki McCray-Penson as women's coach

Mississippi State hired former Old Dominion women’s basketball coach Nikki McCray-Penson to replace Vic Schaefer as the Bulldogs’ head coach. Athletic director John Cohen called McCray-Penson “a proven winner who will lead one of the best programs in the nation” on the department’s website. McCray-Penson, a former Tennessee star and Women’s Basketball Hall of Famer, said it’s been a dream to coach in the Southeastern Conference and she’s “grateful and blessed for this incredible honor and opportunity.”




mi

Charli Turner Thorne drops by 'Pac-12 Playlist' to surprise former player Dr. Michelle Tom

Pac-12 Networks' Ashley Adamson speaks with former Arizona State women's basketball player Michelle Tom, who is now a doctor treating COVID-19 patients in Winslow, Arizona.




mi

Dr. Michelle Tom shares journey from ASU women's hoops to treating COVID-19 patients

Pac-12 Networks' Ashley Adamson speaks with former Arizona State women's basketball player Michelle Tom, who is now a doctor treating COVID-19 patients Winslow Indian Health Care Center and Little Colorado Medical Center in Eastern Arizona.




mi

Chicago State women's basketball coach Misty Opat resigns

CHICAGO (AP) -- Chicago State women’s coach Misty Opat resigned Thursday after two seasons and a 3-55 record.




mi

NCAA women's hoops committee moves away from RPI to NET

The women's basketball committee will start using the NCAA Evaluation Tool instead of RPI to help evaluate teams for the tournament starting with the upcoming season. “It’s an exciting time for the game as we look to the future,” said Nina King, senior deputy athletics director and chief of staff at Duke, who chair the Division I Women’s Basketball Committee next season. “We felt after much analysis that the women’s basketball NET, which will be determined by who you played, where you played, how efficiently you played and the result of the game, is a more accurate tool and should be used by the committee going forward.”




mi

The limiting behavior of isotonic and convex regression estimators when the model is misspecified

Eunji Lim.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 2053--2097.

Abstract:
We study the asymptotic behavior of the least squares estimators when the model is possibly misspecified. We consider the setting where we wish to estimate an unknown function $f_{*}:(0,1)^{d} ightarrow mathbb{R}$ from observations $(X,Y),(X_{1},Y_{1}),cdots ,(X_{n},Y_{n})$; our estimator $hat{g}_{n}$ is the minimizer of $sum _{i=1}^{n}(Y_{i}-g(X_{i}))^{2}/n$ over $gin mathcal{G}$ for some set of functions $mathcal{G}$. We provide sufficient conditions on the metric entropy of $mathcal{G}$, under which $hat{g}_{n}$ converges to $g_{*}$ as $n ightarrow infty $, where $g_{*}$ is the minimizer of $|g-f_{*}| riangleq mathbb{E}(g(X)-f_{*}(X))^{2}$ over $gin mathcal{G}$. As corollaries of our theorem, we establish $|hat{g}_{n}-g_{*}| ightarrow 0$ as $n ightarrow infty $ when $mathcal{G}$ is the set of monotone functions or the set of convex functions. We also make a connection between the convergence rate of $|hat{g}_{n}-g_{*}|$ and the metric entropy of $mathcal{G}$. As special cases of our finding, we compute the convergence rate of $|hat{g}_{n}-g_{*}|^{2}$ when $mathcal{G}$ is the set of bounded monotone functions or the set of bounded convex functions.




mi

Statistical convergence of the EM algorithm on Gaussian mixture models

Ruofei Zhao, Yuanzhi Li, Yuekai Sun.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 632--660.

Abstract:
We study the convergence behavior of the Expectation Maximization (EM) algorithm on Gaussian mixture models with an arbitrary number of mixture components and mixing weights. We show that as long as the means of the components are separated by at least $Omega (sqrt{min {M,d}})$, where $M$ is the number of components and $d$ is the dimension, the EM algorithm converges locally to the global optimum of the log-likelihood. Further, we show that the convergence rate is linear and characterize the size of the basin of attraction to the global optimum.




mi

Adaptive estimation in the supremum norm for semiparametric mixtures of regressions

Heiko Werner, Hajo Holzmann, Pierre Vandekerkhove.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1816--1871.

Abstract:
We investigate a flexible two-component semiparametric mixture of regressions model, in which one of the conditional component distributions of the response given the covariate is unknown but assumed symmetric about a location parameter, while the other is specified up to a scale parameter. The location and scale parameters together with the proportion are allowed to depend nonparametrically on covariates. After settling identifiability, we provide local M-estimators for these parameters which converge in the sup-norm at the optimal rates over Hölder-smoothness classes. We also introduce an adaptive version of the estimators based on the Lepski-method. Sup-norm bounds show that the local M-estimator properly estimates the functions globally, and are the first step in the construction of useful inferential tools such as confidence bands. In our analysis we develop general results about rates of convergence in the sup-norm as well as adaptive estimation of local M-estimators which might be of some independent interest, and which can also be applied in various other settings. We investigate the finite-sample behaviour of our method in a simulation study, and give an illustration to a real data set from bioinformatics.




mi

Non-parametric adaptive estimation of order 1 Sobol indices in stochastic models, with an application to Epidemiology

Gwenaëlle Castellan, Anthony Cousien, Viet Chi Tran.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 50--81.

Abstract:
Global sensitivity analysis is a set of methods aiming at quantifying the contribution of an uncertain input parameter of the model (or combination of parameters) on the variability of the response. We consider here the estimation of the Sobol indices of order 1 which are commonly-used indicators based on a decomposition of the output’s variance. In a deterministic framework, when the same inputs always give the same outputs, these indices are usually estimated by replicated simulations of the model. In a stochastic framework, when the response given a set of input parameters is not unique due to randomness in the model, metamodels are often used to approximate the mean and dispersion of the response by deterministic functions. We propose a new non-parametric estimator without the need of defining a metamodel to estimate the Sobol indices of order 1. The estimator is based on warped wavelets and is adaptive in the regularity of the model. The convergence of the mean square error to zero, when the number of simulations of the model tend to infinity, is computed and an elbow effect is shown, depending on the regularity of the model. Applications in Epidemiology are carried to illustrate the use of non-parametric estimators.




mi

Beta-Binomial stick-breaking non-parametric prior

María F. Gil–Leyva, Ramsés H. Mena, Theodoros Nicoleris.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1479--1507.

Abstract:
A new class of nonparametric prior distributions, termed Beta-Binomial stick-breaking process, is proposed. By allowing the underlying length random variables to be dependent through a Beta marginals Markov chain, an appealing discrete random probability measure arises. The chain’s dependence parameter controls the ordering of the stick-breaking weights, and thus tunes the model’s label-switching ability. Also, by tuning this parameter, the resulting class contains the Dirichlet process and the Geometric process priors as particular cases, which is of interest for MCMC implementations. Some properties of the model are discussed and a density estimation algorithm is proposed and tested with simulated datasets.




mi

Estimation of a semiparametric transformation model: A novel approach based on least squares minimization

Benjamin Colling, Ingrid Van Keilegom.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 769--800.

Abstract:
Consider the following semiparametric transformation model $Lambda_{ heta }(Y)=m(X)+varepsilon $, where $X$ is a $d$-dimensional covariate, $Y$ is a univariate response variable and $varepsilon $ is an error term with zero mean and independent of $X$. We assume that $m$ is an unknown regression function and that ${Lambda _{ heta }: heta inTheta }$ is a parametric family of strictly increasing functions. Our goal is to develop two new estimators of the transformation parameter $ heta $. The main idea of these two estimators is to minimize, with respect to $ heta $, the $L_{2}$-distance between the transformation $Lambda _{ heta }$ and one of its fully nonparametric estimators. We consider in particular the nonparametric estimator based on the least-absolute deviation loss constructed in Colling and Van Keilegom (2019). We establish the consistency and the asymptotic normality of the two proposed estimators of $ heta $. We also carry out a simulation study to illustrate and compare the performance of our new parametric estimators to that of the profile likelihood estimator constructed in Linton et al. (2008).




mi

A Low Complexity Algorithm with O(√T) Regret and O(1) Constraint Violations for Online Convex Optimization with Long Term Constraints

This paper considers online convex optimization over a complicated constraint set, which typically consists of multiple functional constraints and a set constraint. The conventional online projection algorithm (Zinkevich, 2003) can be difficult to implement due to the potentially high computation complexity of the projection operation. In this paper, we relax the functional constraints by allowing them to be violated at each round but still requiring them to be satisfied in the long term. This type of relaxed online convex optimization (with long term constraints) was first considered in Mahdavi et al. (2012). That prior work proposes an algorithm to achieve $O(sqrt{T})$ regret and $O(T^{3/4})$ constraint violations for general problems and another algorithm to achieve an $O(T^{2/3})$ bound for both regret and constraint violations when the constraint set can be described by a finite number of linear constraints. A recent extension in Jenatton et al. (2016) can achieve $O(T^{max{ heta,1- heta}})$ regret and $O(T^{1- heta/2})$ constraint violations where $ hetain (0,1)$. The current paper proposes a new simple algorithm that yields improved performance in comparison to prior works. The new algorithm achieves an $O(sqrt{T})$ regret bound with $O(1)$ constraint violations.




mi

Lower Bounds for Parallel and Randomized Convex Optimization

We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation and in the high-dimensional regime. We show that the answer is negative for both deterministic and randomized algorithms applied to essentially any of the interesting geometries and nonsmooth, weakly-smooth, or smooth objective functions. In particular, we show that it is not possible to obtain a polylogarithmic (in the sequential complexity of the problem) number of parallel rounds with a polynomial (in the dimension) number of queries per round. In the majority of these settings and when the dimension of the space is polynomial in the inverse target accuracy, our lower bounds match the oracle complexity of sequential convex optimization, up to at most a logarithmic factor in the dimension, which makes them (nearly) tight. Another conceptual contribution of our work is in providing a general and streamlined framework for proving lower bounds in the setting of parallel convex optimization. Prior to our work, lower bounds for parallel convex optimization algorithms were only known in a small fraction of the settings considered in this paper, mainly applying to Euclidean ($ell_2$) and $ell_infty$ spaces.




mi

DESlib: A Dynamic ensemble selection library in Python

DESlib is an open-source python library providing the implementation of several dynamic selection techniques. The library is divided into three modules: (i) dcs, containing the implementation of dynamic classifier selection methods (DCS); (ii) des, containing the implementation of dynamic ensemble selection methods (DES); (iii) static, with the implementation of static ensemble techniques. The library is fully documented (documentation available online on Read the Docs), has a high test coverage (codecov.io) and is part of the scikit-learn-contrib supported projects. Documentation, code and examples can be found on its GitHub page: https://github.com/scikit-learn-contrib/DESlib.




mi

Weighted Message Passing and Minimum Energy Flow for Heterogeneous Stochastic Block Models with Side Information

We study the misclassification error for community detection in general heterogeneous stochastic block models (SBM) with noisy or partial label information. We establish a connection between the misclassification rate and the notion of minimum energy on the local neighborhood of the SBM. We develop an optimally weighted message passing algorithm to reconstruct labels for SBM based on the minimum energy flow and the eigenvectors of a certain Markov transition matrix. The general SBM considered in this paper allows for unequal-size communities, degree heterogeneity, and different connection probabilities among blocks. We focus on how to optimally weigh the message passing to improve misclassification.




mi

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. Our main theoretical result provides an explicit bound on the sample or evaluation complexity: we show that these methods are guaranteed to converge to within any pre-specified tolerance of the optimal policy with a number of zero-order evaluations that is an explicit polynomial of the error tolerance, dimension, and curvature properties of the problem. Our analysis reveals some interesting differences between the settings of additive driving noise and random initialization, as well as the settings of one-point and two-point reward feedback. Our theory is corroborated by simulations of derivative-free methods in application to these systems. Along the way, we derive convergence rates for stochastic zero-order optimization algorithms when applied to a certain class of non-convex problems.




mi

On the consistency of graph-based Bayesian semi-supervised learning and the scalability of sampling algorithms

This paper considers a Bayesian approach to graph-based semi-supervised learning. We show that if the graph parameters are suitably scaled, the graph-posteriors converge to a continuum limit as the size of the unlabeled data set grows. This consistency result has profound algorithmic implications: we prove that when consistency holds, carefully designed Markov chain Monte Carlo algorithms have a uniform spectral gap, independent of the number of unlabeled inputs. Numerical experiments illustrate and complement the theory.




mi

On the Complexity Analysis of the Primal Solutions for the Accelerated Randomized Dual Coordinate Ascent

Dual first-order methods are essential techniques for large-scale constrained convex optimization. However, when recovering the primal solutions, we need $T(epsilon^{-2})$ iterations to achieve an $epsilon$-optimal primal solution when we apply an algorithm to the non-strongly convex dual problem with $T(epsilon^{-1})$ iterations to achieve an $epsilon$-optimal dual solution, where $T(x)$ can be $x$ or $sqrt{x}$. In this paper, we prove that the iteration complexity of the primal solutions and dual solutions have the same $Oleft(frac{1}{sqrt{epsilon}} ight)$ order of magnitude for the accelerated randomized dual coordinate ascent. When the dual function further satisfies the quadratic functional growth condition, by restarting the algorithm at any period, we establish the linear iteration complexity for both the primal solutions and dual solutions even if the condition number is unknown. When applied to the regularized empirical risk minimization problem, we prove the iteration complexity of $Oleft(nlog n+sqrt{frac{n}{epsilon}} ight)$ in both primal space and dual space, where $n$ is the number of samples. Our result takes out the $left(log frac{1}{epsilon} ight)$ factor compared with the methods based on smoothing/regularization or Catalyst reduction. As far as we know, this is the first time that the optimal $Oleft(sqrt{frac{n}{epsilon}} ight)$ iteration complexity in the primal space is established for the dual coordinate ascent based stochastic algorithms. We also establish the accelerated linear complexity for some problems with nonsmooth loss, e.g., the least absolute deviation and SVM.




mi

Dynamical Systems as Temporal Feature Spaces

Parametrised state space models in the form of recurrent networks are often used in machine learning to learn from data streams exhibiting temporal dependencies. To break the black box nature of such models it is important to understand the dynamical features of the input-driving time series that are formed in the state space. We propose a framework for rigorous analysis of such state representations in vanishing memory state space models such as echo state networks (ESN). In particular, we consider the state space a temporal feature space and the readout mapping from the state space a kernel machine operating in that feature space. We show that: (1) The usual ESN strategy of randomly generating input-to-state, as well as state coupling leads to shallow memory time series representations, corresponding to cross-correlation operator with fast exponentially decaying coefficients; (2) Imposing symmetry on dynamic coupling yields a constrained dynamic kernel matching the input time series with straightforward exponentially decaying motifs or exponentially decaying motifs of the highest frequency; (3) Simple ring (cycle) high-dimensional reservoir topology specified only through two free parameters can implement deep memory dynamic kernels with a rich variety of matching motifs. We quantify richness of feature representations imposed by dynamic kernels and demonstrate that for dynamic kernel associated with cycle reservoir topology, the kernel richness undergoes a phase transition close to the edge of stability.




mi

Exact Guarantees on the Absence of Spurious Local Minima for Non-negative Rank-1 Robust Principal Component Analysis

This work is concerned with the non-negative rank-1 robust principal component analysis (RPCA), where the goal is to recover the dominant non-negative principal components of a data matrix precisely, where a number of measurements could be grossly corrupted with sparse and arbitrary large noise. Most of the known techniques for solving the RPCA rely on convex relaxation methods by lifting the problem to a higher dimension, which significantly increase the number of variables. As an alternative, the well-known Burer-Monteiro approach can be used to cast the RPCA as a non-convex and non-smooth $ell_1$ optimization problem with a significantly smaller number of variables. In this work, we show that the low-dimensional formulation of the symmetric and asymmetric positive rank-1 RPCA based on the Burer-Monteiro approach has benign landscape, i.e., 1) it does not have any spurious local solution, 2) has a unique global solution, and 3) its unique global solution coincides with the true components. An implication of this result is that simple local search algorithms are guaranteed to achieve a zero global optimality gap when directly applied to the low-dimensional formulation. Furthermore, we provide strong deterministic and probabilistic guarantees for the exact recovery of the true principal components. In particular, it is shown that a constant fraction of the measurements could be grossly corrupted and yet they would not create any spurious local solution.




mi

Unique Sharp Local Minimum in L1-minimization Complete Dictionary Learning

We study the problem of globally recovering a dictionary from a set of signals via $ell_1$-minimization. We assume that the signals are generated as i.i.d. random linear combinations of the $K$ atoms from a complete reference dictionary $D^*in mathbb R^{K imes K}$, where the linear combination coefficients are from either a Bernoulli type model or exact sparse model. First, we obtain a necessary and sufficient norm condition for the reference dictionary $D^*$ to be a sharp local minimum of the expected $ell_1$ objective function. Our result substantially extends that of Wu and Yu (2015) and allows the combination coefficient to be non-negative. Secondly, we obtain an explicit bound on the region within which the objective value of the reference dictionary is minimal. Thirdly, we show that the reference dictionary is the unique sharp local minimum, thus establishing the first known global property of $ell_1$-minimization dictionary learning. Motivated by the theoretical results, we introduce a perturbation based test to determine whether a dictionary is a sharp local minimum of the objective function. In addition, we also propose a new dictionary learning algorithm based on Block Coordinate Descent, called DL-BCD, which is guaranteed to decrease the obective function monotonically. Simulation studies show that DL-BCD has competitive performance in terms of recovery rate compared to other state-of-the-art dictionary learning algorithms when the reference dictionary is generated from random Gaussian matrices.




mi

On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient Langevin Dynamics

Stochastic gradient Langevin dynamics (SGLD) is a fundamental algorithm in stochastic optimization. Recent work by Zhang et al. (2017) presents an analysis for the hitting time of SGLD for the first and second order stationary points. The proof in Zhang et al. (2017) is a two-stage procedure through bounding the Cheeger's constant, which is rather complicated and leads to loose bounds. In this paper, using intuitions from stochastic differential equations, we provide a direct analysis for the hitting times of SGLD to the first and second order stationary points. Our analysis is straightforward. It only relies on basic linear algebra and probability theory tools. Our direct analysis also leads to tighter bounds comparing to Zhang et al. (2017) and shows the explicit dependence of the hitting time on different factors, including dimensionality, smoothness, noise strength, and step size effects. Under suitable conditions, we show that the hitting time of SGLD to first-order stationary points can be dimension-independent. Moreover, we apply our analysis to study several important online estimation problems in machine learning, including linear regression, matrix factorization, and online PCA.




mi

Representation Learning for Dynamic Graphs: A Survey

Graphs arise naturally in many real-world applications including social networks, recommender systems, ontologies, biology, and computational finance. Traditionally, machine learning models for graphs have been mostly designed for static graphs. However, many applications involve evolving graphs. This introduces important challenges for learning and inference since nodes, attributes, and edges change over time. In this survey, we review the recent advances in representation learning for dynamic graphs, including dynamic knowledge graphs. We describe existing models from an encoder-decoder perspective, categorize these encoders and decoders based on the techniques they employ, and analyze the approaches in each category. We also review several prominent applications and widely used datasets and highlight directions for future research.




mi

Oriented first passage percolation in the mean field limit

Nicola Kistler, Adrien Schertzer, Marius A. Schmidt.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 2, 414--425.

Abstract:
The Poisson clumping heuristic has lead Aldous to conjecture the value of the oriented first passage percolation on the hypercube in the limit of large dimensions. Aldous’ conjecture has been rigorously confirmed by Fill and Pemantle ( Ann. Appl. Probab. 3 (1993) 593–629) by means of a variance reduction trick. We present here a streamlined and, we believe, more natural proof based on ideas emerged in the study of Derrida’s random energy models.




mi

A Bayesian sparse finite mixture model for clustering data from a heterogeneous population

Erlandson F. Saraiva, Adriano K. Suzuki, Luís A. Milan.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 2, 323--344.

Abstract:
In this paper, we introduce a Bayesian approach for clustering data using a sparse finite mixture model (SFMM). The SFMM is a finite mixture model with a large number of components $k$ previously fixed where many components can be empty. In this model, the number of components $k$ can be interpreted as the maximum number of distinct mixture components. Then, we explore the use of a prior distribution for the weights of the mixture model that take into account the possibility that the number of clusters $k_{mathbf{c}}$ (e.g., nonempty components) can be random and smaller than the number of components $k$ of the finite mixture model. In order to determine clusters we develop a MCMC algorithm denominated Split-Merge allocation sampler. In this algorithm, the split-merge strategy is data-driven and was inserted within the algorithm in order to increase the mixing of the Markov chain in relation to the number of clusters. The performance of the method is verified using simulated datasets and three real datasets. The first real data set is the benchmark galaxy data, while second and third are the publicly available data set on Enzyme and Acidity, respectively.




mi

Symmetrical and asymmetrical mixture autoregressive processes

Mohsen Maleki, Arezo Hajrajabi, Reinaldo B. Arellano-Valle.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 2, 273--290.

Abstract:
In this paper, we study the finite mixtures of autoregressive processes assuming that the distribution of innovations (errors) belongs to the class of scale mixture of skew-normal (SMSN) distributions. The SMSN distributions allow a simultaneous modeling of the existence of outliers, heavy tails and asymmetries in the distribution of innovations. Therefore, a statistical methodology based on the SMSN family allows us to use a robust modeling on some non-linear time series with great flexibility, to accommodate skewness, heavy tails and heterogeneity simultaneously. The existence of convenient hierarchical representations of the SMSN distributions facilitates also the implementation of an ECME-type of algorithm to perform the likelihood inference in the considered model. Simulation studies and the application to a real data set are finally presented to illustrate the usefulness of the proposed model.




mi

Random environment binomial thinning integer-valued autoregressive process with Poisson or geometric marginal

Zhengwei Liu, Qi Li, Fukang Zhu.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 2, 251--272.

Abstract:
To predict time series of counts with small values and remarkable fluctuations, an available model is the $r$ states random environment process based on the negative binomial thinning operator and the geometric marginal. However, we argue that the aforementioned model may suffer from the following two drawbacks. First, under the condition of no prior information, the overdispersed property of the geometric distribution may cause the predictions fluctuate greatly. Second, because of the constraints on the model parameters, some estimated parameters are close to zero in real-data examples, which may not objectively reveal the correlation relationship. For the first drawback, an $r$ states random environment process based on the binomial thinning operator and the Poisson marginal is introduced. For the second drawback, we propose a generalized $r$ states random environment integer-valued autoregressive model based on the binomial thinning operator to model fluctuations of data. Yule–Walker and conditional maximum likelihood estimates are considered and their performances are assessed via simulation studies. Two real-data sets are conducted to illustrate the better performances of the proposed models compared with some existing models.




mi

Nonparametric discrimination of areal functional data

Ahmad Younso.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 112--126.

Abstract:
We consider a new nonparametric rule of classification, inspired from the classical moving window rule, that allows for the classification of spatially dependent functional data containing some completely missing curves. We investigate the consistency of this classifier under mild conditions. The practical use of the classifier will be illustrated through simulation studies.




mi

Effects of gene–environment and gene–gene interactions in case-control studies: A novel Bayesian semiparametric approach

Durba Bhattacharya, Sourabh Bhattacharya.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 71--89.

Abstract:
Present day bio-medical research is pointing towards the fact that cognizance of gene–environment interactions along with genetic interactions may help prevent or detain the onset of many complex diseases like cardiovascular disease, cancer, type2 diabetes, autism or asthma by adjustments to lifestyle. In this regard, we propose a Bayesian semiparametric model to detect not only the roles of genes and their interactions, but also the possible influence of environmental variables on the genes in case-control studies. Our model also accounts for the unknown number of genetic sub-populations via finite mixtures composed of Dirichlet processes. An effective parallel computing methodology, developed by us harnesses the power of parallel processing technology to increase the efficiencies of our conditionally independent Gibbs sampling and Transformation based MCMC (TMCMC) methods. Applications of our model and methods to simulation studies with biologically realistic genotype datasets and a real, case-control based genotype dataset on early onset of myocardial infarction (MI) have yielded quite interesting results beside providing some insights into the differential effect of gender on MI.




mi

Robust Bayesian model selection for heavy-tailed linear regression using finite mixtures

Flávio B. Gonçalves, Marcos O. Prates, Victor Hugo Lachos.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 51--70.

Abstract:
In this paper, we present a novel methodology to perform Bayesian model selection in linear models with heavy-tailed distributions. We consider a finite mixture of distributions to model a latent variable where each component of the mixture corresponds to one possible model within the symmetrical class of normal independent distributions. Naturally, the Gaussian model is one of the possibilities. This allows for a simultaneous analysis based on the posterior probability of each model. Inference is performed via Markov chain Monte Carlo—a Gibbs sampler with Metropolis–Hastings steps for a class of parameters. Simulated examples highlight the advantages of this approach compared to a segregated analysis based on arbitrarily chosen model selection criteria. Examples with real data are presented and an extension to censored linear regression is introduced and discussed.




mi

Bayesian modelling of the abilities in dichotomous IRT models via regression with missing values in the covariates

Flávio B. Gonçalves, Bárbara C. C. Dias.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 4, 782--800.

Abstract:
Educational assessment usually considers a contextual questionnaire to extract relevant information from the applicants. This may include items related to socio-economical profile as well as items to extract other characteristics potentially related to applicant’s performance in the test. A careful analysis of the questionnaires jointly with the test’s results may evidence important relations between profiles and test performance. The most coherent way to perform this task in a statistical context is to use the information from the questionnaire to help explain the variability of the abilities in a joint model-based approach. Nevertheless, the responses to the questionnaire typically present missing values which, in some cases, may be missing not at random. This paper proposes a statistical methodology to model the abilities in dichotomous IRT models using the information of the contextual questionnaires via linear regression. The proposed methodology models the missing data jointly with the all the observed data, which allows for the estimation of the former. The missing data modelling is flexible enough to allow the specification of missing not at random structures. Furthermore, even if those structures are not assumed a priori, they can be estimated from the posterior results when assuming missing (completely) at random structures a priori. Statistical inference is performed under the Bayesian paradigm via an efficient MCMC algorithm. Simulated and real examples are presented to investigate the efficiency and applicability of the proposed methodology.




mi

The limiting distribution of the Gibbs sampler for the intrinsic conditional autoregressive model

Marco A. R. Ferreira.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 4, 734--744.

Abstract:
We study the limiting behavior of the one-at-a-time Gibbs sampler for the intrinsic conditional autoregressive model with centering on the fly. The intrinsic conditional autoregressive model is widely used as a prior for random effects in hierarchical models for spatial modeling. This model is defined by full conditional distributions that imply an improper joint “density” with a multivariate Gaussian kernel and a singular precision matrix. To guarantee propriety of the posterior distribution, usually at the end of each iteration of the Gibbs sampler the random effects are centered to sum to zero in what is widely known as centering on the fly. While this works well in practice, this informal computational way to recenter the random effects obscures their implied prior distribution and prevents the development of formal Bayesian procedures. Here we show that the implied prior distribution, that is, the limiting distribution of the one-at-a-time Gibbs sampler for the intrinsic conditional autoregressive model with centering on the fly is a singular Gaussian distribution with a covariance matrix that is the Moore–Penrose inverse of the precision matrix. This result has important implications for the development of formal Bayesian procedures such as reference priors and Bayes-factor-based model selection for spatial models.




mi

Keeping the balance—Bridge sampling for marginal likelihood estimation in finite mixture, mixture of experts and Markov mixture models

Sylvia Frühwirth-Schnatter.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 4, 706--733.

Abstract:
Finite mixture models and their extensions to Markov mixture and mixture of experts models are very popular in analysing data of various kind. A challenge for these models is choosing the number of components based on marginal likelihoods. The present paper suggests two innovative, generic bridge sampling estimators of the marginal likelihood that are based on constructing balanced importance densities from the conditional densities arising during Gibbs sampling. The full permutation bridge sampling estimator is derived from considering all possible permutations of the mixture labels for a subset of these densities. For the double random permutation bridge sampling estimator, two levels of random permutations are applied, first to permute the labels of the MCMC draws and second to randomly permute the labels of the conditional densities arising during Gibbs sampling. Various applications show very good performance of these estimators in comparison to importance and to reciprocal importance sampling estimators derived from the same importance densities.




mi

A note on monotonicity of spatial epidemic models

Achillefs Tzioufas.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 3, 674--684.

Abstract:
The epidemic process on a graph is considered for which infectious contacts occur at rate which depends on whether a susceptible is infected for the first time or not. We show that the Vasershtein coupling extends if and only if secondary infections occur at rate which is greater than that of initial ones. Nonetheless we show that, with respect to the probability of occurrence of an infinite epidemic, the said proviso may be dropped regarding the totally asymmetric process in one dimension, thus settling in the affirmative this special case of the conjecture for arbitrary graphs due to [ Ann. Appl. Probab. 13 (2003) 669–690].




mi

Fake uniformity in a shape inversion formula

Christian Rau.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 3, 549--557.

Abstract:
We revisit a shape inversion formula derived by Panaretos in the context of a particle density estimation problem with unknown rotation of the particle. A distribution is presented which imitates, or “fakes”, the uniformity or Haar distribution that is part of that formula.