x

Convergence and inference for mixed Poisson random sums. (arXiv:2005.03187v1 [math.PR])

In this paper we obtain the limit distribution for partial sums with a random number of terms following a class of mixed Poisson distributions. The resulting weak limit is a mixing between a normal distribution and an exponential family, which we call by normal exponential family (NEF) laws. A new stability concept is introduced and a relationship between {alpha}-stable distributions and NEF laws is established. We propose estimation of the parameters of the NEF models through the method of moments and also by the maximum likelihood method, which is performed via an Expectation-Maximization algorithm. Monte Carlo simulation studies are addressed to check the performance of the proposed estimators and an empirical illustration on financial market is presented.




x

Model Reduction and Neural Networks for Parametric PDEs. (arXiv:2005.03180v1 [math.NA])

We develop a general framework for data-driven approximation of input-output maps between infinite-dimensional spaces. The proposed approach is motivated by the recent successes of neural networks and deep learning, in combination with ideas from model reduction. This combination results in a neural network approximation which, in principle, is defined on infinite-dimensional spaces and, in practice, is robust to the dimension of finite-dimensional approximations of these spaces required for computation. For a class of input-output maps, and suitably chosen probability measures on the inputs, we prove convergence of the proposed approximation methodology. Numerically we demonstrate the effectiveness of the method on a class of parametric elliptic PDE problems, showing convergence and robustness of the approximation scheme with respect to the size of the discretization, and compare our method with existing algorithms from the literature.




x

MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation. (arXiv:2005.03161v1 [stat.ML])

Model Stealing (MS) attacks allow an adversary with black-box access to a Machine Learning model to replicate its functionality, compromising the confidentiality of the model. Such attacks train a clone model by using the predictions of the target model for different inputs. The effectiveness of such attacks relies heavily on the availability of data necessary to query the target model. Existing attacks either assume partial access to the dataset of the target model or availability of an alternate dataset with semantic similarities.

This paper proposes MAZE -- a data-free model stealing attack using zeroth-order gradient estimation. In contrast to prior works, MAZE does not require any data and instead creates synthetic data using a generative model. Inspired by recent works in data-free Knowledge Distillation (KD), we train the generative model using a disagreement objective to produce inputs that maximize disagreement between the clone and the target model. However, unlike the white-box setting of KD, where the gradient information is available, training a generator for model stealing requires performing black-box optimization, as it involves accessing the target model under attack. MAZE relies on zeroth-order gradient estimation to perform this optimization and enables a highly accurate MS attack.

Our evaluation with four datasets shows that MAZE provides a normalized clone accuracy in the range of 0.91x to 0.99x, and outperforms even the recent attacks that rely on partial data (JBDA, clone accuracy 0.13x to 0.69x) and surrogate data (KnockoffNets, clone accuracy 0.52x to 0.97x). We also study an extension of MAZE in the partial-data setting and develop MAZE-PD, which generates synthetic data closer to the target distribution. MAZE-PD further improves the clone accuracy (0.97x to 1.0x) and reduces the query required for the attack by 2x-24x.




x

On the Optimality of Randomization in Experimental Design: How to Randomize for Minimax Variance and Design-Based Inference. (arXiv:2005.03151v1 [stat.ME])

I study the minimax-optimal design for a two-arm controlled experiment where conditional mean outcomes may vary in a given set. When this set is permutation symmetric, the optimal design is complete randomization, and using a single partition (i.e., the design that only randomizes the treatment labels for each side of the partition) has minimax risk larger by a factor of $n-1$. More generally, the optimal design is shown to be the mixed-strategy optimal design (MSOD) of Kallus (2018). Notably, even when the set of conditional mean outcomes has structure (i.e., is not permutation symmetric), being minimax-optimal for variance still requires randomization beyond a single partition. Nonetheless, since this targets precision, it may still not ensure sufficient uniformity in randomization to enable randomization (i.e., design-based) inference by Fisher's exact test to appropriately detect violations of null. I therefore propose the inference-constrained MSOD, which is minimax-optimal among all designs subject to such uniformity constraints. On the way, I discuss Johansson et al. (2020) who recently compared rerandomization of Morgan and Rubin (2012) and the pure-strategy optimal design (PSOD) of Kallus (2018). I point out some errors therein and set straight that randomization is minimax-optimal and that the "no free lunch" theorem and example in Kallus (2018) are correct.




x

Towards Frequency-Based Explanation for Robust CNN. (arXiv:2005.03141v1 [cs.LG])

Current explanation techniques towards a transparent Convolutional Neural Network (CNN) mainly focuses on building connections between the human-understandable input features with models' prediction, overlooking an alternative representation of the input, the frequency components decomposition. In this work, we present an analysis of the connection between the distribution of frequency components in the input dataset and the reasoning process the model learns from the data. We further provide quantification analysis about the contribution of different frequency components toward the model's prediction. We show that the vulnerability of the model against tiny distortions is a result of the model is relying on the high-frequency features, the target features of the adversarial (black and white-box) attackers, to make the prediction. We further show that if the model develops stronger association between the low-frequency component with true labels, the model is more robust, which is the explanation of why adversarially trained models are more robust against tiny distortions.




x

Joint Multi-Dimensional Model for Global and Time-Series Annotations. (arXiv:2005.03117v1 [cs.LG])

Crowdsourcing is a popular approach to collect annotations for unlabeled data instances. It involves collecting a large number of annotations from several, often naive untrained annotators for each data instance which are then combined to estimate the ground truth. Further, annotations for constructs such as affect are often multi-dimensional with annotators rating multiple dimensions, such as valence and arousal, for each instance. Most annotation fusion schemes however ignore this aspect and model each dimension separately. In this work we address this by proposing a generative model for multi-dimensional annotation fusion, which models the dimensions jointly leading to more accurate ground truth estimates. The model we propose is applicable to both global and time series annotation fusion problems and treats the ground truth as a latent variable distorted by the annotators. The model parameters are estimated using the Expectation-Maximization algorithm and we evaluate its performance using synthetic data and real emotion corpora as well as on an artificial task with human annotations




x

A comparison of group testing architectures for COVID-19 testing. (arXiv:2005.03051v1 [stat.ME])

An important component of every country's COVID-19 response is fast and efficient testing -- to identify and isolate cases, as well as for early detection of local hotspots. For many countries, producing a sufficient number of tests has been a serious limiting factor in their efforts to control COVID-19 infections. Group testing is a well-established mathematical tool, which can provide a serious and rapid improvement to this situation. In this note, we compare several well-established group testing schemes in the context of qPCR testing for COVID-19. We include example calculations, where we indicate which testing architectures yield the greatest efficiency gains in various settings. We find that for identification of individuals with COVID-19, array testing is usually the best choice, while for estimation of COVID-19 prevalence rates in the total population, Gibbs-Gower testing usually provides the most accurate estimates given a fixed and relatively small number of tests. This note is intended as a helpful handbook for labs implementing group testing methods.




x

Adaptive Invariance for Molecule Property Prediction. (arXiv:2005.03004v1 [q-bio.QM])

Effective property prediction methods can help accelerate the search for COVID-19 antivirals either through accurate in-silico screens or by effectively guiding on-going at-scale experimental efforts. However, existing prediction tools have limited ability to accommodate scarce or fragmented training data currently available. In this paper, we introduce a novel approach to learn predictors that can generalize or extrapolate beyond the heterogeneous data. Our method builds on and extends recently proposed invariant risk minimization, adaptively forcing the predictor to avoid nuisance variation. We achieve this by continually exercising and manipulating latent representations of molecules to highlight undesirable variation to the predictor. To test the method we use a combination of three data sources: SARS-CoV-2 antiviral screening data, molecular fragments that bind to SARS-CoV-2 main protease and large screening data for SARS-CoV-1. Our predictor outperforms state-of-the-art transfer learning methods by significant margin. We also report the top 20 predictions of our model on Broad drug repurposing hub.





x

Flexible Imputation of Missing Data (2nd Edition)




x

mgm: Estimating Time-Varying Mixed Graphical Models in High-Dimensional Data

We present the R package mgm for the estimation of k-order mixed graphical models (MGMs) and mixed vector autoregressive (mVAR) models in high-dimensional data. These are a useful extensions of graphical models for only one variable type, since data sets consisting of mixed types of variables (continuous, count, categorical) are ubiquitous. In addition, we allow to relax the stationarity assumption of both models by introducing time-varying versions of MGMs and mVAR models based on a kernel weighting approach. Time-varying models offer a rich description of temporally evolving systems and allow to identify external influences on the model structure such as the impact of interventions. We provide the background of all implemented methods and provide fully reproducible examples that illustrate how to use the package.




x

lslx: Semi-Confirmatory Structural Equation Modeling via Penalized Likelihood

Sparse estimation via penalized likelihood (PL) is now a popular approach to learn the associations among a large set of variables. This paper describes an R package called lslx that implements PL methods for semi-confirmatory structural equation modeling (SEM). In this semi-confirmatory approach, each model parameter can be specified as free/fixed for theory testing, or penalized for exploration. By incorporating either a L1 or minimax concave penalty, the sparsity pattern of the parameter matrix can be efficiently explored. Package lslx minimizes the PL criterion through a quasi-Newton method. The algorithm conducts line search and checks the first-order condition in each iteration to ensure the optimality of the obtained solution. A numerical comparison between competing packages shows that lslx can reliably find PL estimates with the least time. The current package also supports other advanced functionalities, including a two-stage method with auxiliary variables for missing data handling and a reparameterized multi-group SEM to explore population heterogeneity.




x

lmSubsets: Exact Variable-Subset Selection in Linear Regression for R

An R package for computing the all-subsets regression problem is presented. The proposed algorithms are based on computational strategies recently developed. A novel algorithm for the best-subset regression problem selects subset models based on a predetermined criterion. The package user can choose from exact and from approximation algorithms. The core of the package is written in C++ and provides an efficient implementation of all the underlying numerical computations. A case study and benchmark results illustrate the usage and the computational efficiency of the package.




x

Anxiety and compassion: emotions and the surgical encounter in early 19th-century Britain

The next seminar in the 2017–18 History of Pre-Modern Medicine seminar series takes place on Tuesday 7 November. Speaker: Dr Michael Brown (University of Roehampton), ‘Anxiety and compassion: emotions and the surgical encounter in early 19th-century Britain’ The historical study of the… Continue reading




x

Upper extremity injuries in young athletes

9783319566511 (electronic bk.)




x

Tissue engineering : principles, protocols, and practical exercises

9783030396985




x

The complexity of bird behaviour : a facet theory approach

Hackett, Paul, 1960- author
9783030121921 (electronic bk.)




x

Textbook of palliative care

9783319317380 (electronic bk.)




x

Psychoactive medicinal plants and fungal neurotoxins

Singh Saroya, Amritpal, author
9789811523137 (electronic bk.)




x

Pediatric pelvic and proximal femoral osteotomies

9783319780337 978-3-319-78033-7




x

Oxygen transport to tissue XLI

International Society on Oxygen Transport to Tissue. Annual Meeting (46th : 2018 : Seoul, Korea)
9783030344610 (electronic bk.)




x

Mosquitoes, communities, and public health in Texas

9780128145463 (electronic bk.)




x

Mobilities facing hydrometeorological extreme events.

9780081028827 (electronic bk.)




x

Mixed plantations of eucalyptus and leguminous trees : soil, microbiology and ecosystem services

9783030323653 (electronic bk.)




x

Milk proteins : from expression to food

9780128152522 (electronic bk.)




x

Maxillofacial cone beam computed tomography : principles, techniques and clinical applications

9783319620619 (electronic bk.)




x

LGBTQ cultures : what health care professionals need to know about sexual and gender diversity

Eliason, Michele J., author.
9781496394606 paperback




x

Insect sex pheromone research and beyond : from molecules to robots

9789811530821 (electronic bk.)




x

Handbook of flexible and stretchable electronics

9781315112794 (electronic bk.)




x

Handbook of Lower Extremity Reconstruction

9783030410353 978-3-030-41035-3




x

Green food processing techniques : preservation, transformation and extraction

9780128153536




x

Extra-coronal restorations : concepts and clinical application

9783319790930 (electronic bk.)




x

DeJong's the neurologic examination

Campbell, William W., Jr. (William Wesley), author.
9781496386168 (hardcover)




x

Computer security : ESORICS 2019 International Workshops, IOSec, MSTEC, and FINSEC, Luxembourg City, Luxembourg, September 26-27, 2019, Revised Selected Papers

European Symposium on Research in Computer Security (24th : 2019 : Luxembourg, Luxembourg)
9783030420512 (electronic bk.)




x

Comprehensive biochemistry for dentistry : textbook for dental students

Gupta, Anil, author.
9789811310355 (electronic bk.)




x

Complexity and approximation : in memory of Ker-I Ko

9783030416720 (electronic bk.)




x

Botulinum toxins, fillers and related substances

9783319168029 (electronic bk.)




x

Atlas of sexually transmitted diseases : clinical aspects and differential diagnosis

9783319574707 (electronic bk.)




x

Anxiety disorders : rethinking and understanding recent discoveries

9789813297050 (electronic bk.)









x

Almost sure uniqueness of a global minimum without convexity

Gregory Cox.

Source: The Annals of Statistics, Volume 48, Number 1, 584--606.

Abstract:
This paper establishes the argmin of a random objective function to be unique almost surely. This paper first formulates a general result that proves almost sure uniqueness without convexity of the objective function. The general result is then applied to a variety of applications in statistics. Four applications are discussed, including uniqueness of M-estimators, both classical likelihood and penalized likelihood estimators, and two applications of the argmin theorem, threshold regression and weak identification.




x

Concentration and consistency results for canonical and curved exponential-family models of random graphs

Michael Schweinberger, Jonathan Stewart.

Source: The Annals of Statistics, Volume 48, Number 1, 374--396.

Abstract:
Statistical inference for exponential-family models of random graphs with dependent edges is challenging. We stress the importance of additional structure and show that additional structure facilitates statistical inference. A simple example of a random graph with additional structure is a random graph with neighborhoods and local dependence within neighborhoods. We develop the first concentration and consistency results for maximum likelihood and $M$-estimators of a wide range of canonical and curved exponential-family models of random graphs with local dependence. All results are nonasymptotic and applicable to random graphs with finite populations of nodes, although asymptotic consistency results can be obtained as well. In addition, we show that additional structure can facilitate subgraph-to-graph estimation, and present concentration results for subgraph-to-graph estimators. As an application, we consider popular curved exponential-family models of random graphs, with local dependence induced by transitivity and parameter vectors whose dimensions depend on the number of nodes.




x

Sparse high-dimensional regression: Exact scalable algorithms and phase transitions

Dimitris Bertsimas, Bart Van Parys.

Source: The Annals of Statistics, Volume 48, Number 1, 300--323.

Abstract:
We present a novel binary convex reformulation of the sparse regression problem that constitutes a new duality perspective. We devise a new cutting plane method and provide evidence that it can solve to provable optimality the sparse regression problem for sample sizes $n$ and number of regressors $p$ in the 100,000s, that is, two orders of magnitude better than the current state of the art, in seconds. The ability to solve the problem for very high dimensions allows us to observe new phase transition phenomena. Contrary to traditional complexity theory which suggests that the difficulty of a problem increases with problem size, the sparse regression problem has the property that as the number of samples $n$ increases the problem becomes easier in that the solution recovers 100% of the true signal, and our approach solves the problem extremely fast (in fact faster than Lasso), while for small number of samples $n$, our approach takes a larger amount of time to solve the problem, but importantly the optimal solution provides a statistically more relevant regressor. We argue that our exact sparse regression approach presents a superior alternative over heuristic methods available at present.




x

Spectral and matrix factorization methods for consistent community detection in multi-layer networks

Subhadeep Paul, Yuguo Chen.

Source: The Annals of Statistics, Volume 48, Number 1, 230--250.

Abstract:
We consider the problem of estimating a consensus community structure by combining information from multiple layers of a multi-layer network using methods based on the spectral clustering or a low-rank matrix factorization. As a general theme, these “intermediate fusion” methods involve obtaining a low column rank matrix by optimizing an objective function and then using the columns of the matrix for clustering. However, the theoretical properties of these methods remain largely unexplored. In the absence of statistical guarantees on the objective functions, it is difficult to determine if the algorithms optimizing the objectives will return good community structures. We investigate the consistency properties of the global optimizer of some of these objective functions under the multi-layer stochastic blockmodel. For this purpose, we derive several new asymptotic results showing consistency of the intermediate fusion techniques along with the spectral clustering of mean adjacency matrix under a high dimensional setup, where the number of nodes, the number of layers and the number of communities of the multi-layer graph grow. Our numerical study shows that the intermediate fusion techniques outperform late fusion methods, namely spectral clustering on aggregate spectral kernel and module allegiance matrix in sparse networks, while they outperform the spectral clustering of mean adjacency matrix in multi-layer networks that contain layers with both homophilic and heterophilic communities.




x

Model assisted variable clustering: Minimax-optimal recovery and algorithms

Florentina Bunea, Christophe Giraud, Xi Luo, Martin Royer, Nicolas Verzelen.

Source: The Annals of Statistics, Volume 48, Number 1, 111--137.

Abstract:
The problem of variable clustering is that of estimating groups of similar components of a $p$-dimensional vector $X=(X_{1},ldots ,X_{p})$ from $n$ independent copies of $X$. There exists a large number of algorithms that return data-dependent groups of variables, but their interpretation is limited to the algorithm that produced them. An alternative is model-based clustering, in which one begins by defining population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of $G$-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a $G$-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the model-defined clusters exactly, and show that they are different for the two metrics. We therefore develop two algorithms, COD and PECOK, tailored to $G$-block covariance models, and study their minimax-optimality with respect to each metric. Of independent interest is the fact that the analysis of the PECOK algorithm, which is based on a corrected convex relaxation of the popular $K$-means algorithm, provides the first statistical analysis of such algorithms for variable clustering. Additionally, we compare our methods with another popular clustering method, spectral clustering. Extensive simulation studies, as well as our data analyses, confirm the applicability of our approach.




x

Rerandomization in $2^{K}$ factorial experiments

Xinran Li, Peng Ding, Donald B. Rubin.

Source: The Annals of Statistics, Volume 48, Number 1, 43--63.

Abstract:
With many pretreatment covariates and treatment factors, the classical factorial experiment often fails to balance covariates across multiple factorial effects simultaneously. Therefore, it is intuitive to restrict the randomization of the treatment factors to satisfy certain covariate balance criteria, possibly conforming to the tiers of factorial effects and covariates based on their relative importances. This is rerandomization in factorial experiments. We study the asymptotic properties of this experimental design under the randomization inference framework without imposing any distributional or modeling assumptions of the covariates and outcomes. We derive the joint asymptotic sampling distribution of the usual estimators of the factorial effects, and show that it is symmetric, unimodal and more “concentrated” at the true factorial effects under rerandomization than under the classical factorial experiment. We quantify this advantage of rerandomization using the notions of “central convex unimodality” and “peakedness” of the joint asymptotic sampling distribution. We also construct conservative large-sample confidence sets for the factorial effects.