Latest mp news

Neyman-Pearson classification: parametrics and sample size requirement

By
Published On :: 2020

The Neyman-Pearson (NP) paradigm in binary classification seeks classifiers that achieve a minimal type II error while enforcing the prioritized type I error controlled under some user-specified level $alpha$. This paradigm serves naturally in applications such as severe disease diagnosis and spam detection, where people have clear priorities among the two error types. Recently, Tong, Feng, and Li (2018) proposed a nonparametric umbrella algorithm that adapts all scoring-type classification methods (e.g., logistic regression, support vector machines, random forest) to respect the given type I error (i.e., conditional probability of classifying a class $0$ observation as class $1$ under the 0-1 coding) upper bound $alpha$ with high probability, without specific distributional assumptions on the features and the responses. Universal the umbrella algorithm is, it demands an explicit minimum sample size requirement on class $0$, which is often the more scarce class, such as in rare disease diagnosis applications. In this work, we employ the parametric linear discriminant analysis (LDA) model and propose a new parametric thresholding algorithm, which does not need the minimum sample size requirements on class $0$ observations and thus is suitable for small sample applications such as rare disease diagnosis. Leveraging both the existing nonparametric and the newly proposed parametric thresholding rules, we propose four LDA-based NP classifiers, for both low- and high-dimensional settings. On the theoretical front, we prove NP oracle inequalities for one proposed classifier, where the rate for excess type II error benefits from the explicit parametric model assumption. Furthermore, as NP classifiers involve a sample splitting step of class $0$ observations, we construct a new adaptive sample splitting scheme that can be applied universally to NP classifiers, and this adaptive strategy reduces the type II error of these classifiers. The proposed NP classifiers are implemented in the R package nproc.

Neyman-Pearson classification: parametrics and sample size requirement

Generalized probabilistic principal component analysis of correlated data

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

Distributed Feature Screening via Componentwise Debiasing

On the consistency of graph-based Bayesian semi-supervised learning and the scalability of sampling algorithms

Tensor Train Decomposition on TensorFlow (T3F)

Provably robust estimation of modulo 1 samples of a smooth function with applications to phase unwrapping

On the Complexity Analysis of the Primal Solutions for the Accelerated Randomized Dual Coordinate Ascent

Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent

Latent Simplex Position Model: High Dimensional Multi-view Clustering with Uncertainty Quantification

Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

Dynamical Systems as Temporal Feature Spaces

Ancestral Gumbel-Top-k Sampling for Sampling Without Replacement

Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions

Exact Guarantees on the Absence of Spurious Local Minima for Non-negative Rank-1 Robust Principal Component Analysis

Unique Sharp Local Minimum in L1-minimization Complete Dictionary Learning

Union of Low-Rank Tensor Spaces: Clustering and Completion

(1 + epsilon)-class Classification: an Anomaly Detection Method for Highly Imbalanced or Incomplete Data Sets

Q&A with Tara June Winch

Q&A with Adam Ferguson

Youth & Community Initiatives Funding available

Health & Active Living Challenge

Have your say on the Highway 404 Employment Corridor Secondary Plan

Reliability estimation in a multicomponent stress-strength model for Burr XII distribution under progressive censoring

Recent developments in complex and spatially correlated functional data

&#36;W^{1,p}&#36;-Solutions of the transport equation by stochastic perturbation

Application of weighted and unordered majorization orders in comparisons of parallel systems with exponentiated generalized gamma components

Simple step-stress models with a cure fraction

Time series of count data: A review, empirical comparisons and data analysis

The limiting distribution of the Gibbs sampler for the intrinsic conditional autoregressive model

Keeping the balance—Bridge sampling for marginal likelihood estimation in finite mixture, mixture of experts and Markov mixture models

Spatiotemporal point processes: regression, model specifications and future directions

Estimation of parameters in the &#36;operatorname{DDRCINAR}(p)&#36; model

A rank-based Cramér–von-Mises-type test for two samples

A temporal perspective on the rate of convergence in first-passage percolation under a moment condition

Hierarchical modelling of power law processes for the analysis of repairable systems with different truncation times: An empirical Bayes approach

Simple tail index estimation for dependent and heterogeneous data with missing values

Heavy metalloid music : the story of Simply Saucer

Can &#36;p&#36;-values be meaningfully interpreted without random sampling?

Estimating the size of a hidden finite set: Large-sample behavior of estimators

Pitfalls of significance testing and &#36;p&#36;-value variability: An econometrics perspective

A design-sensitive approach to fitting regression models with complex survey data

A comparison of spatial predictors when datasets could be very large

A survey of bootstrap methods in finite population sampling

A unified treatment for non-asymptotic and asymptotic approaches to minimax signal detection

Errata: A survey of Bayesian predictive methods for model assessment, selection and comparison

Analyzing complex functional brain networks: Fusing statistics and network science to understand the brain

A survey of Bayesian predictive methods for model assessment, selection and comparison

Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules

Holtermann and the A&A Photographic Company

Companies' 'Green' Efforts Include Products’ Material Content

Incomplete information can fuel misjudgment: study

Floyd & Associates unveils new website

Crossville’s Wood Impressions Collection

Lakeview Farms to Acquire noosa from Campbell Soup Company

Good Morning, News: City Council to Vote on Clean & Safe Contract, Vision Zero Gets an Audit, and Trump Taps Elon Musk to Lead DOGE (Do You Even Want to Know?)

'Apprehensive and fearful': Federal workers await a dismantling under Trump

What Trump's win means for electric vehicle manufacturers

Congressional leadership under a second Trump administration takes shape

Blue states prepare to fight Trump administration policies

Trump intends to nominate Florida Rep. Matt Gaetz as attorney general

Former heavywieght champ Mike Tyson to fight YouTuber-turned-boxer Jake Paul

A look at the potential impact of shutting down the Department of Education

Basic Black: Voting Matters in Black & White

Basic Black: Thomas Menino's Imprint on the "New Boston"

Subscribe To Our Newsletter

$W^{1,p}$-Solutions of the transport equation by stochastic perturbation

Estimation of parameters in the $operatorname{DDRCINAR}(p)$ model

Can $p$-values be meaningfully interpreted without random sampling?

Pitfalls of significance testing and $p$-value variability: An econometrics perspective