inference

Pliops to Highlight Efficient LLM Inference with XDP LightningAI at SC24

SAN JOSE, Calif., Nov. 13, 2024 — Addressing the critical issue of constrained power budgets, Pliops is enabling AI-powered businesses and hyperscalers to achieve impressive performance by optimizing power usage, reducing […]

The post Pliops to Highlight Efficient LLM Inference with XDP LightningAI at SC24 appeared first on HPCwire.




inference

Inference with Gemma using Dataflow and vLLM

vLLM's continuous batching and Dataflow's model manager optimizes LLM serving and simplifies the deployment process, delivering a powerful combination for developers to build high-performance LLM inference pipelines more efficiently.




inference

Modular Inference Trees for Expository Reports




inference

A multiregion community model for inference about geographic variation in species richness




inference

A differentiable simulation package for performing inference of synchrotron-radiation-based diagnostics

The direction of particle accelerator development is ever-increasing beam quality, currents and repetition rates. This poses a challenge to traditional diagnostics that directly intercept the beam due to the mutual destruction of both the beam and the diagnostic. An alternative approach is to infer beam parameters non-invasively from the synchrotron radiation emitted in bending magnets. However, inferring the beam distribution from a measured radiation pattern is a complex and computationally expensive task. To address this challenge we present SYRIPY (SYnchrotron Radiation In PYthon), a software package intended as a tool for performing inference of synchrotron-radiation-based diagnostics. SYRIPY has been developed using PyTorch, which makes it both differentiable and able to leverage the high performance of GPUs, two vital characteristics for performing statistical inference. The package consists of three modules: a particle tracker, Lienard–Wiechert solver and Fourier optics propagator, allowing start-to-end simulation of synchrotron radiation detection to be carried out. SYRIPY has been benchmarked against SRW, the prevalent numerical package in the field, showing good agreement and up to a 50× speed improvement. Finally, we have demonstrated how SYRIPY can be used to perform Bayesian inference of beam parameters using stochastic variational inference.




inference

Sivoo and FuriosaAI Implement Inference & Rendering Capabilities

Implemented across the globe, Sivoo and FuriosaAI will offer users video inference and LLM rendering capabilities




inference

UniFab's new inference engine is released, and the conversion speed is improved again!

Denoise AI, overall acceleration of Enlarger AI processing speed, and new features for the HDR upscaler AI module.




inference

NYT catches up to Statistical Modeling, Causal Inference, and Social Science

A colleague pointed to this news article, “Do People in ‘Blue Zones’ Actually Live Longer?”, and wrote that I might find it blog-worthy. I replied that, yeah, the topic is blog-worthy enough that it’s already appeared on the blog, with … Continue reading




inference

Two spans of the bridge of inference

This is Jessica. Larry Hedges relayed a quote to me recently that I thought others here might appreciate. It appears in an old Annals of Mathematical Statistics paper by Tukey and Cornfield: In almost any practical situation where analytical statistics … Continue reading





inference

microsoft/BitNet: Official inference framework for 1-bit LLMs




inference

Interpretation of Shotgun Proteomic Data: The Protein Inference Problem

Alexey I. Nesvizhskii
Oct 1, 2005; 4:1419-1440
Tutorial




inference

MLPerf Releases Latest Inference Results and New Storage Benchmark

MLCommons this week issued the results of its latest MLPerf Inference (v3.1) benchmark exercise. Nvidia was again the top performing accelerator, but Intel (Xeon CPU) and Habana (Gaudi1 and 2) […]

The post MLPerf Releases Latest Inference Results and New Storage Benchmark appeared first on HPCwire.




inference

GDDR7: The Ideal Memory Solution in AI Inference

The generative AI market is experiencing rapid growth, driven by the increasing parameter size of Large Language Models (LLMs). This growth is pushing the boundaries of performance requirements for training hardware within data centers. For an in-depth look at this, consider the insights provided in "HBM3E: All About Bandwidth". Once trained, these models are deployed across a diverse range of applications. They are transforming sectors such as finance, meteorology, image and voice recognition, healthcare, augmented reality, high-speed trading, and industrial, to name just a few.

The critical process that utilizes these trained models is called AI inference. Inference is the capability of processing real-time data through a trained model to swiftly and effectively generate predictions that yield actionable outcomes. While the AI market has primarily focused on the requirements of training infrastructure, there is an anticipated shift towards prioritizing inference as these models are deployed.

The computational power and memory bandwidth required for inference are significantly lower than those needed for training. Inference engines typically need between 300-700GB/s of memory bandwidth, compared to 1-3TB/s for training. Additionally, the cost of inference needs to be lower, as these systems will be widely deployed not only in data centers but also at the network's edge (e.g., 5G) and in end-user equipment like security cameras, cell phones, and automobiles.

When designing an AI inference engine, there are several memory options to consider, including DDR, LPDDR, GDDR, and HBM. The choice depends on the specific application, bandwidth, and cost requirements. DDR and LPDDR offer good memory density, HBM provides the highest bandwidth but requires 2.5D packaging, and GDDR offers high bandwidth using standard packaging and PCB technology.

The GDDR7 standard, announced by JEDEC in March of this year, features a data rate of up to 192GB/s per device, a chip density of 32Gb, and the latest data integrity features. The high data rate is achieved by using PAM3 (Pulse Amplitude Modulation) with 3 levels (+1, 0, -1) to transmit 3 bits over 2 cycles, whereas the current GDDR6 generation uses NRZ (non-return-to-zero) to transmit 2 bits over 2 cycles.

GDDR7 offers many advantages for AI Inference having the best balance of bandwidth and cost. For example, an AI Inference system requiring 500GB/s memory bandwidth will need only 4 GDDR7 DRAM running at 32Gbp/s (32 data bits x 32Gbp/s per pin = 1024Gb/s per DRAM). The same system would use 13 LPDDR5X PHYs running at 9.6Gbp/s, which is currently the highest data rate available (32 data bits x 9.6Gb/s = 307Gb/s per DRAM).

Cadence stands at the forefront of AI inference hardware support, being the first IP company to roll out GDDR7 PHYs capable of impressive speeds up to 36Gb/s across various process nodes. This milestone builds on Cadence's established leadership in GDDR6 PHY IP, which has been available since 2019. The company caters to a diverse client base spanning AI inference, graphics, automotive, and networking equipment.

While GDDR7 continues to utilize standard PCB board technology, the increased signal speeds seen in GDDR6 (20Gbp/s) and now GDDR7 (36Gb/s) calls for careful attention with the physical design to ensure optimized system performance. In addition to providing the PHY, Cadence also offers comprehensive PCB and package reference design, which are essential in helping customers achieve optimal signal and power integrity (SI/PI) for their systems.

Cadence is dedicated to ensuring customer success beyond just providing hardware. They provide expert support in SI/PI, collaborating closely with customers throughout the design process. This approach ensures that customers can benefit from Cadence's expertise in navigating the complexities of high-speed design and achieving optimal performance in their AI inference systems.

As the AI market continues to advance, Cadence remains at the forefront by offering a comprehensive memory IP portfolio tailored for every segment of this dynamic market. From DDR5 and HBM3E, which cater to the intensive demands of training in servers and high-performance computing (HPC), to LPDDR5X designed for low-end inference at the network edge and in consumer devices, Cadence's offerings cover a wide range of applications.

Looking to the future, Cadence is dedicated to innovating at the forefront of memory system performance, ensuring that the evolving needs of AI training and inference are met with the highest standards of excellence. Whether it's pushing the boundaries with GDDR7 or exploring new technologies, Cadence is dedicated to driving the AI revolution forward, one breakthrough at a time.

Learn more about Cadence GDDR7 PHY

Learn more about Cadence Simulation VIP for GDDR7.




inference

Interoceptive inference and prediction in food-related disorders [Special Section: Symposium Outlook]

The brain's capacity to predict and anticipate changes in internal and external environments is fundamental to initiating efficient adaptive responses, behaviors, and reflexes that minimize disruptions to physiology. In the context of feeding control, the brain predicts and anticipates responses to the consumption of dietary substances, thus driving adaptive behaviors in the form of food choices, physiological preparation for meals, and engagement of defensive mechanisms. Here, we provide an integrative perspective on the multisensory computation between exteroceptive and interoceptive cues that guides feeding strategy and may result in food-related disorders.




inference

Shift-Share Designs: Theory and Inference [electronic journal].

National Bureau of Economic Research




inference

Robust Bayesian Inference in Proxy SVARs [electronic journal].




inference

Public Expenditure and Private Firm Performance: Using Religious Denominations for Causal Inference [electronic journal].




inference

Partial Identification and Inference for Dynamic Models and Counterfactuals [electronic journal].

National Bureau of Economic Research




inference

Inference in Structural Vector Autoregressions When the Identifying Assumptions are Not Fully Believed: Re-evaluating the Role of Monetary Policy in Economic Fluctuations [electronic journal].

National Bureau of Economic Research




inference

FX Intervention to Stabilize or Manipulate the Exchange Rate? Inference from Profitability [electronic journal].

International Monetary Fund,




inference

Confidence maps: statistical inference of cryo-EM maps

The concept of statistical signal detection by controlling the false-discovery rate (FDR) to aid the atomic model interpretation of cryo-EM density maps is reviewed. The recommended usage of the FDR software tool is presented together with its successful integration into the CCP-EM suite.




inference

Confidence maps: statistical inference of cryo-EM maps

Confidence maps provide complementary information for interpreting cryo-EM densities as they indicate statistical significance with respect to background noise. They can be thresholded by specifying the expected false-discovery rate (FDR), and the displayed volume shows the parts of the map that have the corresponding level of significance. Here, the basic statistical concepts of confidence maps are reviewed and practical guidance is provided for their interpretation and usage inside the CCP-EM suite. Limitations of the approach are discussed and extensions towards other error criteria such as the family-wise error rate are presented. The observed map features can be rendered at a common isosurface threshold, which is particularly beneficial for the interpretation of weak and noisy densities. In the current article, a practical guide is provided to the recommended usage of confidence maps.




inference

A Concise Course in Statistical Inference: The Free eBook

Check out this freely available book, All of Statistics: A Concise Course in Statistical Inference, and learn the probability and statistics needed for success in data science.




inference

Come up with a logo for causal inference!

Stephen Cole, Jennifer Hill, Luke Keele, Ilya Shpitser, and Dylan Small write: We wanted to provide an update on our efforts to build the Society for Causal Inference (SCI). As you may recall, we are creating the SCI as a home for causal inference research that will increase support and knowledge sharing both within the […]




inference

Simple Bayesian analysis inference of coronavirus infection rate from the Stanford study in Santa Clara county

tl;dr: Their 95% interval for the infection rate, given the data available, is [0.7%, 1.8%]. My Bayesian interval is [0.3%, 2.4%]. Most of what makes my interval wider is the possibility that the specificity and sensitivity of the tests can vary across labs. To get a narrower interval, you’d need additional assumptions regarding the specificity […]




inference

Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. (arXiv:2005.03572v1 [cs.CV])

Deep learning-based object detection and instance segmentation have achieved unprecedented progress. In this paper, we propose Complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding box regression and Non-Maximum Suppression (NMS), leading to notable gains of average precision (AP) and average recall (AR), without the sacrifice of inference efficiency. In particular, we consider three geometric factors, i.e., overlap area, normalized central point distance and aspect ratio, which are crucial for measuring bounding box regression in object detection and instance segmentation. The three geometric factors are then incorporated into CIoU loss for better distinguishing difficult regression cases. The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted $ell_n$-norm loss and IoU-based loss. Furthermore, we propose Cluster-NMS, where NMS during inference is done by implicitly clustering detected boxes and usually requires less iterations. Cluster-NMS is very efficient due to its pure GPU implementation, , and geometric factors can be incorporated to improve both AP and AR. In the experiments, CIoU loss and Cluster-NMS have been applied to state-of-the-art instance segmentation (e.g., YOLACT), and object detection (e.g., YOLO v3, SSD and Faster R-CNN) models. Taking YOLACT on MS COCO as an example, our method achieves performance gains as +1.7 AP and +6.2 AR$_{100}$ for object detection, and +0.9 AP and +3.5 AR$_{100}$ for instance segmentation, with 27.1 FPS on one NVIDIA GTX 1080Ti GPU. All the source code and trained models are available at https://github.com/Zzh-tju/CIoU




inference

Strong replica symmetry in high-dimensional optimal Bayesian inference. (arXiv:2005.03115v1 [math.PR])

We consider generic optimal Bayesian inference, namely, models of signal reconstruction where the posterior distribution and all hyperparameters are known. Under a standard assumption on the concentration of the free energy, we show how replica symmetry in the strong sense of concentration of all multioverlaps can be established as a consequence of the Franz-de Sanctis identities; the identities themselves in the current setting are obtained via a novel perturbation of the prior distribution of the signal. Concentration of multioverlaps means that asymptotically the posterior distribution has a particularly simple structure encoded by a random probability measure (or, in the case of binary signal, a non-random probability measure). We believe that such strong control of the model should be key in the study of inference problems with underlying sparse graphical structure (error correcting codes, block models, etc) and, in particular, in the derivation of replica symmetric formulas for the free energy and mutual information in this context.




inference

Inference with Choice Functions Made Practical. (arXiv:2005.03098v1 [cs.AI])

We study how to infer new choices from previous choices in a conservative manner. To make such inferences, we use the theory of choice functions: a unifying mathematical framework for conservative decision making that allows one to impose axioms directly on the represented decisions. We here adopt the coherence axioms of De Bock and De Cooman (2019). We show how to naturally extend any given choice assessment to such a coherent choice function, whenever possible, and use this natural extension to make new choices. We present a practical algorithm to compute this natural extension and provide several methods that can be used to improve its scalability.




inference

Interpretation of Shotgun Proteomic Data: The Protein Inference Problem

Alexey I. Nesvizhskii
Oct 1, 2005; 4:1419-1440
Tutorial




inference

Robust summarization and inference in proteome-wide label-free quantification

Adriaan Sticker
Apr 22, 2020; 0:RA119.001624v1-mcp.RA119.001624
Research




inference

Robust summarization and inference in proteome-wide label-free quantification [Research]

Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis due to peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outperform summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialised end-user, and do not provide protein summaries, which are important for visualisation or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared to the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRobSum, which estimates MSqRob’s model parameters in a two-stage procedure circumventing the drawbacks of peptide-based workflows. MSqRobSum maintains MSqRob’s superior performance, while providing useful protein expression summaries for plotting and downstream analysis. Summarising peptide to protein intensities considerably reduces the computational complexity, the memory footprint and the model complexity, and makes it easier to disseminate DE inferred on protein summaries. Moreover, MSqRobSum provides a highly modular analysis framework, which provides researchers with full flexibility to develop data analysis workflows tailored towards their specific applications.




inference

Differential network inference via the fused D-trace loss with cross variables

Yichong Wu, Tiejun Li, Xiaoping Liu, Luonan Chen.

Source: Electronic Journal of Statistics, Volume 14, Number 1, 1269--1301.

Abstract:
Detecting the change of biological interaction networks is of great importance in biological and medical research. We proposed a simple loss function, named as CrossFDTL, to identify the network change or differential network by estimating the difference between two precision matrices under Gaussian assumption. The CrossFDTL is a natural fusion of the D-trace loss for the considered two networks by imposing the $ell _{1}$ penalty to the differential matrix to ensure sparsity. The key point of our method is to utilize the cross variables, which correspond to the sum and difference of two precision matrices instead of using their original forms. Moreover, we developed an efficient minimization algorithm for the proposed loss function and further rigorously proved its convergence. Numerical results showed that our method outperforms the existing methods in both accuracy and convergence speed for the simulated and real data.




inference

Expectation Propagation as a Way of Life: A Framework for Bayesian Inference on Partitioned Data

A common divide-and-conquer approach for Bayesian computation with big data is to partition the data, perform local inference for each piece separately, and combine the results to obtain a global posterior approximation. While being conceptually and computationally appealing, this method involves the problematic need to also split the prior for the local inferences; these weakened priors may not provide enough regularization for each separate computation, thus eliminating one of the key advantages of Bayesian methods. To resolve this dilemma while still retaining the generalizability of the underlying local inference method, we apply the idea of expectation propagation (EP) as a framework for distributed Bayesian inference. The central idea is to iteratively update approximations to the local likelihoods given the state of the other approximations and the prior. The present paper has two roles: we review the steps that are needed to keep EP algorithms numerically stable, and we suggest a general approach, inspired by EP, for approaching data partitioning problems in a way that achieves the computational benefits of parallelism while allowing each local update to make use of relevant information from the other sites. In addition, we demonstrate how the method can be applied in a hierarchical context to make use of partitioning of both data and parameters. The paper describes a general algorithmic framework, rather than a specific algorithm, and presents an example implementation for it.




inference

Switching Regression Models and Causal Inference in the Presence of Discrete Latent Variables

Given a response $Y$ and a vector $X = (X^1, dots, X^d)$ of $d$ predictors, we investigate the problem of inferring direct causes of $Y$ among the vector $X$. Models for $Y$ that use all of its causal covariates as predictors enjoy the property of being invariant across different environments or interventional settings. Given data from such environments, this property has been exploited for causal discovery. Here, we extend this inference principle to situations in which some (discrete-valued) direct causes of $ Y $ are unobserved. Such cases naturally give rise to switching regression models. We provide sufficient conditions for the existence, consistency and asymptotic normality of the MLE in linear switching regression models with Gaussian noise, and construct a test for the equality of such models. These results allow us to prove that the proposed causal discovery method obtains asymptotic false discovery control under mild conditions. We provide an algorithm, make available code, and test our method on simulated data. It is robust against model violations and outperforms state-of-the-art approaches. We further apply our method to a real data set, where we show that it does not only output causal predictors, but also a process-based clustering of data points, which could be of additional interest to practitioners.




inference

High-Dimensional Inference for Cluster-Based Graphical Models

Motivated by modern applications in which one constructs graphical models based on a very large number of features, this paper introduces a new class of cluster-based graphical models, in which variable clustering is applied as an initial step for reducing the dimension of the feature space. We employ model assisted clustering, in which the clusters contain features that are similar to the same unobserved latent variable. Two different cluster-based Gaussian graphical models are considered: the latent variable graph, corresponding to the graphical model associated with the unobserved latent variables, and the cluster-average graph, corresponding to the vector of features averaged over clusters. Our study reveals that likelihood based inference for the latent graph, not analyzed previously, is analytically intractable. Our main contribution is the development and analysis of alternative estimation and inference strategies, for the precision matrix of an unobservable latent vector Z. We replace the likelihood of the data by an appropriate class of empirical risk functions, that can be specialized to the latent graphical model and to the simpler, but under-analyzed, cluster-average graphical model. The estimators thus derived can be used for inference on the graph structure, for instance on edge strength or pattern recovery. Inference is based on the asymptotic limits of the entry-wise estimates of the precision matrices associated with the conditional independence graphs under consideration. While taking the uncertainty induced by the clustering step into account, we establish Berry-Esseen central limit theorems for the proposed estimators. It is noteworthy that, although the clusters are estimated adaptively from the data, the central limit theorems regarding the entries of the estimated graphs are proved under the same conditions one would use if the clusters were known in advance. As an illustration of the usage of these newly developed inferential tools, we show that they can be reliably used for recovery of the sparsity pattern of the graphs we study, under FDR control, which is verified via simulation studies and an fMRI data analysis. These experimental results confirm the theoretically established difference between the two graph structures. Furthermore, the data analysis suggests that the latent variable graph, corresponding to the unobserved cluster centers, can help provide more insight into the understanding of the brain connectivity networks relative to the simpler, average-based, graph.




inference

Generalized Optimal Matching Methods for Causal Inference

We develop an encompassing framework for matching, covariate balancing, and doubly-robust methods for causal inference from observational data called generalized optimal matching (GOM). The framework is given by generalizing a new functional-analytical formulation of optimal matching, giving rise to the class of GOM methods, for which we provide a single unified theory to analyze tractability and consistency. Many commonly used existing methods are included in GOM and, using their GOM interpretation, can be extended to optimally and automatically trade off balance for variance and outperform their standard counterparts. As a subclass, GOM gives rise to kernel optimal matching (KOM), which, as supported by new theoretical and empirical results, is notable for combining many of the positive properties of other methods in one. KOM, which is solved as a linearly-constrained convex-quadratic optimization problem, inherits both the interpretability and model-free consistency of matching but can also achieve the $sqrt{n}$-consistency of well-specified regression and the bias reduction and robustness of doubly robust methods. In settings of limited overlap, KOM enables a very transparent method for interval estimation for partial identification and robust coverage. We demonstrate this in examples with both synthetic and real data.




inference

Bootstrap-based testing inference in beta regressions

Fábio P. Lima, Francisco Cribari-Neto.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 18--34.

Abstract:
We address the issue of performing testing inference in small samples in the class of beta regression models. We consider the likelihood ratio test and its standard bootstrap version. We also consider two alternative resampling-based tests. One of them uses the bootstrap test statistic replicates to numerically estimate a Bartlett correction factor that can be applied to the likelihood ratio test statistic. By doing so, we avoid estimation of quantities located in the tail of the likelihood ratio test statistic null distribution. The second alternative resampling-based test uses a fast double bootstrap scheme in which a single second level bootstrapping resample is performed for each first level bootstrap replication. It delivers accurate testing inferences at a computational cost that is considerably smaller than that of a standard double bootstrapping scheme. The Monte Carlo results we provide show that the standard likelihood ratio test tends to be quite liberal in small samples. They also show that the bootstrap tests deliver accurate testing inferences even when the sample size is quite small. An empirical application is also presented and discussed.




inference

Bayesian inference on power Lindley distribution based on different loss functions

Abbas Pak, M. E. Ghitany, Mohammad Reza Mahmoudi.

Source: Brazilian Journal of Probability and Statistics, Volume 33, Number 4, 894--914.

Abstract:
This paper focuses on Bayesian estimation of the parameters and reliability function of the power Lindley distribution by using various symmetric and asymmetric loss functions. Assuming suitable priors on the parameters, Bayes estimates are derived by using squared error, linear exponential (linex) and general entropy loss functions. Since, under these loss functions, Bayes estimates of the parameters do not have closed forms we use lindley’s approximation technique to calculate the Bayes estimates. Moreover, we obtain the Bayes estimates of the parameters using a Markov Chain Monte Carlo (MCMC) method. Simulation studies are conducted in order to evaluate the performances of the proposed estimators under the considered loss functions. Finally, analysis of a real data set is presented for illustrative purposes.




inference

Statistical inference for dynamical systems: A review

Kevin McGoff, Sayan Mukherjee, Natesh Pillai.

Source: Statistics Surveys, Volume 9, 209--252.

Abstract:
The topic of statistical inference for dynamical systems has been studied widely across several fields. In this survey we focus on methods related to parameter estimation for nonlinear dynamical systems. Our objective is to place results across distinct disciplines in a common setting and highlight opportunities for further research.




inference

Statistical inference for disordered sphere packings

Jeffrey Picka

Source: Statist. Surv., Volume 6, 74--112.

Abstract:
This paper gives an overview of statistical inference for disordered sphere packing processes. These processes are used extensively in physics and engineering in order to represent the internal structure of composite materials, packed bed reactors, and powders at rest, and are used as initial arrangements of grains in the study of avalanches and other problems involving powders in motion. Packing processes are spatial processes which are neither stationary nor ergodic. Classical spatial statistical models and procedures cannot be applied to these processes, but alternative models and procedures can be developed based on ideas from statistical physics. Most of the development of models and statistics for sphere packings has been undertaken by scientists and engineers. This review summarizes their results from an inferential perspective.




inference

Statistical errors in Monte Carlo-based inference for random elements. (arXiv:2005.02532v2 [math.ST] UPDATED)

Monte Carlo simulation is useful to compute or estimate expected functionals of random elements if those random samples are possible to be generated from the true distribution. However, when the distribution has some unknown parameters, the samples must be generated from an estimated distribution with the parameters replaced by some estimators, which causes a statistical error in Monte Carlo estimation. This paper considers such a statistical error and investigates the asymptotic distributions of Monte Carlo-based estimators when the random elements are not only the real valued, but also functional valued random variables. We also investigate expected functionals for semimartingales in details. The consideration indicates that the Monte Carlo estimation can get worse when a semimartingale has a jump part with unremovable unknown parameters.




inference

Convergence and inference for mixed Poisson random sums. (arXiv:2005.03187v1 [math.PR])

In this paper we obtain the limit distribution for partial sums with a random number of terms following a class of mixed Poisson distributions. The resulting weak limit is a mixing between a normal distribution and an exponential family, which we call by normal exponential family (NEF) laws. A new stability concept is introduced and a relationship between {alpha}-stable distributions and NEF laws is established. We propose estimation of the parameters of the NEF models through the method of moments and also by the maximum likelihood method, which is performed via an Expectation-Maximization algorithm. Monte Carlo simulation studies are addressed to check the performance of the proposed estimators and an empirical illustration on financial market is presented.




inference

On the Optimality of Randomization in Experimental Design: How to Randomize for Minimax Variance and Design-Based Inference. (arXiv:2005.03151v1 [stat.ME])

I study the minimax-optimal design for a two-arm controlled experiment where conditional mean outcomes may vary in a given set. When this set is permutation symmetric, the optimal design is complete randomization, and using a single partition (i.e., the design that only randomizes the treatment labels for each side of the partition) has minimax risk larger by a factor of $n-1$. More generally, the optimal design is shown to be the mixed-strategy optimal design (MSOD) of Kallus (2018). Notably, even when the set of conditional mean outcomes has structure (i.e., is not permutation symmetric), being minimax-optimal for variance still requires randomization beyond a single partition. Nonetheless, since this targets precision, it may still not ensure sufficient uniformity in randomization to enable randomization (i.e., design-based) inference by Fisher's exact test to appropriately detect violations of null. I therefore propose the inference-constrained MSOD, which is minimax-optimal among all designs subject to such uniformity constraints. On the way, I discuss Johansson et al. (2020) who recently compared rerandomization of Morgan and Rubin (2012) and the pure-strategy optimal design (PSOD) of Kallus (2018). I point out some errors therein and set straight that randomization is minimax-optimal and that the "no free lunch" theorem and example in Kallus (2018) are correct.




inference

Statistical inference for model parameters in stochastic gradient descent

Xi Chen, Jason D. Lee, Xin T. Tong, Yichen Zhang.

Source: The Annals of Statistics, Volume 48, Number 1, 251--273.

Abstract:
The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing works focus on the convergence of the objective function or the error of the obtained solution, we investigate the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain smoothness conditions. Our main contributions are twofold. First, in the fixed dimension setup, we propose two consistent estimators of the asymptotic covariance of the average iterate from SGD: (1) a plug-in estimator, and (2) a batch-means estimator, which is computationally more efficient and only uses the iterates from SGD. Both proposed estimators allow us to construct asymptotically exact confidence intervals and hypothesis tests. Second, for high-dimensional linear regression, using a variant of the SGD algorithm, we construct a debiased estimator of each regression coefficient that is asymptotically normal. This gives a one-pass algorithm for computing both the sparse regression coefficients and confidence intervals, which is computationally attractive and applicable to online data.




inference

New $G$-formula for the sequential causal effect and blip effect of treatment in sequential causal inference

Xiaoqin Wang, Li Yin.

Source: The Annals of Statistics, Volume 48, Number 1, 138--160.

Abstract:
In sequential causal inference, two types of causal effects are of practical interest, namely, the causal effect of the treatment regime (called the sequential causal effect) and the blip effect of treatment on the potential outcome after the last treatment. The well-known $G$-formula expresses these causal effects in terms of the standard parameters. In this article, we obtain a new $G$-formula that expresses these causal effects in terms of the point observable effects of treatments similar to treatment in the framework of single-point causal inference. Based on the new $G$-formula, we estimate these causal effects by maximum likelihood via point observable effects with methods extended from single-point causal inference. We are able to increase precision of the estimation without introducing biases by an unsaturated model imposing constraints on the point observable effects. We are also able to reduce the number of point observable effects in the estimation by treatment assignment conditions.




inference

Two-step semiparametric empirical likelihood inference

Francesco Bravo, Juan Carlos Escanciano, Ingrid Van Keilegom.

Source: The Annals of Statistics, Volume 48, Number 1, 1--26.

Abstract:
In both parametric and certain nonparametric statistical models, the empirical likelihood ratio satisfies a nonparametric version of Wilks’ theorem. For many semiparametric models, however, the commonly used two-step (plug-in) empirical likelihood ratio is not asymptotically distribution-free, that is, its asymptotic distribution contains unknown quantities, and hence Wilks’ theorem breaks down. This article suggests a general approach to restore Wilks’ phenomenon in two-step semiparametric empirical likelihood inferences. The main insight consists in using as the moment function in the estimating equation the influence function of the plug-in sample moment. The proposed method is general; it leads to a chi-squared limiting distribution with known degrees of freedom; it is efficient; it does not require undersmoothing; and it is less sensitive to the first-step than alternative methods, which is particularly appealing for high-dimensional settings. Several examples and simulation studies illustrate the general applicability of the procedure and its excellent finite sample performance relative to competing methods.




inference

Bootstrapping and sample splitting for high-dimensional, assumption-lean inference

Alessandro Rinaldo, Larry Wasserman, Max G’Sell.

Source: The Annals of Statistics, Volume 47, Number 6, 3438--3469.

Abstract:
Several new methods have been recently proposed for performing valid inference after model selection. An older method is sample splitting: use part of the data for model selection and the rest for inference. In this paper, we revisit sample splitting combined with the bootstrap (or the Normal approximation). We show that this leads to a simple, assumption-lean approach to inference and we establish results on the accuracy of the method. In fact, we find new bounds on the accuracy of the bootstrap and the Normal approximation for general nonlinear parameters with increasing dimension which we then use to assess the accuracy of regression inference. We define new parameters that measure variable importance and that can be inferred with greater accuracy than the usual regression coefficients. Finally, we elucidate an inference-prediction trade-off: splitting increases the accuracy and robustness of inference but can decrease the accuracy of the predictions.




inference

Statistical inference for autoregressive models under heteroscedasticity of unknown form

Ke Zhu.

Source: The Annals of Statistics, Volume 47, Number 6, 3185--3215.

Abstract:
This paper provides an entire inference procedure for the autoregressive model under (conditional) heteroscedasticity of unknown form with a finite variance. We first establish the asymptotic normality of the weighted least absolute deviations estimator (LADE) for the model. Second, we develop the random weighting (RW) method to estimate its asymptotic covariance matrix, leading to the implementation of the Wald test. Third, we construct a portmanteau test for model checking, and use the RW method to obtain its critical values. As a special weighted LADE, the feasible adaptive LADE (ALADE) is proposed and proved to have the same efficiency as its infeasible counterpart. The importance of our entire methodology based on the feasible ALADE is illustrated by simulation results and the real data analysis on three U.S. economic data sets.




inference

Inference for the mode of a log-concave density

Charles R. Doss, Jon A. Wellner.

Source: The Annals of Statistics, Volume 47, Number 5, 2950--2976.

Abstract:
We study a likelihood ratio test for the location of the mode of a log-concave density. Our test is based on comparison of the log-likelihoods corresponding to the unconstrained maximum likelihood estimator of a log-concave density and the constrained maximum likelihood estimator where the constraint is that the mode of the density is fixed, say at $m$. The constrained estimation problem is studied in detail in Doss and Wellner (2018). Here, the results of that paper are used to show that, under the null hypothesis (and strict curvature of $-log f$ at the mode), the likelihood ratio statistic is asymptotically pivotal: that is, it converges in distribution to a limiting distribution which is free of nuisance parameters, thus playing the role of the $chi_{1}^{2}$ distribution in classical parametric statistical problems. By inverting this family of tests, we obtain new (likelihood ratio based) confidence intervals for the mode of a log-concave density $f$. These new intervals do not depend on any smoothing parameters. We study the new confidence intervals via Monte Carlo methods and illustrate them with two real data sets. The new intervals seem to have several advantages over existing procedures. Software implementing the test and confidence intervals is available in the R package verb+logcondens.mode+.