Abstract:
We propose a simple yet powerful framework for modeling integer-valued data, such as counts, scores, and rounded data. The data-generating process is defined by Simultaneously Transforming and Rounding (STAR) a continuous-valued process, which produces a flexible family of integer-valued distributions capable of modeling zero-inflation, bounded or censored data, and over- or underdispersion. The transformation is modeled as unknown for greater distributional flexibility, while the rounding operation ensures a coherent integer-valued data-generating process. An efficient MCMC algorithm is developed for posterior inference and provides a mechanism for adaptation of successful Bayesian models and algorithms for continuous data to the integer-valued data setting. Using the STAR framework, we design a new Bayesian Additive Regression Tree model for integer-valued data, which demonstrates impressive predictive distribution accuracy for both synthetic data and a large healthcare utilization dataset. For interpretable regression-based inference, we develop a STAR additive model, which offers greater flexibility and scalability than existing integer-valued models. The STAR additive model is applied to study the recent decline in Amazon river dolphins.

Full Article

data

Testing goodness of fit for point processes via topological data analysis

By projecteuclid.org
Published On :: Mon, 24 Feb 2020 04:00 EST

Christophe A. N. Biscio, Nicolas Chenavier, Christian Hirsch, Anne Marie Svane.

Full Article

data

Generating Thermal Image Data Samples using 3D Facial Modelling Techniques and Deep Learning Methodologies. (arXiv:2005.01923v2 [cs.CV] UPDATED)

By arxiv.org
Published On ::

Methods for generating synthetic data have become of increasing importance to build large datasets required for Convolution Neural Networks (CNN) based deep learning techniques for a wide range of computer vision applications. In this work, we extend existing methodologies to show how 2D thermal facial data can be mapped to provide 3D facial models. For the proposed research work we have used tufts datasets for generating 3D varying face poses by using a single frontal face pose. The system works by refining the existing image quality by performing fusion based image preprocessing operations. The refined outputs have better contrast adjustments, decreased noise level and higher exposedness of the dark regions. It makes the facial landmarks and temperature patterns on the human face more discernible and visible when compared to original raw data. Different image quality metrics are used to compare the refined version of images with original images. In the next phase of the proposed study, the refined version of images is used to create 3D facial geometry structures by using Convolution Neural Networks (CNN). The generated outputs are then imported in blender software to finally extract the 3D thermal facial outputs of both males and females. The same technique is also used on our thermal face data acquired using prototype thermal camera (developed under Heliaus EU project) in an indoor lab environment which is then used for generating synthetic 3D face data along with varying yaw face angles and lastly facial depth map is generated.

Full Article

data

Data-Space Inversion Using a Recurrent Autoencoder for Time-Series Parameterization. (arXiv:2005.00061v2 [stat.ML] UPDATED)

By arxiv.org
Published On ::

Data-space inversion (DSI) and related procedures represent a family of methods applicable for data assimilation in subsurface flow settings. These methods differ from model-based techniques in that they provide only posterior predictions for quantities (time series) of interest, not posterior models with calibrated parameters. DSI methods require a large number of flow simulations to first be performed on prior geological realizations. Given observed data, posterior predictions can then be generated directly. DSI operates in a Bayesian setting and provides posterior samples of the data vector. In this work we develop and evaluate a new approach for data parameterization in DSI. Parameterization reduces the number of variables to determine in the inversion, and it maintains the physical character of the data variables. The new parameterization uses a recurrent autoencoder (RAE) for dimension reduction, and a long-short-term memory (LSTM) network to represent flow-rate time series. The RAE-based parameterization is combined with an ensemble smoother with multiple data assimilation (ESMDA) for posterior generation. Results are presented for two- and three-phase flow in a 2D channelized system and a 3D multi-Gaussian model. The RAE procedure, along with existing DSI treatments, are assessed through comparison to reference rejection sampling (RS) results. The new DSI methodology is shown to consistently outperform existing approaches, in terms of statistical agreement with RS results. The method is also shown to accurately capture derived quantities, which are computed from variables considered directly in DSI. This requires correlation and covariance between variables to be properly captured, and accuracy in these relationships is demonstrated. The RAE-based parameterization developed here is clearly useful in DSI, and it may also find application in other subsurface flow problems.

Full Article

data

On the impact of selected modern deep-learning techniques to the performance and celerity of classification models in an experimental high-energy physics use case. (arXiv:2002.01427v3 [physics.data-an] UPDATED)

By arxiv.org
Published On ::

data

Flexible Imputation of Missing Data (2nd Edition)

By www.jstatsoft.org
Published On :: Sat, 18 Apr 2020 03:35:08 +0000

Full Article

data

mgm: Estimating Time-Varying Mixed Graphical Models in High-Dimensional Data

By www.jstatsoft.org
Published On :: Mon, 27 Apr 2020 00:00:00 +0000

We present the R package mgm for the estimation of k-order mixed graphical models (MGMs) and mixed vector autoregressive (mVAR) models in high-dimensional data. These are a useful extensions of graphical models for only one variable type, since data sets consisting of mixed types of variables (continuous, count, categorical) are ubiquitous. In addition, we allow to relax the stationarity assumption of both models by introducing time-varying versions of MGMs and mVAR models based on a kernel weighting approach. Time-varying models offer a rich description of temporally evolving systems and allow to identify external influences on the model structure such as the impact of interventions. We provide the background of all implemented methods and provide fully reproducible examples that illustrate how to use the package.

Full Article

Metadata for information management and retrieval : understanding metadata and its use / David Haynes.

Storytelling with data : a data visualization guide for business professionals / Cole Nussbaumer Knaflic.

Invisible women : exposing data bias in a world designed for men / Caroline Criado Perez.

Indiana Using Data to Build Better Transcripts, College Transitions

Kansas City Data-Sharing Effort Showcases Ballmer Group's Strategy

Enteric fever : its prevalence and modifications, aetiology, pathology and treatment as illustrated by Army data at home and abroad / by Francis H. Welch.

Finding Common Ground Through Data to Improve Idaho's Teacher Pipeline

Simultaneous transformation and rounding (STAR) models for integer-valued data

Testing goodness of fit for point processes via topological data analysis

A Model of Fake Data in Data-driven Analysis

Generalized probabilistic principal component analysis of correlated data

Expectation Propagation as a Way of Life: A Framework for Bayesian Inference on Partitioned Data

Targeted Fused Ridge Estimation of Inverse Covariance Matrices from Multiple High-Dimensional Data Classes

Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

Ensemble Learning for Relational Data

(1 + epsilon)-class Classification: an Anomaly Detection Method for Highly Imbalanced or Incomplete Data Sets

High-dimensional Gaussian graphical models on network-linked data

Measuring symmetry and asymmetry of multiplicative distortion measurement errors data

A Bayesian sparse finite mixture model for clustering data from a heterogeneous population

Bayesian modeling and prior sensitivity analysis for zero–one augmented beta regression models with an application to psychometric data

Recent developments in complex and spatially correlated functional data

Nonparametric discrimination of areal functional data

A joint mean-correlation modeling approach for longitudinal zero-inflated count data

Time series of count data: A review, empirical comparisons and data analysis

A new log-linear bimodal Birnbaum–Saunders regression model with application to survival data

Simple tail index estimation for dependent and heterogeneous data with missing values

Scalar-on-function regression for predicting distal outcomes from intensively gathered longitudinal data: Interpretability for applied scientists

PLS for Big Data: A unified parallel algorithm for regularised group PLS

A design-sensitive approach to fitting regression models with complex survey data

A comparison of spatial predictors when datasets could be very large

Some models and methods for the analysis of observational data

Data confidentiality: A review of methods for statistical disclosure limitation and methods for assessing privacy

Generating Thermal Image Data Samples using 3D Facial Modelling Techniques and Deep Learning Methodologies. (arXiv:2005.01923v2 [cs.CV] UPDATED)

Data-Space Inversion Using a Recurrent Autoencoder for Time-Series Parameterization. (arXiv:2005.00061v2 [stat.ML] UPDATED)

On the impact of selected modern deep-learning techniques to the performance and celerity of classification models in an experimental high-energy physics use case. (arXiv:2002.01427v3 [physics.data-an] UPDATED)

Sampling random graph homomorphisms and applications to network data analysis. (arXiv:1910.09483v2 [math.PR] UPDATED)

Bayesian factor models for multivariate categorical data obtained from questionnaires. (arXiv:1910.04283v2 [stat.AP] UPDATED)

Semiparametric Optimal Estimation With Nonignorable Nonresponse Data. (arXiv:1612.09207v3 [stat.ME] UPDATED)

Nonstationary Bayesian modeling for a large data set of derived surface temperature return values. (arXiv:2005.03658v1 [stat.ME])

Local Cascade Ensemble for Multivariate Data Classification. (arXiv:2005.03645v1 [cs.LG])

Domain Adaptation in Highly Imbalanced and Overlapping Datasets. (arXiv:2005.03585v1 [cs.LG])

Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach. (arXiv:2005.03582v1 [cs.LG])

On unbalanced data and common shock models in stochastic loss reserving. (arXiv:2005.03500v1 [q-fin.RM])

Deep learning of physical laws from scarce data. (arXiv:2005.03448v1 [cs.LG])

Interpreting Deep Models through the Lens of Data. (arXiv:2005.03442v1 [cs.LG])

Deep Learning Framework for Detecting Ground Deformation in the Built Environment using Satellite InSAR data. (arXiv:2005.03221v1 [cs.CV])

Efficient Characterization of Dynamic Response Variation Using Multi-Fidelity Data Fusion through Composite Neural Network. (arXiv:2005.03213v1 [stat.ML])

MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation. (arXiv:2005.03161v1 [stat.ML])

Flexible Imputation of Missing Data (2nd Edition)

mgm: Estimating Time-Varying Mixed Graphical Models in High-Dimensional Data

Linking Data on Genetics, Traits and Environment Gives Crop Breeders a Wider Lens

Data from Hawaii observatory helps scientists discover giant planet slingshots around its star

Siemens AG looks to ride India's AI data centre wave

Data-driven safety has arrived

Futures muted ahead of economic data, Powell speech

Stocks and Dollar Rise Before Data, Powell Speech: Markets Wrap

From ?Searching? to ?Finding?: How AI is Unlocking the Power of Unstructured Data

DataCore expands Melbourne operations

RMIT University and NICTA collaborate to open a new data analytics lab in Melbourne

Equinix expands in Melbourne with launch of new data centre

Using Data Recording as a Troubleshooting Aid

New Zealand data - FPI -0.9% in October (prior +0.5%)

More from Musalem: Data since prior meeting suggests economy may be materially stronger

Provision of mobile voice and data services to the OSCE centre in Bishkek, Kyrgyzstan

Solidigm's Monster 122-Terabyte SSD Is Here For Copious Data Center Storage

Subscribe To Our Newsletter