modeling

Urban forest restoration cost modeling: a Seattle natural areas case study

Cities have become more committed to ecological restoration and management activities in urban natural areas.




modeling

Modeling nanoconfinement effects using active learning. (arXiv:2005.02587v2 [physics.app-ph] UPDATED)

Predicting the spatial configuration of gas molecules in nanopores of shale formations is crucial for fluid flow forecasting and hydrocarbon reserves estimation. The key challenge in these tight formations is that the majority of the pore sizes are less than 50 nm. At this scale, the fluid properties are affected by nanoconfinement effects due to the increased fluid-solid interactions. For instance, gas adsorption to the pore walls could account for up to 85% of the total hydrocarbon volume in a tight reservoir. Although there are analytical solutions that describe this phenomenon for simple geometries, they are not suitable for describing realistic pores, where surface roughness and geometric anisotropy play important roles. To describe these, molecular dynamics (MD) simulations are used since they consider fluid-solid and fluid-fluid interactions at the molecular level. However, MD simulations are computationally expensive, and are not able to simulate scales larger than a few connected nanopores. We present a method for building and training physics-based deep learning surrogate models to carry out fast and accurate predictions of molecular configurations of gas inside nanopores. Since training deep learning models requires extensive databases that are computationally expensive to create, we employ active learning (AL). AL reduces the overhead of creating comprehensive sets of high-fidelity data by determining where the model uncertainty is greatest, and running simulations on the fly to minimize it. The proposed workflow enables nanoconfinement effects to be rigorously considered at the mesoscale where complex connected sets of nanopores control key applications such as hydrocarbon recovery and CO2 sequestration.




modeling

Adaptive Dialog Policy Learning with Hindsight and User Modeling. (arXiv:2005.03299v1 [cs.AI])

Reinforcement learning methods have been used to compute dialog policies from language-based interaction experiences. Efficiency is of particular importance in dialog policy learning, because of the considerable cost of interacting with people, and the very poor user experience from low-quality conversations. Aiming at improving the efficiency of dialog policy learning, we develop algorithm LHUA (Learning with Hindsight, User modeling, and Adaptation) that, for the first time, enables dialog agents to adaptively learn with hindsight from both simulated and real users. Simulation and hindsight provide the dialog agent with more experience and more (positive) reinforcements respectively. Experimental results suggest that, in success rate and policy quality, LHUA outperforms competitive baselines from the literature, including its no-simulation, no-adaptation, and no-hindsight counterparts.




modeling

Exploratory Analysis of Covid-19 Tweets using Topic Modeling, UMAP, and DiGraphs. (arXiv:2005.03082v1 [cs.SI])

This paper illustrates five different techniques to assess the distinctiveness of topics, key terms and features, speed of information dissemination, and network behaviors for Covid19 tweets. First, we use pattern matching and second, topic modeling through Latent Dirichlet Allocation (LDA) to generate twenty different topics that discuss case spread, healthcare workers, and personal protective equipment (PPE). One topic specific to U.S. cases would start to uptick immediately after live White House Coronavirus Task Force briefings, implying that many Twitter users are paying attention to government announcements. We contribute machine learning methods not previously reported in the Covid19 Twitter literature. This includes our third method, Uniform Manifold Approximation and Projection (UMAP), that identifies unique clustering-behavior of distinct topics to improve our understanding of important themes in the corpus and help assess the quality of generated topics. Fourth, we calculated retweeting times to understand how fast information about Covid19 propagates on Twitter. Our analysis indicates that the median retweeting time of Covid19 for a sample corpus in March 2020 was 2.87 hours, approximately 50 minutes faster than repostings from Chinese social media about H7N9 in March 2013. Lastly, we sought to understand retweet cascades, by visualizing the connections of users over time from fast to slow retweeting. As the time to retweet increases, the density of connections also increase where in our sample, we found distinct users dominating the attention of Covid19 retweeters. One of the simplest highlights of this analysis is that early-stage descriptive methods like regular expressions can successfully identify high-level themes which were consistently verified as important through every subsequent analysis.




modeling

Extracting Headless MWEs from Dependency Parse Trees: Parsing, Tagging, and Joint Modeling Approaches. (arXiv:2005.03035v1 [cs.CL])

An interesting and frequent type of multi-word expression (MWE) is the headless MWE, for which there are no true internal syntactic dominance relations; examples include many named entities ("Wells Fargo") and dates ("July 5, 2020") as well as certain productive constructions ("blow for blow", "day after day"). Despite their special status and prevalence, current dependency-annotation schemes require treating such flat structures as if they had internal syntactic heads, and most current parsers handle them in the same fashion as headed constructions. Meanwhile, outside the context of parsing, taggers are typically used for identifying MWEs, but taggers might benefit from structural information. We empirically compare these two common strategies--parsing and tagging--for predicting flat MWEs. Additionally, we propose an efficient joint decoding algorithm that combines scores from both strategies. Experimental results on the MWE-Aware English Dependency Corpus and on six non-English dependency treebanks with frequent flat structures show that: (1) tagging is more accurate than parsing for identifying flat-structure MWEs, (2) our joint decoder reconciles the two different views and, for non-BERT features, leads to higher accuracies, and (3) most of the gains result from feature sharing between the parsers and taggers.




modeling

Modeling of time-variant threshability due to interactions between a crop in a field and atmospheric and soil conditions for prediction of daily opportunity windows for harvest operations using field-level diagnosis and prediction of weather conditions an

A modeling framework for evaluating the impact of weather conditions on farming and harvest operations applies real-time, field-level weather data and forecasts of meteorological and climatological conditions together with user-provided and/or observed feedback of a present state of a harvest-related condition to agronomic models and to generate a plurality of harvest advisory outputs for precision agriculture. A harvest advisory model simulates and predicts the impacts of this weather information and user-provided and/or observed feedback in one or more physical, empirical, or artificial intelligence models of precision agriculture to analyze crops, plants, soils, and resulting agricultural commodities, and provides harvest advisory outputs to a diagnostic support tool for users to enhance farming and harvest decision-making, whether by providing pre-, post-, or in situ-harvest operations and crop analyzes.




modeling

Fault localization using condition modeling and return value modeling

Disclosed is a novel computer implemented system, on demand service, computer program product and a method that leverages combined concrete and symbolic execution and several fault-localization techniques to automatically detects failures and localizes faults in PHP Hypertext Preprocessor (“PHP”) Web applications.




modeling

Method of material modeling for crash test dummy finite element models

A computer method of material modeling for crash test dummy finite element models includes the steps of making a material card for the material, applying the material card to validate a finite element model of a crash test dummy component, determining whether the finite element model is acceptable, ending the method if the finite element model is acceptable, and adjusting a relative volume (J) range for the material to make the material soft or stiff if the finite element model is not acceptable.




modeling

Modeling gate resistance of a multi-fin multi-gate field effect transistor

The embodiments relate to modeling resistance in a multi-fin multi-gate field effect transistor (MUGFET). In these embodiments, a design for a multi-fin MUGFET comprises a gate structure with a horizontal portion traversing multiple semiconductor fins and comprising a plurality of first resistive elements connected in series, with vertical portions adjacent to opposing sides of the semiconductor fins and comprising second resistive elements connected in parallel by the horizontal portion, and with contact(s) comprising third resistive element(s). The total gate resistance is determined based on resistance contributions from the first resistive elements, the second resistive elements and the third resistive element(s), particularly, where each resistive contribution is based on a resistance value of the resistive element, a first fraction of current from the semiconductor fins entering the resistive element and a second fraction of the current from the semiconductor fins exiting the resistive element.




modeling

Method of providing data included in building information modeling data file, recording medium therefor, system using the method, and method of providing data using building information modeling server

A method of providing data included in a building information modeling (BIM) data file using a server is provided. The method includes retrieving mapping data corresponding to a user request, extracting data corresponding to at least one entity mapped to the mapping data from the BIM data file, and transmitting the extracted data to a client.




modeling

Modeling defect management and product handling during the manufacturing process

A method models a defect management routine. Both the modeling and a handling are executed within a manufacturing execution system. During an engineering phase: modeling the production process and creating a library of possible defect types which may occur; assigning the defect types to at least one defect group; creating a library of defect specifications; creating a library of defect type specification details; creating at least one runtime defect criteria that is used to link the defect type to a certain production volume; and creating a runtime defect measurement routine that monitors a corrective measure. During a runtime production phase evaluating the product produced; identifying the respective defect type out of the library of defect types; and using the identified defect type to determine a corrective measure, a runtime defect criteria identifying the resource causing the defect type, a production volume, and to run the respective runtime defect management routine.




modeling

Tridimensional modeling apparatuses, system and kit for providing a representation of an exploration network

A tridimensional modeling apparatus, system and kit is for representing an exploration network. The apparatus, system and kit include a transparent hollow cube with six plane surfaces for representing an enclosed volume, a plurality of perforations on at least two of the six plane surfaces and indicia around each opening for marking polar coordinates and orientation. The apparatus, system and kit further include a plurality of transparent rods for representing exploration channels. The plurality of perforations on the cube are arranged for receiving rods for tridimensional modeling of the exploration network and each rod is inserted into an opening with an angle and a depth, thereby resulting in a visual representation of the exploration network within the represented volume.





modeling

Did Ajak Deng Just Quit Modeling for Good



The 26-year-old makes a shocking announcement.



  • New York Fashion Week
  • Celebrity fashion and beauty news

modeling

Improving AEDT Modeling for Aircraft Noise Reflection and Diffraction from Terrain and Manmade Structures

Barriers, berms, buildings, and natural terrain may affect the propagation of aircraft noise by shielding or reflecting sound energy. If terrain and manmade structures obstruct the line‐of‐sight between the source and the receiver, then sound energy will be attenuated at the receiver. This attenuation increases with the terrain and structures’ size and proximity to either the source or the receiver. If gaps exist in the terrain or structures, then the potential benefits of acoustical shielding will be su...



  • http://www.trb.org/Resource.ashx?sn=cover_acrp_wod_44

modeling

Perils of topic modeling

Today's xkcd illustrates why topic modeling can be tricky, for people as well as for machines: The mouseover title: "As the 'exotic animals in homemade aprons hosting baking shows' YouTube craze reached its peak in March 2020, Andrew Cuomo announced he was replacing the Statue of Liberty with a bronze pangolin in a chef's hat." […]



  • Linguistics in the comics

modeling

Tyra Banks' modeling theme park, ModelLand, is finally opening in Santa Monica

A decade in the making, Tyra Banks' modeling utopia, ModelLand, 'will emulate a fantasy version of the modeling world.' The park opens in Santa Monica in May.




modeling

Polarization of protease-activated receptor 2 (PAR-2) signaling is altered during airway epithelial remodeling and deciliation [Immunology]

Protease-activated receptor 2 (PAR-2) is activated by secreted proteases from immune cells or fungi. PAR-2 is normally expressed basolaterally in differentiated nasal ciliated cells. We hypothesized that epithelial remodeling during diseases characterized by cilial loss and squamous metaplasia may alter PAR-2 polarization. Here, using a fluorescent arrestin assay, we confirmed that the common fungal airway pathogen Aspergillus fumigatus activates heterologously-expressed PAR-2. Endogenous PAR-2 activation in submerged airway RPMI 2650 or NCI–H520 squamous cells increased intracellular calcium levels and granulocyte macrophage–colony-stimulating factor, tumor necrosis factor α, and interleukin (IL)-6 secretion. RPMI 2650 cells cultured at an air–liquid interface (ALI) responded to apically or basolaterally applied PAR-2 agonists. However, well-differentiated primary nasal epithelial ALIs responded only to basolateral PAR-2 stimulation, indicated by calcium elevation, increased cilia beat frequency, and increased fluid and cytokine secretion. We exposed primary cells to disease-related modifiers that alter epithelial morphology, including IL-13, cigarette smoke condensate, and retinoic acid deficiency, at concentrations and times that altered epithelial morphology without causing breakdown of the epithelial barrier to model early disease states. These altered primary cultures responded to both apical and basolateral PAR-2 stimulation. Imaging nasal polyps and control middle turbinate explants, we found that nasal polyps, but not turbinates, exhibit apical calcium responses to PAR-2 stimulation. However, isolated ciliated cells from both polyps and turbinates maintained basolateral PAR-2 polarization, suggesting that the calcium responses originated from nonciliated cells. Altered PAR-2 polarization in disease-remodeled epithelia may enhance apical responses and increase sensitivity to inhaled proteases.




modeling

Modeling COVID-19: A new video describing the types of models used

Below, Mac Hyman, Tulane University, talks about types of mathematical models--their strengths and weaknesses--the data that we currently have and what we really need, and what models can tell us about a possible second wave.

At the beginning of the video, he thanks the mathematics community for its work, and near the end says, "Our mathematical community is really playing a central role in helping to predict the spread, and help mitigate this epidemic, and prioritize our efforts. …Do not underestimate the power that mathematics can have in helping to mitigate this epidemic—-we have a role to play."

See the full set of videos on modeling COVID-19 and see media coverage of mathematics' role in modeling the pandemic.




modeling

Infectious disease modeling study casts doubt on impact of Justinianic plague

(University of Maryland) Many historians have claimed the Justinianic Plague (c. 541-750 CE) killed half of the population of Byzantine (Eastern Roman) Empire. New historical research and mathematical modeling challenge the death rate and severity of this first plague pandemic, named for Emperor Justinian I.




modeling

Modeling gas diffusion in aggregated soils

(American Society of Agronomy) Researchers develop soil-gas diffusivity model based on two agricultural soils.




modeling

Kinetic modeling and test-retest reproducibility of 11C-EKAP and 11C-FEKAP, novel agonist radiotracers for PET imaging of the kappa opioid receptor in humans

The kappa opioid receptor (KOR) is implicated in various neuropsychiatric disorders. We previously evaluated an agonist tracer, 11C-GR103545, for PET imaging of KOR in humans. Although 11C-GR103545 showed high brain uptake, good binding specificity, and selectivity to KOR, it displayed slow kinetics and relatively large test-retest variability (TRV) of distribution volume (VT) estimates (15%). Therefore we set out to develop two novel KOR agonist radiotracers, 11C-EKAP and 11C-FEKAP, and in nonhuman primates, both tracers exhibited faster kinetics and comparable binding parameters to 11C-GR103545. The aim of this study was to assess their kinetic and binding properties in humans. Methods: Six healthy subjects underwent 120-min test-retest PET scans with both 11C-EKAP and 11C-FEKAP. Metabolite-corrected arterial input functions were measured. Regional time-activity curves (TACs) were generated for 14 regions of interest. One- and two-tissue compartment models (1TC, 2TC) and the multilinear analysis-1 (MA1) method were applied to the regional TACs to calculate VT. Time-stability of VT values and test-retest reproducibility were evaluated. Levels of specific binding, as measured by the non-displaceable binding potential (BPND) for the three tracers (11C-EKAP, 11C-FEKAP and 11C-GR103545), were compared using a graphical method. Results: For both tracers, regional TACs were fitted well with the 2TC model and MA1 method (t*=20min), but not with the 1TC model. Given unreliably estimated parameters in several fits with the 2TC model and a good match between VT values from MA1 and 2TC, MA1 was chosen as the appropriate model for both tracers. Mean MA1 VT values were highest for 11C-GR103545, followed by 11C-EKAP, then 11C-FEKAP. Minimum scan time for stable VT measurement was 90 and 110min for 11C-EKAP and 11C-FEKAP, respectively, compared with 140min for 11C-GR103545. The mean absolute TRV in MA1 VT estimates was 7% and 18% for 11C-EKAP and 11C-FEKAP, respectively. BPND levels were similar for 11C-FEKAP and 11C-GR103545, but ~25% lower for 11C-EKAP. Conclusion: The two novel KOR agonist tracers showed faster tissue kinetics than 11C-GR103545. Even with slightly lower BPND, 11C-EKAP is judged to be a better tracer for imaging and quantification of KOR in humans, based on the shorter minimum scan time and excellent test-retest.




modeling

Bayesian Proteoform Modeling Improves Protein Quantification of Global Proteomic Measurements [Technology]

As the capability of mass spectrometry-based proteomics has matured, tens of thousands of peptides can be measured simultaneously, which has the benefit of offering a systems view of protein expression. However, a major challenge is that with an increase in throughput, protein quantification estimation from the native measured peptides has become a computational task. A limitation to existing computationally-driven protein quantification methods is that most ignore protein variation, such as alternate splicing of the RNA transcript and post-translational modifications or other possible proteoforms, which will affect a significant fraction of the proteome. The consequence of this assumption is that statistical inference at the protein level, and consequently downstream analyses, such as network and pathway modeling, have only limited power for biomarker discovery. Here, we describe a Bayesian model (BP-Quant) that uses statistically derived peptides signatures to identify peptides that are outside the dominant pattern, or the existence of multiple over-expressed patterns to improve relative protein abundance estimates. It is a research-driven approach that utilizes the objectives of the experiment, defined in the context of a standard statistical hypothesis, to identify a set of peptides exhibiting similar statistical behavior relating to a protein. This approach infers that changes in relative protein abundance can be used as a surrogate for changes in function, without necessarily taking into account the effect of differential post-translational modifications, processing, or splicing in altering protein function. We verify the approach using a dilution study from mouse plasma samples and demonstrate that BP-Quant achieves similar accuracy as the current state-of-the-art methods at proteoform identification with significantly better specificity. BP-Quant is available as a MatLab ® and R packages at https://github.com/PNNL-Comp-Mass-Spec/BP-Quant.




modeling

Erratum. Multiethnic Genome-Wide Association Study of Diabetic Retinopathy Using Liability Threshold Modeling of Duration of Diabetes and Glycemic Control. Diabetes 2019;68:441--456




modeling

Predictive Modeling of Type 1 Diabetes Stages Using Disparate Data Sources

This study aims to model genetic, immunologic, metabolomics, and proteomic biomarkers for development of islet autoimmunity (IA) and progression to type 1 diabetes in a prospective high-risk cohort. We studied 67 children: 42 who developed IA (20 of 42 progressed to diabetes) and 25 control subjects matched for sex and age. Biomarkers were assessed at four time points: earliest available sample, just prior to IA, just after IA, and just prior to diabetes onset. Predictors of IA and progression to diabetes were identified across disparate sources using an integrative machine learning algorithm and optimization-based feature selection. Our integrative approach was predictive of IA (area under the receiver operating characteristic curve [AUC] 0.91) and progression to diabetes (AUC 0.92) based on standard cross-validation (CV). Among the strongest predictors of IA were change in serum ascorbate, 3-methyl-oxobutyrate, and the PTPN22 (rs2476601) polymorphism. Serum glucose, ADP fibrinogen, and mannose were among the strongest predictors of progression to diabetes. This proof-of-principle analysis is the first study to integrate large, diverse biomarker data sets into a limited number of features, highlighting differences in pathways leading to IA from those predicting progression to diabetes. Integrated models, if validated in independent populations, could provide novel clues concerning the pathways leading to IA and type 1 diabetes.




modeling

Multimodality Imaging of Inflammation and Ventricular Remodeling in Pressure-Overload Heart Failure

Inflammation contributes to ventricular remodeling after myocardial ischemia, but its role in nonischemic heart failure is poorly understood. Local tissue inflammation is difficult to assess serially during pathogenesis. Although 18F-FDG accumulates in inflammatory leukocytes and thus may identify inflammation in the myocardial microenvironment, it remains unclear whether this imaging technique can isolate diffuse leukocytes in pressure-overload heart failure. We aimed to evaluate whether inflammation with 18F-FDG can be serially imaged in the early stages of pressure-overload–induced heart failure and to compare the time course with functional impairment assessed by cardiac MRI. Methods: C57Bl6/N mice underwent transverse aortic constriction (TAC) (n = 22), sham surgery (n = 12), or coronary ligation as an inflammation-positive control (n = 5). MRI assessed ventricular geometry and contractile function at 2 and 8 d after TAC. Immunostaining identified the extent of inflammatory leukocyte infiltration early in pressure overload. 18F-FDG PET scans were acquired at 3 and 7 d after TAC, under ketamine-xylazine anesthesia to suppress cardiomyocyte glucose uptake. Results: Pressure overload evoked rapid left ventricular dilation compared with sham (end-systolic volume, day 2: 40.6 ± 10.2 μL vs. 23.8 ± 1.7 μL, P < 0.001). Contractile function was similarly impaired (ejection fraction, day 2: 40.9% ± 9.7% vs. 59.2% ± 4.4%, P < 0.001). The severity of contractile impairment was proportional to histology-defined myocardial macrophage density on day 8 (r = –0.669, P = 0.010). PET imaging identified significantly higher left ventricular 18F-FDG accumulation in TAC mice than in sham mice on day 3 (10.5 ± 4.1 percentage injected dose [%ID]/g vs. 3.8 ± 0.9 %ID/g, P < 0.001) and on day 7 (7.8 ± 3.7 %ID/g vs. 3.0 ± 0.8 %ID/g, P = 0.006), though the efficiency of cardiomyocyte suppression was variable among TAC mice. The 18F-FDG signal correlated with ejection fraction (r = –0.75, P = 0.01) and ventricular volume (r = 0.75, P < 0.01). Western immunoblotting demonstrated a 60% elevation of myocardial glucose transporter 4 expression in the left ventricle at 8 d after TAC, indicating altered glucose metabolism. Conclusion: TAC induces rapid changes in left ventricular geometry and contractile function, with a parallel modest infiltration of inflammatory macrophages. Metabolic remodeling overshadows inflammatory leukocyte signal using 18F-FDG PET imaging. More selective inflammatory tracers are requisite to identify the diffuse local inflammation in pressure overload.




modeling

Dapagliflozin Versus Placebo on Left Ventricular Remodeling in Patients With Diabetes and Heart Failure: The REFORM Trial

OBJECTIVE

To determine the effects of dapagliflozin in patients with heart failure (HF) and type 2 diabetes mellitus (T2DM) on left ventricular (LV) remodeling using cardiac MRI.

RESEARCH DESIGN AND METHODS

We randomized 56 patients with T2DM and HF with LV systolic dysfunction to dapagliflozin 10 mg daily or placebo for 1 year, on top of usual therapy. The primary end point was difference in LV end-systolic volume (LVESV) using cardiac MRI. Key secondary end points included other measures of LV remodeling and clinical and biochemical parameters.

RESULTS

In our cohort, dapagliflozin had no effect on LVESV or any other parameter of LV remodeling. However, it reduced diastolic blood pressure and loop diuretic requirements while increasing hemoglobin, hematocrit, and ketone bodies. There was a trend toward lower weight.

CONCLUSIONS

We were unable to determine with certainty whether dapagliflozin in patients with T2DM and HF had any effect on LV remodeling. Whether the benefits of dapagliflozin in HF are due to remodeling or other mechanisms remains unknown.




modeling

Bayesian modeling and prior sensitivity analysis for zero–one augmented beta regression models with an application to psychometric data

Danilo Covaes Nogarotto, Caio Lucidius Naberezny Azevedo, Jorge Luis Bazán.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 2, 304--322.

Abstract:
The interest on the analysis of the zero–one augmented beta regression (ZOABR) model has been increasing over the last few years. In this work, we developed a Bayesian inference for the ZOABR model, providing some contributions, namely: we explored the use of Jeffreys-rule and independence Jeffreys prior for some of the parameters, performing a sensitivity study of prior choice, comparing the Bayesian estimates with the maximum likelihood ones and measuring the accuracy of the estimates under several scenarios of interest. The results indicate, in a general way, that: the Bayesian approach, under the Jeffreys-rule prior, was as accurate as the ML one. Also, different from other approaches, we use the predictive distribution of the response to implement Bayesian residuals. To further illustrate the advantages of our approach, we conduct an analysis of a real psychometric data set including a Bayesian residual analysis, where it is shown that misleading inference can be obtained when the data is transformed. That is, when the zeros and ones are transformed to suitable values and the usual beta regression model is considered, instead of the ZOABR model. Finally, future developments are discussed.




modeling

A joint mean-correlation modeling approach for longitudinal zero-inflated count data

Weiping Zhang, Jiangli Wang, Fang Qian, Yu Chen.

Source: Brazilian Journal of Probability and Statistics, Volume 34, Number 1, 35--50.

Abstract:
Longitudinal zero-inflated count data are widely encountered in many fields, while modeling the correlation between measurements for the same subject is more challenge due to the lack of suitable multivariate joint distributions. This paper studies a novel mean-correlation modeling approach for longitudinal zero-inflated regression model, solving both problems of specifying joint distribution and parsimoniously modeling correlations with no constraint. The joint distribution of zero-inflated discrete longitudinal responses is modeled by a copula model whose correlation parameters are innovatively represented in hyper-spherical coordinates. To overcome the computational intractability in maximizing the full likelihood function of the model, we further propose a computationally efficient pairwise likelihood approach. We then propose separated mean and correlation regression models to model these key quantities, such modeling approach can also handle irregularly and possibly subject-specific times points. The resulting estimators are shown to be consistent and asymptotically normal. Data example and simulations support the effectiveness of the proposed approach.




modeling

Nonstationary Bayesian modeling for a large data set of derived surface temperature return values. (arXiv:2005.03658v1 [stat.ME])

Heat waves resulting from prolonged extreme temperatures pose a significant risk to human health globally. Given the limitations of observations of extreme temperature, climate models are often used to characterize extreme temperature globally, from which one can derive quantities like return values to summarize the magnitude of a low probability event for an arbitrary geographic location. However, while these derived quantities are useful on their own, it is also often important to apply a spatial statistical model to such data in order to, e.g., understand how the spatial dependence properties of the return values vary over space and emulate the climate model for generating additional spatial fields with corresponding statistical properties. For these objectives, when modeling global data it is critical to use a nonstationary covariance function. Furthermore, given that the output of modern global climate models can be on the order of $mathcal{O}(10^4)$, it is important to utilize approximate Gaussian process methods to enable inference. In this paper, we demonstrate the application of methodology introduced in Risser and Turek (2020) to conduct a nonstationary and fully Bayesian analysis of a large data set of 20-year return values derived from an ensemble of global climate model runs with over 50,000 spatial locations. This analysis uses the freely available BayesNSGP software package for R.




modeling

Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach. (arXiv:2005.03582v1 [cs.LG])

Early detection of patients vulnerable to infections acquired in the hospital environment is a challenge in current health systems given the impact that such infections have on patient mortality and healthcare costs. This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units by means of machine-learning methods. The aim is to support decision making addressed at reducing the incidence rate of infections. In this field, it is necessary to deal with the problem of building reliable classifiers from imbalanced datasets. We propose a clustering-based undersampling strategy to be used in combination with ensemble classifiers. A comparative study with data from 4616 patients was conducted in order to validate our proposal. We applied several single and ensemble classifiers both to the original dataset and to data preprocessed by means of different resampling methods. The results were analyzed by means of classic and recent metrics specifically designed for imbalanced data classification. They revealed that the proposal is more efficient in comparison with other approaches.




modeling

Modeling High-Dimensional Unit-Root Time Series. (arXiv:2005.03496v1 [stat.ME])

In this paper, we propose a new procedure to build a structural-factor model for a vector unit-root time series. For a $p$-dimensional unit-root process, we assume that each component consists of a set of common factors, which may be unit-root non-stationary, and a set of stationary components, which contain the cointegrations among the unit-root processes. To further reduce the dimensionality, we also postulate that the stationary part of the series is a nonsingular linear transformation of certain common factors and idiosyncratic white noise components as in Gao and Tsay (2019a, b). The estimation of linear loading spaces of the unit-root factors and the stationary components is achieved by an eigenanalysis of some nonnegative definite matrix, and the separation between the stationary factors and the white noises is based on an eigenanalysis and a projected principal component analysis. Asymptotic properties of the proposed method are established for both fixed $p$ and diverging $p$ as the sample size $n$ tends to infinity. Both simulated and real examples are used to demonstrate the performance of the proposed method in finite samples.




modeling

Feature Selection Methods for Uplift Modeling. (arXiv:2005.03447v1 [cs.LG])

Uplift modeling is a predictive modeling technique that estimates the user-level incremental effect of a treatment using machine learning models. It is often used for targeting promotions and advertisements, as well as for the personalization of product offerings. In these applications, there are often hundreds of features available to build such models. Keeping all the features in a model can be costly and inefficient. Feature selection is an essential step in the modeling process for multiple reasons: improving the estimation accuracy by eliminating irrelevant features, accelerating model training and prediction speed, reducing the monitoring and maintenance workload for feature data pipeline, and providing better model interpretation and diagnostics capability. However, feature selection methods for uplift modeling have been rarely discussed in the literature. Although there are various feature selection methods for standard machine learning models, we will demonstrate that those methods are sub-optimal for solving the feature selection problem for uplift modeling. To address this problem, we introduce a set of feature selection methods designed specifically for uplift modeling, including both filter methods and embedded methods. To evaluate the effectiveness of the proposed feature selection methods, we use different uplift models and measure the accuracy of each model with a different number of selected features. We use both synthetic and real data to conduct these experiments. We also implemented the proposed filter methods in an open source Python package (CausalML).




modeling

lslx: Semi-Confirmatory Structural Equation Modeling via Penalized Likelihood

Sparse estimation via penalized likelihood (PL) is now a popular approach to learn the associations among a large set of variables. This paper describes an R package called lslx that implements PL methods for semi-confirmatory structural equation modeling (SEM). In this semi-confirmatory approach, each model parameter can be specified as free/fixed for theory testing, or penalized for exploration. By incorporating either a L1 or minimax concave penalty, the sparsity pattern of the parameter matrix can be efficiently explored. Package lslx minimizes the PL criterion through a quasi-Newton method. The algorithm conducts line search and checks the first-order condition in each iteration to ensure the optimality of the obtained solution. A numerical comparison between competing packages shows that lslx can reliably find PL estimates with the least time. The current package also supports other advanced functionalities, including a two-stage method with auxiliary variables for missing data handling and a reparameterized multi-group SEM to explore population heterogeneity.




modeling

Semi-Parametric Joint Modeling of Survival and Longitudinal Data: The R Package JSM

This paper is devoted to the R package JSM which performs joint statistical modeling of survival and longitudinal data. In biomedical studies it has been increasingly common to collect both baseline and longitudinal covariates along with a possibly censored survival time. Instead of analyzing the survival and longitudinal outcomes separately, joint modeling approaches have attracted substantive attention in the recent literature and have been shown to correct biases from separate modeling approaches and enhance information. Most existing approaches adopt a linear mixed effects model for the longitudinal component and the Cox proportional hazards model for the survival component. We extend the Cox model to a more general class of transformation models for the survival process, where the baseline hazard function is completely unspecified leading to semiparametric survival models. We also offer a non-parametric multiplicative random effects model for the longitudinal process in JSM in addition to the linear mixed effects model. In this paper, we present the joint modeling framework that is implemented in JSM, as well as the standard error estimation methods, and illustrate the package with two real data examples: a liver cirrhosis data and a Mayo Clinic primary biliary cirrhosis data.




modeling

Multi-body dynamic modeling of multi-legged robots

Mahapatra, Abhijit, author
9789811529535 (electronic bk.)




modeling

Regression for copula-linked compound distributions with applications in modeling aggregate insurance claims

Peng Shi, Zifeng Zhao.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 357--380.

Abstract:
In actuarial research a task of particular interest and importance is to predict the loss cost for individual risks so that informative decisions are made in various insurance operations such as underwriting, ratemaking and capital management. The loss cost is typically viewed to follow a compound distribution where the summation of the severity variables is stopped by the frequency variable. A challenging issue in modeling such outcomes is to accommodate the potential dependence between the number of claims and the size of each individual claim. In this article we introduce a novel regression framework for compound distributions that uses a copula to accommodate the association between the frequency and the severity variables and, thus, allows for arbitrary dependence between the two components. We further show that the new model is very flexible and is easily modified to account for incomplete data due to censoring or truncation. The flexibility of the proposed model is illustrated using both simulated and real data sets. In the analysis of granular claims data from property insurance, we find substantive negative relationship between the number and the size of insurance claims. In addition, we demonstrate that ignoring the frequency-severity association could lead to biased decision-making in insurance operations.




modeling

Modeling wildfire ignition origins in southern California using linear network point processes

Medha Uppala, Mark S. Handcock.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 339--356.

Abstract:
This paper focuses on spatial and temporal modeling of point processes on linear networks. Point processes on linear networks can simply be defined as point events occurring on or near line segment network structures embedded in a certain space. A separable modeling framework is introduced that posits separate formation and dissolution models of point processes on linear networks over time. While the model was inspired by spider web building activity in brick mortar lines, the focus is on modeling wildfire ignition origins near road networks over a span of 14 years. As most wildfires in California have human-related origins, modeling the origin locations with respect to the road network provides insight into how human, vehicular and structural densities affect ignition occurrence. Model results show that roads that traverse different types of regions such as residential, interface and wildland regions have higher ignition intensities compared to roads that only exist in each of the mentioned region types.




modeling

Modeling microbial abundances and dysbiosis with beta-binomial regression

Bryan D. Martin, Daniela Witten, Amy D. Willis.

Source: The Annals of Applied Statistics, Volume 14, Number 1, 94--115.

Abstract:
Using a sample from a population to estimate the proportion of the population with a certain category label is a broadly important problem. In the context of microbiome studies, this problem arises when researchers wish to use a sample from a population of microbes to estimate the population proportion of a particular taxon, known as the taxon’s relative abundance . In this paper, we propose a beta-binomial model for this task. Like existing models, our model allows for a taxon’s relative abundance to be associated with covariates of interest. However, unlike existing models, our proposal also allows for the overdispersion in the taxon’s counts to be associated with covariates of interest. We exploit this model in order to propose tests not only for differential relative abundance, but also for differential variability. The latter is particularly valuable in light of speculation that dysbiosis , the perturbation from a normal microbiome that can occur in certain disease conditions, may manifest as a loss of stability, or increase in variability, of the counts associated with each taxon. We demonstrate the performance of our proposed model using a simulation study and an application to soil microbial data.




modeling

Spatial modeling of trends in crime over time in Philadelphia

Cecilia Balocchi, Shane T. Jensen.

Source: The Annals of Applied Statistics, Volume 13, Number 4, 2235--2259.

Abstract:
Understanding the relationship between change in crime over time and the geography of urban areas is an important problem for urban planning. Accurate estimation of changing crime rates throughout a city would aid law enforcement as well as enable studies of the association between crime and the built environment. Bayesian modeling is a promising direction since areal data require principled sharing of information to address spatial autocorrelation between proximal neighborhoods. We develop several Bayesian approaches to spatial sharing of information between neighborhoods while modeling trends in crime counts over time. We apply our methodology to estimate changes in crime throughout Philadelphia over the 2006-15 period while also incorporating spatially-varying economic and demographic predictors. We find that the local shrinkage imposed by a conditional autoregressive model has substantial benefits in terms of out-of-sample predictive accuracy of crime. We also explore the possibility of spatial discontinuities between neighborhoods that could represent natural barriers or aspects of the built environment.




modeling

A semiparametric modeling approach using Bayesian Additive Regression Trees with an application to evaluate heterogeneous treatment effects

Bret Zeldow, Vincent Lo Re III, Jason Roy.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1989--2010.

Abstract:
Bayesian Additive Regression Trees (BART) is a flexible machine learning algorithm capable of capturing nonlinearities between an outcome and covariates and interactions among covariates. We extend BART to a semiparametric regression framework in which the conditional expectation of an outcome is a function of treatment, its effect modifiers, and confounders. The confounders are allowed to have unspecified functional form, while treatment and effect modifiers that are directly related to the research question are given a linear form. The result is a Bayesian semiparametric linear regression model where the posterior distribution of the parameters of the linear part can be interpreted as in parametric Bayesian regression. This is useful in situations where a subset of the variables are of substantive interest and the others are nuisance variables that we would like to control for. An example of this occurs in causal modeling with the structural mean model (SMM). Under certain causal assumptions, our method can be used as a Bayesian SMM. Our methods are demonstrated with simulation studies and an application to dataset involving adults with HIV/Hepatitis C coinfection who newly initiate antiretroviral therapy. The methods are available in an R package called semibart.




modeling

Bayesian modeling of the structural connectome for studying Alzheimer’s disease

Arkaprava Roy, Subhashis Ghosal, Jeffrey Prescott, Kingshuk Roy Choudhury.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1791--1816.

Abstract:
We study possible relations between Alzheimer’s disease progression and the structure of the connectome which is white matter connecting different regions of the brain. Regression models in covariates including age, gender and disease status for the extent of white matter connecting each pair of regions of the brain are proposed. Subject inhomogeneity is also incorporated in the model through random effects with an unknown distribution. As there is a large number of pairs of regions, we also adopt a dimension reduction technique through graphon ( J. Combin. Theory Ser. B 96 (2006) 933–957) functions which reduces the functions of pairs of regions to functions of regions. The connecting graphon functions are considered unknown but the assumed smoothness allows putting priors of low complexity on these functions. We pursue a nonparametric Bayesian approach by assigning a Dirichlet process scale mixture of zero to mean normal prior on the distributions of the random effects and finite random series of tensor products of B-splines priors on the underlying graphon functions. We develop efficient Markov chain Monte Carlo techniques for drawing samples for the posterior distributions using Hamiltonian Monte Carlo (HMC). The proposed Bayesian method overwhelmingly outperforms a competing method based on ANCOVA models in the simulation setup. The proposed Bayesian approach is applied on a dataset of 100 subjects and 83 brain regions and key regions implicated in the changing connectome are identified.




modeling

Modeling seasonality and serial dependence of electricity price curves with warping functional autoregressive dynamics

Ying Chen, J. S. Marron, Jiejie Zhang.

Source: The Annals of Applied Statistics, Volume 13, Number 3, 1590--1616.

Abstract:
Electricity prices are high dimensional, serially dependent and have seasonal variations. We propose a Warping Functional AutoRegressive (WFAR) model that simultaneously accounts for the cross time-dependence and seasonal variations of the large dimensional data. In particular, electricity price curves are obtained by smoothing over the $24$ discrete hourly prices on each day. In the functional domain, seasonal phase variations are separated from level amplitude changes in a warping process with the Fisher–Rao distance metric, and the aligned (season-adjusted) electricity price curves are modeled in the functional autoregression framework. In a real application, the WFAR model provides superior out-of-sample forecast accuracy in both a normal functioning market, Nord Pool, and an extreme situation, the California market. The forecast performance as well as the relative accuracy improvement are stable for different markets and different time periods.




modeling

Introduction to papers on the modeling and analysis of network data—II

Stephen E. Fienberg

Source: Ann. Appl. Stat., Volume 4, Number 2, 533--534.




modeling

Joint Modeling of Longitudinal Relational Data and Exogenous Variables

Rajarshi Guhaniyogi, Abel Rodriguez.

Source: Bayesian Analysis, Volume 15, Number 2, 477--503.

Abstract:
This article proposes a framework based on shared, time varying stochastic latent factor models for modeling relational data in which network and node-attributes co-evolve over time. Our proposed framework is flexible enough to handle both categorical and continuous attributes, allows us to estimate the dimension of the latent social space, and automatically yields Bayesian hypothesis tests for the association between network structure and nodal attributes. Additionally, the model is easy to compute and readily yields inference and prediction for missing link between nodes. We employ our model framework to study co-evolution of international relations between 22 countries and the country specific indicators over a period of 11 years.




modeling

Additive Multivariate Gaussian Processes for Joint Species Distribution Modeling with Heterogeneous Data

Jarno Vanhatalo, Marcelo Hartmann, Lari Veneranta.

Source: Bayesian Analysis, Volume 15, Number 2, 415--447.

Abstract:
Species distribution models (SDM) are a key tool in ecology, conservation and management of natural resources. Two key components of the state-of-the-art SDMs are the description for species distribution response along environmental covariates and the spatial random effect that captures deviations from the distribution patterns explained by environmental covariates. Joint species distribution models (JSDMs) additionally include interspecific correlations which have been shown to improve their descriptive and predictive performance compared to single species models. However, current JSDMs are restricted to hierarchical generalized linear modeling framework. Their limitation is that parametric models have trouble in explaining changes in abundance due, for example, highly non-linear physical tolerance limits which is particularly important when predicting species distribution in new areas or under scenarios of environmental change. On the other hand, semi-parametric response functions have been shown to improve the predictive performance of SDMs in these tasks in single species models. Here, we propose JSDMs where the responses to environmental covariates are modeled with additive multivariate Gaussian processes coded as linear models of coregionalization. These allow inference for wide range of functional forms and interspecific correlations between the responses. We propose also an efficient approach for inference with Laplace approximation and parameterization of the interspecific covariance matrices on the Euclidean space. We demonstrate the benefits of our model with two small scale examples and one real world case study. We use cross-validation to compare the proposed model to analogous semi-parametric single species models and parametric single and joint species models in interpolation and extrapolation tasks. The proposed model outperforms the alternative models in all cases. We also show that the proposed model can be seen as an extension of the current state-of-the-art JSDMs to semi-parametric models.




modeling

Hierarchical Normalized Completely Random Measures for Robust Graphical Modeling

Andrea Cremaschi, Raffaele Argiento, Katherine Shoemaker, Christine Peterson, Marina Vannucci.

Source: Bayesian Analysis, Volume 14, Number 4, 1271--1301.

Abstract:
Gaussian graphical models are useful tools for exploring network structures in multivariate normal data. In this paper we are interested in situations where data show departures from Gaussianity, therefore requiring alternative modeling distributions. The multivariate $t$ -distribution, obtained by dividing each component of the data vector by a gamma random variable, is a straightforward generalization to accommodate deviations from normality such as heavy tails. Since different groups of variables may be contaminated to a different extent, Finegold and Drton (2014) introduced the Dirichlet $t$ -distribution, where the divisors are clustered using a Dirichlet process. In this work, we consider a more general class of nonparametric distributions as the prior on the divisor terms, namely the class of normalized completely random measures (NormCRMs). To improve the effectiveness of the clustering, we propose modeling the dependence among the divisors through a nonparametric hierarchical structure, which allows for the sharing of parameters across the samples in the data set. This desirable feature enables us to cluster together different components of multivariate data in a parsimonious way. We demonstrate through simulations that this approach provides accurate graphical model inference, and apply it to a case study examining the dependence structure in radiomics data derived from The Cancer Imaging Atlas.




modeling

Estimating the Use of Public Lands: Integrated Modeling of Open Populations with Convolution Likelihood Ecological Abundance Regression

Lutz F. Gruber, Erica F. Stuber, Lyndsie S. Wszola, Joseph J. Fontaine.

Source: Bayesian Analysis, Volume 14, Number 4, 1173--1199.

Abstract:
We present an integrated open population model where the population dynamics are defined by a differential equation, and the related statistical model utilizes a Poisson binomial convolution likelihood. Key advantages of the proposed approach over existing open population models include the flexibility to predict related, but unobserved quantities such as total immigration or emigration over a specified time period, and more computationally efficient posterior simulation by elimination of the need to explicitly simulate latent immigration and emigration. The viability of the proposed method is shown in an in-depth analysis of outdoor recreation participation on public lands, where the surveyed populations changed rapidly and demographic population closure cannot be assumed even within a single day.




modeling

Semiparametric Multivariate and Multiple Change-Point Modeling

Stefano Peluso, Siddhartha Chib, Antonietta Mira.

Source: Bayesian Analysis, Volume 14, Number 3, 727--751.

Abstract:
We develop a general Bayesian semiparametric change-point model in which separate groups of structural parameters (for example, location and dispersion parameters) can each follow a separate multiple change-point process, driven by time-dependent transition matrices among the latent regimes. The distribution of the observations within regimes is unknown and given by a Dirichlet process mixture prior. The properties of the proposed model are studied theoretically through the analysis of inter-arrival times and of the number of change-points in a given time interval. The prior-posterior analysis by Markov chain Monte Carlo techniques is developed on a forward-backward algorithm for sampling the various regime indicators. Analysis with simulated data under various scenarios and an application to short-term interest rates are used to show the generality and usefulness of the proposed model.




modeling

Modeling Population Structure Under Hierarchical Dirichlet Processes

Lloyd T. Elliott, Maria De Iorio, Stefano Favaro, Kaustubh Adhikari, Yee Whye Teh.

Source: Bayesian Analysis, Volume 14, Number 2, 313--339.

Abstract:
We propose a Bayesian nonparametric model to infer population admixture, extending the hierarchical Dirichlet process to allow for correlation between loci due to linkage disequilibrium. Given multilocus genotype data from a sample of individuals, the proposed model allows inferring and classifying individuals as unadmixed or admixed, inferring the number of subpopulations ancestral to an admixed population and the population of origin of chromosomal regions. Our model does not assume any specific mutation process, and can be applied to most of the commonly used genetic markers. We present a Markov chain Monte Carlo (MCMC) algorithm to perform posterior inference from the model and we discuss some methods to summarize the MCMC output for the analysis of population admixture. Finally, we demonstrate the performance of the proposed model in a real application, using genetic data from the ectodysplasin-A receptor (EDAR) gene, which is considered to be ancestry-informative due to well-known variations in allele frequency as well as phenotypic effects across ancestry. The structure analysis of this dataset leads to the identification of a rare haplotype in Europeans. We also conduct a simulated experiment and show that our algorithm outperforms parametric methods.