predictive modeling

Getting Started with Python Integration to SAS Viya for Predictive Modeling - Comparing Logistic Regression and Decision Tree

Comparing Logistic Regression and Decision Tree - Which of our models is better at predicting our outcome? Learn how to compare models using misclassification, area under the curve (ROC) charts, and lift charts with validation data. In part 6 and part 7 of this series we fit a logistic regression [...]

Getting Started with Python Integration to SAS Viya for Predictive Modeling - Comparing Logistic Regression and Decision Tree was published on SAS Users.




predictive modeling

Getting Started with Python Integration to SAS Viya for Predictive Modeling - Fitting a Random Forest

Learn how to fit a random forest and use your model to score new data. In Part 6 and Part 7 of this series, we fit a logistic regression and decision tree to the Home Equity data we saved in Part 4. In this post we will fit a Random [...]

Getting Started with Python Integration to SAS Viya for Predictive Modeling - Fitting a Random Forest was published on SAS Users.




predictive modeling

Getting Started with Python Integration to SAS Viya for Predictive Modeling - Fitting a Gradient Boosting Model

Fitting a Gradient Boosting Model - Learn how to fit a gradient boosting model and use your model to score new data In Part 6, Part 7, and Part 9 of this series, we fit a logistic regression, decision tree and random forest model to the Home Equity data we [...]

Getting Started with Python Integration to SAS Viya for Predictive Modeling - Fitting a Gradient Boosting Model was published on SAS Users.




predictive modeling

Predictive Modeling of Type 1 Diabetes Stages Using Disparate Data Sources

This study aims to model genetic, immunologic, metabolomics, and proteomic biomarkers for development of islet autoimmunity (IA) and progression to type 1 diabetes in a prospective high-risk cohort. We studied 67 children: 42 who developed IA (20 of 42 progressed to diabetes) and 25 control subjects matched for sex and age. Biomarkers were assessed at four time points: earliest available sample, just prior to IA, just after IA, and just prior to diabetes onset. Predictors of IA and progression to diabetes were identified across disparate sources using an integrative machine learning algorithm and optimization-based feature selection. Our integrative approach was predictive of IA (area under the receiver operating characteristic curve [AUC] 0.91) and progression to diabetes (AUC 0.92) based on standard cross-validation (CV). Among the strongest predictors of IA were change in serum ascorbate, 3-methyl-oxobutyrate, and the PTPN22 (rs2476601) polymorphism. Serum glucose, ADP fibrinogen, and mannose were among the strongest predictors of progression to diabetes. This proof-of-principle analysis is the first study to integrate large, diverse biomarker data sets into a limited number of features, highlighting differences in pathways leading to IA from those predicting progression to diabetes. Integrated models, if validated in independent populations, could provide novel clues concerning the pathways leading to IA and type 1 diabetes.




predictive modeling

Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach. (arXiv:2005.03582v1 [cs.LG])

Early detection of patients vulnerable to infections acquired in the hospital environment is a challenge in current health systems given the impact that such infections have on patient mortality and healthcare costs. This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units by means of machine-learning methods. The aim is to support decision making addressed at reducing the incidence rate of infections. In this field, it is necessary to deal with the problem of building reliable classifiers from imbalanced datasets. We propose a clustering-based undersampling strategy to be used in combination with ensemble classifiers. A comparative study with data from 4616 patients was conducted in order to validate our proposal. We applied several single and ensemble classifiers both to the original dataset and to data preprocessed by means of different resampling methods. The results were analyzed by means of classic and recent metrics specifically designed for imbalanced data classification. They revealed that the proposal is more efficient in comparison with other approaches.