Practical Machine Learning for Prognosis: A Case for AUC and Hepatitis Guideline Driven Inputs

Written by SwitchPoint Ventures

At SwitchPoint, we partner with business leaders across industries to drive rapid technological innovation. Give us your toughest challenge and we’ll turn it into your biggest opportunity.


In this paper, co-authored by SwitchPoint’s Chief Data Scientist, Damian Mingle, machine learning was used to predict hepatitis-caused mortality in a way that is analytically robust and clinically relevant. Specifically, predictor variables in the dataset were used to generate new features based on clinical guidelines on test result thresholds and risk levels. The model was optimized to maximize area under the receiver operator curve, which arguably has more clinical relevance. Finally, models were tested for performance on a holdout sample that never served as part of any training set, serving as a proxy for expected out-of-sample performance. Using this guideline-inspired feature engineering and a model ensembling approach, we found that our particular model has an AUC of 98% and an accuracy of 96.77% in the holdout data. Based on this, our approach may serve as a first step in integrating machine learning models with clinical decision-making.