The number of Asthma patients are increasing and the severity of their condition varies from person to person. In the next 20 years the cost of asthma is likely to be over $1.5 trillion.
It is important to understand the severity so that the patients with severe exacerbation asthma (SEA) are suggested for further treatment to consult a pulmonologist. It becomes crucial to correctly identify SEA patients because the treatment is very expensive and also there are a very few pulmonologists available in the industry. Also, it is important to prioritize the patients based on the severity of their disease, in other terms, they should be ranked on their need and urgency to consult a pulmonologist. Our business problem involved identifying SEA patients along with their ranking.
How we used Machine Learning to predict Asthma?
We built machine learning models to automatically predict and classify the condition and severity of Asthma in patients. In other words identify SEA patients among the Asthma patients using binary classification models. The next goal is to rank the SEA patients based on the severity of their condition so that they may be suggested to meet pulmonologists based on their rank.
MIMIC-III (Medical Information Mart for Intensive Care III) is a large, freely-available database comprising de-identified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012.
The database for predicting severe asthma included information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (both in and out of hospital).
This step involves getting asthma related data from the Mimic III database. We tried different approaches as mentioned below:
We retrieved patient level data on the basis of ICD codes starting with 493. The samples with the code 49392 are labelled as SEA (1/ positive samples with severe asthma. Only one admission per patient is taken, if one of the admissions had 49392, then that sample is labelled as having SEA. But this approach missed out some Asthma related data and also included irrelevant data.
There are 214 admissions in the database having ‘asthma’ condition as primary diagnosis (icd9_code like ‘493%’ AND diagnoses seq_num = 1). Records with null severity level are used as scoring data as they do not have a target label. 156 records have severity levels defined, it has four levels from 1 to 4 . Levels 1 and 2 are considered not severe, and levels 3 and 4 are considered as severe asthma records.
The following tables are used to get the final features for the data to be feeded to the machine learning models- diagnosis_icd, admissions, prescriptions, icustays, drgcodes, noteevents.
From diagnosis_icd table,
– item_code_count (derived) – count of distinct item codes per admission
From admissions table,
From prescriptions table,
– different_drug_count_prescriptions (derived) – number of distinct drugs
– los_sum_icustays (derived) – sum of los
– drg_severity (used to derive final target and not used as feature)
From patients table
– age (derived by using admission time and dob)
The target variable is target_asthma_status.
The text data description of the clinical notes is taken for nlp feature extraction from noteevents table. Most commonly prescribed drugs for Asthma are listed and the no. of times these drugs has been prescribed for a patient during an admission are taken as features (from prescriptions table). Also some common lab events related to asthma are listed and those values are used as features.
– BaseExcess, CalculatedTotalCO2, pCO2, pH, pO2, AnionGap, Bicarbonate, Calcium, Chloride, Creatinine, Glucose, Magnesium, Phosphate, Potassium, Sodium, UreaNitrogen, Basophils, Eosinophils, Hematocrit, Hemoglobin, Lymphocytes, MCH, MCHC, MCV, Monocytes, Neutrophils, Platelet Count, RDW, Red Blood Cells, White Blood Cells
Most commonly prescribed drugs for Asthma are listed and the no. of times these drugs has been prescribed for a patient during an admission are taken as features (from prescriptions table)
– advairdiskus, albuterol, albuterol0083nebsoln, albuterol05, albuterolinhaler, albuterolmdi, albuterolnebsoln, albuterolsulfate, albuterol-ipratropium, beclomethasonediprooral, beclomethasonediproaqnasal, beclomethasonedipropionate, budesonide, cetirizine-pseudoephedrine, cromolynsodium, dexamethasone, epinephrine1:1000, epinephrineinhalation, fluticasonepropionate110mcg, fluticasonepropionatenasal, fluticasone-salmeterol, fluticasone-salmeterol100/50, fluticasone-salmeterol250/50, fluticasone-salmeteroldiskus100/50, fluticasone-salmeteroldiskus250/50, fluticasone-salmeteroldiskus500/50, hydrocortisone, hydrocortisonenasucc, ipratropiumbromide, ipratropiumbromidemdi, ipratropiumbromideneb, levalbuterolhcl, levalbuterolneb, methylprednisolonenasucc, methylprednisolonesodiumsucc, montelukastsodium, norepinephrine, prednisone, pseudoephedrinehcl, racepinephrine, salmeterol, salmeterolxinafoatediskus50mcg, terbutalinesulfate, theophyllineoralsolution, theophyllinesr, tobramycin-dexamethasoneophthoint, xopenex, xopenexhfa, xopenexneb
– The raw data is transformed into the data with the features as mentioned above, and this data is used for modelling.
– The data with 156 records having the severity level defined, is then split into train and test sets in 80:20 ratio using stratified random sampling on the target variable with a random seed value 13.
– Upsampling is done on train data having 124 samples and the upsampled data has 180 records with 45 samples in each of the 4 levels of severity.
– The target variable defined has 2 classes- on combining severity levels 1 and 2 into class 0, and the severity levels 3 and 4 into class 1.
– Explored few classifier algorithms from scikit-learn like Multinomial Naive Bayes, Logistic Regression, Svm Linear, SvmRadial, RandomForestClassifier.
– The best classifier is chosen by parameter tuning and 5 fold cross validation on the train data.
– Currently using RandomForestClassifier with different experiments of feature selection, scaling, nlp features, etc.
– Modelling on data with features not including any nlp features
– Modelling on data with nlp features generated using tfidf (text preprocessed with nltk) – all sections of discharge notes, and with some sections of discharge notes
– Modelling on data with nlp features generated using tfidf (text preprocessed with nltk) – all sections of discharge notes, and with some sections of discharge notes, along with the other features
– Modelling on data with nlp features generated using tfidf (text preprocessed with spacy)
– Modelling on data with nlp features generated using tfidf (text preprocessed with spacy) along with the other features
– Modelling on data with nlp features generated using Word2Vec model (developed on discharge notes from MIMIC data)
– Modelling on data with nlp features generated using Word2Vec model (developed on discharge notes from MIMIC data) along with the other features
– To report performance of a machine learning model to stakeholders, we need more than just accuracy
– PPA, predictive positive rate or recall is more useful for this use case
– The rule is that it is better to identify irrelevant data that can be discarded upon review than to miss any of the relevant patients.
– Another important score in a lot of literature on healthcare is AUC (sometimes called c-statistic), which evaluates the positive cases and the negative cases.
– Precision and recall evaluate the positive cases that were given as positive and auc score evaluates the positive cases that the model said were positive.
– Rather than presenting just a single error score (accuracy error), a confidence interval can be calculated and presented as part of the model skill.
Train and Test Data sets:
|Total Samples||Class 1||Class 0|
|Train Data||124 (80%) -upsampled to 180||90||90|
|Test Data||32 (20%)||16||16|
The performance measures for the model with the test data are in the tables below. The columns show the method followed for the modelling, the confidence intervals for the errors in accuracy at a 95% confidence, the roc_auc_score and the accuracy score.
The experiment setup is followed in reference to the research paper: Classification of Asthma Severity. However with the same experiment setup, the number of samples in the data are less than what is mentioned in the paper. Hence the accuracy obtained might be less than what is achieved in the research paper. The average accuracy, sensitivity and specificity achieved for experiments in the paper are 91.1%, 93% and 89.3% respectively.