A Common Data Model for Machine Learning in Healthcare
Most healthcare data like EHR, claims, socioeconomic, etc are aimed to aid a specific practice domain that is disparate and diverse. Hence these are difficult to operate on under a single roof for certain purposes. So, to collect and maintain all these types of data at a single place for performing analytics and machine learning on top of it, we need a common data model approach to support this.
Predera Data model (PDM) is a healthcare common data model designed to facilitate data interoperability which is built on top of OMOP CDM and extended using FHIR data standard is used to support observational analytics on top of healthcare data.
OMOP CDM allows for the systematic analysis of disparate observational databases and making disparate coding systems be harmonized with minimal information loss into a standardized vocabulary.
In order to populate the PDM
- Initially, the EHR data in various formats like HL7, FHIR, CSV files, etc are collected in a Data warehouse. This DW also acts as a historic datastore and also purposes for other applications.
- Based on the type/schemas of the data each field is first mapped to respective equivalent FHIR identifiers to create FHIR resources.
- This helps us to bring all the various standardized/non-standardized data into the FHIR healthcare standard which is widely used in the industry and can leverage all the additional features of FHIR as well
- These FHIR resources are then mapped to the fields of the pre-built PDM schema to populate it.
To perform all this data ETL, stitching, etc. Apache Spark has been used.
- Apache Spark is a high-performance in-memory parallel processing engine which can have the capability of reading data from data lakes, data warehouses, relational databases, and Big Data sources
- The application built using Spark can also connect to APIs exposed from the EMR machines or data exported from the EMRs into various locations
- From simple to complex transformations of data can be done here and then finally can be loaded into the PDM.
Predera Data Model (PDM) seamlessly works with all data formats as it implicitly contains all the resources supported by FHIR which is an industry-standard.
This is designed to evolve with different data sources and formats coming in to support various applications and usages.
Using the data residing in PDM things such as feature stores, machine learning models, rule engines can be seamlessly built and various analytics can also be performed.