Covid Definition:

A mild to severe respiratory illness that is caused by a coronavirus, is transmitted chiefly by contact with infectious material (such as respiratory droplets) or with objects or surfaces contaminated by the causative virus, and is characterized especially by fever, cough, and shortness of breath and may progress to pneumonia and respiratory failure. Coronaviruses are a large family of viruses that cause illness ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome and Severe Acute Respiratory Syndrome. 2019-nCoV is a new strain that has not been previously identified in humans and causes COVID19/ coronavirus disease.

Business Problem:

Artificial intelligence (AI) has the potential to revolutionize disease diagnosis and management by performing classification difficult for human experts and by rapidly reviewing immense amounts of images. COVID is possibly better diagnosed using chest x-rays.

Data Problem:

Build a deep learning model that learns the patterns from the chest x-rays and identifies if the image of the chest x-ray is either normal or is having COVID.

Data Source:

The datasets include COVID dataset and Pneumonia dataset. “Normal” class images are taken from Pneumonia dataset.

The Pneumonia dataset is organized into 3 folders (train, test, val) and contains subfolders for each image category (Pneumonia/Normal). There are 5,863 X-Ray images (JPEG) and 2 categories (Pneumonia/Normal). 

Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients’ routine clinical care.

COVID dataset is a database of COVID-19 cases with chest X-ray or CT images. It contains COVID-19 cases as well as MERS, SARS, and ARDS. This has a metadata.csv file and a folder of images. Here is a list of each metadata field, with explanations:

– Patientid (internal identifier, just for this dataset)

– offset (number of days since the start of symptoms or hospitalization for each image, this is very important to have when there are multiple images for the same patient to track progression while being imaged. If a report says “after a few days” let’s assume 5 days.)

– sex (M, F, or blank)

– age (age of the patient in years)

– finding (which pneumonia)

– survival (did they survive? Y or N)

– view (for example, PA, AP, or L for X-rays and Axial or Coronal for CT scans)

– modality (CT, X-ray, or something else)

– date (date the image was acquired)

– location (hospital name, city, state, country) importance from right to left.

– filename

– doi (DOI of the research article)

– url (URL of the paper or website where the image came from)

– license

– clinical notes (about the radiograph in particular, not just the patient)

– other notes (e.g. credit)

Covid data source


The images from covid dataset are chosen by applying some filter on metadata.csv. The sample images corresponding to metadata rows with column ‘finding’ value as ‘COVID-19’ are taken as covid class images. 351 total covid images are found from covid dataset.

1341 normal images are taken from train and test folders of the pneumonia dataset.

The total images are split into 80:20 train and test ratio. 20% of the train set is further divided into validation dataset.

Thus the training dataset has 1083 images belonging to both the classes. The validation dataset has 270 and the test dataset contains 339 images.

Deep Learning Approach:

Tried out transfer learning algorithms which are trained on ImageNet as the base models and built cnn layers on top of them, using their learned weights and trained the cnn layers with the train data to build the final trained models. The transfer learning algorithm used is DenseNet121.

The model has a base layer of DenseNet121 with weights initialized with ‘imagenet’ weights and average pooling has been applied. On top of it 2 dense cnn layers are added with relu and sigmoid activation functions.

Model Performance: (Test performance)

Predera AIQ
g80oNmTd6wUSWZpDtSZgnjmul09tBkmHkAzO4BWVTDzatnJK6 MVsktSjE5Xx7ghIEyS7On6Q9wZowTb4ZfFM73XbimYfvdMyqlLxr3xo1SAE5t0KvrUc0b3OQQL5OtHjpF NnJi Predera AIQ

Test Metrics: (339 images)

Accuracy of the model is –  97.9 %