Human Factors

The emergence and spread of infectious diseases, like COVID-19, are on the rise. Can you identify patterns between population density and COVID-19 cases and identify factors that could help predict hotspots of disease spread?

The Human Factor Theory

Summary

The developed random forest algorithm works to predict Covid-19 cases in an area by the parameters of longitude, latitude, population, and current cases. Initially, the algorithm gets trained on the dataset. 60% of the dataset is utilized in training. After that, the algorithm is tested on rest 40% of the dataset and using that, the efficiency is calculated. The random forest algorithm constructs multiple decision trees and outputs the modal class making it more efficient than decision trees.

How We Addressed This Challenge

The Human Factor Theory addresses the Human Factor Challenge by providing an algorithm which is able to predict the number of COVID-19 cases in a region based on data provided to it of the parameters utilized. This allows officials to forecast potential surge areas and take control the situation before it becomes a hotspot. By achieving this the Human Factor Theory answers the deliverable of forecasting Corona Virus cases in regions based on human behavior in terms of a population.

How We Developed This Project

Motivation:

Living in a nation where we’ve all walked down streets so crowded that it’s nearly impossible to even ride a motorcycle through them, we have first-handedly seen the havoc the COVID-19 is wreaking. These areas are prone to diseases spreading like wildfire, a prime example of this is the Indian City of Dreams, Mumbai, Maharashtra. This beloved city is the 7th most populous city in the world, 2nd most populated in India after New Delhi and it’s fallen to its knees amidst this pandemic. To do our part in helping the people and families that constitute these statistics, Team Clusteris chose this challenge. Identifying hotspots and how human factors are causing the disease to spread are the need of the hour. In the midst of a pandemic, it is crucial to be able to identify surge areas in order to control the outbreak especially as many countries look to reopen their imposed lockdowns.

Approach: Our approach to developing the project was to evaluate several datasets and create an accurate predictive model which could forecast area-based hotspots. We decided to use a random forest algorithm as it constructs multiple decision trees, making it far more efficient than decision trees. We achieved an accuracy of 95%.

Space Agency Data: Data from the CSA (Canadian Space Agency) was utilized in order to test the model on various Canadian provinces. Further data regarding COVID-19 cases was obtained from data.world, a site which collects data from multiple agencies such as Johns Hopkins, Harvard Global Health Institute, WHO, and others.

Coding tools and languages:

  1. Language: Python
  2. Libraries: Pandas, Numpy, Matplotlib, Sklearn
  3. Algorithm: Random forest

Problems & Achievements:

The first challenge we faced was finding a dataset which would fit our requirements. It took us the better part of the first day in order to narrow down our search to the currently used resources.

While running the algorithm we faced technical issues where the laptop running the algorithm crashed twice due to low computing power for the datasets being utilized. This was the most obvious red flag for adding more parameters or sourcing additional datasets.

Project Demo

https://www.canva.com/design/DAD92T7mRYo/8QlX1PLY4CbLF1e9D_4aog/view?utm_content=DAD92T7mRYo&utm_campaign=designshare&utm_medium=link&utm_source=sharebutton

Tags
#predictive model #covid19 #machine learning #hotspots
Global Judging
This project was submitted for consideration during the Space Apps Global Judging process.