Michiganders Researching Coronavirus has received the following awards and nominations. Way to go!

In order to gain insight into the human factors that were most essential Covid-19’s infection rate, we focused on gathering diverse datasets, encompassing demographics, NASA/JAXA satellite-based data, government policies, mobility, and historic Covid-19 data, among other sources. Due to time constraints, we focused only on county-level data from the United States. We then created a Long Short Term Memory (LSTM) machine learning model to predict future “hotspot indices,” which forecasts future outbreaks at the county level. This would allow citizens and local governments to take appropriate action ahead of time while avoiding a “one size fits all approach” to all counties. In other words, the model’s main benefit is that it is specific enough to pinpoint county-level outbreaks, allowing localities with a low hotspot indices to re-open while keeping others with higher values closed.
With 20% unemployment and over 100,000 deaths in the U.S., the impact of Coronavirus has been devastating. Since the majority of us are Asian Americans, we were also significantly impacted through negative racial stereotypes. Because of these unfortunate factors, we wanted to use our knowledge to make a positive and substantial impact on the worldwide fight against the Coronavirus.
In this project, we decided to focus on an initiative that would mitigate a much-feared “second wave” of cases, as weakening this phenomenon would result in less societal bloodletting. To do this, we looked at geographic and temporal patterns, as these are extremely vital to examine before the next major outbreak. Understanding the relationship between Coronavirus and different factors — pollution, environment, ethnicity and communities, government action, and health— can give guidance towards more impactful and precise actions to stifle the spread of Covid-19.
More specifically, we gathered satellite data of UV indices, NO2 emissions, and land surface temperatures from NASA’s Earthdata Search (OMI/Aura and MODIS/Aqua). We also used precipitation data from JAXA’s GCOM-W1, night light data from NASA’s SNPP (Suomi National Polar Partnership) VIIRS (Visible Infrared Imaging Radiometer Suite) Instrument, as well as non-time-series data such as the socioeconomic and health rankings data from Robert Wood Johnson Foundation. We then processed them with Python using libraries such as h5py and Pandas. Because Covid-19 was recently discovered in late December, we decided to filter and obtain space data from January to the end of May.
Upon gathering and processing all the data, we created a Long Short Term Memory (LSTM) machine learning model using TensorFlow, which served as the foundation for our interactive choropleth. We trained the model on Google Colab to predict the future severity of outbreaks as measured by our “hotspot index”. This metric is a combination of the incidence rate, mortality rate, new case growth percentages, and total confirmed cases; it is predicted using data from the prior two weeks. Using HTML, CSS, Javascript, and Leaflet JS, we created an interactive county-level choropleth map that informs the user on the current number of cases per 1000 as well as the model's predicted “hotspot index.”
One of the most difficult challenges we faced was cleaning the data and transforming it into Pandas dataframes that could be correctly processed. For example, we discovered that over 40% of the precipitation data from JAXA’s GCOM-W was missing for the U.S. To resolve this issue, we filled in blank values with the next available value. Devising the hotspot index was also a challenge, as the process involved fine-tuning the parameters to minimize error, avoid underfitting and overfitting, and ensure that the model followed prevailing science knowledge. Additionally, the models we created with machine learning occasionally created invalid values, failed to record values for some counties, and had outliers which made us doubt the model that we created. We almost abandoned our machine learning model in favor of a statistical regression model. Nonetheless, during the past two days, we trained the machine learning model constantly to obtain a small enough error, measured by mean absolute percentage error.
Despite the adversity, we were able to successfully create a map representing our model’s forecasted hotspots of Covid-19 on a website. Our machine learning model can predict within an ~8 % margin of error. We also inferred the importance of each input feature through perturbation and permutation tests. Most importantly, we were able to successfully identify specific counties that were most susceptible to Coronavirus and map out our data effectively so proactive action may be taken.
Video: https://youtu.be/KAUdL_GG324
Website: https://covid19mrc.us/