Michiganders Researching Coronavirus | Human Factors

Awards & Nominations

Michiganders Researching Coronavirus has received the following awards and nominations. Way to go!

Best Use of Science

The solution that makes the best and most valid use of science and/or the scientific method.

Human Factors

The emergence and spread of infectious diseases, like COVID-19, are on the rise. Can you identify patterns between population density and COVID-19 cases and identify factors that could help predict hotspots of disease spread?

Project Prometheus

Summary

The purpose of Project Prometheus is to predict areas of the U.S. (and globally, in the future) that will be most affected by Coronavirus via utilizing machine learning tools. By forecasting future severity rates (via our "Hotspot Index"), the model points us towards a better balance of government intervention and economic activity to minimize the negative impact to communities.

How We Addressed This Challenge

In order to gain insight into the human factors that were most essential Covid-19’s infection rate, we focused on gathering diverse datasets, encompassing demographics, NASA/JAXA satellite-based data, government policies, mobility, and historic Covid-19 data, among other sources. Due to time constraints, we focused only on county-level data from the United States. We then created a Long Short Term Memory (LSTM) machine learning model to predict future “hotspot indices,” which forecasts future outbreaks at the county level. This would allow citizens and local governments to take appropriate action ahead of time while avoiding a “one size fits all approach” to all counties. In other words, the model’s main benefit  is that it is specific enough to pinpoint county-level outbreaks, allowing localities with a low hotspot indices to re-open while keeping others with higher values closed.

How We Developed This Project

With 20% unemployment and over 100,000 deaths in the U.S., the impact of Coronavirus has been devastating. Since the majority of us are Asian Americans, we were also significantly impacted through negative racial stereotypes. Because of these unfortunate factors, we wanted to use our knowledge to make a positive and substantial impact  on the worldwide fight against the Coronavirus.

In this project, we decided to focus on an initiative that would mitigate a much-feared “second wave” of cases, as weakening this phenomenon would result in less societal bloodletting. To do this, we looked at geographic and temporal patterns, as these are extremely vital to examine before the next major outbreak. Understanding the relationship between Coronavirus and different factors — pollution, environment, ethnicity and communities, government action, and health— can give guidance towards more impactful and precise actions to stifle the spread of Covid-19.

More specifically, we gathered satellite data of UV indices, NO2 emissions, and land surface temperatures from NASA’s Earthdata Search (OMI/Aura and MODIS/Aqua). We also used precipitation data from JAXA’s GCOM-W1, night light data from NASA’s SNPP (Suomi National Polar Partnership) VIIRS (Visible Infrared Imaging Radiometer Suite) Instrument, as well as non-time-series data such as the socioeconomic and health rankings data from Robert Wood Johnson Foundation. We then processed them with Python using libraries such as h5py and Pandas. Because Covid-19 was recently discovered in late December, we decided to filter and obtain space data from January to the end of May.

Upon gathering and processing all the data, we created a Long Short Term Memory (LSTM) machine learning model using TensorFlow, which served as the foundation for our interactive choropleth. We trained the model on Google Colab to predict the future severity of outbreaks as measured by our “hotspot index”. This metric is a combination of the incidence rate, mortality rate, new case growth percentages, and total confirmed cases; it is predicted using data from the prior two weeks. Using HTML, CSS, Javascript, and Leaflet JS, we created an interactive county-level choropleth map that informs the user on the current number of cases per 1000 as well as the model's predicted “hotspot index.”

One of the most difficult challenges we faced was cleaning the data and transforming it into Pandas dataframes that could be correctly processed. For example, we discovered that over 40% of the precipitation data from JAXA’s GCOM-W was missing for the U.S. To resolve this issue, we filled in blank values with the next available value. Devising the hotspot index was also a challenge, as the process involved fine-tuning the parameters to minimize error, avoid underfitting and overfitting, and ensure that the model followed prevailing science knowledge. Additionally, the models we created with machine learning occasionally created invalid values, failed to record values for some counties, and had outliers which made us doubt the model that we created. We almost abandoned our machine learning model in favor of a statistical regression model. Nonetheless, during the past two days, we trained the machine learning model constantly to obtain a small enough error, measured by mean absolute percentage error.

Despite the adversity, we were able to successfully create a map representing our model’s forecasted hotspots of Covid-19 on a website. Our machine learning model can predict within an ~8 % margin of error. We also inferred the importance of each input feature through perturbation and permutation tests. Most importantly, we were able to successfully identify specific counties that were most susceptible to Coronavirus and map out our data effectively so proactive action may be taken.

Data & Resources
  1. Berrick, S. (2004, October 1). Earthdata Search. Retrieved May 30, 2020, fromhttps://search.earthdata.nasa.gov/search/granules?p=C1266136111-GES_DISC
  2. Berrick, S. (2020, May 30). Find Environmental Impacts Data. Retrieved May 30, 2020, fromhttps://earthdata.nasa.gov/learn/pathfinders/covid-19/environmental-impacts
  3. Berrick, S. (2020, May 12). Find Seasonality Data. Retrieved May 30, 2020, from https://earthdata.nasa.gov/learn/pathfinders/covid-19/seasonality
  4. Jari Hovila, Antii Arola, and Johanna Tamminen (2014), OMI/Aura Surface UVB Irradiance and Erythemal Dose Daily L2 Global Gridded 0.25 degree x 0.25 degree V3, NASA Goddard Space Flight Center, Goddard Earth Sciences Data and Information Services Center (GES DISC), Accessed: [Data Access Date], 10.5067/Aura/OMI/DATA2028
  5. “VIIRS Stray Light Corrected Nighttime Day/Night Band Composites Version 1.” Google, Google, 2014, developers.google.com/earth-engine/datasets/catalog/NOAA_VIIRS_DNB_MONTHLY_V1_VCMSLCFG#description.
  6. Wan, Z., Hook, S., Hulley, G. (2015). MYD11C1 MODIS/Aqua Land Surface Temperature/Emissivity Daily L3 Global 0.05Deg CMG V006 [Data set]. NASA EOSDIS Land Processes DAAC. Accessed 2020-05-28 fromhttps://doi.org/10.5067/MODIS/MYD11C1.006
  7. E. (Ed.). (2020, May 18). JHU Centers for Civic Impact Covid-19 County Cases (Daily Update). Retrieved May 30, 2020, fromhttps://coronavirus-resources.esri.com/datasets/4cb598ae041348fb92270f102a6783cb/data?layer=1
  8. Adams, M. (2020, August 3). Early Release - Population-Based Estimates of Chronic Conditions Affecting Risk for Complications from Coronavirus Disease, United States - Volume 26, Number 8-August 2020 - Emerging Infectious Diseases journal - CDC. Retrieved May 30, 2020, fromhttps://wwwnc.cdc.gov/eid/article/26/8/20-0679_article
  9. G. (2020). COVID-19 Community Mobility Report. Retrieved May 30, 2020, from https://www.google.com/covid19/mobility/
  10. Hernandez, M. (2020). This is the effect coronavirus has had on air pollution all across the world. Retrieved May 30, 2020, from https://www.weforum.org/agenda/2020/04/coronavirus-covid19-air-pollution-enviroment-nature-lockdown/
  11. IHME: COVID-19 Projections. (2020, May 26). Retrieved May 29, 2020, fromhttps://covid19.healthdata.org/united-states-of-america
  12. J. (Ed.). (2020, May 27). Mortality Analyses. Retrieved May 30, 2020, fromhttps://coronavirus.jhu.edu/data/mortality
  13. Leonard, A. (2020, April 24). Hokkaido Forced to Reinstate Lockdown After Coronavirus Returned. Retrieved May 29, 2020, fromhttps://time.com/5826918/hokkaido-coronavirus-lockdown/
  14. Lusk, J. (2020, May 01). NGF: Only four states remain closed to golf with no announced dates to reopen. Retrieved May 31, 2020, from https://golfweek.usatoday.com/2020/05/01/ngf-only-four-states-closed-golf/
  15. N. (Ed.). (2020, April 22). COVID-19 in Racial and Ethnic Minority Groups. Retrieved May 29, 2020, from https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/racial-ethnic-minorities.html
  16. Ogden, C. (2017, August 01). Overweight & Obesity Statistics. Retrieved May 30, 2020, fromhttps://www.niddk.nih.gov/health-information/health-statistics/overweight-obesity
  17. Powell, A. (2020, April 14). Warm weather may have no impact on COVID-19. Retrieved May 29, 2020, fromhttps://news.harvard.edu/gazette/story/2020/04/covid-19-may-not-go-away-in-warmer-weather-as-do-colds/
  18. T. (2020, April 24). UPDATE State officials say reopening will be regional, data-driven. Retrieved June 01, 2020, from https://www.dailyitem.com/news/local_news/update-state-officials-say-reopening-will-be-regional-data-driven/article_26f4cc1c-8580-11ea-a7ae-f3cb259dd4ac.html
  19. U. (2020). Explore Health Rankings: Rankings Data & Documentation. Retrieved May 30, 2020, fromhttps://www.countyhealthrankings.org/explore-health-rankings/rankings-data-documentation?fbclid=IwAR0sUOv2BU7j4UH8I5EL-g-vt1peVY37ZKQeQ6z94l7DQn6cNNIBw_zaZ00
  20. Xie, J., & Zhu, Y. (2020, July 1). Association between ambient temperature and COVID-19 infection in 122 cities from China. Retrieved May 29, 2020, fromhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7142675/
Tags
#machinelearning, #bigdata, #JAXA, #NightLights, #ArtificialIntelligence, #ForecastModeling
Global Judging
This project was submitted for consideration during the Space Apps Global Judging process.