An Integrated Assessment

Your challenge is to integrate various Earth Observation-derived features with available socio-economic data in order to discover or enhance our understanding of COVID-19 impacts.

BANDERSNATCH-19

Summary

The solution provided by AI-geos is a Digital Twin powered by an Artificial Intelligence capable to extract significant pattern involved in the Virus affair and to simulate its global spread/social impact depending on variables such as PM2.5, CO2, population density etc. playing with them in a smart and interactive web-app user-interface.

How We Addressed This Challenge

The idea is to find correlations between anthropic and natural features in common between all the states of the world, in order to find the main factors that could impact on the COVID-19 spread. Moreover, our project provides predictions on how changing the main factors could impact on the pandemic: thus, we integrate various EO-derived features with available and/or derived socio-economic data in various ways in order to discover and enhance our understanding of COVID-19 impacts.

How We Developed This Project
  • What inspired your team to choose this challenge?

Our country ended the lockdown recently, but we noticed that governments did not know which factors could be determinant to lower the infection cases. Joining multisource and multi-temporal data, we developed a solution to help countries, still fighting against COVID-19 spread, in predicting which actions can have a significant impact in defeating the pandemic and face future diseases outbreaks.

  • What was your approach to developing this project?

Taking advantage of our experience in the space/environment/data science field, we accurately searched and screened the most significant satellite data and the most updated socio-economic ones that could be correlated with the COVID spread. We then selected both natural and anthropogenic variables in order to create a dataset that, combined with socio-economic data, could feed an AI-model, able to extract COVID patterns exploiting several all those data. The neural network is able to generalize changing the value of the most important features, showing new scenarios based on these variations. All this information have been collected and visualized on a map in an innovative Web-App, where you can experiment some parameter combination, e,g population density, number of physicians, pollution etc. to see which effect they could have.

Data used are: Meteorological data (Temperature, Precipitation, Wind, Humidity), Land Use (e.g Permanent Crops, Land Area, Forest Coverage, Urban Area) and Pollution data (NO2, OZONE, PM2.5, CO2) from NASA satellite as OMI, Population data (e.g Global population density, life expectancy, physicians rates, the annual rate of population increase), Socio-economic data (e.g GDP per capita, Balance Import Export, Index of industrial production, Employment rate, Light night intensity rate) from many sources as JRC GHSL layer, United Nation and World Bank Datasets. 

Together they generated more that 130 features to train our very precise model! 

  • How did you use space agency data in your project?

We used NASA satellite data to add fundamental features to our dataset, which are necessary to reach high prediction accuracy in the AI model. In fact, they provide worldwide coverage needed to obtain a complete overview of the situation.

  • What tools, coding languages, hardware, software did you use to develop your project?
  1. Coding Language: Python 3.7
  2. AI framework: fast.ai , Pytorch 
  3. App and Data visualization Tools: Dash, Plotly 
  4. Hardware: Radeon Pro 555 2 GBIntel HD  Graphics 630 (GPU, for training), Intel core i7-8665u (CPU, for pre and post processing)
  5. Tool for Satellite data download : Wget, Nasa Open Data Portal - Giovanni, earthdata.nasa.gov tool
  • ยท What problems and achievements did your team have?

We faced some problems related to the collection, standardizazion and fusing of data coming from multiple EO and non-EO sources. We also dealt with the problem of choosing the most appropriate variables that could be correlated with the epidemic impact.

Since for some nations we had some null values, we have implemented an algorithm to fill those empty values with high accuracy, reconstructing their value from nations with similar characteristics.

The model having with target the spread of the virus on a logarithmic scale reaches an accuracy of about 1.2 on a root mean square error metric.

The network is composed of 2 hidden layers, composed of 200 and 100 neurons, to better extract all the requested patterns and information. 

Tags
#artificialintelligence #AI #digitaltwin #webapp #datascience #earthobservation #socialeconomic #datafusion #ai-geos
Global Judging
This project was submitted for consideration during the Space Apps Global Judging process.