An Integrated Assessment

Your challenge is to integrate various Earth Observation-derived features with available socio-economic data in order to discover or enhance our understanding of COVID-19 impacts.

Thoth: Using data science to understand causes and impacts of COVID-19

Summary

Our project is a wide study about environmental and socioeconomic factors that have correlations with the virus spread. We aim to help governments, decision makers and NGOs with informations to prioritize actions in locations most likely to have more infections. Our solution uses NASA, ESA and JAXA spatial data allied with government datasets to understand factors that impact the number of infections and deaths and the paramaters that are being affected by the virus and lockdown measures.

How We Addressed This Challenge

The Problem

The COVID-19 pandemic reached all over the globe. Our unprecedented situation drove global leadres to deal with it by using different methods. Closing borders, using social distancing, imposing lockdowns and other measures were used in the idea of smoothing the infection curve. Some countries were more rigorous than others, and with this we could understand the impacts of this social measures. But some of the factors that could impact the contamination were not deeply known, primarily beacuse we haven't lived any situation like this. But know, with the help of high sofisticated sources, we can use a wide range of data to analyze the impacts of environmental and socioeconomic characteristics in the virus spread.

Our Solution

To adress this unique global challenge we have to create unique solutions. But in the era of information, we are able to search solutions of problems that we already lived to merge ideas and come out with a new one. Inspired by studies of environmental impacts on differentes diseases, such as malaria and dengue, and socioeconomic impacts on contagious deseases spread, we thought on a model to integrate both on the coronavirus outbreak. 

Using Meteomatics resources, NASA, ESA and JAXA data allied to US government information of each state, we built a method to analyze the impact of different environmental and socioeconomic parameters on the virus spread. We used data science and statistics methods such as Pearson and Spearman to understand wich variable had positive or negative influence on the outbreak a,d what level of impact it had. By doing it, we were able to build a matrix of correlation between COVID cases and deaths per day for american states and enviromental characteristics for the same day and location such as temperatureand air density. For the socioeconomic impact we used the latest data provided by US Census Bureau and compared total cases and deaths for each state with it's unemployment rate, average age and HDI.

How We Developed This Project

Insipiration and Approach

Our goal by choosing this challenge was to use different data science tools to empower governments, NGOs and decision makers all over the globe with information to help them on focusing their efforts in the combat against coronavirus. We were inspired by different projects, like DiSARM that uses spatial data to find most probable locations of malaria cases, and studies of environmental impacts on COVID infection.

We used data provided by Meteomatics and images of NASA, ESA and JAXA satellites to obtain environmental data that could increase or decrease COVID cases as well as the impacts of lockdown measures on the environment. For that, we used images provided by the space agents, comparing NDVI info with cases evolution. Our goal was not just understand the causes and impacts, but also see which factors where not important on the outbreak. We understand that there are other characteristics that are more important, such as government measures, on the outbreak for each country and state, but these factors must be studied to understand if they are important as well.

Data Sources and Coding

We coded on python, using pandas, statsmodel, numpy and seaborn libraries to examine the correlations. For the environmental studies about NDVI variations we used Google Earth Engine platform, using direct NASA, ESA and JAXA satellite images. We obtained socioeconomic data in US government sites, such as US Census Bureau. To get environmental data such as temperature and air density, we used Meteomatics plataform that was available for the NASA challenge. 

Problems and Achievements

One problem that we encoutered was making a compatible model for the informations that we found, so could be used homogeneously for every state and type of data. We wanted to make a solution that other organizations could replicate to use their own data and understand the influence of their own environmental and socioeconomic characteristics. Another thing we encountered, but not regarded as a problem was the fact that some of the factors we researched had not a significant result to call a impact (positive or negative) on virus spread. Some of them were air density, wind speed, relative humidity and precipitation.

With the methods applied, we got some interesting results. We used p-value and Coefficient of determination in a multivariable linear regression of the number of cases and deaths with HDI, air density and average population age. This method couldn“t match any substantial correlation between HDI and average population age with cases and deaths for different american states. But we could get a correlation between air density and the number of cases, so is related to number of deaths as well. This was a negative correlation, so with an increase of air density, the number of new cases would fall.

With Pearson, Spearman and Kendall correlations, we were able to build some correlation matrix. Because Pearson method gives us the linear regression and not a comparison between factors, we mainly used it, comparing the relation between number of cases and deaths and increase of both with temperature at 2 meters above ground in Celsius, relative humidity at 2m above ground in %, wind speed at 10m above ground in km/h, air density at 10m above ground in kg/m^3, preciptation in 24h period in mm and Carbon Monoxide concentration in micrograms/m^3. The method told us that just air density and temperature  had correlations. Air density had a negative correlation, just like the p-value gave us before. Now, we could get a positive relation between the increase of cases and raise of temperatures, so it would be an influential factor.

To understand the effects of the COVID-19 outbreak, we used satellite images to obtain the NDVI values to compare data from past years and from this year.The NDVI relation is used to measure plant's health. We used the average NDVI value from last 3 years and compared it from the same dates for this year, retrieving the percentage difference. Using p-value we saw that there wasn't a big correlation between NDVI variation and the COVID outbreak in the US. However, it is important to rembember that the impact could be seen in a larger time scale, because plants wouldn't be rapidly impacted by the lockdown measures, but with some time, the fall of poluents emission could change this analysis. At last, this is a good model to evaluate this varitions and could be used in other locations to analyse the impacts, in example of Brazil's case that is suffering from more forest fire than previous months.

It is important to say that each analysis used cases from each day between March 15 and  May 29, using information of each american state. But using the informations that we provided and helping us enlarge our project, thousands of lives could be saved every day.


Future plans

Our solution was a study resticted to american states, because US had more reliable information than other countries such as Brazil. But our model can be used to understand the impacts on more locations, creating a big dataset to comparison with all over the globe. Different locations could increase our understanding of this correlations.

The impact of our project can be enlarged by more and new data types, so we could compare more factors with differente countries and continents, analysing impacts of factors like GDP per capita and population density. This is important to help us understand what enviromental and socioeconomic characteristics have influence in virus spread.

Another impact that we can make is to create more understanding on contagious deseases, not only COVID-19, so we can be more prepared when different situations appear in the future. As well as DiSARM and other projects, ours can inspirate more studies about the environmental and socioeconomic characteristics on the public health, helping countries and NGOs in the combat against different problems.

Project Demo

https://youtu.be/Zrbt6jlVnZI

Data & Resources

- Total Population for each american state https://www.census.gov/data/tables/time-series/demo/popest/2010s-state-total.html

- DiSARM project: supporting data-driven disease eliminationhttps://www.disarm.io/

- Measurements Of Pollution In The Troposphere https://mopitt.physics.utoronto.ca/

- Google Earth Engine https://earthengine.google.com/

- Census 2000 Geographic Terms and Concepts https://www2.census.gov/geo/pdfs/reference/glossry2.pdf

- Median age in 2018 by US Sensus Bureau http://www.statsamerica.org/sip/rank_list.aspx?rank_label=pop46&ct=S09

- Meteomatics Platform https://www.meteomatics.com/en/home-en/

- US Bureau of Labor Statistics https://www.bls.gov/

- Environment and COVID relations study https://www.genevaenvironmentnetwork.org/resources/updates/updates-on-covid-19-and-the-environment/

- Assessing the Relationship between Socioeconomic Conditions and Urban Environmental Quality in Accra, Ghana - US National Library of Medicine

- Temperature significantly changes COVID-19 transmission in (sub)tropical cities of Brazil- ScieceDirect

- COVID-19 and surface water quality: Improved lake water quality during the lockdown- ScienceDirect

Other scientific resources were used to analyse NDVI relation with plants' health and statistics correlation.

Tags
#environmental impact, #data science, #socioeconomic impact, #COVID combat, #earth science, #satellite, #python, #thoth