SIGHPy - Space Apps Challenge

SIGHPy| Quiet Planet

Quiet Planet

The COVID-19 outbreak and the resulting social distancing recommendations and related restrictions have led to numerous short-term changes in economic and social activity around the world, all of which may have impacts on our environment. Your challenge is to use space-based data to document the local to global environmental changes caused by COVID-19 and the associated societal responses.

SIGHPy (Sometimes It Gets Hot Py)

Summary

Data centres (DCs) provide computing power for everyone to access the Internet, using a large proportion of energy to power equipment, generating heat and carbon emissions. COVID-19 has created a socially-distanced setting where DCs are in higher demand: more people are using Internet applications for work and life. We've created a Python module and web-app to compare the environmental impact of DCs before and during the pandemic, allowing users to calculate and visualize their impact.

How We Addressed This Challenge

Our project was created in response to the “Quiet Planet” challenge. We believe that there are a lot of ways in which the COVID-19 pandemic has improved environmental prospects worldwide. For example, people are staying home from work and school and not contributing as many greenhouse gas emissions via cars, and most travel has been restricted, thereby restricting emissions via planes as well. However, we are also aware of the negative environmental impact that activities like watching Netflix or video calling friends and family can have. Furthermore, we believe the environmental impact of activities such as these is often overlooked. Given our personal experiences of increased Internet-usage during this period, for school, work, and entertainment, we wanted to see if COVID-19 has created a negative environmental impact from the standpoint of the heat production and carbon emissions of data centres that are used to provide people with these kinds of Internet-based services.

To quantitatively address the environmental impact of data centres during the COVID-19 pandemic, we created a python module and a web-app that allow users to identify nearby data centres and visualize their impact on the environment via carbon emissions (using JAXA GOSAT data) and land temperature (using NASA AIRS data). Most importantly, our tool allows users to select data and functions by time slices, allowing for the comparison of impact before and during the pandemic.

We hope this serves to illustrate the need for stronger public accountability for cloud driven organizations and services during and post COVID-19.

How We Developed This Project

We were inspired to work on this challenge because we are extremely concerned with human impacts on the environment and we strive to be as eco-friendly as possible. However, one aspect of our lives that we feel less knowledgeable about in terms of environmental impacts is that of our Internet-usage and streaming habits. We have more recently become aware of how these activities negatively impact the environment and we wanted to create a tool to help people understand these effects and hopefully motivate them to make a positive change in terms of their computing time and habits.

We used the NASA Earth Data Standards and References page to examine the kinds of data formats that we would need to handle and wrote a Python script to parse any file type we would come across (using tools like h5py, netCDF4, rasterio, and pandas).

We located appropriate data sources, landing on monthly JAXA GOSAT CO2 data which we interpolated to a daily level, GOSAT 2020), daily NASA AIRS surface skin temperature data (AIRS Science Team/Joao Teixeira 2013), and a DC map to locate the DCs (Data Center Map). Our data sets excluded proprietary DCs (i.e. belonging solely to one vendor, e.g. AWS) but we believe that given the clustered nature of DCs that a wide enough data grain would encapsulate their emissions as well.

We parsed the JAXA and NASA data and drew geographical boxes around each point in order to link data with DCs. In this way, we can attribute carbon emissions or temperature impacts to the DCs that are located with the data point’s geographical box and can take a rough average to compute how much an individual DC is contributing to these data values. Similarly, we can divide the data values by the number of people within a geographical box to get a per-person average. We recognize that this solution is not perfect, however we have decided to operate under the assumption that all emissions should theoretically have decreased with the decrease in travel and business operations during the pandemic, and we are attributing any anomalies seen to DC operations. With more granular data and more data on annual trends in CO2 and temperature as well as a breakdown of the contribution of different industries to these factors, we will be able to refine our solution and obtain a better idea of the impact of DCs on the environment.

After programmatically parsing this data into a local SQLite database, we set up a data pipeline for easy import into a Google Cloud SQL instance, which provided the scalable solution for our large data sets. Sticking with the theme of using Flask as a corner piece of our stack, we developed a web application that provided functionality in visualizing and slicing emissions data in a manageable way.

Calculations were performed primarily with Numpy and built-in mathematical functions in Python (i.e. calculating Haversine distance to account for data provided in latitude/longitude coordinates). We did face issues with IOPS speeds as we did not set up a multiple worker/distributed computation based method for accessing the larger data sets. This could be remedied going forward to improve user experience. Visualization on the web application was done with Flask-GoogleMaps, MatPlotLib, and the Google Maps JS API. User requests are taken and a random sample of DCs are served back, upon which the relevant tables of emissions data are queried and visualized/rendered as HTML objects stored in machine buffer. This process also faces a similar concern in that it is restricted to asynchronous requests to the database to retrieve data. Implementing a multi-worker or synchronous methodology would improve load times (which are currently sub-optimal).

We also encountered some capacity concerns with our existing DB solution, similar to the IOPS limitations mentioned above. AppCPU and DbCPU resources are aggressively consumed by the queries that we currently use, and so optimization in that regard would also be helpful.

Project Demo

https://docs.google.com/presentation/d/1OMOw6a9RI7hir31GiYRkcbXOtilb9jbGblGXZOiliF0/edit?usp=sharing

Project Code

https://github.com/chloe-mt-cheng/SIGHPy.git

Data & Resources

AIRS Science Team/Joao Teixeira (2013), AIRS/Aqua L3 Daily Standard Physical Retrieval (AIRS-only) 1 degree x 1 degree V006, Greenbelt, MD, USA, Goddard Earth Sciences Data and Information Services Center (GES DISC), Accessed: 30 May 2020, doi:10.5067/Aqua/AIRS/DATA303

Data Center Map. Data Center Map. https://www.datacentermap.com/.

GOSAT. (2020). L3 Global CO2 Distribution (SWIR). Tsukuba; GOSAT.

Jones, N. (2018, September 13). The Information Factories. Nature, 561, 163–167.

NASA. Standards and Practices. NASA. https://earthdata.nasa.gov/esdis/eso/standards-and-references.

Global Judging

This project was submitted for consideration during the Space Apps Global Judging process.