Using data to look at population, infection rate, deaths, and average wealth - we want to find areas that have a higher need of a testing site and identify areas that have more of an abundance of testing sites so that resources may be diverted to more deprived areas.
What inspired your team to choose this challenge?
We are an international group of programmers who met mostly through Stanford's Code in Place class and united over a common interest in applying our new skills in the interest of COVID-19. Through many virtual meetings and collaboration over 4 different time-zones, we worked together to determine what common problems we wanted to solved. Many our lives were upended by the virus and felt that the problem is personal as well as societal. And it pains us to see our communities struggling to get right resources and help that they need to contain this novel virus. We wanted to offer our skill sets and ideas to help the healthcare and essential workers in combating this pandemic.
We were inspired by this article and wondered if there was a way to use socioeconomic data to address social justice during a time of the pandemic.
What was your approach to developing this project?
We used up-to-date, coronavirus testing case numbers from city/ county public health data, geolocation and government census to understand what factors can put an individual and/or community at risk for outbreak. We further used our data to see if we can find out which area needs more testing centers in order to effectively control the spread.
Method
Determining Case Sites
We used NASA's SEDAC data to determine where our two case studies would be. Two counties in California were chosen based on two metrics off of SEDAC's COVID-19 viewer:
Data Collection for Each Case Site
Based on those metrics, San Francisco County and Kern County were chosen. We then began to pull data from the US Census and each county's respective COVID-19 dashboard , taking in COVID-19 data and demographic data by test code. Data was compiled onto a spreadsheet. The data can be found here.
Data Analysis
Once the data was collected, we ran a multivariate regression to determine which demographic factors were most indicative of test rate. It was determined to be population size and percent of people without health insurance. Those variables were then used to generate a predicted case rateper zip code. If the current case rate was higher than the predicted case rate, then that county's zipcode would be flagged. A flagged zipcode means that the testing site capacity might not be sufficient to support the current case rate and that more testing sites are recommended.
Implementation of a Model
Using GeoPandas and Bokeh on Jupyter Notebook, we created a map by zip code of San Francisco County. Each zipcode area is flagged red if the current case rate is higher than the predicted case rate. The map is also interactive with a drop-down menu, give policy-makers an integrated view of the social characteristic in a zip code.
In the future
Currently, the map model is static. The data is pulled from May 31, 2020. In the future, we plan to use API's to pull in real-time case rate data.
We also only had time to create this program for San Francisco County. In the future, we plan to extend our calculations and mapping to Kern County as well (and hopefully, the rest of the state, and the country! And the world!)
What problems or achievements did your team have?
The biggest problem we faced was coordination since our team consists of various international members our sleep cycles were off. Another problem we faced was how there is a lack of uniform data reports among different city/ county health departments; some were in-depth but may lack valuable insight e.g. cases by zip code, testing requirements, map/ directory for testing sites, etc. For our achievement, we are just happy that we got our project finished on time and work as a cohesive unit.
Data sources:
Non-data sources: