One of the most pressing concerns over the spread of the COVID-19 virus is the various factors that could affect the spread of the virus. If we are able to effectively determine the extent to which these different factors will contribute to the spread of the virus, we are able to target geographic regions that are more prone to becoming a virus hotspot and allocate more resources to these key regions. Our team saw the value in identifying the key regions to provide support to, in a time where resources are very scarce, and so we brainstormed different solutions to the problem.
Due to the need for a predictive model that can accurately identify COVID-19 hotspots, our team felt that using Python Machine Learning would be an effective way to find the extent to which these different factors actually do affect the risk of a location becoming a hotspot. We decided to use data collected from the USA on coronavirus rates in different metropolitan areas and compared it to some potential factors such as GDP per capita, city/state population density, median age, requirements for masks, and the number of airports. The extensive NASA databases were very pertinent to the collection of information in this step, as they were very comprehensive in the data given and were completely free. Along with NASA, our team used several other databases from organizations such as the US government and the University of Harvard. With the implementation of this ML code, we were able to make an interface where users would be able to input their local human factors and be returned a boolean value that corresponded as to whether or not they are in a region at risk of being a COVID-19 hotspot. This interface was designed using Flutter, an open-source UI software development kit.
One of the key proponents of our team was that we had a very diverse skillset, so everyone was able to work on something different in order to contribute to the team. This ranged from data structures to researching, designing, coding, and programming. In addition to this, we were also able to collaborate efficiently and effectively as we had specific meeting times that were flexible to meet the availabilities of each member. One of the few problems our team faced was that the machine learning program would sometimes give an error, and debugging this error would require the program to ‘relearn’ the data each separate instance. Thankfully, this did not pose a large issue because of our well structured time table which gave adequate room to fix bugs.
For starters, the six of us began brainstorming together on ways to approach this problem. We mainly decided on what human factors we were going to analyze and the scope of this analysis. From then, we searched various sources, both from space agencies and other government sites, to collect our data sets. After further analysis and sorting of the data, the first stage of our project was completed.
From there, we had to design our machine learning algorithm to put our data to use. Two of our members, Sahil and Dhruv, took a heavy lead on the coding side due to their in-depth knowledge of Python and machine learning algorithms. To summarize, our code utilized the Scikit-Learn random forest algorithm to fit a training set, earning a 91% accuracy on the test set. Ultimately, when given values for all eight factors, it could predict whether the user’s location would become a COVID-19 hotspot. By varying the data inputted, a user can see how different factors alter this result. By utilizing Python and Scikit-Learn libraries, we were able to build the base algorithm for our program.
With the backend completed, we now had to design a GUI for the user to interact with. We built our interface using a framework called Flutter, which is a cross-platform development framework. We chose this because Flutter is very user friendly, and allows for us to easily collaborate. Our team member, Aaditya, was responsible for the design and programming of the front-end due to his experience using Flutter and designing cross-platform applications. We decided to use text boxes for the user to input each of the values, and then from there, the algorithm decides whether their location will become a COVID-19 hotspot. We chose to do this rather than just have the user input their city because it is more universal. With our method, the user is essentially giving the data, so it can theoretically work for any city/metropolitan area, not just the ones inputted in our data sets. Overall, by using Flutter and a more input-heavy user interface, we were able to design a UI to turn our algorithm into a working program.
https://docs.google.com/presentation/d/1REJA_YpKnaYiW2I-zc1Obu88xxxH3wpTWQTiMt7X8EM/edit?usp=sharing
Personal Income: https://apps.bea.gov/itable/iTable.cfm?ReqID=70&step=1
Population Info: https://hub.arcgis.com/datasets/4d29eb6f07e94b669c0b90c2aa267100_0
State Population Density: https://sedac.ciesin.columbia.edu/mapping/popest/covid-19/
Confirmed Cases, City Data:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/HIDLTK