The challenge chosen by the group “My Nav” was “Human Factors”:
The emergence and spread of infectious diseases, like COVID-19, are on the rise. Can you identify patterns between population density and COVID-19 cases and identify factors that could help predict hotspots of disease spread?
The emergence and spread of the Coronavirus has changed human activities directly and indirectly. The requirement for horizontal social isolation, adopted in most countries, brought economic, environmental and social impacts. The first, with the shutdown of industries, led to a reduction in GDI; the lack of enforcement has aggravated environmental trends in deforestation and water crises; the reduction of social interaction will bring an increase in the incidence of psychological diseases, a consequence of the negative mental health and well-being.
Since there is a lot of inconclusive data about Covid-19, the project intends to correlate some socioeconomic indicators from different countries - HDR, GDI, improved drinking water, improved sanitation, demographic population density and percentage of urban and rural population - with the Covid-19 contagion rate. From that, it is possible to predict hotspots or future epicenters of the disease, and to develop measures to contain the spread of it, generating the least strain on governments and society.
Based on Data Analysis and Machine Learning, the proposal is to predict new epicentres with the correlation of the virus expansion in countries and socio economic parameters. For this, Python was used to program and for the coding language, in addition of Excel and Colab as softwares. With it, we draw data from different countries, such as HDR, GDI, improved drinking water coverage, improved sanitation coverage, demographic population density and percentage of urban and rural population. The use of these data, used to define parameters between the countries analyzed in the project, was made possible with their offering by the space agency through the Johns Hopkins University & Medicine Coronavirus Resource Center and SEDAC Global COVID-19 viewer, for example. With the acquired database, the influence of some socioeconomic factors on the spread of covid-19 was statistically raised. With historical data it was detected, for example, that countries with higher HDR and GNI values had a greater spread of the coronavirus. Based on this and using a developed machine learning model, future rates of spread of the virus in different countries of interest were predicted.
Finally, the team's complicity in carrying out the work was the most importante accomplishment. The division of tasks was very well defined and worked efficiently, from the search for data, through the development of the code, to the development of the final delivery. In addition, it is gratifying and important to understand how socioeconomic factors affect individual countries in times of crisis, besides the fact that it makes it possible to prevent and combat major problems.
The main problem that the team dealt with was the occurrence of errors in the modeling, because with more time it would be possible to incorporate more variables in the model, making it more complete. In addition, the engineers who programmed the model had experience only with scientific programming for the purposes of statistical analysis and vibration analysis. They had no previous contact with data analytics and machine learning. Thus, the code created was a great challenge. As minor problems, it is possible to mention the search for data in different databases and the handling of software.