The Challenge proposes that environmental, anthropogenic factors, commerce, travel, social activities, among others, can contribute to the dissemination of COVID-19. However, there is a discussion of the accuracy of which conditions and / or factors significantly contribute to the number of pandemic cases measured directly in the population. Our project is fully in line with the challenge since it seeks to solve a real problem with scientific methods for extracting information of relevance from bases with public access and worldwide reach. Considering human, environmental, and social factors extracted from such databases, it is possible to estimate the number of cases of covid-19 to guide containment actions.
In the specific case of the current pandemic, it was evident the difficulty encountered by public administrators, especially from developing countries, in anticipating cases of contagion in their cities or regions. The inefficiency in forecasting the number of cases based on the characteristics related to human factors in these places, due to economic, access, social, and other limitations, made the expansion of contagion contribute to the depletion of hospital resources. Thus, we sought to relate socio-economic criteria with the official cases of covid-19 to create a forecasting model. These criteria were selected based on their relationship with COVID-19, obtained from scientific articles and publications. The quantification of the criteria was then extracted from information from the available bases (i.e. Socioeconomic Data and Applications Center (SEDAC) | Earthdat) by NASA (i.e. Brazil: https://www.ibge.gov.br/cidades-e-estados/pr/curitiba.html). The local bases were used for development and validation and the global bases for validation. A relational table of cities X criteria X number of covid-19 cases was created and loaded into machine learning software (WEKA - https://www.cs.waikato.ac.nz/ml/weka/). The model was obtained using the linear regression method. The initial development challenge was the definition of the criteria and their relationship with COVID-19. The extraction and compilation of the values used in the criteria from the databases represented a major obstacle, consuming most of the time to create the solution. The main achievement was to bring together people with different backgrounds, from distant locations, to develop a solution to solve a problem of global reach.
https://drive.google.com/file/d/1KJvDPsGoj4DtORK6npty2AKMb65SfBxC/view?usp=sharing
*Local analysis/modelling/validation:
https://www.ibge.gov.br/cidades-e-estados/pr/curitiba.html
https://data.brasil.io/dataset/covid19/_meta/list.html
https://www.curitiba.pr.gov.br/dadosabertos/busca/
https://www.ibge.gov.br/geociencias/organizacao-do-territorio/redes-e-fluxos-geograficos/15798-regioes-de-influencia-das-cidades.html?edicao=27334&t=acesso-ao-produto
http://www.tratabrasil.org.br/images/estudos/itb/ranking-2019/PRESS_RELEASE___Ranking_do_Saneamento___NOVO.pdf
* Global validation (ESA,JAXA, CSA, CNES ...)
https://earthdata.nasa.gov/eosdis/daacs/sedac
https://coronavirus.jhu.edu/map.html
http://www.measureofamerica.org/wp-content/uploads/2013/06/MOA-III.pdf
https://visualization.covid19mobility.org/?date=2020-05-27&dates=2020-03-18_2020-05-27®ion=36
https://www.opendatanetwork.com/entity/310M200US19740/Denver_Metro_Area_CO/economy.gdp.per_capita_gdp?year=2017
http://edr.state.fl.us/Content/area-profiles/county/marion.pdf