A wild and wide variety of data sets is available for anyone to explore, and these data sets contain much data which can be used to estimate progress toward a specific SDG, or, more specifically, to a given target. This enormous volume of data, however, poses a couple of problems and threats to objective analysis:
At the end of the day, the overall progress towards a specific SDG or target is better given by a single number specifying progress within a given range of possible values. This is not only true of SDGs but of any objective goal system. This is not to say that qualitative analysis isn't useful: in fact, when the time comes to take action towards improving our position regarding a given goal, we could say the qualitative analysis is even more important than the numbers themselves! But in order for the goal system to be objective, and to be able to say yes or no to the question "did we get better?", a number is needed that can be tracked over time and compared to prior or later results.
This number (which we will call "score" from here on) can be calculated upon many different inputs; hence a function getScore will exist which takes those inputs and gives as a result the score for a given target.
Because the notion of how much a given input does actually tell you about a specific target is rather subjective, we believe it to be a good idea to make getScore a sum of products, in which each input (from here on "pre-score") is multiplied by a given coeficient which represents how important that given pre-score is for a specific target. All the coeficients must sum to 1.
For this to work, the functions that calculate the pre-scores need to have the same image range; for practical purposes, we decided to make this image range go from 0 to 1, 0 meaning bad and 1 meaning good.
Now, like we said before, the data sets available are very dissimilar from one another in terms of format and access methods. Because of this, we decided to abstract the task of accessing the data and calculating the pre-scores into an object oriented interface, IPreScoreProvider with a single function called getPreScore, which takes a timestamp and returns the value for that prescore (remember, a value between 0 and 1) for that given moment.
Each implementation of IPreScoreProvider must take care of all the details pertinent to accessing the data, such as handling API keys, or parsing the UI of a website that provides no api, or downloading a file from an FTP server, etc; and then of computing the pre-score for the moment requested by the caller.
This allows analysts to not having to worry about how to get and combine the data; they will simply choose a set of pre-score providers, give each of them an importance they believe those pre-scores have towards a specific target, and be able to see the overall score evolving over time.
The scope of the implementation presented today, is a simple console application which serves as a PoC for the homogenization of the data. However, the full solution would ship with a UI for users to navigate through a gallery of many different providers providing pre-scores from many different data sets, and in which the user would have the capability to select as many of such providers as they want, and assign a weight to each one of them to combine the data into a single result, which would be displayed in a plot, showing the evolution of the score over time. This plot can include specific highlights in the timeline, such as the first case of COVID-19, first death, first country to lock down, etc. A mockup displaying this idea is now hosted at https://beedata.us/ .
What led us to choose this challenge was the motivation to be able to help society by helping to understand the effects of COVID-19 on SDGs. We believe that SDGs are great goals to achieve for a better humanity development and better management of the planet's resources.
As mentioned above, one of the problems we encountered was the great hetereogenity of data so we decided to encapsulate the logic of accessing the data. The idea is that the analyst won´t have to worry about data homogenization but will instead work with data already converted to the same format. we use relevant data as an example to show the practicality of the project.
In principle we used C# and the open data of different organizations including NASA to develop the first part of the project, which includes the data aggregation engine and a console demo to show the usefulness of the solution (code available at https://github.com/nprieto95/BeeData/tree/master/BeeData.CovidSpaceAppsDemo).
Then, a React SPA was built (now hosted at https://beedata.us/) using material design to showcase how a UI for the solution would look.
Slideshow here.
https://climate.nasa.gov/vital-signs/global-temperature/
http://www.temis.nl/airpollution/no2col/no2regio_tropomi.php
https://search.earthdata.nasa.gov/search?q=population
https://developers.google.com/earth-engine/datasets/catalog/ESA_GLOBCOVER_L4_200901_200912_V2_3
https://unstats.un.org/sdgs/indicators/database/
http://datatopics.worldbank.org/universal-health-coverage/coronavirus/
https://microdata.worldbank.org/index.php/catalog/dhs
https://www.theia-land.fr/en/data-and-services-for-the-land/