We wanted to use statistical modeling in order to assess the accuracy of a given coronavirus model and compare its performance in states with differing demographic factors. To clarify, the goal was not to compare the number of affected patients in different states, but rather the accuracy of the model.
see our model and process here
or: https://docs.google.com/document/d/1P0__ntRiBoeXo2IELruKKCo3_A4HH7N8OpFlT-YpA8w/edit?usp=sharing
Screenshots of our model are included in the above link.
The model was proven to be more accurate in states with higher incomes, greater urbanization, better healthcare, and fewer minorities. This seems to highlight a series of problems with models of the outbreak. While it is able to predict the number of cases and deaths more accurately in more affluent communities, it is unable to do so in the locations where COVID-19 is more likely to have a deeper impact. There are a number of explanations for why this is the case. More urban states may be more accurately modeled because they have denser populations. In other words, they have more predictable and set behavior, as opposed to rural, distanced states. States with higher income and better healthcare may also have more predictable treatment patterns, rather than erratic ones.
There are two possible reasons that could explain why the model is less accurate in states with fewer minorities. First, non-white groups may have inherently different behaviors from white people - for example, the model fails to account for events like Ramadan, a critical event in the Islamic religion. Alternatively, minorities may live in areas with lower income and weaker healthcare.
While a model’s performance can be viewed as weaker because the actual number of cases was lower than the number predicted, the increased uncertainty in these specific areas poses a disturbing problem. Areas with lower income and weaker healthcare are less capable of handling an outbreak and are generally the ones where an accurate model is more important. The fact that models are generally used to impact policy decisions results in a negative impact on these communities, which have been more deeply affected by the economic crisis that was caused by the outbreak.
Statistical modeling can help us expose biases in healthcare and social justice. However, we must be wary of the biases in these models themselves.
[1] Paper with Model https://www.jstor.org/stable/40588987?seq=1
[2] NASA SEDAC Urbanization https://sedac.ciesin.columbia.edu/mapping/popest/covid-19/
[3] Ranking by Income https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_income
[4] Ranking by Whiteness https://en.wikipedia.org/wiki/List_of_U.S._states_by_non-Hispanic_white_population
[5] Ranking by Internet https://getinternet.com/which-states-have-best-worst-internet/
[6] Ranking by Healthcare https://www.usnews.com/news/best-states/rankings/health-care