One of the least expected aspects of 2020 has been the fact that epidemiological models have become both front-page news and a political football. Public health officials have consulted with epidemiological modelers for decades as they’ve attempted to handle diseases ranging from HIV to the seasonal flu. Before 2020, it had been rare for the role these models play to be recognized outside of this small circle of health policymakers.
Some of that tradition hasn’t changed with the SARS-CoV-2 pandemic. International bodies, individual countries, most states, and even some cities have worked with modelers to try to shape policy responses to the threat of COVID-19. But some other aspects of epidemiological modeling life clearly have changed. The models, some of which produce eye-catching estimates of fatalities, have driven headlines in addition to policy responses. And those policy responses have ended up being far more controversial than anyone might have expected heading into the pandemic.
With the severity of COVID-19, it’s no surprise that there has been increased scrutiny of epidemiological models. Models have become yet another aspect of life embroiled in political controversy. And it’s fair for the public to ask why different models—or even the same model run a few days apart—can produce dramatically different estimates of future fatalities.
What’s much less fair is that the models and the scientists behind them have come under attack by people who don’t understand why these different numbers are an expected outcome of the modeling process. And it’s downright unfortunate that these attacks are often politically motivated—driven by a focus on whether the numbers are convenient from a partisan perspective.
So why have models produced so many different numbers, and why have the numbers seemingly changed so often? There’s no simple answer to those questions. But that’s only because there are a lot of pretty simple answers.
There are different models
The fact that we refer to “models” (plural) should indicate that there’s more than a single software package that we just drop a few numbers into. Instead, many researchers have developed models, motivated by the desire to solve different problems or because they felt that a different approach would produce more accurate numbers. Almost all of these (see sidebar) are based on a simple premise: diseases are spread when humans come into contact with one another, so the model has to account for a combination of these contacts and the disease’s properties.
The disease’s properties tend to be things like what percentage of contacts results in the transfer of an infection, how long a person remains infectious, the disease’s incubation period, and so on. These considerations will vary from disease to disease. HIV, for example, is primarily transferred through activities like intercourse and sharing needles, so it spreads much more rarely than the flu, which can be spread when two people simply share the same space.
Other diseases, like malaria and Dengue fever, involve an intermediate host for spread, so a model that focuses on direct person-to-person interactions won’t be sufficient. A completely different approach to modeling—one that takes into account things like mosquito control—may be required for these diseases.
In any case, the models don’t just have to be adjusted for the disease; they have to handle our own behavior, as well. And there are a lot of options here. The Imperial College model that helped drive policy in the US and UK early in the pandemic is incredibly sophisticated, taking into account things like average classroom and office sizes to estimate likely transmission opportunities. Other models have used cellphone data to inform contact estimates. Still others may take much simpler approaches to estimating human contact, trading a bit of precision for the ability to perform multiple model runs quickly.
Naturally, the different approaches will produce different numbers. It’s not a question of whether the numbers are necessarily right or wrong; it’s not even a question of whether they’re useful or not. The key question is whether they’re appropriate for a specific use.
Different things go into the models
We mentioned just above that the models need to have the properties of the disease supplied. But unlike the flu, we simply don’t have definitive numbers for a recently emerged pathogen like SARS-CoV-2. We know people are infectious in advance of the onset of symptoms, but how far in advance? How long do they remain infectious? How long after infection do they start experiencing symptoms?
For now, we at least have estimates for all of these numbers. In fact, we have more than one estimate for modelers to choose from. Should they take the numbers for something like a cruise ship, where the small, contained population can help provide a degree of precision to the estimates? Do they take numbers from a country like Korea, where contact tracing was done efficiently? That gives us a good sense of what transmission looks like in a mobile population. But South Korea also managed to isolate cases effectively, making it a poor model for many other countries. Finally, data from a country like Italy may provide some good overall estimates of the disease’s progression, but that data will suffer from limited overall testing and a likely undercount of the total fatalities.
There are logical cases to be made for using any of these numbers, and researchers can reasonably disagree over what the “best” properties to feed into their models. But again, the different choices will almost certainly produce somewhat different numbers.