Opportunities and challenges
in modeling emerging
C. Jessica E. Metcalf 1,2 and Justin Lessler3
The term “pathogen emergence” encompasses everything from previously unidentified
viruses entering the human population to established pathogens invading new populations
and the evolution of drug resistance. Mathematical models of emergent pathogens allow
forecasts of case numbers, investigation of transmission mechanisms, and evaluation of
control options. Yet, there are numerous limitations and pitfalls to their use, often driven
by data scarcity. Growing availability of data on pathogen genetics and human ecology,
coupled with computational and methodological innovations, is amplifying the power of
models to inform the public health response to emergence events. Tighter integration of
infectious disease models with public health practice and development of resources at the
ready has the potential to increase the timeliness and quality of responses.
Public health emergencies driven by emerg- ing infectious diseases are at the forefront of global awareness. From HIV in the 1980s to Zika virus’s (ZIKV’s) recent invasion of the Americas, models that mathematically
capture disease processes have played a role in
assessing the risk and framing the response to
emerging pathogens. The most prominent, and
perhaps most fraught, role of such models is to
forecast the course of epidemics (1, 2). Yet, explicit representation of mechanisms of spread and
persistence can help us to do far more than forecast incidence. Models can elucidate the properties of emergent pathogens (3, 4), uncover general
principles of emergence (5), and compare potential mechanisms of spread and persistence (6).
Models are only as good as the data on which
they rely. Data scarcity is the norm when a previously unknown pathogen emerges, amplifying
uncertainty and obscuring key drivers of the epidemic. Misrepresentation of core mechanisms
can bias inferences and potentially misdirect intervention efforts. The strengths of models must
be considered in the context of the limitations
and pitfalls of their use.
Here, we focus on emergent viruses, both because of the speed with which they can spread
death and disease and because the dynamics of
viral epidemics exemplify key principles in the
modeling of infectious diseases. This focus should
not, however, detract from the importance of nonviral emergence events, nor from the particular
issues involved in modeling nonviral pathogens.
The “classical” dynamic modeling toolkit
The past decade has seen several viral emergence
events, including the 2009 pandemic of H1N1
influenza, the emergence of Middle East Respiratory Syndrome–associated coronavirus (MERS-
CoV) in the Arabian peninsula, the West African
Ebola outbreak, and ZIKV’s invasion of the Americas. These diseases are very different: Pandemic
H1N1 is spread person-to-person and is closely
related to seasonally circulating influenza viruses
(3); since its emergence, MERS-CoV has failed to
persist outside of the Middle East, and the epidemic appears to be largely driven by zoonotic
infections from camels (although a rapidly contained human-driven outbreak occurred in South
Korea) (6); Ebola is extremely virulent and spread
mostly through direct contact with very sick or dead
cases (7); and ZIKV is a mosquito-transmitted
virus, known for decades but recently discovered
to be a cause of severe pathogenic disease after
emerging in the Americas (8).
Despite these differences, the response to each
emerging virus has relied on models grounded in
the same dynamic principles and key data that
have informed the response to emerging infections
since at least the 1980s (Fig. 1). Then, Anderson
and May used mathematical models to elucidate
the key variables required for forecasting the
future trajectory and impact of the emerging HIV
epidemic (Box 1) (9). Although there have been
substantial advances in our ability to assess disease threats—ranging from increased statistical
rigor enabled by powerful computers to entirely
new methods of inference driven by analyses of
pathogen genetics (3, 4)—the underlying core
principles remain the same.
When responding to an emerging virus, perhaps the first priority is measuring the distributions of R0 and generation time (Box 1). R0 is of
particular interest because it determines whether
the disease will die out after introduction. For
example, early estimates of R0 for MERS-CoV
were well below 1 (4, 10), whereas estimates for
pandemic H1N1 were in the neighborhood of
1.5 (3, 11). The former remains confined to the
Arabian peninsula, and to persist, it apparently
requires continuous reseeding into the human
population from camels, whereas the latter has
established itself globally. Knowing the generation time allows us to estimate R0 from the growth
in case numbers during the early (exponential
growth) phase of an epidemic. Likewise, using
these two values, relatively accurate short-term
forecasts can be made early on with simple models (Fig. 2).
Moving from forecasting cases to forecasting
disease burden requires estimates of the risk of
severe illness and mortality after infection. Dynamic aspects of both the disease and reporting processes, and potentially large numbers of
unobserved infections (Fig. 1), mean that models
of both are often necessary to estimate these quantities (12). Biases can go both ways; models of the
1Department of Ecology and Evolutionary Biology, Princeton
University, Princeton, NJ, USA. 2Office of Population Research,
Princeton University, Princeton, NJ, USA. 3Department of
Epidemiology, Johns Hopkins Bloomberg School of Public
Health, Johns Hopkins University, Baltimore, MD, USA.
*Corresponding author. Email: email@example.com
Box 1. Key dynamic quantities estimated early after an emergence event. There are
several key dynamic quantities that determine the course of the epidemic and indicate needs
for the structure of the response that should be identified rapidly after emergence:
Basic reproductive number (R0): The number of cases expected to be directly infected by
a single index case in an immunologically naive population. This provides an estimate
of the transmissibility of an emergent pathogen. If R0 < 1, the emerging pathogen will
die out, whereas if R0 > 1, the pathogen can spread widely and cause a major epidemic
or pandemic. R0 further determines the final size of epidemics in the absence of
Reproductive number (R): The number of cases expected to be directly infected by a single
infected individual case in a population in which there is some underlying immunity.
Generation time: The time between a case becoming infected and that case causing other
infections. Combined with R0, this determines the speed at which an epidemic spreads through
Incubation period: The time from infection to symptom onset.
Latent period: The time from infection to becoming infectious.
Infectious period: The length of time that infected individuals can transmit.
Case fatality ratio: The proportion of cases that prove fatal.
Hospitalization rate/clinical attack rate: The proportion of cases in which disease is sufficiently
severe as to result in hospitalization, potentially affecting detection via passive surveillance.
Asymptomatic proportion: The percent of infected individuals that do not develop