Prediction and prevention of pandemics via graphical model inference and convex programming

We follow our previous work1 in justification for the use of the Graphical Models (GM) to study and mitigate pandemics. Therefore, we start from providing a brief recap of the prior literature on modeling of the epidemics, describe the logic which led us in1 to the Ising Model (IM) formulation, and then state formally the inference and prevention problems addressed in the manuscript.

Difficulty in both predicting and neutralizing the spread of pandemics is a major social challenge of humanity. Technically speaking, we are yet to design a coherent data lifecycle for modeling and prevention both in terms of the global strategies and local tactics. To address the challenge, we must devise a hierarchy of spatio-temporal models with different resolutions — from individual to community, county to the city, and from the moment a pathogen first enters our bodies, to days of disease development and to community transmission. Importantly, the models should be efficient in computing probabilistic predictions (for instance, offering the marginal probability heat map for the city neighborhoods to transition from the current/prior state of infection to the projected/a-posteriori state in 2 weeks).

Epidemiology and Mathematical Biology experts have relied in the past on a number of modeling approaches. The Agent-Based-Models (ABMs), introduced in epidemiology in 2004–20082,3,4,5,6,7have complemented the earlier compartmental models8,9,10,11. Using ABMs, even though not exclusive to epidemiology12.13, became a breakthrough in the field, as they allowed to make a significant improvement in the quality of predictions, especially in the spatio-temporal resolution of how the disease spreads and how one can mitigate its spread. The models became and remained a core part of the epidemiology data life-cycle. (See for instance14.15 for most recent bibliography.) The ABMs provide a detailed prediction of how pandemics spread within counties, cities, and regions. A majority of the country-, city- or county-scale testbeds testing various mitigation strategies are resolved nowadays with ABMs. In particular, recently ABMs have been used extensively to inform public health in (non-pharmaceutical) interventions against the spread of COVID-1916,17,18,19,20and verify new strategies like test-trace-quarantine15among many other applications.

There are two major problems with the modeling of pandemic. First, many parameters need to be calibrated on data. Second, even when calibrated for the current state of pandemic the models which are too detailed become impractical for making a forecast and for developing prevention strategies — both requiring checking multiple (forecast and/or prevention) scenarios. Using ABMs, which are clearly over-modeled (too detailed) is especially problematic in the context of the latter. For example, the open-source ABM solver FLUTE21 developed originally for modeling influenza, works with data that are acquired through Geographic Information Systems (GIS) on the scale of census tracts or communities, which is a very reasonable scale of spatial resolution to understand the dynamics of pandemics on a local scale. FLUTE populates each of the communities with thousands to millions of inhabitants in order to account for their daily patterns of travel. We believe that constructing effective Graphical Models (GM) of Pandemics with community-scale spatial resolution and then modeling pairwise (and possibly higher-order) epidemic interactions between communities directly, without introducing the thousands-to-millions of dummy agents, will complement ( as discussed in the next paragraph), but also improve upon ABMs by being more efficient, robust and easier to calibrate.

An important, and possibly one of the first, Graphical Model (GM) of the COVID-19 pandemic was proposed in22. Dynamic bi-partite GMs connecting census tracts to specific Points Of Interest (non-residential locations that people visit such as restaurants, grocery stores and religious establishments) within the city and studying dynamics of the four-state (Susceptible, Exposed, Infectious and Removed ) of a census tract (graph node) on the graph, were constructed in22 for major metro-area in USA based on the SafeGraph mobility data23.

In fact, similar dynamic GMs, eg of the Independent Cascade Model (ICM) type24,25,26,27,28, were introduced even earlier in the CS/AI literature in the context of modeling how the rumors spread over social networks (with a side reference on using ICM in epidemiology). As argued in1 the Independent Cascade Models (ICMs) can be adapted to modeling pandemics. (Another interesting use of the ICM to model COVID-19 pandemic was discussed in29.) In its minimal version, an ICM of Pandemic can be built as follows. Assume that the virus spreads in the community (census tract) sufficiently fast, say within five days — which is the estimate for the early versions of COVID-19 median incubation period. If an infected person enters a community/neighborhood but does not stay there, he infects others with some probability. If a single resident of the community becomes infected, all other residents are assumed infected as well (instantaneously). The model is a discrete-time dynamic model in which nodes in a network are in one of the three states: Susceptible, Infected, or Removed. The nodes represent communities/neighborhoods. A contact between an Infected community/node and another community which is Susceptible has an assigned probability of disease transmission, which can also be interpreted as the probability of turning the S state into I state. Consistently with what was described above, the network is represented as a graph, where nodes are tracts and edges, connecting two tracts, have an associated strength of interaction representing the probability for the infection to spread from one node to its neighbor. A seed of the infection is injected initially at random, for example, mimicking an exogenous super-spreader infection event in the area; examples could include political or religious gatherings. See Fig. 1 illustrating dynamics of the cascade model over 3-by-3 grid graph. Color coding of nodes is according to Susceptible = blue, Infected = red, Removed = black. Given the starting infection configuration, each infected community can infect its graph-neighbor community during the next time step with the probability associated with the edge connecting the two communities. Then the infected community moves into the removed state. The attempt to infect each neighbor is independent of all other neighbors. This creates a cascading spread of the virus across the network. The cascade stops in a finite number of steps, thereby generating a random Removed pattern, shown in black in the Fig. 1, while other communities which were never infected (remain Susceptible) are shown in blue.

Figure 1

An exemplary random sequence (top-left to top-right to bottom-left to bottom-right) of the Independent Cascade Model (ICM) dynamics over (3 times 3 ) grid. Nodes colored red, blue, and black are Infected, Susceptible, and Removed at the respective stage of the dynamical process. This (shown) sample of the dynamic process terminates in 3 steps. Ising Model of Pandemic (IMP), which is the focal point of this manuscript, describes a regularized version of the ICM terminal state, where only two states (S—Blue and R—Black) are left. (See text for details).

It was shown in1 that with some regularization applied, statistics of the terminal state of the Cascade Model of Pandemic turns into a Graphical Model of the attractive Ising Model type.

This manuscript Road Map. Working with the Ising Model of Pandemic, we start the technical part of the manuscript by posing the Inference/Prediction Challenge in Section “Ising model of pandemic”. Here, the problem is stated, first, as the Maximum A-Posteriori over an attractive Ising Model, and we argue, following the approach which is classic in the GM literature, that problem can be re-stated as a tractable LP. We then proceed to Section “Prevention challenge” to pose the main challenge addressed in the manuscript — the Prevention Challenge—As the two-level optimization with inner step requiring resolution of the aforementioned Prediction Challenge. Aiming to reduce the complexity of the Prevention problem, we turn in Section “Geometry of the MAP states” to the analysis of the conditions in the formulation of the Prediction Challenge, describing the Safety domain in the space of the Ising Model parameters. We show the Safety domain is actually a polytope, even though exponential in the size of the system. We proceed in Section “Projecting to the safe polytope” with analysis of the Prevention Challenge, discussing the interpretation of the problem as a projection to the Safety Polytope from the polytope exterior, needed when the bare prediction suggests that system will be found with high probability outside of the Safety Polytope. Section “Two polarized modes” is devoted to approximation which allows an enormous reduction in the problem complexity. We suggest here that if the graph of the system is sufficiently dense, the resulting MAP solution may only be in one of the two polarized states (a) completely safe (no other nodes except the initially infected) pick the infection, or (b) the infection is spread over the entire system. We support this remarkable simplification by detailed empirical analysis and also by some theoretical arguments. Section “Results” is devoted to the experimental illustration of the methodology on the practical example of the Graphical Model of Seattle. The manuscript is concluded in Section “Discussion” with a brief summary and discussion of the path forward.

We conclude this introductory section with a general comment about relation between ABM and GM approaches. Focusing on GMs we did not mean to suggest that the GM approach is outright better than the ABM approach. In fact, we do believe that the two are complementary. ABM is more accurate but computationally heavy, while GM is coarser but computationally lighter. We are also convinced that it will be advantageous in the future to consider the two in tandem, eg, with the parameters of the GM trained on the synthetic data generated by the ABM.

Leave a Comment