## Modelling COVID-19 infection across England

In this tutorial we’ll cover some work on fitting a purely spatial Bayesian model to predict the COVID-19 infection rate across England.

## Study aim and data description

This study describes how to fit a purely spatial Bayesian hierarchical model (BHM) based on Markov Chain Monte Carlo (MCMC) simulation method to estimate the spatial pattern of COVID-19 infection rate in England. The first thing is to load all the packages used in the COVID-19 case study.

### Install CARBayes

This tutorial requires the CARBayes package, please install it before continuing through the tutorial.

#### Retrieving data

The study region is mainland England, which is partitioned into 6789
neighbourhoods at the Middle Layer Super Output Area (MSOA) scale. The
infections data are the total reported number of COVID-19 cases in each
MSOA from Jan 8, 2022 to March 26, 2022. The shapefile of the study
region is a `SpatialPolygonsDataFrame`

, which is used to map
the data. It stores the location, shape and attributes of geographic
features for the neighbourhoods. We’ll use
`retrieve_tutorial_data`

to first retrieve the dataset for
this tutorial.

`fdmr::retrieve_tutorial_data(dataset = "covid_mcmc")`

```
##
## Tutorial data extracted to /home/runner/fdmr/tutorial_data/covid_mcmc
```

The COVID-19 data and the related covariate information are included
in our tutorial data package. We’ll load in the data using the
`load_tutorial_data`

function.

`s_covid <- fdmr::load_tutorial_data(dataset = "covid_mcmc", filename = "s_covid.rds")`

Next we’ll use the `load_tutorial_data`

function to load
in the spatial data we want.

`sp_data <- fdmr::load_tutorial_data(dataset = "covid_mcmc", filename = "spatial_data.rds")`

In this study, we use the areal unit modelling approach to fit the BHM and then make model inference using MCMC method. To do this, we need to construct a non-negative symmetric $n \times n$ neighbourhood or adjacency matrix $\boldsymbol{W}$ that accounts for the spatio-temporal autocorrelation structure, where $n=6789$ is the number of areal units. The neighbourhood matrix specifies the spatial closeness between pairs of areal units. The elements $\{w_{ij}\}$ in $\boldsymbol{W}$ can be either continuous or binary, and a larger value of $w_{ij}$ represents that MSOAs $(i,j)$ are spatially closer to each other. Here we use the border sharing specification, so that $w_{ij}=1$ if MSOAs $(i,j)$ share a common geographical border, and $w_{ij}=0$ otherwise.

## Model specification

We use a Bayesian hierarchical model to predict the spatial COVID-19 infection rate at the MSOA level in England. Let $Y_{i}$ denote the total number of reported COVID cases for neighbourhood $i=1,\ldots, n(=6789)$ during the study period, and $N_{i}$ denote the (official) estimated population living in MSOA $i$. $Y_{i}$ is assumed to have a Poisson distribution with parameters ($N_{i}$, $\theta_{i}$), where $\theta_{i}$ is the true unobserved COVID-19 infection rate in MSAO $i$. We follow a standard path in modelling $\theta_{i}$ with a log link to the Poisson and start with a model where the linear predictor decomposes additively into a set of covariates and a Gaussian Markov Random Field process, which characterises the infection of the disease after the covariate effects have been accounted for. A general Bayesian hierarchical model commonly specified is given by

$\begin{align} \nonumber Y_{i}\vert N_{i}, \theta_{i} &\sim \text{Poisson}(N_{i}\theta_{i}),\ \ i=1,\ldots,n,\\ log(\theta_{i} )&=\boldsymbol{x_{i}^{\top}}\boldsymbol{\beta}+\phi_{i}. \end{align}$

The spatial random effects $\{\phi_i\}$ are included in the model to account for any residual spatio-temporal autocorrelation after adjusting for covariates $\boldsymbol{x_{i}}$. Here we utilise the spatial modelling structure “BYM”, proposed by Besag, York, and Mollié (1991), to model $\{\phi_i\}$. It is given by

$\begin{align} \nonumber \phi_i &=\phi_i^{(1)}+\phi_i^{(2)}\\ \phi_i^{(1)}\vert\boldsymbol\phi_{-i}^{(1)}&\sim \text{N}\left( \frac{\sum_{j=1}^{n}w_{ij}\phi_j^{(1)}}{\sum_{j=1}^{n}w_{ij}}, \frac{\tau_1^2}{\sum_{j=1}^{n}w_{ij}}\right)\\ \nonumber \phi_i^{(2)}&\sim \text{N}(0, \tau_2^2), \end{align}$

where $\phi_i$ now consists of two components. $\phi_i^{(1)}$ is assigned the intrinsic CAR prior (Besag et al., 1991), and $\phi_i^{(2)}$ is a set of independent and identically normally distributed random effects, with mean zero and common variance $\tau_2^2$.

## Define the model formula

In order to fit the spatial model, a model formula needs to be defined, by including the response in the left-hand side and the fixed and random effects in the right-hand side. First, we consider the scenario of including no covariates.

## Fit the model

Finally, we fit the spatial model using the function
`S.CARbym()`

of the package `CARBayes`

developed
by Lee (2013).

```
MCMC_model <- CARBayes::S.CARbym(
formula = form,
data = s_covid,
family = "poisson",
W = w,
burnin = 10000,
n.sample = 30000,
thin = 10,
verbose = F
)
```

`## Error in loadNamespace(x): there is no package called 'CARBayes'`

For comparison purpose, we fit a separate BHM to the same dataset using the INLA approach. Likewise, it uses the BYM to model the spatial random effects.

```
s_covid$ID <- seq(1, nrow(s_covid))
formula <- total.cases ~ 1 + f(ID,
model = "bym",
graph = w
)
INLA_model <- INLA::inla(formula,
data = s_covid,
family = "poisson",
E = s_covid$population,
control.compute = list(
dic = TRUE,
waic = TRUE,
cpo = TRUE
),
verbose = FALSE
)
```

Finally, we fit a BHM to the same dataset using the INLA-SPDE approach.

```
initial_range <- diff(range(sp_data@data[, "LONG"])) / 5
max_edge <- initial_range / 8
mesh <- fmesher::fm_mesh_2d_inla(
loc = sp_data@data[, c("LONG", "LAT")],
max.edge = c(1, 2) * max_edge,
offset = c(initial_range / 4, initial_range),
cutoff = max_edge / 7
)
prior_range <- initial_range
spde <- inla.spde2.pcmatern(
mesh = mesh,
prior.range = c(prior_range, 0.5),
prior.sigma = c(1, 0.01)
)
s_covid_cp <- s_covid
sp::coordinates(s_covid_cp) <- c("LONG", "LAT")
cmp <- total.cases ~ 0 + Intercept(1) + f(main = coordinates, model = spde)
inlabru_model <- inlabru::bru(cmp,
data = s_covid_cp,
family = "poisson",
E = s_covid_cp$population,
control.family = list(link = "log"),
options = list(
verbose = FALSE
)
)
```

## Model comparison

In terms of model selection criteria, we show the different values for each model based on DIC and WAIC. The model fitted using the INLA-SPDE approach performs better than the other two models in terms of the lowest DIC and WAIC values.

```
modfit <- data.frame(
DIC = c(MCMC_model$modelfit[1], INLA_model$dic$dic, inlabru_model$dic$dic),
WAIC = c(MCMC_model$modelfit[3], INLA_model$waic$waic, inlabru_model$waic$waic)
)
```

`## Error in eval(expr, envir, enclos): object 'MCMC_model' not found`

`## Error: object 'modfit' not found`

`modfit`

`## Error in eval(expr, envir, enclos): object 'modfit' not found`

We compare the posterior COVID-19 infection rate estimates between the models. In general, all models provide similar posterior COVID-19 infection rate estimates.

`## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'mean': object 'MCMC_model' not found`

```
mcmc_lc <- exp(quantile(MCMC_model$samples$beta, 0.025) +
apply(MCMC_model$samples$psi, 2, quantile, 0.025))
```

`## Error in eval(expr, envir, enclos): object 'MCMC_model' not found`

```
mcmc_uc <- exp(quantile(MCMC_model$samples$beta, 0.975) +
apply(MCMC_model$samples$psi, 2, quantile, 0.975))
```

`## Error in eval(expr, envir, enclos): object 'MCMC_model' not found`

```
comb <- data.frame(
"INLA_BYM" = INLA_model$summary.fitted.values$mean,
"INLA_lc" = INLA_model$summary.fitted.values$`0.025quant`,
"INLA_uc" = INLA_model$summary.fitted.values$`0.975quant`,
"INLA_SPDE" = inlabru_model$summary.fitted.values$mean[1:nrow(s_covid)],
"INLA_SPDE_lc" = inlabru_model$summary.fitted.values$`0.025quant`[1:nrow(s_covid)],
"INLA_SPDE_uc" = inlabru_model$summary.fitted.values$`0.975quant`[1:nrow(s_covid)],
"MCMC" = mcmc_fitted_prev,
"MCMC_lc" = mcmc_lc,
"MCMC_uc" = mcmc_uc
)
```

`## Error in eval(expr, envir, enclos): object 'mcmc_fitted_prev' not found`

```
pairs(comb[, c("MCMC", "INLA_SPDE", "INLA_BYM")],
pch = 19, cex = 0.3, col = "orange", lower.panel = panel.smooth
)
```

`## Error in eval(expr, envir, enclos): object 'comb' not found`

`## Error in eval(expr, envir, enclos): object 'comb' not found`

The spatial patterns of the infection rate estimates for each model are displayed below.

`sp_data@data$est.rate.mcmc <- mcmc_fitted_prev`

`## Error in eval(expr, envir, enclos): object 'mcmc_fitted_prev' not found`

```
domain <- sp_data@data$est.rate.mcmc
legend_values <- sp_data@data$est.rate.mcmc
fdmr::plot_map_leaflet(
polygon_data = sp_data,
domain = domain,
palette = "Reds",
legend_title = "Rate",
add_scale_bar = TRUE,
polygon_fill_opacity = 0.8,
)
```

`## Error: 'plot_map_leaflet' is not an exported object from 'namespace:fdmr'`

```
sp_data@data$est.rate.inlabym <- INLA_model$summary.fitted.values$mean
domain <- sp_data@data$est.rate.inlabym
legend_values <- sp_data@data$est.rate.inlabym
fdmr::plot_map_leaflet(
polygon_data = sp_data,
domain = domain,
palette = "Reds",
legend_title = "Rate",
add_scale_bar = TRUE,
polygon_fill_opacity = 0.8,
)
```

`## Error: 'plot_map_leaflet' is not an exported object from 'namespace:fdmr'`

```
sp_data@data$est.rate.inlaspde <- inlabru_model$summary.fitted.values$mean[1:nrow(s_covid)]
domain <- sp_data@data$est.rate.inlaspde
legend_values <- sp_data@data$est.rate.inlaspde
fdmr::plot_map_leaflet(
polygon_data = sp_data,
domain = domain,
palette = "Reds",
legend_title = "Rate",
add_scale_bar = TRUE,
polygon_fill_opacity = 0.8,
)
```

`## Error: 'plot_map_leaflet' is not an exported object from 'namespace:fdmr'`

Now we consider the scenario of fitting the models with covariates of interest. We select several risk factors used in the COVID-19 tutorial. The models described above are fitted and compared again after incorporating covariate information. First we fit the BHM using MCMC method.

```
form <- total.cases ~ 1 + offset(log(population)) +
IMD + perc.wb + perc.ba + age1 + pm25
MCMC_model2 <- CARBayes::S.CARbym(
formula = form,
data = s_covid,
family = "poisson",
W = w,
burnin = 10000,
n.sample = 30000,
thin = 10,
verbose = F
)
```

`## Error in loadNamespace(x): there is no package called 'CARBayes'`

Next we fit the BHM using the INLA-BYM approach.

```
formula <- total.cases ~ 1 + f(ID,
model = "bym",
graph = w
) + IMD + perc.wb + perc.ba + age1 + pm25
INLA_model2 <- INLA::inla(formula,
data = s_covid,
family = "poisson",
E = s_covid$population,
control.compute = list(
dic = TRUE,
waic = TRUE,
cpo = TRUE
),
verbose = FALSE
)
```

Next we fit the BHM using the INLA-SPDE approach.

```
cmp <- total.cases ~ 0 + Intercept(1) + f(main = coordinates, model = spde) +
IMD + perc.wb + perc.ba + age1 + pm25
inlabru_model2 <- inlabru::bru(cmp,
data = s_covid_cp,
family = "poisson",
E = s_covid_cp$population,
control.family = list(link = "log"),
options = list(
verbose = FALSE
)
)
```

The regression coefficients of the selected covariates for all models are compared. In general, the models have similar regression coefficients estimates.

```
regr_est <- cbind.data.frame(
"MCMC" = MCMC_model2$summary.results[1:(nrow(MCMC_model2$summary.results) - 2), 1],
"INLA_BYM" = INLA_model2$summary.fixed[, 1],
"INLA_SPDE" = inlabru_model2$summary.fixed[, 1]
)
```

`## Error in eval(expr, envir, enclos): object 'MCMC_model2' not found`

`regr_est`

`## Error in eval(expr, envir, enclos): object 'regr_est' not found`

The modelling performance and results of the MCMC-BYM, INLA-BYM and INLA-SPDE approaches can be affected by the choice of the spatial model, the inference method and the data characteristics. MCMC-BYM and INLA-BYM use different inference methods, with the latter being faster. In the MCMC-BYM approach, spatial correlation structure is captured by a $n\times n$ adjacency matrix. Specifically, if two locations are adjacent in geography, they are spatially correlated, leading to data smoothing between them. However, the INLA-SPDE approach uses the Matern model, where spatial correlation depends on the distance between two locations, rather than their geographic adjacency. Thus, even if two locations are not geographically adjacent, they can still exhibit correlation.

*Annals of the Institute of Statistical Mathematics*43: 1–20.

*Journal of Statistical Software*55 (13): 1–24.