### Setting up the R environment

If you do not have `INLA`

or `inlabru`

installed please check our installation
instructions before continuing. First we load INLA and then download
the data for this tutorial from our data repository using
`retrieve_tutorial_data`

.

`library(INLA)`

`## Loading required package: Matrix`

`## Loading required package: sp`

```
## This is INLA_24.06.27 built 2024-06-27 02:36:04 UTC.
## - See www.r-inla.org/contact-us for how to get help.
## - List available models/likelihoods/etc with inla.list.models()
## - Use inla.doc(<NAME>) to access documentation
```

`fdmr::retrieve_tutorial_data(dataset = "hydro")`

```
##
## Tutorial data extracted to /home/runner/fdmr/tutorial_data/hydro
```

### Kvilldal dam area

First we will look at the location of the dam by making a map and plotting the stream gauges on it.

```
norway_polygon_path <- fdmr::get_tutorial_datapath(dataset = "hydro", filename = "Kvilldal_Catch_Boundary.geojson")
norway_polygon <- sf::read_sf(norway_polygon_path) %>% sf::st_zm()
sfc <- sf::st_transform(norway_polygon, crs = "+proj=longlat +datum=WGS84")
```

We want to mark the position of the dam and the stream gauges on the
map so we’ll create a `data.frame`

to pass into the
`plot_map`

function with the names and positions. We’ll
create a list for each of the points and then use the
(`dplyr::bind_rows`

)[https://dplyr.tidyverse.org/reference/bind.html]
function to create a `data.frame`

from these.

```
suldalsvatnet_dam <- list(longitude = 6.517174, latitude = 59.490720, label = "Suldalsvatnet Dam")
stream_gauge_13 <- list(longitude = 6.5395789, latitude = 59.5815849, label = "Stream Gauge 13")
stream_gauge_14 <- list(longitude = 6.7897968, latitude = 59.7531662, label = "Stream Gauge 14")
points <- list(suldalsvatnet_dam, stream_gauge_13, stream_gauge_14)
markers <- dplyr::bind_rows(points)
```

We’re now ready to plot the map, passing in the polygon data, markers and optionally the fill opacity.

`fdmr::plot_map(polygon_data = sfc, markers = markers, polygon_fill_opacity = 0.5)`

On the map you can see the dam, the resevoir, the catchment area, and the two stream gauges. The area inside the shape is where water accumulates, the area outside of the boundary has water that goes elsewhere. This will be important later when we are including ERA5-land precipitation data.

### Stream gauge data

The stream gauge data is measured as the average daily liters/second that pass through the area. This data goes back for many years, sometimes decades, so we’ll take a subset of the data for October 2021.

```
streamdata_13 <- fdmr::get_tutorial_datapath(dataset = "hydro", filename = "NVEobservations_s36_13.csv")
streamdata_14 <- fdmr::get_tutorial_datapath(dataset = "hydro", filename = "NVEobservations_s36_14.csv")
data_13 <- read.csv(streamdata_13)
data_13$date <- as.Date(data_13$time)
data_13 <-
subset(data_13, date >= "2021-10-01" & date <= "2021-10-31")
row.names(data_13) <- NULL
data_13$time_index <- seq(1, 31, 1)
data_14 <- read.csv(streamdata_14)
data_14$date <- as.Date(data_14$time)
data_14 <-
subset(data_14, date >= "2021-10-01" & date <= "2021-10-31")
row.names(data_14) <- NULL
data_14$time_index <- seq(1, 31, 1)
```

We can now plot the average stream flow using
`plot_timeseries`

`fdmr::plot_timeseries(data = data_13, x = "time", y = "value", x_label = "Time", y_label = "Stream Flow: Daily Average (Liter/s)", line_colour = "violet")`

`fdmr::plot_timeseries(data = data_14, x = "time", y = "value", x_label = "Time", y_label = "Stream Flow: Daily Average (Liter/s)", line_colour = "limegreen")`

We rescale the data with the Z-score since the values are so different between the two stream gauges

You can see, these two different stream gauges record a very different amount of stream flow. Therefore we should likely normalize this data using the zscore. The response variable in this study is the stream flow measurement. The amount of water that passes through the stream gauge is representative of the amount of water that will accumulate in the resevoir. Thus, in order to understand how this changes over time, it will be important to understand what physical processes drive water into the streams that feed the resevoir.

### ERA5-land data

ERA5-land data includes many variables. Here we use only the daily precipitation data. ERA5-land precipitation comes from reanalysis of the climate model. ERA5-land data is gridded however we need the data to be assigned to catchment shapes. We’ll first load in the data, ensure it is projected using the correct coordinate reference system (CRS) and then plot it on a map.

```
era5_location <- fdmr::get_tutorial_datapath(dataset = "hydro", filename = "era5_land_daily.nc")
era5_precip <- raster::stack(era5_location)
```

`## Loading required namespace: ncdf4`

```
sr <- "+proj=utm +zone=32"
projected_raster <- raster::projectRaster(era5_precip, crs = sr)
era5_precip_cropped <- raster::mask(projected_raster, norway_polygon)
era5_precip_cropped <- terra::crop(era5_precip_cropped, raster::extent(norway_polygon), snap = "near")
```

Next we create a `data.frame`

`raster_df`

containing precipitation data. We convert the `RasterBrick`

of precipitation data we just plotted into a `data.frame`

. We
then want to get only the day component from the date of each date. If
we have a look at the names of each layer in the
`RasterBrick`

you can see that each is prepended with an X,
this is due to `raster`

not allowing layer names that start
with numbers. First we need to remove the X, which we do with
`numbers_only`

. Then we convert the strings to
`Date`

objects using `to_dates`

before finally
using `lubridate::day`

to get just the day component of the
date. Finally, we set the names of the columns in the
`data.frame`

.

```
raster_df <- data.frame(raster::rasterToPoints(era5_precip_cropped))
raster_df <- data.table::melt(data.table::setDT(raster_df), id.vars = c("x", "y"), variable.name = "time")
raster_df$time <- lubridate::day(fdmr::to_dates(fdmr::numbers_only(raster_df$time)))
raster_df <- setNames(raster_df, c("x", "y", "time", "precip"))
```

We need to figure out which ERA5 precipitation pixels are closest to which stream gauge so we can match the data for the regression. First we need to calculate the distance between the pixel and each stream gauge, then use the minimum distance to choose the stream gauge. We first convert the lat/long coordinates to UTM and then match the pixel precipation data to the nearest stream gauge.

```
pixel_coords <- unique(sp::coordinates(era5_precip_cropped))
s13_utm <- fdmr::latlong_to_utm(
lat = stream_gauge_13$latitude,
lon = stream_gauge_13$longitude
)
s14_utm <- fdmr::latlong_to_utm(
lat = stream_gauge_14$latitude,
lon = stream_gauge_14$longitude
)
s13 <- data.frame(x = s13_utm[[1]], y = s13_utm[[2]])
s14 <- data.frame(x = s14_utm[[1]], y = s14_utm[[2]])
pixel_dist_gauge <- data.frame(
x = pixel_coords[, 1],
y = pixel_coords[, 2],
dist_s1 = rep(NA, nrow(pixel_coords)),
dist_s2 = rep(NA, nrow(pixel_coords)),
min_s = rep(NA, nrow(pixel_coords)),
streamflow = rep(NA, nrow(pixel_coords))
)
```

```
for (i in seq_len(nrow(pixel_coords))) {
pixel_dist_gauge$dist_s1[i] <- sqrt((pixel_dist_gauge$x[i] - s13$x)^2 + (pixel_dist_gauge$y[i] - s13$y)^2)
pixel_dist_gauge$dist_s2[i] <- sqrt((pixel_dist_gauge$x[i] - s14$x)^2 + (pixel_dist_gauge$y[i] - s14$y)^2)
pixel_dist_gauge$min_s[i] <- which.min(c(pixel_dist_gauge$dist_s1[i], pixel_dist_gauge$dist_s2[i]))
}
```

The `data.frame`

`pixel_dist_gauge`

then needs
to be replicated by the number of time points, in this case, layers in
the `RasterBrick`

.

```
n_time <- raster::nlayers(era5_precip)
pixel_dist_gauge <- do.call(rbind, replicate(n_time, pixel_dist_gauge, simplify = FALSE))
pixel_dist_gauge$time <- rep(1:n_time, each = nrow(pixel_coords))
streamdata <- list(data_13, data_14)
get_stream_data <- function(row) {
which_stream_data <- streamdata[[row["min_s"]]]
data_at_row_time_index <- which_stream_data[which_stream_data[, c("time_index")] == row["time"], ]
return(data_at_row_time_index$value)
}
pixel_dist_gauge$streamflow <- apply(pixel_dist_gauge, 1, get_stream_data)
```

Now we create a new `SpatialPointsDataFrame`

called
`inla_data`

for model fitting. The `data.frame`

contains the response and predictor data at each spatial location and
time point.

```
inla_data <- merge(pixel_dist_gauge, raster_df, by = c("time", "x", "y"))
inla_data <- subset(inla_data, select = c("x", "y", "streamflow", "precip", "time"))
sp::coordinates(inla_data) <- c("x", "y")
head(inla_data@data)
```

```
## streamflow precip time
## 1 -0.7379358 0.007305657 1
## 2 -0.7379358 0.006965242 1
## 3 -0.7379358 0.006590370 1
## 4 -0.7379358 0.006557765 1
## 5 -0.7379358 0.007222239 1
## 6 -0.7379358 0.006858168 1
```

`head(inla_data@coords)`

```
## x y
## 1 331417.8 6583370
## 2 342677.8 6594470
## 3 348307.8 6594470
## 4 353937.8 6594470
## 5 353937.8 6605570
## 6 359567.8 6594470
```

## Creating the BHM using 4D-Modeller

In order to model this with the 4D-Modeller we need to:

- Create a spacial mesh which the SPDE model can be evaluated over
- Build the SPDE model
- Define how the process evolves over time

### Mesh resolution

Create the triangulated mesh of the study region and have a quick look at it.

```
e <- era5_precip_cropped@extent
resolution <- raster::res(era5_precip_cropped)
crs <- "+proj=utm +zone=32"
xres <- resolution[1] * 2
yres <- resolution[2] * 2
xy <- sp::coordinates(era5_precip_cropped)
xy <- xy[seq(1, nrow(xy), by = 2), ]
colnames(xy) <- c("LONG", "LAT")
mesh <- fmesher::fm_mesh_2d_inla(loc = xy, max.edge = c(xres * 1, xres * 100), cutoff = 75, crs = crs)
fdmr::plot_mesh(mesh)
```

### Priors

The priors here are describing how the unobserved variannce is distributed over the region. That is, how far away from the center of the process should it cease to spatially correlate with the surrounding environment. To put it a different way, if it is raining on a hill top, how far from that hill must you be to not notice if it’s raining or not.

In this case, we set the prior range to be 20km.

```
# prior.range<-0.296
# the prior range is the distance that the process should stop effecting, so in this case it is currently 20km away from the node center
prior_range <- 20.0
spde <- INLA::inla.spde2.pcmatern(
mesh = mesh,
prior.range = c(prior_range, 0.5),
prior.sigma = c(1, 0.01)
)
```

the model has some time dependency.

In order to fit the model, we also need to define a temporal index and the number of discrete time points we want to model. The INLA model requires that time indicies must be an integer starting at 1.

### Model Inference

The model below predicts the streamflow given the precipitation as a fixed effect and an SPDE model which assumes there is some other correlation structure in the streamflow data due to unobserved variables. Likely unobserved co-variates could be local elevation, temperature, soil permeability, or presence of vegetation.

```
formula1 <- streamflow ~ 0 + Intercept(1) + precip
formula2 <- streamflow ~ 0 + Intercept(1) + precip +
f(
main = coordinates,
model = spde,
group = time,
ngroup = n_groups,
control.group = list(
model = "ar1",
hyper = rhoprior
)
)
```

```
inlabru_model1 <- inlabru::bru(formula1,
data = inla_data,
family = "gaussian",
options = list(
verbose = FALSE
)
)
inlabru_model2 <- inlabru::bru(formula2,
data = inla_data,
family = "gaussian",
options = list(
verbose = FALSE
)
)
```

Now it’s time to see its output, we can use the
`fdmr::model_viewer`

and pass in our model output, mesh and
measurement data. This Shiny app helps analyse model output by
automatically creating plots and a map of predictions for your data.
Make sure you have a CRS set on your mesh. If no CRS can be read a
default of `"+proj=longlat +datum=WGS84"`

will be used which
may lead to errors with data using UTM as we’ve done here.

`fdmr::model_viewer(model_output = inlabru_model2, mesh = mesh, measurement_data = inla_data, data_distribution = "Gaussian")`

We can also view the summary of the model outputs

`summary(inlabru_model1)`

```
## inlabru version: 2.11.1
## INLA version: 24.06.27
## Components:
## Intercept: main = linear(1), group = exchangeable(1L), replicate = iid(1L)
## precip: main = linear(precip), group = exchangeable(1L), replicate = iid(1L)
## Likelihoods:
## Family: 'gaussian'
## Data class: 'SpatialPointsDataFrame'
## Predictor: streamflow ~ .
## Time used:
## Pre = 0.393, Running = 0.191, Post = 0.421, Total = 1.01
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## Intercept -0.248 0.049 -0.343 -0.248 -0.152 -0.248 0
## precip 23.923 3.503 17.052 23.923 30.792 23.923 0
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant
## Precision for the Gaussian observations 1.09 0.052 0.99 1.09
## 0.975quant mode
## Precision for the Gaussian observations 1.20 1.09
##
## Deviance Information Criterion (DIC) ...............: 2394.72
## Deviance Information Criterion (DIC, saturated) ....: 873.36
## Effective number of parameters .....................: 2.98
##
## Watanabe-Akaike information criterion (WAIC) ...: 2394.28
## Effective number of parameters .................: 2.53
##
## Marginal log-Likelihood: -1215.68
## is computed
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
```

`summary(inlabru_model2)`

```
## inlabru version: 2.11.1
## INLA version: 24.06.27
## Components:
## Intercept: main = linear(1), group = exchangeable(1L), replicate = iid(1L)
## precip: main = linear(precip), group = exchangeable(1L), replicate = iid(1L)
## f: main = spde(coordinates), group = ar1(time), replicate = iid(1L)
## Likelihoods:
## Family: 'gaussian'
## Data class: 'SpatialPointsDataFrame'
## Predictor: streamflow ~ .
## Time used:
## Pre = 0.368, Running = 43, Post = 2.4, Total = 45.8
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## Intercept -0.165 0.298 -0.753 -0.165 0.421 -0.165 0
## precip 14.653 6.155 2.593 14.643 26.767 14.643 0
##
## Random effects:
## Name Model
## f SPDE2 model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant
## Precision for the Gaussian observations 3.01e+04 3366.005 2.47e+04 2.97e+04
## Range for f 6.69e+04 3758.489 5.99e+04 6.68e+04
## Stdev for f 8.21e-01 0.049 7.31e-01 8.19e-01
## GroupRho for f 7.28e-01 0.023 6.82e-01 7.28e-01
## 0.975quant mode
## Precision for the Gaussian observations 3.78e+04 2.85e+04
## Range for f 7.47e+04 6.65e+04
## Stdev for f 9.22e-01 8.14e-01
## GroupRho for f 7.71e-01 7.29e-01
##
## Deviance Information Criterion (DIC) ...............: -5638.18
## Deviance Information Criterion (DIC, saturated) ....: 1703.77
## Effective number of parameters .....................: 836.86
##
## Watanabe-Akaike information criterion (WAIC) ...: -5869.45
## Effective number of parameters .................: 438.54
##
## Marginal log-Likelihood: 63.78
## is computed
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
```