Comparing model types - site replacement

Author

Murray Logan

Published

06/07/2025

1 Purpose

The purpose of this page is to contrast a few different approaches to modelling broad scale spatio-temporal that are suitable for both very large and small sample data sets in the context of the upcoming GCRMN report. More specifically, we have identified the following modelling challenges for which analyses of simulated data will hopefully provide model choice guidance:

Site replacements: when monitored sites are removed from a design and replaced with alternative sites. Of particular concern is the situation in which a poor site is discontinued (perhaps it becomes so degraded it no longer functions as a coral reef) and is replaced by a relatively good new site (with the argument that there is no value in establishing a new poor site that might similarly disappear in the near future). Unfortunately, this shifts the bias and has the potential to confound the timeseries.
Unsampled years: when the timeseries are punctuated with missing years. Ideally, it is important that the analyses are able to estimate trends across the full timeseries. Hence estimates are necessary, even when data are unavailable.
Use of covariates: whilst other covariates (such as sea surface temperature and cyclones could be used to inform trends at locations and times for which observed data are lacking, they also have the potential to induce new biases. For example, if models are “trained” with wave energy (or wind) data in which no major storms occurred and coral cover grew, it is possible that the model could “learn” that coral cover is positively correlated with wave energy. When predicting to a different area that did experience substantial stores (that might be expected to have a negative impact on coral cover), such a model might erroneously predict substantial increases in coral cover.

There are also additional complications of using covariates that are likely to represent acute impacts. Whilst, coral cover can decline rapidly in response to a disturbance, it typically takes time to recover (if at all) to its pre-disturbance state. As a result, the state of coral at any specific time is the result of an accumulation of conditions over the past and not just the most recent conditions. To account for this in a modelling perspective, is likely to require either:
- Inclusion of the previous years cover estimate in a model (which effectively means the model is parameterised on changes in cover). This is obviously infeasible for the first year of a monitoring program or a program without fixed sites and also becomes a proportionally bigger issue with shorter the time series.
- Inclusion of multiple lagged versions of each of the covariates that are likely to represent causal disturbances.

2 Preparations

[1] "/home/runner/work/gcrmn_model_dev/gcrmn_model_dev/docs"

Load the necessary R libraries

        library(tidyverse) # for data manipulation and visualisation
        library(sf) # for spatial data handling and visualisation
        library(glmmTMB) # for frequentist GLMM
        library(emmeans) # for post-hoc analysis
        library(DHARMa) # for model diagnostics
        library(patchwork)
        library(brms) # for Bayesian GLMM
        library(rstan) # for Bayesian modelling
        library(bayesplot) # for Bayesian plotting
        library(tidybayes) # for tidy bayesian analysis
        library(posterior) # for posterior analysis
        library(gbm) # for gradient boosted regression trees
        library(dbarts) # for Bayesian additive regression trees
        library(HDInterval) # for Bayesian credible intervals

Establish the global environment and paths

        assign(x = "data_path", value = "../data/", envir = .GlobalEnv)
        assign(x = "output_path", value = "../output/", envir = .GlobalEnv)
        paths <- list(
          data_path = data_path,
          synthetic_path = paste0(data_path, "synthetic/"),
          output_path = output_path,
          fig_path = paste0(output_path, "figures/")
        )
        lapply(paths, function(x) {
          if (!dir.exists(x)) {
            dir.create(x)
          }
        })

Load any helper functions

        source("model_functions.R")
        ## source("helper_functions.R")

import pandas as pd
import numpy as np
import xarray as xr
import pymc as pm
import arviz as az
import preliz as pz
import pymc_bart as pmb
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import os
import pickle

# Assign a variable to the global namespace
data_path = "../data/"
output_path = "../output/"
# Create a dictionary of paths
paths = {
    "data_path": data_path,
    "synthetic_path": f"{data_path}synthetic/",
    "fig_path": f"{output_path}figures/"
}
# Ensure the directories exist
for path in paths.values():
    if not os.path.exists(path):
        os.makedirs(path)

3 Import data

These data represent the synthesized “truth” and are thereby useful to compare to the various modelled outcomes.

      benthos_reefs_sf <- readRDS(file = paste0(
        data_path,
        "synthetic/benthos_reefs_sf.rds"
      ))

benthos_reefs_sf

Simple feature collection with 267132 features and 8 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 0.1086254 ymin: -20.96124 xmax: 9.508625 ymax: -9.981244
Geodetic CRS:  WGS 84
# A tibble: 267,132 × 9
    Year Reef              geometry   HCC     SC    MA    CYC   DHW OTHER
   <int> <chr>          <POINT [°]> <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl>
 1     1 Reef1 (9.398625 -19.95124) 0.310 0.0750 0.565 0      0.399 0.512
 2     2 Reef1 (9.398625 -19.95124) 0.270 0.0627 0.592 0.570  0.379 0.745
 3     3 Reef1 (9.398625 -19.95124) 0.279 0.0652 0.586 0      0.417 0.486
 4     4 Reef1 (9.398625 -19.95124) 0.291 0.0691 0.578 0.0180 0.340 0.615
 5     5 Reef1 (9.398625 -19.95124) 0.301 0.0722 0.571 0      0.419 0.429
 6     6 Reef1 (9.398625 -19.95124) 0.316 0.0769 0.561 0      0.318 0.728
 7     7 Reef1 (9.398625 -19.95124) 0.328 0.0811 0.553 0      0.355 0.648
 8     8 Reef1 (9.398625 -19.95124) 0.309 0.0748 0.565 0      0.637 0.691
 9     9 Reef1 (9.398625 -19.95124) 0.261 0.0599 0.599 0      0.961 0.575
10    10 Reef1 (9.398625 -19.95124) 0.246 0.0557 0.609 0      0.650 0.519
# ℹ 267,122 more rows

Show the code

      benthos_reefs_temporal_summary <- benthos_reefs_sf |>
        st_drop_geometry() |>
        group_by(Year) |>
        summarise(Mean = mean(HCC),
          Median = median(HCC),
          SD = sd(HCC),
          Lower = quantile(HCC, 0.025),
          Upper = quantile(HCC, 0.975))
      benthos_reefs_temporal_summary |> 
        saveRDS(
          file = paste0(
            data_path,
            "synthetic/benthos_reefs_temporal_summary.rds"
          )
        )

Show the code

      benthos_reefs_temporal_summary <- 
        readRDS(
          file = paste0(
            data_path,
            "synthetic/benthos_reefs_temporal_summary.rds"
          )
        )
      g <- benthos_reefs_temporal_summary |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = Lower, ymax = Upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = Mean, colour = "mean")) +
        geom_line(aes(x = Year, y = Median, colour = "median")) +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_all_temporal_summary_plot.png"
        ),
        g,
        width = 8, height = 6, dpi = 72
      )

These data represent a simulated sampling design in which 25 fixed (though randomly selected from among all the defined reefs in the synthetic spatial domain) reefs are monitored over a continuous 12 year period.

      benthos_fixed_locs_obs <- readRDS(file = paste0(
        data_path,
        "synthetic/benthos_fixed_locs_obs.rds"
      ))
      benthos_fixed_locs_obs <- benthos_fixed_locs_obs |>
        left_join(
          benthos_reefs_sf |>
            st_drop_geometry() |>
            dplyr::select(Year, Reef, CYC, DHW, OTHER) |>
            group_by(Year, Reef) |>
            summarise(across(c(CYC, DHW, OTHER), mean)),
          by = c("Year", "Reef")
        )

`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.

benthos_fixed_locs_obs

# A tibble: 3,000 × 13
   Reef    Longitude Latitude Site  Transect  Year   HCC    SC    MA
   <chr>       <dbl>    <dbl> <chr> <chr>    <int> <dbl> <dbl> <dbl>
 1 Reef103      3.66    -19.5 S1    T1           1  18.6  3.99  54.9
 2 Reef103      3.66    -19.5 S1    T1           2  19.0  4.32  50.5
 3 Reef103      3.66    -19.5 S1    T1           3  19.0  4.18  48.7
 4 Reef103      3.66    -19.5 S1    T1           4  21.6  4.81  50.2
 5 Reef103      3.66    -19.5 S1    T1           5  25.4  5.20  49.0
 6 Reef103      3.66    -19.5 S1    T1           6  21.2  5.47  46.3
 7 Reef103      3.66    -19.5 S1    T1           7  22.2  5.26  45.6
 8 Reef103      3.66    -19.5 S1    T1           8  23.2  5.18  46.1
 9 Reef103      3.66    -19.5 S1    T1           9  19.7  4.43  47.2
10 Reef103      3.66    -19.5 S1    T1          10  17.4  4.09  54.4
# ℹ 2,990 more rows
# ℹ 4 more variables: Date <dttm>, CYC <dbl>, DHW <dbl>, OTHER <dbl>

Show the code

      benthos_fixed_locs_obs <- benthos_fixed_locs_obs |>
        mutate(
          fYear = as.factor(Year),
          Reef = as.factor(Reef),
          Site = interaction(Reef, Site),
          Transect = interaction(Site, Transect),
          cover = HCC/100
        )
      print(benthos_fixed_locs_obs)
      ## Raw summary means
      all_sampled_sum <-
        benthos_fixed_locs_obs |>
        group_by(fYear) |>
        summarise(
          mean_mean_response = mean(cover),
          median_median_response = median(cover),
          mean_sd = sd(cover),
          mean_lower_lower = mean_mean_response - 1.96 * mean_sd,
          mean_upper_upper = mean_mean_response + 1.96 * mean_sd,
          median_lower_lower = quantile(cover, 0.025),
          median_upper_upper = quantile(cover, 0.975)
        ) |>
        dplyr::select(-mean_sd) |>
        pivot_longer(
          cols = c(
            mean_mean_response, median_median_response, mean_lower_lower,
            mean_upper_upper, median_lower_lower, median_upper_upper
          ),
          names_to = c("type", "variable", "stat"),
          names_sep = "_",
          values_to = "values"
        ) |>
        mutate(type = paste("all_sampled_", type)) |>
        dplyr::select(-variable) |>
        pivot_wider(
          names_from = stat,
          values_from = values
        ) |>
        mutate(Year = as.numeric(as.character(fYear)))
      print(all_sampled_sum)

Show the code

      g <- all_sampled_sum |>
        ggplot(aes(x = Year, y = response, color = type, fill = type)) +
        geom_line() +
        geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) +
        scale_y_continuous(
          name = "Hard Coral Cover (%)",
          limits = c(0, 1),
          labels = scales::percent_format(accuracy = 1)
        ) +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_full_simple_raw_means_plot.png"
        ),
        g,
        width = 8, height = 6, dpi = 72
      )

These data represent the fixed 25 sites, yet the site with the lowest coral cover is discontinued after 9 years and the highest performing reef is only introduced after 9 years. In this way, these data are intended to mimic the scenario in which a poor reef is replaced by a relatively good reef.

      benthos_fixed_locs_obs_1 <- readRDS(
        file = paste0(
          data_path,
          "synthetic/benthos_fixed_locs_obs_1.rds"
        )
      )
      benthos_fixed_locs_obs_1 <- benthos_fixed_locs_obs_1 |>
        left_join(
          benthos_reefs_sf |>
            st_drop_geometry() |>
            dplyr::select(Year, Reef, CYC, DHW, OTHER) |>
            group_by(Year, Reef) |>
            summarise(across(c(CYC, DHW, OTHER), mean)),
          by = c("Year", "Reef")
        )

`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.

benthos_fixed_locs_obs_1

# A tibble: 2,880 × 13
   Reef    Longitude Latitude Site  Transect  Year   HCC    SC    MA
   <chr>       <dbl>    <dbl> <chr> <chr>    <int> <dbl> <dbl> <dbl>
 1 Reef103      3.66    -19.5 S1    T1           1  18.6  3.99  54.9
 2 Reef103      3.66    -19.5 S1    T1           2  19.0  4.32  50.5
 3 Reef103      3.66    -19.5 S1    T1           3  19.0  4.18  48.7
 4 Reef103      3.66    -19.5 S1    T1           4  21.6  4.81  50.2
 5 Reef103      3.66    -19.5 S1    T1           5  25.4  5.20  49.0
 6 Reef103      3.66    -19.5 S1    T1           6  21.2  5.47  46.3
 7 Reef103      3.66    -19.5 S1    T1           7  22.2  5.26  45.6
 8 Reef103      3.66    -19.5 S1    T1           8  23.2  5.18  46.1
 9 Reef103      3.66    -19.5 S1    T1           9  19.7  4.43  47.2
10 Reef103      3.66    -19.5 S1    T1          10  17.4  4.09  54.4
# ℹ 2,870 more rows
# ℹ 4 more variables: Date <dttm>, CYC <dbl>, DHW <dbl>, OTHER <dbl>

Show the code

      g <- benthos_fixed_locs_obs_2 |>
        ggplot() +
        geom_line(aes(
          y = HCC, x = Year, colour = Site,
          group = interaction(Site, Transect)
        )) +
        facet_wrap(~Reef) +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_sampled_reefs_2_plot.png"
        ),
        g,
        width = 8, height = 6, dpi = 72
      )

These data represent a more extreme version of the above reef replacement situation in that there are only a total of five reefs and thus the replacement represents a higher fraction of reefs.

      benthos_fixed_locs_obs_2 <- readRDS(
        file = paste0(
          data_path,
          "synthetic/benthos_fixed_locs_obs_2.rds"
        )
      )
      benthos_fixed_locs_obs_2 <- benthos_fixed_locs_obs_2|>
        left_join(
          benthos_reefs_sf |>
            st_drop_geometry() |>
            dplyr::select(Year, Reef, CYC, DHW, OTHER) |>
            group_by(Year, Reef) |>
            summarise(across(c(CYC, DHW, OTHER), mean)),
          by = c("Year", "Reef")
        )

`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.

benthos_fixed_locs_obs_2

# A tibble: 480 × 13
   Reef    Longitude Latitude Site  Transect  Year   HCC    SC    MA
   <chr>       <dbl>    <dbl> <chr> <chr>    <int> <dbl> <dbl> <dbl>
 1 Reef170      1.34    -20.9 S1    T1           1  13.0  3.41  55.0
 2 Reef170      1.34    -20.9 S1    T1           2  13.8  3.53  55.0
 3 Reef170      1.34    -20.9 S1    T1           3  15.6  3.66  55.6
 4 Reef170      1.34    -20.9 S1    T1           4  16.8  4.02  54.3
 5 Reef170      1.34    -20.9 S1    T1           5  16.6  4.24  52.6
 6 Reef170      1.34    -20.9 S1    T1           6  18.3  4.69  58.3
 7 Reef170      1.34    -20.9 S1    T1           7  17.6  4.66  53.6
 8 Reef170      1.34    -20.9 S1    T1           8  15.3  4.32  54.6
 9 Reef170      1.34    -20.9 S1    T1           9  14.1  3.74  51.3
10 Reef170      1.34    -20.9 S1    T2           1  14.5  3.33  54.8
# ℹ 470 more rows
# ℹ 4 more variables: Date <dttm>, CYC <dbl>, DHW <dbl>, OTHER <dbl>

These data represent the synthesized “truth” and are thereby useful to compare to the various modelled outcomes.

benthos_reefs_sf = pd.read_csv(f"{paths['data_path']}synthetic/benthos_reefs_sf.csv")

pd.set_option("display.max_columns", None)  # Show all columns
benthos_reefs_sf

        Year     Reef                              geometry       HCC  \
0          1    Reef1  c(9.398625412978, -19.9512440470767)  0.309773   
1          2    Reef1  c(9.398625412978, -19.9512440470767)  0.270281   
2          3    Reef1  c(9.398625412978, -19.9512440470767)  0.278780   
3          4    Reef1  c(9.398625412978, -19.9512440470767)  0.291293   
4          5    Reef1  c(9.398625412978, -19.9512440470767)  0.301213   
...      ...      ...                                   ...       ...   
267127     8  Reef187  c(1.478625412978, -14.1412440470767)  0.194059   
267128     9  Reef187  c(1.478625412978, -14.1412440470767)  0.172821   
267129    10  Reef187  c(1.478625412978, -14.1412440470767)  0.157513   
267130    11  Reef187  c(1.478625412978, -14.1412440470767)  0.167410   
267131    12  Reef187  c(1.478625412978, -14.1412440470767)  0.178224   

              SC        MA       CYC       DHW     OTHER  
0       0.074957  0.565184  0.000000  0.398843  0.511869  
1       0.062682  0.592401  0.570183  0.378864  0.745148  
2       0.065237  0.586457  0.000000  0.417448  0.485992  
3       0.069083  0.577790  0.018019  0.339762  0.615030  
4       0.072207  0.570994  0.000000  0.419087  0.428713  
...          ...       ...       ...       ...       ...  
267127  0.022101  0.628042  0.000000  0.453441  0.828005  
267128  0.019233  0.646412  0.000000  0.766604  0.586145  
267129  0.017246  0.659733  0.000000  0.701376  0.603980  
267130  0.018523  0.651113  0.000000  0.321212  0.666380  
267131  0.019950  0.641727  0.000000  0.351262  0.487009  

[267132 rows x 9 columns]

pd.reset_option("display.max_columns")

Show the code

benthos_reefs_temporal_summary = (
    benthos_reefs_sf.groupby("Year").agg(
        Mean=("HCC", "mean"),
        Median=("HCC", "median"),
        SD=("HCC", "std"),
        Lower=("HCC", lambda x: x.quantile(0.025)),
        Upper=("HCC", lambda x: x.quantile(0.975))
    ).reset_index()
)

Show the code

# Create the plot
plt.figure(figsize=(10, 6))
# Add ribbon (shaded area)
plt.fill_between(
    benthos_reefs_temporal_summary["Year"],
    benthos_reefs_temporal_summary["Lower"],
    benthos_reefs_temporal_summary["Upper"],
    alpha=0.2,
    label="Confidence Interval"
)
# Add mean line
plt.plot(
    benthos_reefs_temporal_summary["Year"],
    benthos_reefs_temporal_summary["Mean"],
    label="Mean",
    color="blue"
)
# Add median line
plt.plot(
    benthos_reefs_temporal_summary["Year"],
    benthos_reefs_temporal_summary["Median"],
    label="Median",
    color="orange"
)
# Add labels and theme
plt.xlabel("Year")
plt.ylabel("Value")
plt.title("Temporal Summary")
plt.legend()
plt.grid(True)
# plt.show()
# Save the plot as a PNG file
plt.savefig(f"{paths['fig_path']}Python_all_temporal_summary_plot.png", dpi=300, bbox_inches="tight")

benthos_fixed_locs_obs = pd.read_csv(f"{paths['data_path']}synthetic/benthos_fixed_locs_obs.csv")
benthos_reefs_sf = pd.read_csv(f"{paths['data_path']}synthetic/benthos_reefs_sf.csv")
# Group and summarize the benthos_reefs_sf DataFrame
benthos_reefs_summary = (
    benthos_reefs_sf
    .drop(columns=["geometry"])  # Equivalent to st_drop_geometry()
    .groupby(["Year", "Reef"], as_index=False)
    .agg({
        "CYC": "mean",
        "DHW": "mean",
        "OTHER": "mean"
    })
)
# Perform the left join
benthos_fixed_locs_obs = benthos_fixed_locs_obs.merge(
    benthos_reefs_summary,
    on=["Year", "Reef"],
    how="left"
)

pd.set_option("display.max_columns", None)  # Show all columns
benthos_fixed_locs_obs

         Reef  Longitude   Latitude Site Transect  Year        HCC        SC  \
0     Reef103   3.658625 -19.531244   S1       T1     1  18.625436  3.985485   
1     Reef103   3.658625 -19.531244   S1       T1     2  18.974974  4.319049   
2     Reef103   3.658625 -19.531244   S1       T1     3  19.045657  4.176435   
3     Reef103   3.658625 -19.531244   S1       T1     4  21.643148  4.812177   
4     Reef103   3.658625 -19.531244   S1       T1     5  25.442402  5.198853   
...       ...        ...        ...  ...      ...   ...        ...       ...   
2995   Reef99   3.758625 -13.031244   S2       T5     8  22.853000  2.448697   
2996   Reef99   3.758625 -13.031244   S2       T5     9  21.445286  2.063128   
2997   Reef99   3.758625 -13.031244   S2       T5    10  21.156792  1.837525   
2998   Reef99   3.758625 -13.031244   S2       T5    11  18.795713  1.968396   
2999   Reef99   3.758625 -13.031244   S2       T5    12  21.955909  2.082333   

             MA                Date  CYC       DHW     OTHER  
0     54.913906   1-01-01T14:00:00Z  0.0  0.411396  0.345324  
1     50.489073   2-01-01T14:00:00Z  0.0  0.383427  0.500119  
2     48.694751   3-01-01T14:00:00Z  0.0  0.527733  0.561641  
3     50.156755   4-01-01T14:00:00Z  0.0  0.192189  0.416467  
4     48.970873   5-01-01T14:00:00Z  0.0  0.349549  0.587900  
...         ...                 ...  ...       ...       ...  
2995  55.811516   8-01-01T14:00:00Z  0.0  0.482534  0.349079  
2996  56.844522   9-01-01T14:00:00Z  0.0  0.864747  0.393180  
2997  59.206188  10-01-01T14:00:00Z  0.0  0.773226  0.368701  
2998  58.821929  11-01-01T14:00:00Z  0.0  0.371080  0.558885  
2999  57.269170  12-01-01T14:00:00Z  0.0  0.385996  0.491746  

[3000 rows x 13 columns]

pd.reset_option("display.max_columns")

Show the code

benthos_fixed_locs_obs = benthos_fixed_locs_obs.assign(
    fYear = benthos_fixed_locs_obs['Year'].astype('category'),
    Reef = benthos_fixed_locs_obs['Reef'].astype('category'),
    Site = benthos_fixed_locs_obs['Reef'].astype(str) + "_" +
    benthos_fixed_locs_obs['Site'].astype(str),
    Transect = benthos_fixed_locs_obs['Site'] + "_" +
    benthos_fixed_locs_obs['Transect'].astype(str),
    cover = benthos_fixed_locs_obs['HCC'] / 100
)
all_sampled_sum = (
    benthos_fixed_locs_obs.groupby("fYear").agg(
        mean_mean_response = ("cover", "mean"),
        median_median_response = ("cover", "median"),
        mean_sd = ("cover", "std"),
        median_lower_lower = ("cover", lambda x: x.quantile(0.025)),
        median_upper_upper = ("cover", lambda x: x.quantile(0.975))
    ).reset_index()
)
# Calculate additional columns
all_sampled_sum['mean_lower_lower'] = all_sampled_sum['mean_mean_response'] - 1.96 * all_sampled_sum['mean_sd']
all_sampled_sum['mean_upper_upper'] = all_sampled_sum['mean_mean_response'] + 1.96 * all_sampled_sum['mean_sd']
# Drop the 'mean_sd' column
all_sampled_sum = all_sampled_sum.drop(columns=['mean_sd'])
# Pivot longer
all_sampled_sum = all_sampled_sum.melt(
    id_vars=['fYear'],
    value_vars=[
        'mean_mean_response', 'median_median_response', 'mean_lower_lower',
        'mean_upper_upper', 'median_lower_lower', 'median_upper_upper'
    ],
    var_name='names',
    value_name='values'
)
# Split the 'names' column into 'type', 'variable', and 'stat'
all_sampled_sum[['type', 'variable', 'stat']] = all_sampled_sum['names'].str.split('_', expand=True)
# Modify the 'type' column
all_sampled_sum['type'] = 'all_sampled_' + all_sampled_sum['type']
# Drop the 'variable' column
all_sampled_sum = all_sampled_sum.drop(columns=['variable', 'names'])
# Pivot wider
all_sampled_sum = all_sampled_sum.pivot(
    index=['fYear', 'type'],
    columns='stat',
    values='values'
).reset_index()
# Convert 'fYear' to numeric
all_sampled_sum['Year'] = pd.to_numeric(all_sampled_sum['fYear'])
all_sampled_sum

Show the code

# Create the plot
plt.figure(figsize=(10, 6))
# Add ribbon (shaded area)
for _, group in all_sampled_sum.groupby('type'):
    plt.fill_between(
        group['Year'],
        group['lower'],
        group['upper'],
        alpha=0.2,
        label=_
    )
# Add mean line
for _, group in all_sampled_sum.groupby('type'):
    plt.plot(
        group["Year"],
        group["response"],
        label=_
    )
# Customize the y-axis
plt.ylim(0, 1)
plt.ylabel("Hard Coral Cover (%)")
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"{x*100:.0f}%"))
# Add theme and legend
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.legend(title='Type')
plt.title("Hard Coral Cover Over Time")
plt.tight_layout()
# Show the plot
#plt.show()
# Save the plot as a PNG file
plt.savefig(f"{paths['fig_path']}Python_full_simple_raw_means_plot.png",
            dpi=72, bbox_inches="tight")

benthos_fixed_locs_obs_1 = pd.read_csv(f"{paths['data_path']}synthetic/benthos_fixed_locs_obs_1.csv")
benthos_reefs_sf = pd.read_csv(f"{paths['data_path']}synthetic/benthos_reefs_sf.csv")
# Group and summarize the benthos_reefs_sf DataFrame
benthos_reefs_summary = (
    benthos_reefs_sf
    .drop(columns=["geometry"])  # Equivalent to st_drop_geometry()
    .groupby(["Year", "Reef"], as_index=False)
    .agg({
        "CYC": "mean",
        "DHW": "mean",
        "OTHER": "mean"
    })
)
# Perform the left join
benthos_fixed_locs_obs_1 = benthos_fixed_locs_obs_1.merge(
    benthos_reefs_summary,
    on=["Year", "Reef"],
    how="left"
)

pd.set_option("display.max_columns", None)  # Show all columns
benthos_fixed_locs_obs_1

         Reef  Longitude   Latitude Site Transect  Year        HCC        SC  \
0     Reef103   3.658625 -19.531244   S1       T1     1  18.625436  3.985485   
1     Reef103   3.658625 -19.531244   S1       T1     2  18.974974  4.319049   
2     Reef103   3.658625 -19.531244   S1       T1     3  19.045657  4.176435   
3     Reef103   3.658625 -19.531244   S1       T1     4  21.643148  4.812177   
4     Reef103   3.658625 -19.531244   S1       T1     5  25.442402  5.198853   
...       ...        ...        ...  ...      ...   ...        ...       ...   
2875   Reef99   3.758625 -13.031244   S2       T5     8  22.853000  2.448697   
2876   Reef99   3.758625 -13.031244   S2       T5     9  21.445286  2.063128   
2877   Reef99   3.758625 -13.031244   S2       T5    10  21.156792  1.837525   
2878   Reef99   3.758625 -13.031244   S2       T5    11  18.795713  1.968396   
2879   Reef99   3.758625 -13.031244   S2       T5    12  21.955909  2.082333   

             MA                Date  CYC       DHW     OTHER  
0     54.913906   1-01-01T14:00:00Z  0.0  0.411396  0.345324  
1     50.489073   2-01-01T14:00:00Z  0.0  0.383427  0.500119  
2     48.694751   3-01-01T14:00:00Z  0.0  0.527733  0.561641  
3     50.156755   4-01-01T14:00:00Z  0.0  0.192189  0.416467  
4     48.970873   5-01-01T14:00:00Z  0.0  0.349549  0.587900  
...         ...                 ...  ...       ...       ...  
2875  55.811516   8-01-01T14:00:00Z  0.0  0.482534  0.349079  
2876  56.844522   9-01-01T14:00:00Z  0.0  0.864747  0.393180  
2877  59.206188  10-01-01T14:00:00Z  0.0  0.773226  0.368701  
2878  58.821929  11-01-01T14:00:00Z  0.0  0.371080  0.558885  
2879  57.269170  12-01-01T14:00:00Z  0.0  0.385996  0.491746  

[2880 rows x 13 columns]

pd.reset_option("display.max_columns")

Show the code

# Create the plot
g = sns.FacetGrid(
    data=benthos_fixed_locs_obs_1,
    col="Reef",
    col_wrap=5,  # Adjust the number of columns in the facet grid
    height=2,
    aspect=1.6,
    sharey=True,
    sharex=True
)

# Add line plots to each facet
g.map_dataframe(
    sns.lineplot,
    x="Year",
    y="HCC",
    hue="Site",
    style="Transect",
    estimator=None,
    units="Transect",
    legend=False
)

# Customize the plot
g.set_axis_labels("Year", "Hard Coral Cover (HCC)")
g.set_titles(col_template="{col_name}")
g.add_legend(title="Site")
g.fig.tight_layout()

# Show the plot
#plt.show()

# Save the plot as a PNG file
plt.savefig(f"{paths['fig_path']}Python_sampled_reefs_1_plot.png", dpi=72, bbox_inches="tight")

These data represent a more extreme version of the above reef replacement situation in that there are only a total of five reefs and thus the replacement represents a higher fraction of reefs.

benthos_fixed_locs_obs_2 = pd.read_csv(f"{paths['data_path']}synthetic/benthos_fixed_locs_obs_2.csv")
benthos_reefs_sf = pd.read_csv(f"{paths['data_path']}synthetic/benthos_reefs_sf.csv")
# Group and summarize the benthos_reefs_sf DataFrame
benthos_reefs_summary = (
    benthos_reefs_sf
    .drop(columns=["geometry"])  # Equivalent to st_drop_geometry()
    .groupby(["Year", "Reef"], as_index=False)
    .agg({
        "CYC": "mean",
        "DHW": "mean",
        "OTHER": "mean"
    })
)
# Perform the left join
benthos_fixed_locs_obs_2 = benthos_fixed_locs_obs_2.merge(
    benthos_reefs_summary,
    on=["Year", "Reef"],
    how="left"
)

pd.set_option("display.max_columns", None)  # Show all columns
benthos_fixed_locs_obs_2

        Reef  Longitude   Latitude Site Transect  Year        HCC        SC  \
0    Reef170   1.338625 -20.901244   S1       T1     1  13.027202  3.413521   
1    Reef170   1.338625 -20.901244   S1       T1     2  13.841143  3.526345   
2    Reef170   1.338625 -20.901244   S1       T1     3  15.559364  3.658558   
3    Reef170   1.338625 -20.901244   S1       T1     4  16.838991  4.017966   
4    Reef170   1.338625 -20.901244   S1       T1     5  16.595779  4.244544   
..       ...        ...        ...  ...      ...   ...        ...       ...   
475   Reef90   4.038625 -14.171244   S2       T5     8  24.022636  3.109348   
476   Reef90   4.038625 -14.171244   S2       T5     9  21.299862  2.709066   
477   Reef90   4.038625 -14.171244   S2       T5    10  22.066287  2.528670   
478   Reef90   4.038625 -14.171244   S2       T5    11  21.524609  2.581125   
479   Reef90   4.038625 -14.171244   S2       T5    12  21.649099  2.688958   

            MA                Date  CYC       DHW     OTHER  
0    55.028757   1-01-01T14:00:00Z  0.0  0.474864  0.349540  
1    55.015181   2-01-01T14:00:00Z  0.0  0.452288  0.386229  
2    55.572046   3-01-01T14:00:00Z  0.0  0.444938  0.490269  
3    54.276855   4-01-01T14:00:00Z  0.0  0.304818  0.595595  
4    52.587258   5-01-01T14:00:00Z  0.0  0.409084  0.492792  
..         ...                 ...  ...       ...       ...  
475  54.103665   8-01-01T14:00:00Z  0.0  0.525576  0.329536  
476  56.139564   9-01-01T14:00:00Z  0.0  0.785465  0.509434  
477  59.131197  10-01-01T14:00:00Z  0.0  0.650841  0.488150  
478  56.729758  11-01-01T14:00:00Z  0.0  0.400474  0.677549  
479  58.822425  12-01-01T14:00:00Z  0.0  0.407944  0.519343  

[480 rows x 13 columns]

pd.reset_option("display.max_columns")

Show the code

# Create the plot
g = sns.FacetGrid(
    data=benthos_fixed_locs_obs_2,
    col="Reef",
    col_wrap=3,  # Adjust the number of columns in the facet grid
    height=3,
    aspect=1.6,
    sharey=True,
    sharex=True
)

# Add line plots to each facet
g.map_dataframe(
    sns.lineplot,
    x="Year",
    y="HCC",
    hue="Site",
    style="Transect",
    estimator=None,
    units="Transect",
    legend=False
)

# Customize the plot
g.set_axis_labels("Year", "Hard Coral Cover (HCC)")
g.set_titles(col_template="{col_name}")
g.add_legend(title="Site")
g.fig.tight_layout()

# Show the plot
#plt.show()

# Save the plot as a PNG file
plt.savefig(f"{paths['fig_path']}Python_sampled_reefs_2_plot.png", dpi=72, bbox_inches="tight")

4 Data preparations

Perform the following data preparation steps:

specifically declare any categorical variables (as factors in R, categories in python)
ensure that all levels of hierarchical (varying) effects have unique IDs
express cover as a proportion out of 1 as is necessary for modelling against a beta distribution
create a prediction grid containing all years in the monitoring range

      benthos_fixed_locs_obs_0 <- benthos_fixed_locs_obs |>
        mutate(
          fYear = as.factor(Year),
          Reef = as.factor(Reef),
          Site = interaction(Reef, Site),
          Transect = interaction(Site, Transect),
          cover = HCC/100
        )

      newdata_0 <-
        benthos_fixed_locs_obs_0 |>
        tidyr::expand(fYear)

      benthos_fixed_locs_obs_1 <- benthos_fixed_locs_obs_1 |>
        mutate(
          fYear = as.factor(Year),
          Reef = as.factor(Reef),
          Site = interaction(Reef, Site),
          Transect = interaction(Site, Transect),
          cover = HCC/100
        )

      newdata_1 <-
        benthos_fixed_locs_obs_1 |>
        tidyr::expand(fYear)

      benthos_fixed_locs_obs_2 <- benthos_fixed_locs_obs_2 |>
        mutate(
          fYear = as.factor(Year),
          Reef = as.factor(Reef),
          Site = interaction(Reef, Site),
          Transect = interaction(Site, Transect),
          cover = HCC/100
        )

      newdata_2 <-
        benthos_fixed_locs_obs_2 |>
        tidyr::expand(fYear)

benthos_fixed_locs_obs_0 = (
    benthos_fixed_locs_obs.assign(
        fYear=lambda df: df["Year"].astype("category"),
        Reef=lambda df: df["Reef"].astype("category"),
        Site=lambda df: (df["Reef"].astype("str") + "_" +
                         df["Site"]).astype("category"),  # Interaction of Reef and Site
        Transect=lambda df: (df["Site"].astype("str") + "_" +
                             df["Transect"]).astype("category"),  # Interaction of Site and Transect
        cover=lambda df: df["HCC"] / 100  # Calculate cover
    )
)

newdata_0 = pd.MultiIndex.from_product(
    [benthos_fixed_locs_obs_0["fYear"].unique()],
    names=["fYear"]
).to_frame(index=False)

benthos_fixed_locs_obs_1 = (
    benthos_fixed_locs_obs_1.assign(
        fYear=lambda df: df["Year"].astype("category"),
        Reef=lambda df: df["Reef"].astype("category"),
        Site=lambda df: (df["Reef"].astype("str") + "_" +
                         df["Site"]).astype("category"),  # Interaction of Reef and Site
        Transect=lambda df: (df["Site"].astype("str") + "_" +
                             df["Transect"]).astype("category"),  # Interaction of Site and Transect
        cover=lambda df: df["HCC"] / 100  # Calculate cover
    )
)

newdata_1 = pd.MultiIndex.from_product(
    [benthos_fixed_locs_obs_1["fYear"].unique()],
    names=["fYear"]
).to_frame(index=False)

benthos_fixed_locs_obs_2 = (
    benthos_fixed_locs_obs_2.assign(
        fYear=lambda df: df["Year"].astype("category"),
        Reef=lambda df: df["Reef"].astype("category"),
        Site=lambda df: (df["Reef"].astype("str") + "_" +
                         df["Site"]).astype("category"),  # Interaction of Reef and Site
        Transect=lambda df: (df["Site"].astype("str") + "_" +
                             df["Transect"]).astype("category"),  # Interaction of Site and Transect
        cover=lambda df: df["HCC"] / 100  # Calculate cover
    )
)

newdata_2 = pd.MultiIndex.from_product(
    [benthos_fixed_locs_obs_2["fYear"].unique()],
    names=["fYear"]
).to_frame(index=False)

5 Modelled trends

The following table lists datasets that are used in the comparison figures along with their respective legend entries.

view table

Dataset	Description	Legend label(s)
`benthos_reefs_sf`	All reef-level data
`benthos_reefs_temporal_summary`	All global-level data	all mean / all median
`all_sampled_sum`	Simple stats of all sampled data (25 sites)	all_sampled_mean / all_sampled_median

`benthos_fixed_locs_obs_0`	All sampled data (25 sites)
`mod_glmmTMB_0`	glmmTMB model of all sampled data (25 sites)
`glmmTMB_0_sum`	Estimated marginal means from all sampled data (25 sites)	glmmTMB
`mod_brms_0`	brms model of all sampled data (25 sites)
`brms_0_sum`	Estimated marginal means from all sampled data (25 sites)	brms
`mod_stan_0`	stan model of all sampled data (25 sites)
`stan_0_sum`	Estimated marginal means from all sampled data (25 sites)	stan
`mod_gbm_0`	gbm model of all sampled data (25 sites)
`gbm_0_sum`	Estimated predictions from all sampled data (25 sites)	gbm
`mod_gbm_0b`	gbm model with covariates of all sampled data (25 sites)
`gbm_0b_sum`	Estimated predictions with covariates from all sampled data (25 sites)	gbm
`gbm_0c_sum`	Estimated predictions with covariates from all sampled data (all sites)	gbm
`mod_dbarts_0`	bart model with covariates of all sampled ata (25 sites)
`dbarts_0_sum`	Estimated predictions from all sampled data (25 sites)	dbarts
`mod_dbarts_0b`	bart model with covariates of all sampled ata (25 sites)
`dbarts_0b_sum`	Estimated predictions with covariates from all sampled data (25 sites)	dbarts
`dbarts_0c_sum`	Estimated predictions with covariates from all data (all sites)	dbarts

`benthos_fixed_locs_obs_1`	Replaced sampled data (25 sites)
`mod_glmmTMB_1`	glmmTMB model of replaced sampled data (25 sites)
`glmmTMB_1_sum`	Estimated marginal means from replaced sampled data (25 sites)	glmmTMB
`mod_brms_1`	brms model of replaced sampled data (25 sites)
`brms_1_sum`	Estimated marginal means from replaced sampled data (25 sites)	brms
`mod_stan_1`	stan model of replaced sampled data (25 sites)
`stan_1_sum`	Estimated marginal means from replaced sampled data (25 sites)	stan
`mod_gbm_1`	gbm model of replaced sampled data (25 sites)
`gbm_1_sum`	Estimated predictions from replaced sampled data (25 sites)	gbm
`mod_gbm_1b`	gbm model with covariates of replaced sampled data (25 sites)
`gbm_1b_sum`	Estimated predictions with covariates from replaced sampled data (25 sites)	gbm
`gbm_1c_sum`	Estimated predictions with covariates from replaced sampled data (replaced sites)	gbm
`mod_dbarts_1`	bart model with covariates of replaced sampled ata (25 sites)
`dbarts_1_sum`	Estimated predictions from replaced sampled data (25 sites)	dbarts
`mod_dbarts_1b`	bart model with covariates of replaced sampled ata (25 sites)
`dbarts_1b_sum`	Estimated predictions with covariates from replaced sampled data (25 sites)	dbarts
`dbarts_1c_sum`	Estimated predictions with covariates from replaced data (all sites)	dbarts

`benthos_fixed_locs_obs_2`	Replaced sampled data (5 sites)
`mod_glmmTMB_2`	glmmTMB model of replaced sampled data (5 sites)
`glmmTMB_2_sum`	Estimated marginal means from replaced sampled data (5 sites)	glmmTMB
`mod_brms_2`	brms model of replaced sampled data (5 sites)
`brms_2_sum`	Estimated marginal means from replaced sampled data (5 sites)	brms
`mod_stan_2`	stan model of replaced sampled data (5 sites)
`stan_2_sum`	Estimated marginal means from replaced sampled data (5 sites)	stan
`mod_gbm_2`	gbm model of replaced sampled data (5 sites)
`gbm_2_sum`	Estimated predictions from replaced sampled data (5 sites)	gbm
`mod_gbm_2b`	gbm model with covariates of replaced sampled data (5 sites)
`gbm_2b_sum`	Estimated predictions with covariates from replaced sampled data (5 sites)	gbm
`gbm_2c_sum`	Estimated predictions with covariates from replaced sampled data (replaced sites)	gbm
`mod_dbarts_2`	bart model with covariates of replaced sampled ata (5 sites)
`dbarts_2_sum`	Estimated predictions from replaced sampled data (5 sites)	dbarts
`mod_dbarts_2b`	bart model with covariates of replaced sampled ata (5 sites)
`dbarts_2b_sum`	Estimated predictions with covariates from replaced sampled data (5 sites)	dbarts
`dbarts_2c_sum`	Estimated predictions with covariates from replaced data (all sites)	dbarts

Model	Description
Simple	Simple hierarchical means/medians

Show the code

      mod_glmmTMB_0 <- glmmTMB(cover ~ fYear + (1 | Site) + (1 | Transect),
        data = benthos_fixed_locs_obs_0,
        family = "beta_family"
      )
      saveRDS(mod_glmmTMB_0,
        file = paste0(data_path, "synthetic/mod_glmmTMB_0.rds")
      )

Show the code

      glmmTMB_0_sum <- 
        mod_glmmTMB_0 |> emmeans(~fYear, at = newdata_0, type = "response") |>
        as.data.frame() |> 
        rename(median = response, lower = asymp.LCL, upper = asymp.UCL) |>
        mutate(Year = as.numeric(as.character(fYear))) |>
        mutate(type = "glmmTMB")
      saveRDS(glmmTMB_0_sum,
        file = paste0(data_path, "synthetic/glmmTMB_0_sum.rds")
      )

Show the code

      g <- glmmTMB_0_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "glmmTMB")) +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Mean, colour = "all mean"), linetype = "dashed") +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Median, colour = "all median"), linetype = "dashed") +
        geom_line(data = all_sampled_sum,
          aes(x = Year, y = response, colour = type), linetype = "dashed") +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_pdp_mod_glmmTMB_0.png"
        ),
        g,
        width = 8, height = 6, dpi = 72
      )

sampled reefs (blue and green dashed lines) are lower than the true global average (red and khaki)
glmmTMB modelled estimates are consistent with the 25 site same data, yet biased lower than the “truth”

Show the code

      glmmTMB_0_dharma <- mod_glmmTMB_0 |> 
        simulateResiduals(n = 1000)
      saveRDS(glmmTMB_0_dharma,
        file = paste0(data_path, "synthetic/glmmTMB_0_dharma.rds")
      )

Show the code

      glmmTMB_0_dharma <- readRDS(
        file = paste0(data_path, "synthetic/glmmTMB_0_dharma.rds")
      )
      g <- wrap_elements(~testUniformity(glmmTMB_0_dharma)) +
        wrap_elements(~plotResiduals(glmmTMB_0_dharma)) +
        wrap_elements(~testDispersion(glmmTMB_0_dharma))
      ggsave(
        filename = paste0(
          fig_path, "R_dharma_mod_glmmTMB_0.png"
        ),
        g,
        width = 10, height = 4, dpi = 72
      )

Show the code

mod_glmmTMB_0 <- readRDS(
  file = paste0(data_path, "synthetic/mod_glmmTMB_0.rds")
)
mod_glmmTMB_0 |> summary()

 Family: beta  ( logit )
Formula:          cover ~ fYear + (1 | Site) + (1 | Transect)
Data: benthos_fixed_locs_obs_0

      AIC       BIC    logLik -2*log(L)  df.resid 
 -13130.3  -13040.2    6580.2  -13160.3      2985 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 Site     (Intercept) 0.08491  0.2914  
 Transect (Intercept) 0.02920  0.1709  
Number of obs: 3000, groups:  Site, 50; Transect, 250

Dispersion parameter for beta family ():  320 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.26554    0.04346 -29.122  < 2e-16 ***
fYear2       0.01799    0.01205   1.493   0.1353    
fYear3       0.04983    0.01199   4.154 3.26e-05 ***
fYear4       0.10552    0.01192   8.853  < 2e-16 ***
fYear5       0.14555    0.01186  12.275  < 2e-16 ***
fYear6       0.18332    0.01180  15.539  < 2e-16 ***
fYear7       0.21699    0.01176  18.457  < 2e-16 ***
fYear8       0.18269    0.01180  15.486  < 2e-16 ***
fYear9       0.03129    0.01201   2.605   0.0092 ** 
fYear10     -0.06817    0.01218  -5.596 2.19e-08 ***
fYear11     -0.02343    0.01211  -1.935   0.0530 .  
fYear12     -0.02145    0.01212  -1.770   0.0767 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show the code

      benthos_fixed_locs_obs_0 |>
        group_by(fYear) |>
        summarise(mean_abundance = qlogis(mean(cover)),
          median_abundance = qlogis(median(cover)),
          sd_abundance = sd(qlogis(cover)) ,
          mad_abundance = mad(qlogis(cover)) 
        ) 
      priors <- c(
        prior(normal(0, 0.5), class = "b"),
        prior(normal(-1.5, 0.5), class = "Intercept"),
        prior(student_t(3, 0, 0.5), class = "sd")
      )
      mod_form <- bf(cover ~ fYear + (1 | Site) + (1 | Transect),
        family = "Beta"
      )

Show the code

      mod_brms_0 <- brm(mod_form,
        data = benthos_fixed_locs_obs_0,
        iter = 5000,
        warmup = 1000,
        chains = 3,
        cores = 3,
        prior = priors,
        thin =  5,
        control = list(adapt_delta = 0.99),
        backend = "cmdstanr"
      )
      saveRDS(mod_brms_0,
        file = paste0(data_path, "synthetic/mod_brms_0.rds")
      )

Show the code

      brms_0_sum <- 
        mod_brms_0 |> emmeans(~fYear, at = newdata_0, type = "response") |>
        as.data.frame() |> 
        rename(median = response, lower = lower.HPD, upper = upper.HPD) |>
        mutate(Year = as.numeric(as.character(fYear))) |>
        mutate(type = "brms")
      saveRDS(brms_0_sum,
        file = paste0(data_path, "synthetic/brms_0_sum.rds")
      )

Show the code

      brms_0_sum <- readRDS(
        file = paste0(data_path, "synthetic/brms_0_sum.rds")
      )
      g <- 
        brms_0_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "brms")) +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Mean, colour = "all mean"), linetype = "dashed") +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Median, colour = "all median"), linetype = "dashed") +
        geom_line(data = all_sampled_sum,
          aes(x = Year, y = response, colour = type), linetype = "dashed") +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_pdp_mod_brms_0.png"
        ),
        g,
        width = 8, height = 6, dpi = 72
      )

sampled reefs (blue and green dashed lines) are lower than the true global average (red and khaki)
brms modelled estimates are consistent with the 25 site same data, yet biased lower than the “truth”

Show the code

      mod_brms_0 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_0.rds")
      )
      vars <- mod_brms_0 |>
        brms::variables() |>
        str_subset("^b.*")
      g <- mod_brms_0$fit |> stan_trace(pars = vars)
      ggsave(
        filename = paste0(
          fig_path, "R_brms_trace_0.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_brms_0 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_0.rds")
      )
      vars <- mod_brms_0 |>
        brms::variables() |>
        str_subset("^b.*")
      g <- mod_brms_0$fit |> stan_ac(pars = vars)
      ggsave(
        filename = paste0(
          fig_path, "R_brms_ac_0.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_brms_0 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_0.rds")
      )
      g <- mod_brms_0$fit |> stan_rhat()
      ggsave(
        filename = paste0(
          fig_path, "R_brms_rhat_0.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_brms_0 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_0.rds")
      )
      g <- mod_brms_0$fit |> stan_ess()
      ggsave(
        filename = paste0(
          fig_path, "R_brms_ess_0.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_brms_0 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_0.rds")
      )
      g <- mod_brms_0 |> pp_check( type='dens_overlay', ndraws=100)
      ggsave(
        filename = paste0(
          fig_path, "R_brms_ppc_0.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

mod_brms_0 <- readRDS(
  file = paste0(data_path, "synthetic/mod_brms_0.rds")
)
mod_brms_0 |> summary()

 Family: beta 
  Links: mu = logit; phi = identity 
Formula: cover ~ fYear + (1 | Site) + (1 | Transect) 
   Data: benthos_fixed_locs_obs_0 (Number of observations: 3000) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Multilevel Hyperparameters:
~Site (Number of levels: 50) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.30      0.03     0.24     0.37 1.00     2084     2289

~Transect (Number of levels: 250) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.17      0.01     0.15     0.19 1.00     2193     2210

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.27      0.04    -1.35    -1.18 1.00     1643     2218
fYear2        0.02      0.01    -0.01     0.04 1.00     2004     2227
fYear3        0.05      0.01     0.03     0.07 1.00     2036     2327
fYear4        0.10      0.01     0.08     0.13 1.00     2206     2315
fYear5        0.15      0.01     0.12     0.17 1.00     2127     2197
fYear6        0.18      0.01     0.16     0.21 1.00     2213     2212
fYear7        0.22      0.01     0.19     0.24 1.00     2159     2181
fYear8        0.18      0.01     0.16     0.21 1.00     2177     2452
fYear9        0.03      0.01     0.01     0.06 1.00     2189     2136
fYear10      -0.07      0.01    -0.09    -0.04 1.00     2052     2191
fYear11      -0.02      0.01    -0.05     0.00 1.00     2272     2281
fYear12      -0.02      0.01    -0.05     0.00 1.00     2336     2369

Further Distributional Parameters:
    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi   317.91      8.51   301.48   334.14 1.00     2363     2265

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Show the code

      benthos_fixed_locs_obs_0 <-
        benthos_fixed_locs_obs_0 |>
        mutate(
          cYear = fYear,
          grid_id = factor(Reef),
          cSite = factor(interaction(Reef, Site)),
          cReplicate = ifelse(is.na(Transect),
            interaction(Site, Year),
            interaction(cSite, Transect)),
          Cover = cover,
          area = 1,
          sum = 1
        )
      saveRDS(benthos_fixed_locs_obs_0,
        file = paste0(data_path, "synthetic/saveRDS(benthos_fixed_locs_obs_0_forstan.rds")
      ) 
      stan_data <- prepare_data_for_stan(benthos_fixed_locs_obs_0, yrs = NULL)
      model_stan <- cmdstanr::cmdstan_model(stan_file = "model1.stan")
      ## model_stan <- cmdstanr::cmdstan_model(stan_file = "mod1a.stan")
      ## model_stan <- cmdstanr::cmdstan_model(stan_file = "mod2a.stan")

Show the code

      mod_stan_0 <- model_stan$sample(
        data = stan_data,
        seed = 123,
        iter_sampling = 5000,
        iter_warmup = 1000,
        thin = 5,
        chains = 3,
        parallel_chains = 3,
        adapt_delta = 0.99,
        output_dir = paste0(data_path, "synthetic/"),
      )
      saveRDS(mod_stan_0,
        file = paste0(data_path, "synthetic/mod_stan_0.rds")
      )

Show the code

      stan_0_sum <-
        mod_stan_0$draws(variables = "cellmeans") |>
        posterior::as_draws_df() |>
        posterior::summarise_draws(
          median,
          HDInterval::hdi,
          ~ HDInterval::hdi(., credMass = c(0.9)),
          rhat,
          ess_bulk,
          ess_tail
        ) |>
        rename(lower_90 = V4, upper_90 = V5) |>
        bind_cols(newdata_0) |>
        mutate(Year = as.numeric(as.character(fYear)))
      saveRDS(stan_0_sum,
        file = paste0(data_path, "synthetic/stan_0_sum.rds")
      )

Show the code

      stan_0_sum <- readRDS(
        file = paste0(data_path, "synthetic/stan_0_sum.rds")
      )
      g <- stan_0_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "stan")) +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Mean, colour = "all mean"), linetype = "dashed") +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Median, colour = "all median"), linetype = "dashed") +
        geom_line(data = all_sampled_sum,
          aes(x = Year, y = response, colour = type), linetype = "dashed") +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_pdp_mod_stan_0.png"
        ),
        g,
        width = 8, height = 6, dpi = 72
      )

sampled reefs (blue and green dashed lines) are lower than the true global average (red and khaki)
stan modelled estimates are consistent with the 25 site same data, yet biased lower than the “truth”

Show the code

      mod_stan_0 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_0.rds")
      )
      color_scheme_set("viridis")
      g <-
        mod_stan_0$draws(variables = c("beta", "phi", "sd_1", "sd_2", "sd_3")) |>
        mcmc_trace() +
        theme_minimal()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_trace_0.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_stan_0 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_0.rds")
      )
      color_scheme_set("viridis")
      g <-
        mod_stan_0$draws(variables = c("beta", "phi", "sd_1", "sd_2", "sd_3")) |>
        mcmc_acf() +
        theme_minimal()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_ac_0.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_stan_0 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_0.rds")
      )
      color_scheme_set("viridis")
      g <-
        mod_stan_0 |> bayesplot::rhat() |> 
        mcmc_rhat_hist() +
        theme_minimal()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_rhat_0.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_stan_0 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_0.rds")
      )
      color_scheme_set("viridis")
      g <-
        mod_stan_0 |> bayesplot::neff_ratio() |> 
        mcmc_neff_hist() +
        theme_minimal()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_ess_0.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_stan_0 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_0.rds")
      )
      g <- 
        bayesplot::pp_check(
          benthos_fixed_locs_obs_0$cover,
          mod_stan_0$draws("ypred", format = "matrix")[1:100, ],
          ppc_dens_overlay
        ) +
        theme_classic()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_ppc_0.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

mod_stan_0 <- readRDS(
  file = paste0(data_path, "synthetic/mod_stan_0.rds")
)
mod_stan_0$print(
  variables = c("phi", "beta", "sd_1", "sd_2", "sd_3"),
  max_rows = 20
)

 variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
 phi      317.92 317.93 8.69 8.74 303.53 331.90 1.00     2854     2685
 beta[1]   -1.26  -1.26 0.05 0.04  -1.33  -1.18 1.00     2148     2386
 beta[2]   -1.24  -1.24 0.05 0.05  -1.32  -1.17 1.00     2210     2406
 beta[3]   -1.21  -1.21 0.05 0.05  -1.28  -1.13 1.00     2205     2393
 beta[4]   -1.15  -1.15 0.05 0.05  -1.23  -1.08 1.00     2164     2251
 beta[5]   -1.11  -1.11 0.05 0.05  -1.19  -1.04 1.00     2186     2374
 beta[6]   -1.08  -1.08 0.05 0.05  -1.15  -1.00 1.00     2149     2254
 beta[7]   -1.04  -1.04 0.05 0.05  -1.12  -0.97 1.00     2164     2571
 beta[8]   -1.08  -1.08 0.05 0.05  -1.15  -1.00 1.00     2168     2299
 beta[9]   -1.23  -1.23 0.05 0.05  -1.30  -1.15 1.00     2170     2365
 beta[10]  -1.33  -1.32 0.05 0.05  -1.40  -1.25 1.00     2174     2289
 beta[11]  -1.28  -1.28 0.05 0.05  -1.36  -1.21 1.00     2168     2391
 beta[12]  -1.28  -1.28 0.05 0.05  -1.35  -1.20 1.00     2206     2487
 sd_1[1]    0.30   0.29 0.05 0.05   0.22   0.39 1.00     2892     2909
 sd_2[1]    0.09   0.09 0.03 0.02   0.05   0.14 1.00     1525     1450
 sd_3[1]    0.17   0.17 0.01 0.01   0.16   0.19 1.00     2469     2860

Show the code

mod_stan_0$print(
  variables = "cellmeans",
  max_rows = 20
)

      variable mean median   sd  mad   q5  q95 rhat ess_bulk ess_tail
 cellmeans[1]  0.22   0.22 0.01 0.01 0.21 0.23 1.00     2148     2386
 cellmeans[2]  0.22   0.22 0.01 0.01 0.21 0.24 1.00     2210     2393
 cellmeans[3]  0.23   0.23 0.01 0.01 0.22 0.24 1.00     2205     2393
 cellmeans[4]  0.24   0.24 0.01 0.01 0.23 0.25 1.00     2164     2251
 cellmeans[5]  0.25   0.25 0.01 0.01 0.23 0.26 1.00     2186     2374
 cellmeans[6]  0.25   0.25 0.01 0.01 0.24 0.27 1.00     2149     2254
 cellmeans[7]  0.26   0.26 0.01 0.01 0.25 0.28 1.00     2164     2571
 cellmeans[8]  0.25   0.25 0.01 0.01 0.24 0.27 1.00     2168     2299
 cellmeans[9]  0.23   0.23 0.01 0.01 0.21 0.24 1.00     2170     2365
 cellmeans[10] 0.21   0.21 0.01 0.01 0.20 0.22 1.00     2174     2289
 cellmeans[11] 0.22   0.22 0.01 0.01 0.20 0.23 1.00     2168     2391
 cellmeans[12] 0.22   0.22 0.01 0.01 0.21 0.23 1.00     2207     2487

Show the code

      mod_gbm_0 <- gbm(cover ~ fYear,
        data =  benthos_fixed_locs_obs_0,
        distribution = "gaussian",
        n.trees = 10000,
        interaction.depth = 5,
        shrinkage = 0.001,
        bag.fraction = 0.5,
        cv.folds = 5,
        verbose = TRUE
      )
      saveRDS(mod_gbm_0,
        file = paste0(data_path, "synthetic/mod_gbm_0.rds")
      )

Show the code

      n.trees <- gbm.perf(mod_gbm_0, method = "cv")

Partial dependency plot

Show the code

      gbm_0_sum <- readRDS(
        file = paste0(data_path, "synthetic/gbm_0_sum.rds")
      )
      g <-
        gbm_0_sum |>
        ggplot() +
        geom_line(aes(x = Year, y = median, color = "gbm")) +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Mean, colour = "all mean"), linetype = "dashed") +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Median, colour = "all median"), linetype = "dashed") +
        geom_line(data = all_sampled_sum,
          aes(x = Year, y = response, colour = type), linetype = "dashed") +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_pdp_mod_gbm_0.png"
        ),
        g,
        width = 8, height = 6, dpi = 72
      )

sampled reefs (blue and green dashed lines) are lower than the true global average (red and khaki)
gbm modelled estimates are consistent with the 25 site same data, yet biased lower than the “truth”
gbm derived predictions lack uncertainty estimates

Show the code

      mod_gbm_0b <- gbm(cover ~ fYear + Latitude + Longitude + CYC + DHW + OTHER,
        data =  benthos_fixed_locs_obs_0,
        distribution = "gaussian",
        var.monotone = c(0, 0, 0, -1, -1, -1),
        n.trees = 10000,
        interaction.depth = 5,
        shrinkage = 0.001,
        bag.fraction = 0.5,
        cv.folds = 5,
        verbose = TRUE
      )
      saveRDS(mod_gbm_0b,
        file = paste0(data_path, "synthetic/mod_gbm_0b.rds")
      )

Show the code

      n.trees <- gbm.perf(mod_gbm_0b, method = "cv")

Show the code

      newdata_0b <- benthos_fixed_locs_obs_0
      gbm_0b_sum <- newdata_0b |>
        mutate(median = predict(mod_gbm_0b, newdata_0b, n.trees = n.trees, type = "response")) |>
        mutate(Year = as.numeric(as.character(fYear))) |>
        ## group_by(Year, Site, Transect) |>
        ## summarise(median = median(median)) |>
        ## group_by(Year, Site, .add = FALSE) |>
        ## summarise(median = median(median)) |>
        group_by(Year, .add = FALSE) |> 
        summarise(median = median(median)) |> 
        mutate(type = "gbm")
      saveRDS(gbm_0b_sum,
        file = paste0(data_path, "synthetic/gbm_0b_sum.rds")
      )

sampled reefs (blue and green dashed lines) are lower than the true global average (red and khaki)
gbm modelled estimates are consistent with the 25 site same data, yet biased lower than the “truth”
gbm derived predictions lack uncertainty estimates
relative influences mainly space/time

Show the code

      dbarts_0c_sum <- preds_0c |> 
        t() |>
        as_tibble(.name_repair = ~ paste0("V", seq_along(.))) |>
        bind_cols(benthos_reefs_sf) |>
        pivot_longer(
          cols = matches("^V[0-9]*"),
          names_to = ".draw", values_to = "fit"
        ) |>
        group_by(Year, .draw) |>
        summarise(fit = mean(fit)) |>
        ## dplyr::select(Year, fit, .draw) |> 
        group_by(Year, .add = FALSE) |> 
        summarise_draws(median, HDInterval::hdi)
      saveRDS(dbarts_0c_sum,
        file = paste0(data_path, "synthetic/dbarts_0c_sum.rds")
      )

sampled reefs (blue and green dashed lines) are lower than the true global average (red and khaki)
gbm modelled estimates are consistent with the 25 site same data, yet biased lower than the “truth”
gbm derived predictions lack uncertainty estimates
relative influences mainly space/time

Show the code

      print(head(benthos_fixed_locs_obs_0))
      mod_dbarts_0 <- bart2(log(cover) ~ fYear,
        data =   benthos_fixed_locs_obs_0,
        keepTrees = TRUE
      )
      saveRDS(mod_dbarts_0,
        file = paste0(data_path, "synthetic/mod_dbarts_0.rds")
      )

Partial dependency plot

Show the code

      dbarts_0b_sum <- preds |> 
        t() |>
        as_tibble(.name_repair = ~ paste0("V", seq_along(.))) |>
        bind_cols(benthos_fixed_locs_obs_0) |>
        pivot_longer(
          cols = matches("^V[0-9]*"),
          names_to = ".draw", values_to = "fit"
        ) |>
        group_by(Year, .draw) |>
        summarise(fit = mean(fit)) |>
        ## dplyr::select(Year, fit, .draw) |> 
        group_by(Year, .add = FALSE) |> 
        summarise_draws(median, HDInterval::hdi)
      saveRDS(dbarts_0b_sum,
        file = paste0(data_path, "synthetic/dbarts_0b_sum.rds")
      )

sampled reefs (blue and green dashed lines) are lower than the true global average (red and khaki)
gbm modelled estimates are consistent with the 25 site same data, yet biased lower than the “truth”
gbm derived predictions lack uncertainty estimates

Show the code

      mod_dbarts_0b <- bart2(log(cover) ~ fYear + Latitude + Longitude + CYC + DHW + OTHER,
        data =   benthos_fixed_locs_obs_0,
        keepTrees = TRUE
      )
      saveRDS(mod_dbarts_0b,
        file = paste0(data_path, "synthetic/mod_dbarts_0b.rds")
      )

sampled reefs (blue and green dashed lines) are lower than the true global average (red and khaki)
dbarts modelled estimates are consistent with the 25 site same data, yet biased lower than the “truth”

sampled reefs (blue and green dashed lines) are lower than the true global average (red and khaki)
dbarts modelled estimates are slightly elevated above the 25 site same data, yet still biased lower than the “truth”

Show the code

      mod_glmmTMB_1 <- glmmTMB(cover ~ fYear + (1 | Site) + (1 | Transect),
        data = benthos_fixed_locs_obs_1,
        family = "beta_family"
      )
      saveRDS(mod_glmmTMB_1,
        file = paste0(data_path, "synthetic/mod_glmmTMB_1.rds")
      )

Show the code

      glmmTMB_1_sum <- 
        mod_glmmTMB_1 |> emmeans(~fYear, at = newdata_1, type = "response") |>
        as.data.frame() |> 
        rename(median = response, lower = asymp.LCL, upper = asymp.UCL) |>
        mutate(Year = as.numeric(as.character(fYear))) |>
        mutate(type = "glmmTMB")
      saveRDS(glmmTMB_1_sum,
        file = paste0(data_path, "synthetic/glmmTMB_1_sum.rds")
      )

Show the code

      g1 <- glmmTMB_1_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "glmmTMB")) +
        geom_line(data = mod_simple_1,
          aes(x = Year, y = Mean, colour = "simple data mean"), linetype = "dashed") +
        geom_line(data = mod_simple_1,
          aes(x = Year, y = Median, colour = "simple data median"), linetype = "dashed") +
        ## geom_line(data = all_sampled_sum,
        ##   aes(x = Year, y = response, colour = type), linetype = "dashed") +
        theme_bw()
      g2 <- glmmTMB_1_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "glmmTMB")) +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Mean, colour = "true mean"), linetype = "dashed") +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Median, colour = "true median"), linetype = "dashed") +
        geom_line(data = all_sampled_sum,
          aes(x = Year, y = response, colour = type), linetype = "dashed") +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_pdp_mod_glmmTMB_1.png"
        ),
        g1 + g2,
        width = 12, height = 6, dpi = 72
      )

left figure compares glmmTMB modelled estimates to the sample data
right figure compares glmmTMB modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and khaki dashed lines) are lower than the true global average (blue and purple) - right side figure
glmmTMB modelled estimates are consistent with the 25 site, replacement data, yet biased lower than the “truth”

Show the code

      glmmTMB_1_dharma <- mod_glmmTMB_1 |> 
        simulateResiduals(n = 1000)
      saveRDS(glmmTMB_1_dharma,
        file = paste0(data_path, "synthetic/glmmTMB_1_dharma.rds")
      )

Show the code

      glmmTMB_1_dharma <- readRDS(
        file = paste0(data_path, "synthetic/glmmTMB_1_dharma.rds")
      )
      g <- wrap_elements(~testUniformity(glmmTMB_1_dharma)) +
        wrap_elements(~plotResiduals(glmmTMB_1_dharma)) +
        wrap_elements(~testDispersion(glmmTMB_1_dharma))
      ggsave(
        filename = paste0(
          fig_path, "R_dharma_mod_glmmTMB_1.png"
        ),
        g,
        width = 10, height = 4, dpi = 72
      )

Show the code

mod_glmmTMB_1 <- readRDS(
  file = paste0(data_path, "synthetic/mod_glmmTMB_1.rds")
)
mod_glmmTMB_1 |> summary()

 Family: beta  ( logit )
Formula:          cover ~ fYear + (1 | Site) + (1 | Transect)
Data: benthos_fixed_locs_obs_1

      AIC       BIC    logLik -2*log(L)  df.resid 
 -12668.6  -12579.1    6349.3  -12698.6      2865 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 Site     (Intercept) 0.08257  0.2874  
 Transect (Intercept) 0.02895  0.1701  
Number of obs: 2880, groups:  Site, 50; Transect, 250

Dispersion parameter for beta family ():  328 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.25922    0.04293 -29.331  < 2e-16 ***
fYear2       0.01431    0.01219   1.174 0.240579    
fYear3       0.04112    0.01215   3.384 0.000714 ***
fYear4       0.09460    0.01208   7.834 4.73e-15 ***
fYear5       0.13212    0.01202  10.993  < 2e-16 ***
fYear6       0.16764    0.01196  14.017  < 2e-16 ***
fYear7       0.19975    0.01192  16.758  < 2e-16 ***
fYear8       0.16788    0.01196  14.039  < 2e-16 ***
fYear9       0.01501    0.01219   1.232 0.217997    
fYear10     -0.07794    0.01234  -6.318 2.64e-10 ***
fYear11     -0.03260    0.01226  -2.658 0.007852 ** 
fYear12     -0.03354    0.01228  -2.731 0.006320 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show the code

      benthos_fixed_locs_obs_1 |>
        group_by(fYear) |>
        summarise(mean_abundance = qlogis(mean(cover)),
          median_abundance = qlogis(median(cover)),
          sd_abundance = sd(qlogis(cover)) ,
          mad_abundance = mad(qlogis(cover)) 
        ) 
      priors <- c(
        prior(normal(0, 0.5), class = "b"),
        prior(normal(-1.5, 0.5), class = "Intercept"),
        prior(student_t(3, 0, 0.5), class = "sd")
      )
      mod_form <- bf(cover ~ fYear + (1 | Site) + (1 | Transect),
        family = "Beta"
      )

Show the code

      mod_brms_1 <- brm(mod_form,
        data = benthos_fixed_locs_obs_1,
        iter = 5000,
        warmup = 1000,
        chains = 3,
        cores = 3,
        prior = priors,
        thin =  5,
        control = list(adapt_delta = 0.99),
        backend = "cmdstanr"
      )
      saveRDS(mod_brms_1,
        file = paste0(data_path, "synthetic/mod_brms_1.rds")
      )

Show the code

      brms_1_sum <- 
        mod_brms_1 |> emmeans(~fYear, at = newdata_1, type = "response") |>
        as.data.frame() |> 
        rename(median = response, lower = lower.HPD, upper = upper.HPD) |>
        mutate(Year = as.numeric(as.character(fYear))) |>
        mutate(type = "brms")
      saveRDS(brms_1_sum,
        file = paste0(data_path, "synthetic/brms_1_sum.rds")
      )

Show the code

      brms_1_sum <- readRDS(
        file = paste0(data_path, "synthetic/brms_1_sum.rds")
      )
      g1 <- 
        brms_1_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "brms")) +
        geom_line(data = mod_simple_1,
          aes(x = Year, y = Mean, colour = "simple data mean"), linetype = "dashed") +
        geom_line(data = mod_simple_1,
          aes(x = Year, y = Median, colour = "simple data median"), linetype = "dashed") +
        theme_bw()
      g2 <- 
        brms_1_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "brms")) +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Mean, colour = "all mean"), linetype = "dashed") +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Median, colour = "all median"), linetype = "dashed") +
        geom_line(data = all_sampled_sum,
          aes(x = Year, y = response, colour = type), linetype = "dashed") +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_pdp_mod_brms_1.png"
        ),
        g1 + g2,
        width = 12, height = 6, dpi = 72
      )

left figure compares brms modelled estimates to the sample data
right figure compares brms modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and green dashed lines) are lower than the true global average (orange and khaki) - right side figure
brms modelled estimates are consistent with the 25 site, replacement data, yet biased lower than the “truth”

Show the code

      mod_brms_1 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_1.rds")
      )
      vars <- mod_brms_1 |>
        brms::variables() |>
        str_subset("^b.*")
      g <- mod_brms_1$fit |> stan_trace(pars = vars)
      ggsave(
        filename = paste0(
          fig_path, "R_brms_trace_1.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_brms_1 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_1.rds")
      )
      vars <- mod_brms_1 |>
        brms::variables() |>
        str_subset("^b.*")
      g <- mod_brms_1$fit |> stan_ac(pars = vars)
      ggsave(
        filename = paste0(
          fig_path, "R_brms_ac_1.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_brms_1 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_1.rds")
      )
      g <- mod_brms_1$fit |> stan_rhat()
      ggsave(
        filename = paste0(
          fig_path, "R_brms_rhat_1.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_brms_1 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_1.rds")
      )
      g <- mod_brms_1$fit |> stan_ess()
      ggsave(
        filename = paste0(
          fig_path, "R_brms_ess_1.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_brms_1 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_1.rds")
      )
      g <- mod_brms_1 |> pp_check( type='dens_overlay', ndraws=100)
      ggsave(
        filename = paste0(
          fig_path, "R_brms_ppc_1.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

mod_brms_1 <- readRDS(
  file = paste0(data_path, "synthetic/mod_brms_1.rds")
)
mod_brms_1 |> summary()

 Family: beta 
  Links: mu = logit; phi = identity 
Formula: cover ~ fYear + (1 | Site) + (1 | Transect) 
   Data: benthos_fixed_locs_obs_1 (Number of observations: 2880) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Multilevel Hyperparameters:
~Site (Number of levels: 50) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.30      0.03     0.24     0.38 1.00     2200     2249

~Transect (Number of levels: 250) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.17      0.01     0.15     0.19 1.00     2226     2052

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.26      0.04    -1.35    -1.17 1.00     2547     2162
fYear2        0.01      0.01    -0.01     0.04 1.00     2269     2234
fYear3        0.04      0.01     0.02     0.07 1.00     2351     2286
fYear4        0.09      0.01     0.07     0.12 1.00     2536     2363
fYear5        0.13      0.01     0.11     0.16 1.00     2438     2411
fYear6        0.17      0.01     0.14     0.19 1.00     2490     2057
fYear7        0.20      0.01     0.18     0.22 1.00     2513     2060
fYear8        0.17      0.01     0.14     0.19 1.00     2499     2288
fYear9        0.01      0.01    -0.01     0.04 1.00     2467     2180
fYear10      -0.08      0.01    -0.10    -0.05 1.00     2441     2345
fYear11      -0.03      0.01    -0.06    -0.01 1.00     2368     2409
fYear12      -0.03      0.01    -0.06    -0.01 1.00     2514     2325

Further Distributional Parameters:
    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi   325.42      8.69   308.35   342.07 1.00     1910     2056

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Show the code

      benthos_fixed_locs_obs_1 <-
        benthos_fixed_locs_obs_1 |>
        mutate(
          cYear = fYear,
          grid_id = factor(Reef),
          cSite = factor(interaction(Reef, Site)),
          cReplicate = ifelse(is.na(Transect),
            interaction(Site, Year),
            interaction(cSite, Transect)),
          Cover = cover,
          area = 1,
          sum = 1
        )
      saveRDS(benthos_fixed_locs_obs_1,
        file = paste0(data_path, "synthetic/saveRDS(benthos_fixed_locs_obs_1_forstan.rds")
      ) 
      stan_data <- prepare_data_for_stan(benthos_fixed_locs_obs_1, yrs = NULL)
      model_stan <- cmdstanr::cmdstan_model(stan_file = "model1.stan")
      ## model_stan <- cmdstanr::cmdstan_model(stan_file = "mod1a.stan")
      ## model_stan <- cmdstanr::cmdstan_model(stan_file = "mod2a.stan")

Show the code

      mod_stan_1 <- model_stan$sample(
        data = stan_data,
        seed = 123,
        iter_sampling = 5000,
        iter_warmup = 1000,
        thin = 5,
        chains = 3,
        parallel_chains = 3,
        adapt_delta = 0.99,
        output_dir = paste0(data_path, "synthetic/"),
      )
      saveRDS(mod_stan_1,
        file = paste0(data_path, "synthetic/mod_stan_1.rds")
      )

Show the code

      stan_1_sum <-
        mod_stan_1$draws(variables = "cellmeans") |>
        posterior::as_draws_df() |>
        posterior::summarise_draws(
          median,
          HDInterval::hdi,
          ~ HDInterval::hdi(., credMass = c(0.9)),
          rhat,
          ess_bulk,
          ess_tail
        ) |>
        rename(lower_90 = V4, upper_90 = V5) |>
        bind_cols(newdata_1) |>
        mutate(Year = as.numeric(as.character(fYear)))
      saveRDS(stan_1_sum,
        file = paste0(data_path, "synthetic/stan_1_sum.rds")
      )

Show the code

      stan_1_sum <- readRDS(
        file = paste0(data_path, "synthetic/stan_1_sum.rds")
      )
      g1 <- stan_1_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "stan")) +
        geom_line(data = mod_simple_1,
          aes(x = Year, y = Mean, colour = "simple data mean"), linetype = "dashed") +
        geom_line(data = mod_simple_1,
          aes(x = Year, y = Median, colour = "simple data median"), linetype = "dashed") +
        theme_bw()
      g2 <- stan_1_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "stan")) +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Mean, colour = "all mean"), linetype = "dashed") +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Median, colour = "all median"), linetype = "dashed") +
        geom_line(data = all_sampled_sum,
          aes(x = Year, y = response, colour = type), linetype = "dashed") +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_pdp_mod_stan_1.png"
        ),
        g1 + g2,
        width = 12, height = 6, dpi = 72
      )

left figure compares stan modelled estimates to the sample data
right figure compares stan modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and green dashed lines) are lower than the true global average (orange and khaki) - right side figure
stan modelled estimates are consistent with the 25 site, replacement data, yet biased lower than the “truth”

Show the code

      mod_stan_1 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_1.rds")
      )
      color_scheme_set("viridis")
      g <-
        mod_stan_1$draws(variables = c("beta", "phi", "sd_1", "sd_2", "sd_3")) |>
        mcmc_trace() +
        theme_minimal()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_trace_1.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_stan_1 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_1.rds")
      )
      color_scheme_set("viridis")
      g <-
        mod_stan_1$draws(variables = c("beta", "phi", "sd_1", "sd_2", "sd_3")) |>
        mcmc_acf() +
        theme_minimal()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_ac_1.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_stan_1 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_1.rds")
      )
      color_scheme_set("viridis")
      g <-
        mod_stan_1 |> bayesplot::rhat() |> 
        mcmc_rhat_hist() +
        theme_minimal()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_rhat_1.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_stan_1 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_1.rds")
      )
      color_scheme_set("viridis")
      g <-
        mod_stan_1 |> bayesplot::neff_ratio() |> 
        mcmc_neff_hist() +
        theme_minimal()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_ess_1.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_stan_1 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_1.rds")
      )
      g <- 
        bayesplot::pp_check(
          benthos_fixed_locs_obs_1$cover,
          mod_stan_1$draws("ypred", format = "matrix")[1:100, ],
          ppc_dens_overlay
        ) +
        theme_classic()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_ppc_1.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

mod_stan_1 <- readRDS(
  file = paste0(data_path, "synthetic/mod_stan_1.rds")
)
mod_stan_1$print(
  variables = c("phi", "beta", "sd_1", "sd_2", "sd_3"),
  max_rows = 20
)

 variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
 phi      325.64 325.55 8.82 8.87 311.07 340.37 1.00     2928     2945
 beta[1]   -1.26  -1.26 0.04 0.04  -1.33  -1.19 1.00     1909     2397
 beta[2]   -1.25  -1.25 0.04 0.04  -1.32  -1.18 1.00     1876     2383
 beta[3]   -1.22  -1.22 0.04 0.04  -1.29  -1.15 1.00     1905     2370
 beta[4]   -1.17  -1.17 0.04 0.04  -1.24  -1.09 1.00     1932     2295
 beta[5]   -1.13  -1.13 0.04 0.04  -1.20  -1.06 1.00     1879     2198
 beta[6]   -1.10  -1.10 0.04 0.04  -1.17  -1.02 1.00     1899     2318
 beta[7]   -1.06  -1.06 0.04 0.04  -1.14  -0.99 1.00     1953     2448
 beta[8]   -1.10  -1.10 0.04 0.04  -1.17  -1.02 1.00     1920     2298
 beta[9]   -1.25  -1.25 0.04 0.04  -1.32  -1.17 1.00     1912     2453
 beta[10]  -1.34  -1.34 0.04 0.04  -1.41  -1.26 1.00     1880     2133
 beta[11]  -1.30  -1.30 0.04 0.04  -1.37  -1.22 1.00     1914     2238
 beta[12]  -1.30  -1.30 0.04 0.04  -1.37  -1.22 1.00     1835     2141
 sd_1[1]    0.29   0.29 0.05 0.05   0.22   0.38 1.00     2775     2808
 sd_2[1]    0.09   0.09 0.03 0.02   0.05   0.14 1.00     1503     1960
 sd_3[1]    0.17   0.17 0.01 0.01   0.16   0.19 1.00     2510     2905

Show the code

mod_stan_1$print(
  variables = "cellmeans",
  max_rows = 20
)

      variable mean median   sd  mad   q5  q95 rhat ess_bulk ess_tail
 cellmeans[1]  0.22   0.22 0.01 0.01 0.21 0.23 1.00     1909     2397
 cellmeans[2]  0.22   0.22 0.01 0.01 0.21 0.24 1.00     1876     2383
 cellmeans[3]  0.23   0.23 0.01 0.01 0.22 0.24 1.00     1905     2370
 cellmeans[4]  0.24   0.24 0.01 0.01 0.22 0.25 1.00     1932     2295
 cellmeans[5]  0.24   0.24 0.01 0.01 0.23 0.26 1.00     1879     2198
 cellmeans[6]  0.25   0.25 0.01 0.01 0.24 0.26 1.00     1899     2318
 cellmeans[7]  0.26   0.26 0.01 0.01 0.24 0.27 1.00     1953     2448
 cellmeans[8]  0.25   0.25 0.01 0.01 0.24 0.26 1.00     1920     2298
 cellmeans[9]  0.22   0.22 0.01 0.01 0.21 0.24 1.00     1912     2453
 cellmeans[10] 0.21   0.21 0.01 0.01 0.20 0.22 1.00     1880     2133
 cellmeans[11] 0.22   0.21 0.01 0.01 0.20 0.23 1.00     1914     2238
 cellmeans[12] 0.21   0.21 0.01 0.01 0.20 0.23 1.00     1835     2141

Show the code

      mod_gbm_1 <- gbm(cover ~ fYear,
        data =  benthos_fixed_locs_obs_1,
        distribution = "gaussian",
        n.trees = 10000,
        interaction.depth = 5,
        shrinkage = 0.001,
        bag.fraction = 0.5,
        cv.folds = 5,
        verbose = TRUE
      )
      saveRDS(mod_gbm_1,
        file = paste0(data_path, "synthetic/mod_gbm_1.rds")
      )

Show the code

      n.trees <- gbm.perf(mod_gbm_1, method = "cv")

Partial dependency plot

Show the code

      gbm_1_sum <- readRDS(
        file = paste0(data_path, "synthetic/gbm_1_sum.rds")
      )
      g1 <-
        gbm_1_sum |>
        ggplot() +
        geom_line(aes(x = Year, y = median, color = "gbm")) +
        geom_line(data = mod_simple_1,
          aes(x = Year, y = Mean, colour = "simple data mean"), linetype = "dashed") +
        geom_line(data = mod_simple_1,
          aes(x = Year, y = Median, colour = "simple data median"), linetype = "dashed") +
        theme_bw()
      g2 <-
        gbm_1_sum |>
        ggplot() +
        geom_line(aes(x = Year, y = median, color = "gbm")) +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Mean, colour = "all mean"), linetype = "dashed") +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Median, colour = "all median"), linetype = "dashed") +
        geom_line(data = all_sampled_sum,
          aes(x = Year, y = response, colour = type), linetype = "dashed") +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_pdp_mod_gbm_1.png"
        ),
        g1 + g2,
        width = 12, height = 6, dpi = 72
      )

left figure compares gbm modelled estimates to the sample data
right figure compares gbm modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and green dashed lines) are lower than the true global average (orange and khaki) - right side figure
gbm modelled estimates are consistent with the 25 site, replacement data, yet biased lower than the “truth”
gbm derived predictions lack uncertainty estimates

Show the code

      mod_gbm_1b <- gbm(cover ~ fYear + Latitude + Longitude + CYC + DHW + OTHER,
        data =  benthos_fixed_locs_obs_1,
        distribution = "gaussian",
        var.monotone = c(0, 0, 0, -1, -1, -1),
        n.trees = 10000,
        interaction.depth = 5,
        shrinkage = 0.001,
        bag.fraction = 0.5,
        cv.folds = 5,
        verbose = TRUE
      )
      saveRDS(mod_gbm_1b,
        file = paste0(data_path, "synthetic/mod_gbm_1b.rds")
      )

Show the code

      n.trees <- gbm.perf(mod_gbm_1b, method = "cv")

Show the code

      newdata_1b <- benthos_fixed_locs_obs_1
      gbm_1b_sum <- newdata_1b |>
        mutate(median = predict(mod_gbm_1b, newdata_1b, n.trees = n.trees, type = "response")) |>
        mutate(Year = as.numeric(as.character(fYear))) |>
        ## group_by(Year, Site, Transect) |>
        ## summarise(median = median(median)) |>
        ## group_by(Year, Site, .add = FALSE) |>
        ## summarise(median = median(median)) |>
        group_by(Year, .add = FALSE) |> 
        summarise(median = median(median)) |> 
        mutate(type = "gbm")
      saveRDS(gbm_1b_sum,
        file = paste0(data_path, "synthetic/gbm_1b_sum.rds")
      )

left figure compares gbm modelled estimates to the sample data
right figure compares gbm modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and green dashed lines) are lower than the true global average (orange and khaki) - right side figure
gbm modelled estimates are consistent with the 25 site, replacement data, yet biased lower than the “truth”
gbm derived predictions lack uncertainty estimates
relative influences mainly space/time

Show the code

      dbarts_1c_sum <- preds_1c |> 
        t() |>
        as_tibble(.name_repair = ~ paste0("V", seq_along(.))) |>
        bind_cols(benthos_reefs_sf) |>
        pivot_longer(
          cols = matches("^V[0-9]*"),
          names_to = ".draw", values_to = "fit"
        ) |>
        group_by(Year, .draw) |>
        summarise(fit = mean(fit)) |>
        ## dplyr::select(Year, fit, .draw) |> 
        group_by(Year, .add = FALSE) |> 
        summarise_draws(median, HDInterval::hdi)
      saveRDS(dbarts_1c_sum,
        file = paste0(data_path, "synthetic/dbarts_1c_sum.rds")
      )

left figure compares gbm modelled estimates to the sample data
right figure compares gbm modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and green dashed lines) are lower than the true global average (orange and khaki) - right side figure
gbm modelled estimates are consistent with the 25 site, replacement data, yet biased lower than the “truth”
gbm derived predictions lack uncertainty estimates
relative influences mainly space/time (though note that this is because this partial is the result of the same model, just predicted onto the full data grid)

Show the code

      print(head(benthos_fixed_locs_obs_1))
      mod_dbarts_1 <- bart2(log(cover) ~ fYear,
        data =   benthos_fixed_locs_obs_1,
        keepTrees = TRUE
      )
      saveRDS(mod_dbarts_1,
        file = paste0(data_path, "synthetic/mod_dbarts_1.rds")
      )

Partial dependency plot

Show the code

      dbarts_1b_sum <- preds |> 
        t() |>
        as_tibble(.name_repair = ~ paste0("V", seq_along(.))) |>
        bind_cols(benthos_fixed_locs_obs_1) |>
        pivot_longer(
          cols = matches("^V[0-9]*"),
          names_to = ".draw", values_to = "fit"
        ) |>
        group_by(Year, .draw) |>
        summarise(fit = mean(fit)) |>
        ## dplyr::select(Year, fit, .draw) |> 
        group_by(Year, .add = FALSE) |> 
        summarise_draws(median, HDInterval::hdi)
      saveRDS(dbarts_1b_sum,
        file = paste0(data_path, "synthetic/dbarts_1b_sum.rds")
      )

left figure compares dbarts modelled estimates to the sample data
right figure compares dbarts modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and green dashed lines) are lower than the true global average (orange and khaki) - right side figure
dbarts modelled estimates are slightly lower than the 25 site, replacement data, and thus biased lower than the “truth”

Show the code

      mod_dbarts_1b <- bart2(log(cover) ~ fYear + Latitude + Longitude + CYC + DHW + OTHER,
        data =   benthos_fixed_locs_obs_1,
        keepTrees = TRUE
      )
      saveRDS(mod_dbarts_1b,
        file = paste0(data_path, "synthetic/mod_dbarts_1b.rds")
      )

left figure compares dbarts modelled estimates to the sample data
right figure compares dbarts modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and green dashed lines) are lower than the true global average (orange and khaki) - right side figure
dbarts modelled estimates are consistent with the 25 site, replacement data, yet biased lower than the “truth”
relative influences mainly space/time

left figure compares dbarts modelled estimates to the sample data
right figure compares dbarts modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and green dashed lines) are lower than the true global average (orange and khaki) - right side figure
dbarts modelled estimates are slightly higher than the 25 site, replacement data, yet biased lower than the “truth”
relative influences mainly space/time (though note that this is because this partial is the result of the same model, just predicted onto the full data grid)

Show the code

      mod_glmmTMB_2 <- glmmTMB(cover ~ fYear + (1 | Site) + (1 | Transect),
        data = benthos_fixed_locs_obs_2,
        family = "beta_family"
      )
      saveRDS(mod_glmmTMB_2,
        file = paste0(data_path, "synthetic/mod_glmmTMB_2.rds")
      )

Show the code

      glmmTMB_2_sum <- 
        mod_glmmTMB_2 |> emmeans(~fYear, at = newdata_2, type = "response") |>
        as.data.frame() |> 
        rename(median = response, lower = asymp.LCL, upper = asymp.UCL) |>
        mutate(Year = as.numeric(as.character(fYear))) |>
        mutate(type = "glmmTMB")
      saveRDS(glmmTMB_2_sum,
        file = paste0(data_path, "synthetic/glmmTMB_2_sum.rds")
      )

Show the code

      g1 <- glmmTMB_2_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "glmmTMB")) +
        geom_line(data = mod_simple_2,
          aes(x = Year, y = Mean, colour = "simple data mean"), linetype = "dashed") +
        geom_line(data = mod_simple_2,
          aes(x = Year, y = Median, colour = "simple data median"), linetype = "dashed") +
        theme_bw()
      g2 <- glmmTMB_2_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "glmmTMB")) +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Mean, colour = "all mean"), linetype = "dashed") +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Median, colour = "all median"), linetype = "dashed") +
        geom_line(data = all_sampled_sum,
          aes(x = Year, y = response, colour = type), linetype = "dashed") +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_pdp_mod_glmmTMB_2.png"
        ),
        g1 + g2,
        width = 12, height = 6, dpi = 72
      )

left figure compares glmmTMB modelled estimates to the sample data
right figure compares glmmTMB modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and green dashed lines) are lower than the true global average (orange and khaki) - right side figure
glmmTMB modelled estimates are slightly higher than the 5 site, replacement data, yet biased lower than the “truth”
the uncertainty is relatively high

Show the code

      glmmTMB_2_dharma <- mod_glmmTMB_2 |> 
        simulateResiduals(n = 1000)
      saveRDS(glmmTMB_2_dharma,
        file = paste0(data_path, "synthetic/glmmTMB_2_dharma.rds")
      )

Show the code

      glmmTMB_2_dharma <- readRDS(
        file = paste0(data_path, "synthetic/glmmTMB_2_dharma.rds")
      )
      g <- wrap_elements(~testUniformity(glmmTMB_2_dharma)) +
        wrap_elements(~plotResiduals(glmmTMB_2_dharma)) +
        wrap_elements(~testDispersion(glmmTMB_2_dharma))
      ggsave(
        filename = paste0(
          fig_path, "R_dharma_mod_glmmTMB_2.png"
        ),
        g,
        width = 10, height = 4, dpi = 72
      )

Show the code

mod_glmmTMB_2 <- readRDS(
  file = paste0(data_path, "synthetic/mod_glmmTMB_2.rds")
)
mod_glmmTMB_2 |> summary()

 Family: beta  ( logit )
Formula:          cover ~ fYear + (1 | Site) + (1 | Transect)
Data: benthos_fixed_locs_obs_2

      AIC       BIC    logLik -2*log(L)  df.resid 
  -2114.5   -2051.9    1072.2   -2144.5       465 

Random effects:

Conditional model:
 Groups   Name        Variance Std.Dev.
 Site     (Intercept) 0.11926  0.3453  
 Transect (Intercept) 0.02793  0.1671  
Number of obs: 480, groups:  Site, 10; Transect, 50

Dispersion parameter for beta family ():  378 

Conditional model:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.131216   0.113515  -9.965  < 2e-16 ***
fYear2      -0.010299   0.027511  -0.374 0.708140    
fYear3      -0.004781   0.027507  -0.174 0.862022    
fYear4       0.047843   0.027351   1.749 0.080248 .  
fYear5       0.091099   0.027192   3.350 0.000807 ***
fYear6       0.159378   0.026997   5.904 3.56e-09 ***
fYear7       0.131914   0.027094   4.869 1.12e-06 ***
fYear8       0.121663   0.027124   4.485 7.28e-06 ***
fYear9      -0.048782   0.027734  -1.759 0.078584 .  
fYear10     -0.118284   0.028027  -4.220 2.44e-05 ***
fYear11     -0.068882   0.027907  -2.468 0.013577 *  
fYear12     -0.101854   0.028192  -3.613 0.000303 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show the code

      benthos_fixed_locs_obs_2 |>
        group_by(fYear) |>
        summarise(mean_abundance = qlogis(mean(cover)),
          median_abundance = qlogis(median(cover)),
          sd_abundance = sd(qlogis(cover)) ,
          mad_abundance = mad(qlogis(cover)) 
        ) 
      priors <- c(
        prior(normal(0, 0.5), class = "b"),
        prior(normal(-1.5, 0.5), class = "Intercept"),
        prior(student_t(3, 0, 0.5), class = "sd")
      )
      mod_form <- bf(cover ~ fYear + (1 | Site) + (1 | Transect),
        family = "Beta"
      )

Show the code

      mod_brms_2 <- brm(mod_form,
        data = benthos_fixed_locs_obs_2,
        iter = 5000,
        warmup = 1000,
        chains = 3,
        cores = 3,
        prior = priors,
        thin =  5,
        control = list(adapt_delta = 0.99),
        backend = "cmdstanr"
      )
      saveRDS(mod_brms_2,
        file = paste0(data_path, "synthetic/mod_brms_2.rds")
      )

Show the code

      brms_2_sum <- 
        mod_brms_2 |> emmeans(~fYear, at = newdata_2, type = "response") |>
        as.data.frame() |> 
        rename(median = response, lower = lower.HPD, upper = upper.HPD) |>
        mutate(Year = as.numeric(as.character(fYear))) |>
        mutate(type = "brms")
      saveRDS(brms_2_sum,
        file = paste0(data_path, "synthetic/brms_2_sum.rds")
      )

Show the code

      brms_2_sum <- readRDS(
        file = paste0(data_path, "synthetic/brms_2_sum.rds")
      )
      g1 <- 
        brms_2_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "brms")) +
        geom_line(data = mod_simple_2,
          aes(x = Year, y = Mean, colour = "simple data mean"), linetype = "dashed") +
        geom_line(data = mod_simple_2,
          aes(x = Year, y = Median, colour = "simple data median"), linetype = "dashed") +
        theme_bw()
      g2 <- 
        brms_2_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "brms")) +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Mean, colour = "all mean"), linetype = "dashed") +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Median, colour = "all median"), linetype = "dashed") +
        geom_line(data = all_sampled_sum,
          aes(x = Year, y = response, colour = type), linetype = "dashed") +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_pdp_mod_brms_2.png"
        ),
        g1 + g2,
        width = 12, height = 6, dpi = 72
      )

left figure compares brms modelled estimates to the sample data
right figure compares brms modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and green dashed lines) are lower than the true global average (orange and khaki) - right side figure
brms modelled estimates are slightly higher than the 5 site, replacement data, yet biased lower than the “truth”
the uncertainty is relatively high

Show the code

      mod_brms_2 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_2.rds")
      )
      vars <- mod_brms_2 |>
        brms::variables() |>
        str_subset("^b.*")
      g <- mod_brms_2$fit |> stan_trace(pars = vars)
      ggsave(
        filename = paste0(
          fig_path, "R_brms_trace_2.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_brms_2 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_2.rds")
      )
      vars <- mod_brms_2 |>
        brms::variables() |>
        str_subset("^b.*")
      g <- mod_brms_2$fit |> stan_ac(pars = vars)
      ggsave(
        filename = paste0(
          fig_path, "R_brms_ac_2.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_brms_2 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_2.rds")
      )
      g <- mod_brms_2$fit |> stan_rhat()
      ggsave(
        filename = paste0(
          fig_path, "R_brms_rhat_2.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_brms_2 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_2.rds")
      )
      g <- mod_brms_2$fit |> stan_ess()
      ggsave(
        filename = paste0(
          fig_path, "R_brms_ess_2.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_brms_2 <- readRDS(
        file = paste0(data_path, "synthetic/mod_brms_2.rds")
      )
      g <- mod_brms_2 |> pp_check( type='dens_overlay', ndraws=100)
      ggsave(
        filename = paste0(
          fig_path, "R_brms_ppc_2.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

mod_brms_2 <- readRDS(
  file = paste0(data_path, "synthetic/mod_brms_2.rds")
)
mod_brms_2 |> summary()

 Family: beta 
  Links: mu = logit; phi = identity 
Formula: cover ~ fYear + (1 | Site) + (1 | Transect) 
   Data: benthos_fixed_locs_obs_2 (Number of observations: 480) 
  Draws: 3 chains, each with iter = 5000; warmup = 1000; thin = 5;
         total post-warmup draws = 2400

Multilevel Hyperparameters:
~Site (Number of levels: 10) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.40      0.11     0.24     0.65 1.00     2192     2106

~Transect (Number of levels: 50) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.17      0.02     0.14     0.22 1.00     1984     2330

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.15      0.13    -1.42    -0.90 1.00     2414     2273
fYear2       -0.01      0.03    -0.06     0.04 1.00     2338     2430
fYear3       -0.00      0.03    -0.06     0.05 1.00     2524     2270
fYear4        0.05      0.03    -0.00     0.10 1.00     2165     2134
fYear5        0.09      0.03     0.04     0.15 1.00     2360     2410
fYear6        0.16      0.03     0.11     0.21 1.00     2515     2306
fYear7        0.13      0.03     0.08     0.18 1.00     2454     2311
fYear8        0.12      0.03     0.07     0.18 1.00     2560     2266
fYear9       -0.05      0.03    -0.10     0.01 1.00     2075     2061
fYear10      -0.12      0.03    -0.18    -0.06 1.00     2571     2322
fYear11      -0.07      0.03    -0.12    -0.01 1.00     2375     2369
fYear12      -0.10      0.03    -0.16    -0.04 1.00     2267     2270

Further Distributional Parameters:
    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi   361.86     25.09   314.73   413.59 1.00     2414     2080

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Show the code

      benthos_fixed_locs_obs_2 <-
        benthos_fixed_locs_obs_2 |>
        mutate(
          cYear = fYear,
          grid_id = factor(Reef),
          cSite = factor(interaction(Reef, Site)),
          cReplicate = ifelse(is.na(Transect),
            interaction(Site, Year),
            interaction(cSite, Transect)),
          Cover = cover,
          area = 1,
          sum = 1
        )
      saveRDS(benthos_fixed_locs_obs_2,
        file = paste0(data_path, "synthetic/saveRDS(benthos_fixed_locs_obs_2_forstan.rds")
      ) 
      stan_data <- prepare_data_for_stan(benthos_fixed_locs_obs_2, yrs = NULL)
      model_stan <- cmdstanr::cmdstan_model(stan_file = "model1.stan")
      ## model_stan <- cmdstanr::cmdstan_model(stan_file = "mod1a.stan")
      ## model_stan <- cmdstanr::cmdstan_model(stan_file = "mod2a.stan")

Show the code

      mod_stan_2 <- model_stan$sample(
        data = stan_data,
        seed = 123,
        iter_sampling = 5000,
        iter_warmup = 1000,
        thin = 5,
        chains = 3,
        parallel_chains = 3,
        adapt_delta = 0.99,
        output_dir = paste0(data_path, "synthetic/"),
      )
      saveRDS(mod_stan_2,
        file = paste0(data_path, "synthetic/mod_stan_2.rds")
      )

Show the code

      stan_2_sum <-
        mod_stan_2$draws(variables = "cellmeans") |>
        posterior::as_draws_df() |>
        posterior::summarise_draws(
          median,
          HDInterval::hdi,
          ~ HDInterval::hdi(., credMass = c(0.9)),
          rhat,
          ess_bulk,
          ess_tail
        ) |>
        rename(lower_90 = V4, upper_90 = V5) |>
        bind_cols(newdata_2) |>
        mutate(Year = as.numeric(as.character(fYear)))
      saveRDS(stan_2_sum,
        file = paste0(data_path, "synthetic/stan_2_sum.rds")
      )

Show the code

      stan_2_sum <- readRDS(
        file = paste0(data_path, "synthetic/stan_2_sum.rds")
      )
      g1 <- stan_2_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "stan")) +
        geom_line(data = mod_simple_2,
          aes(x = Year, y = Mean, colour = "simple data mean"), linetype = "dashed") +
        geom_line(data = mod_simple_2,
          aes(x = Year, y = Median, colour = "simple data median"), linetype = "dashed") +
        theme_bw()
      g2 <- stan_2_sum |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = lower, ymax = upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = median, color = "stan")) +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Mean, colour = "all mean"), linetype = "dashed") +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Median, colour = "all median"), linetype = "dashed") +
        geom_line(data = all_sampled_sum,
          aes(x = Year, y = response, colour = type), linetype = "dashed") +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_pdp_mod_stan_2.png"
        ),
        g1 + g2,
        width = 12, height = 6, dpi = 72
      )

left figure compares brms modelled estimates to the sample data
right figure compares brms modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and green dashed lines) are lower than the true global average (orange and khaki) - right side figure
brms modelled estimates are slightly higher than the 5 site, replacement data, yet biased lower than the “truth”
the uncertainty is relatively high, but not as high as brms or glmmTMB

Show the code

      mod_stan_2 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_2.rds")
      )
      color_scheme_set("viridis")
      g <-
        mod_stan_2$draws(variables = c("beta", "phi", "sd_2", "sd_2", "sd_3")) |>
        mcmc_trace() +
        theme_minimal()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_trace_2.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_stan_2 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_2.rds")
      )
      color_scheme_set("viridis")
      g <-
        mod_stan_2$draws(variables = c("beta", "phi", "sd_2", "sd_2", "sd_3")) |>
        mcmc_acf() +
        theme_minimal()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_ac_2.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_stan_2 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_2.rds")
      )
      color_scheme_set("viridis")
      g <-
        mod_stan_2 |> bayesplot::rhat() |> 
        mcmc_rhat_hist() +
        theme_minimal()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_rhat_2.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_stan_2 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_2.rds")
      )
      color_scheme_set("viridis")
      g <-
        mod_stan_2 |> bayesplot::neff_ratio() |> 
        mcmc_neff_hist() +
        theme_minimal()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_ess_2.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

      mod_stan_2 <- readRDS(
        file = paste0(data_path, "synthetic/mod_stan_2.rds")
      )
      g <- 
        bayesplot::pp_check(
          benthos_fixed_locs_obs_2$cover,
          mod_stan_2$draws("ypred", format = "matrix")[1:100, ],
          ppc_dens_overlay
        ) +
        theme_classic()
      ggsave(
        filename = paste0(
          fig_path, "R_stan_ppc_2.png"
        ),
        g,
        width = 10, height = 8, dpi = 72
      )

Show the code

mod_stan_2 <- readRDS(
  file = paste0(data_path, "synthetic/mod_stan_2.rds")
)
mod_stan_2$print(
  variables = c("phi", "beta", "sd_2", "sd_2", "sd_3"),
  max_rows = 20
)

 variable   mean median    sd   mad     q5    q95 rhat ess_bulk ess_tail
 phi      361.47 360.40 24.95 24.24 321.72 405.20 1.00     2875     2749
 beta[1]   -1.21  -1.21  0.06  0.06  -1.31  -1.10 1.00     1887     2322
 beta[2]   -1.21  -1.21  0.06  0.06  -1.32  -1.11 1.00     1906     2203
 beta[3]   -1.21  -1.21  0.06  0.06  -1.31  -1.10 1.00     1959     2377
 beta[4]   -1.16  -1.16  0.07  0.06  -1.26  -1.05 1.00     1839     2060
 beta[5]   -1.11  -1.12  0.07  0.06  -1.22  -1.00 1.00     1933     2278
 beta[6]   -1.05  -1.05  0.07  0.06  -1.16  -0.94 1.00     1860     2147
 beta[7]   -1.08  -1.08  0.07  0.06  -1.18  -0.97 1.00     1916     2408
 beta[8]   -1.09  -1.09  0.07  0.06  -1.19  -0.98 1.00     1898     2322
 beta[9]   -1.25  -1.25  0.07  0.06  -1.36  -1.14 1.00     1858     2493
 beta[10]  -1.32  -1.32  0.07  0.06  -1.42  -1.21 1.00     1862     2209
 beta[11]  -1.27  -1.28  0.07  0.06  -1.38  -1.16 1.00     1907     2227
 beta[12]  -1.30  -1.31  0.07  0.06  -1.41  -1.19 1.00     1978     2210
 sd_2[1]    0.14   0.12  0.09  0.07   0.03   0.32 1.00     2015     2457
 sd_3[1]    0.17   0.17  0.02  0.02   0.14   0.21 1.00     2645     2723

Show the code

mod_stan_2$print(
  variables = "cellmeans",
  max_rows = 20
)

      variable mean median   sd  mad   q5  q95 rhat ess_bulk ess_tail
 cellmeans[1]  0.23   0.23 0.01 0.01 0.21 0.25 1.00     1887     2322
 cellmeans[2]  0.23   0.23 0.01 0.01 0.21 0.25 1.00     1906     2203
 cellmeans[3]  0.23   0.23 0.01 0.01 0.21 0.25 1.00     1959     2377
 cellmeans[4]  0.24   0.24 0.01 0.01 0.22 0.26 1.00     1839     2060
 cellmeans[5]  0.25   0.25 0.01 0.01 0.23 0.27 1.00     1933     2278
 cellmeans[6]  0.26   0.26 0.01 0.01 0.24 0.28 1.00     1860     2147
 cellmeans[7]  0.25   0.25 0.01 0.01 0.23 0.28 1.00     1916     2408
 cellmeans[8]  0.25   0.25 0.01 0.01 0.23 0.27 1.00     1898     2322
 cellmeans[9]  0.22   0.22 0.01 0.01 0.20 0.24 1.00     1858     2493
 cellmeans[10] 0.21   0.21 0.01 0.01 0.19 0.23 1.00     1862     2209
 cellmeans[11] 0.22   0.22 0.01 0.01 0.20 0.24 1.00     1907     2227
 cellmeans[12] 0.21   0.21 0.01 0.01 0.20 0.23 1.00     1978     2210

Show the code

      mod_gbm_2 <- gbm(cover ~ fYear,
        data =  benthos_fixed_locs_obs_2,
        distribution = "gaussian",
        n.trees = 10000,
        interaction.depth = 5,
        shrinkage = 0.001,
        bag.fraction = 0.5,
        cv.folds = 5,
        verbose = TRUE
      )
      saveRDS(mod_gbm_2,
        file = paste0(data_path, "synthetic/mod_gbm_2.rds")
      )

Show the code

      n.trees <- gbm.perf(mod_gbm_2, method = "cv")

Partial dependency plot

Show the code

      gbm_2_sum <- readRDS(
        file = paste0(data_path, "synthetic/gbm_2_sum.rds")
      )
      g1 <-
        gbm_2_sum |>
        ggplot() +
        geom_line(aes(x = Year, y = median, color = "gbm")) +
        geom_line(data = mod_simple_2,
          aes(x = Year, y = Mean, colour = "simple data mean"), linetype = "dashed") +
        geom_line(data = mod_simple_2,
          aes(x = Year, y = Median, colour = "simple data median"), linetype = "dashed") +
        theme_bw()
      g2 <-
        gbm_2_sum |>
        ggplot() +
        geom_line(aes(x = Year, y = median, color = "gbm")) +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Mean, colour = "all mean"), linetype = "dashed") +
        geom_line(data = benthos_reefs_temporal_summary,
          aes(x = Year, y = Median, colour = "all median"), linetype = "dashed") +
        geom_line(data = all_sampled_sum,
          aes(x = Year, y = response, colour = type), linetype = "dashed") +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_pdp_mod_gbm_2.png"
        ),
        g1 + g2,
        width = 12, height = 6, dpi = 72
      )

left figure compares gbm modelled estimates to the sample data
right figure compares gbm modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and green dashed lines) are lower than the true global average (orange and khaki) - right side figure
gbm modelled estimates deviate substantially from the full 25 site dataset, particularly after year 9
gbm modelled estimates are consistent with the observed 5 site replacement data - e.g. adversely impacted by the replacement of a poor reef with a good reef.
gbm derived predictions lack uncertainty estimates

Show the code

      mod_gbm_2b <- gbm(cover ~ fYear + Latitude + Longitude + CYC + DHW + OTHER,
        data =  benthos_fixed_locs_obs_2,
        distribution = "gaussian",
        var.monotone = c(0, 0, 0, -1, -1, -1),
        n.trees = 10000,
        interaction.depth = 5,
        shrinkage = 0.001,
        bag.fraction = 0.5,
        cv.folds = 5,
        verbose = TRUE
      )
      saveRDS(mod_gbm_2b,
        file = paste0(data_path, "synthetic/mod_gbm_2b.rds")
      )

Show the code

      n.trees <- gbm.perf(mod_gbm_2b, method = "cv")

Show the code

      newdata_2b <- benthos_fixed_locs_obs_2
      gbm_2b_sum <- newdata_2b |>
        mutate(median = predict(mod_gbm_2b, newdata_2b, n.trees = n.trees, type = "response")) |>
        mutate(Year = as.numeric(as.character(fYear))) |>
        ## group_by(Year, Site, Transect) |>
        ## summarise(median = median(median)) |>
        ## group_by(Year, Site, .add = FALSE) |>
        ## summarise(median = median(median)) |>
        group_by(Year, .add = FALSE) |> 
        summarise(median = median(median)) |> 
        mutate(type = "gbm")
      saveRDS(gbm_2b_sum,
        file = paste0(data_path, "synthetic/gbm_2b_sum.rds")
      )

left figure compares gbm modelled estimates to the sample data
right figure compares gbm modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and green dashed lines) are lower than the true global average (orange and khaki) - right side figure
gbm modelled estimates deviate substantially from the full 25 site dataset, particularly after year 9
gbm modelled estimates are consistent with the observed 5 site replacement data - e.g. adversely impacted by the replacement of a poor reef with a good reef.
gbm derived predictions lack uncertainty estimates
relative influences mainly space/time

Show the code

      dbarts_2c_sum <- preds_2c |> 
        t() |>
        as_tibble(.name_repair = ~ paste0("V", seq_along(.))) |>
        bind_cols(benthos_reefs_sf) |>
        pivot_longer(
          cols = matches("^V[0-9]*"),
          names_to = ".draw", values_to = "fit"
        ) |>
        group_by(Year, .draw) |>
        summarise(fit = mean(fit)) |>
        ## dplyr::select(Year, fit, .draw) |> 
        group_by(Year, .add = FALSE) |> 
        summarise_draws(median, HDInterval::hdi)
      saveRDS(dbarts_2c_sum,
        file = paste0(data_path, "synthetic/dbarts_2c_sum.rds")
      )

left figure compares gbm modelled estimates to the sample data
right figure compares gbm modelled estimates to the simple hierarchical sampled means/medians and the true means/medians
sampled reefs (blue and green dashed lines) are lower than the true global average (orange and khaki) - right side figure
gbm modelled estimates are more consistent with the full 25 site dataset, e.g., they are less impacted by the replacement of a poor reef with a good reef.
gbm modelled estimates deviate from the observed 5 site replacement data after year 9.
gbm derived predictions lack uncertainty estimates
relative influences mainly space/time

Show the code

      print(head(benthos_fixed_locs_obs_2))
      mod_dbarts_2 <- bart2(log(cover) ~ fYear,
        data =   benthos_fixed_locs_obs_2,
        keepTrees = TRUE
      )
      saveRDS(mod_dbarts_2,
        file = paste0(data_path, "synthetic/mod_dbarts_2.rds")
      )

Partial dependency plot

Show the code

      dbarts_2_sum <- newdata_2 |>
        bind_cols(preds) |> 
        mutate(Year = as.numeric(as.character(fYear))) |>
        mutate(type = "dbarts")
      saveRDS(dbarts_2_sum,
        file = paste0(data_path, "synthetic/dbarts_2_sum.rds")
      )

Show the code

      mod_dbarts_2b <- bart2(log(cover) ~ fYear + Latitude + Longitude + CYC + DHW + OTHER,
        data =   benthos_fixed_locs_obs_2,
        keepTrees = TRUE
      )
      saveRDS(mod_dbarts_2b,
        file = paste0(data_path, "synthetic/mod_dbarts_2b.rds")
      )

Show the code

      dbarts_2b_sum <- preds |> 
        t() |>
        as_tibble(.name_repair = ~ paste0("V", seq_along(.))) |>
        bind_cols(benthos_fixed_locs_obs_2) |>
        pivot_longer(
          cols = matches("^V[0-9]*"),
          names_to = ".draw", values_to = "fit"
        ) |>
        group_by(Year, .draw) |>
        summarise(fit = mean(fit)) |>
        ## dplyr::select(Year, fit, .draw) |> 
        group_by(Year, .add = FALSE) |> 
        summarise_draws(median, HDInterval::hdi)
      saveRDS(dbarts_2b_sum,
        file = paste0(data_path, "synthetic/dbarts_2b_sum.rds")
      )