Comparing model types - site replacement

Author

Murray Logan

Published

06/07/2025

1 Purpose

The purpose of this page is to contrast a few different approaches to modelling broad scale spatio-temporal that are suitable for both very large and small sample data sets in the context of the upcoming GCRMN report. More specifically, we have identified the following modelling challenges for which analyses of simulated data will hopefully provide model choice guidance:

  • Site replacements: when monitored sites are removed from a design and replaced with alternative sites. Of particular concern is the situation in which a poor site is discontinued (perhaps it becomes so degraded it no longer functions as a coral reef) and is replaced by a relatively good new site (with the argument that there is no value in establishing a new poor site that might similarly disappear in the near future). Unfortunately, this shifts the bias and has the potential to confound the timeseries.

  • Unsampled years: when the timeseries are punctuated with missing years. Ideally, it is important that the analyses are able to estimate trends across the full timeseries. Hence estimates are necessary, even when data are unavailable.

  • Use of covariates: whilst other covariates (such as sea surface temperature and cyclones could be used to inform trends at locations and times for which observed data are lacking, they also have the potential to induce new biases. For example, if models are “trained” with wave energy (or wind) data in which no major storms occurred and coral cover grew, it is possible that the model could “learn” that coral cover is positively correlated with wave energy. When predicting to a different area that did experience substantial stores (that might be expected to have a negative impact on coral cover), such a model might erroneously predict substantial increases in coral cover.

    There are also additional complications of using covariates that are likely to represent acute impacts. Whilst, coral cover can decline rapidly in response to a disturbance, it typically takes time to recover (if at all) to its pre-disturbance state. As a result, the state of coral at any specific time is the result of an accumulation of conditions over the past and not just the most recent conditions. To account for this in a modelling perspective, is likely to require either:

    • Inclusion of the previous years cover estimate in a model (which effectively means the model is parameterised on changes in cover). This is obviously infeasible for the first year of a monitoring program or a program without fixed sites and also becomes a proportionally bigger issue with shorter the time series.
    • Inclusion of multiple lagged versions of each of the covariates that are likely to represent causal disturbances.

2 Preparations

[1] "/home/runner/work/gcrmn_model_dev/gcrmn_model_dev/docs"

Load the necessary R libraries

        library(tidyverse) # for data manipulation and visualisation
        library(sf) # for spatial data handling and visualisation
        library(glmmTMB) # for frequentist GLMM
        library(emmeans) # for post-hoc analysis
        library(DHARMa) # for model diagnostics
        library(patchwork)
        library(brms) # for Bayesian GLMM
        library(rstan) # for Bayesian modelling
        library(bayesplot) # for Bayesian plotting
        library(tidybayes) # for tidy bayesian analysis
        library(posterior) # for posterior analysis
        library(gbm) # for gradient boosted regression trees
        library(dbarts) # for Bayesian additive regression trees
        library(HDInterval) # for Bayesian credible intervals

Establish the global environment and paths

        assign(x = "data_path", value = "../data/", envir = .GlobalEnv)
        assign(x = "output_path", value = "../output/", envir = .GlobalEnv)
        paths <- list(
          data_path = data_path,
          synthetic_path = paste0(data_path, "synthetic/"),
          output_path = output_path,
          fig_path = paste0(output_path, "figures/")
        )
        lapply(paths, function(x) {
          if (!dir.exists(x)) {
            dir.create(x)
          }
        })

Load any helper functions

        source("model_functions.R")
        ## source("helper_functions.R")
import pandas as pd
import numpy as np
import xarray as xr
import pymc as pm
import arviz as az
import preliz as pz
import pymc_bart as pmb
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import os
import pickle
# Assign a variable to the global namespace
data_path = "../data/"
output_path = "../output/"
# Create a dictionary of paths
paths = {
    "data_path": data_path,
    "synthetic_path": f"{data_path}synthetic/",
    "fig_path": f"{output_path}figures/"
}
# Ensure the directories exist
for path in paths.values():
    if not os.path.exists(path):
        os.makedirs(path)

3 Import data

These data represent the synthesized “truth” and are thereby useful to compare to the various modelled outcomes.

      benthos_reefs_sf <- readRDS(file = paste0(
        data_path,
        "synthetic/benthos_reefs_sf.rds"
      ))
benthos_reefs_sf 
Simple feature collection with 267132 features and 8 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 0.1086254 ymin: -20.96124 xmax: 9.508625 ymax: -9.981244
Geodetic CRS:  WGS 84
# A tibble: 267,132 × 9
    Year Reef              geometry   HCC     SC    MA    CYC   DHW OTHER
   <int> <chr>          <POINT [°]> <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl>
 1     1 Reef1 (9.398625 -19.95124) 0.310 0.0750 0.565 0      0.399 0.512
 2     2 Reef1 (9.398625 -19.95124) 0.270 0.0627 0.592 0.570  0.379 0.745
 3     3 Reef1 (9.398625 -19.95124) 0.279 0.0652 0.586 0      0.417 0.486
 4     4 Reef1 (9.398625 -19.95124) 0.291 0.0691 0.578 0.0180 0.340 0.615
 5     5 Reef1 (9.398625 -19.95124) 0.301 0.0722 0.571 0      0.419 0.429
 6     6 Reef1 (9.398625 -19.95124) 0.316 0.0769 0.561 0      0.318 0.728
 7     7 Reef1 (9.398625 -19.95124) 0.328 0.0811 0.553 0      0.355 0.648
 8     8 Reef1 (9.398625 -19.95124) 0.309 0.0748 0.565 0      0.637 0.691
 9     9 Reef1 (9.398625 -19.95124) 0.261 0.0599 0.599 0      0.961 0.575
10    10 Reef1 (9.398625 -19.95124) 0.246 0.0557 0.609 0      0.650 0.519
# ℹ 267,122 more rows
Show the code
      benthos_reefs_temporal_summary <- benthos_reefs_sf |>
        st_drop_geometry() |>
        group_by(Year) |>
        summarise(Mean = mean(HCC),
          Median = median(HCC),
          SD = sd(HCC),
          Lower = quantile(HCC, 0.025),
          Upper = quantile(HCC, 0.975))
      benthos_reefs_temporal_summary |> 
        saveRDS(
          file = paste0(
            data_path,
            "synthetic/benthos_reefs_temporal_summary.rds"
          )
        )
Show the code
      benthos_reefs_temporal_summary <- 
        readRDS(
          file = paste0(
            data_path,
            "synthetic/benthos_reefs_temporal_summary.rds"
          )
        )
      g <- benthos_reefs_temporal_summary |>
        ggplot() +
        geom_ribbon(aes(x = Year, ymin = Lower, ymax = Upper), alpha = 0.2) +
        geom_line(aes(x = Year, y = Mean, colour = "mean")) +
        geom_line(aes(x = Year, y = Median, colour = "median")) +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_all_temporal_summary_plot.png"
        ),
        g,
        width = 8, height = 6, dpi = 72
      )

These data represent a simulated sampling design in which 25 fixed (though randomly selected from among all the defined reefs in the synthetic spatial domain) reefs are monitored over a continuous 12 year period.

      benthos_fixed_locs_obs <- readRDS(file = paste0(
        data_path,
        "synthetic/benthos_fixed_locs_obs.rds"
      ))
      benthos_fixed_locs_obs <- benthos_fixed_locs_obs |>
        left_join(
          benthos_reefs_sf |>
            st_drop_geometry() |>
            dplyr::select(Year, Reef, CYC, DHW, OTHER) |>
            group_by(Year, Reef) |>
            summarise(across(c(CYC, DHW, OTHER), mean)),
          by = c("Year", "Reef")
        )
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
benthos_fixed_locs_obs 
# A tibble: 3,000 × 13
   Reef    Longitude Latitude Site  Transect  Year   HCC    SC    MA
   <chr>       <dbl>    <dbl> <chr> <chr>    <int> <dbl> <dbl> <dbl>
 1 Reef103      3.66    -19.5 S1    T1           1  18.6  3.99  54.9
 2 Reef103      3.66    -19.5 S1    T1           2  19.0  4.32  50.5
 3 Reef103      3.66    -19.5 S1    T1           3  19.0  4.18  48.7
 4 Reef103      3.66    -19.5 S1    T1           4  21.6  4.81  50.2
 5 Reef103      3.66    -19.5 S1    T1           5  25.4  5.20  49.0
 6 Reef103      3.66    -19.5 S1    T1           6  21.2  5.47  46.3
 7 Reef103      3.66    -19.5 S1    T1           7  22.2  5.26  45.6
 8 Reef103      3.66    -19.5 S1    T1           8  23.2  5.18  46.1
 9 Reef103      3.66    -19.5 S1    T1           9  19.7  4.43  47.2
10 Reef103      3.66    -19.5 S1    T1          10  17.4  4.09  54.4
# ℹ 2,990 more rows
# ℹ 4 more variables: Date <dttm>, CYC <dbl>, DHW <dbl>, OTHER <dbl>
Show the code
      benthos_fixed_locs_obs <- benthos_fixed_locs_obs |>
        mutate(
          fYear = as.factor(Year),
          Reef = as.factor(Reef),
          Site = interaction(Reef, Site),
          Transect = interaction(Site, Transect),
          cover = HCC/100
        )
      print(benthos_fixed_locs_obs)
      ## Raw summary means
      all_sampled_sum <-
        benthos_fixed_locs_obs |>
        group_by(fYear) |>
        summarise(
          mean_mean_response = mean(cover),
          median_median_response = median(cover),
          mean_sd = sd(cover),
          mean_lower_lower = mean_mean_response - 1.96 * mean_sd,
          mean_upper_upper = mean_mean_response + 1.96 * mean_sd,
          median_lower_lower = quantile(cover, 0.025),
          median_upper_upper = quantile(cover, 0.975)
        ) |>
        dplyr::select(-mean_sd) |>
        pivot_longer(
          cols = c(
            mean_mean_response, median_median_response, mean_lower_lower,
            mean_upper_upper, median_lower_lower, median_upper_upper
          ),
          names_to = c("type", "variable", "stat"),
          names_sep = "_",
          values_to = "values"
        ) |>
        mutate(type = paste("all_sampled_", type)) |>
        dplyr::select(-variable) |>
        pivot_wider(
          names_from = stat,
          values_from = values
        ) |>
        mutate(Year = as.numeric(as.character(fYear)))
      print(all_sampled_sum)
Show the code
      g <- all_sampled_sum |>
        ggplot(aes(x = Year, y = response, color = type, fill = type)) +
        geom_line() +
        geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = NA) +
        scale_y_continuous(
          name = "Hard Coral Cover (%)",
          limits = c(0, 1),
          labels = scales::percent_format(accuracy = 1)
        ) +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_full_simple_raw_means_plot.png"
        ),
        g,
        width = 8, height = 6, dpi = 72
      )

These data represent the fixed 25 sites, yet the site with the lowest coral cover is discontinued after 9 years and the highest performing reef is only introduced after 9 years. In this way, these data are intended to mimic the scenario in which a poor reef is replaced by a relatively good reef.

      benthos_fixed_locs_obs_1 <- readRDS(
        file = paste0(
          data_path,
          "synthetic/benthos_fixed_locs_obs_1.rds"
        )
      )
      benthos_fixed_locs_obs_1 <- benthos_fixed_locs_obs_1 |>
        left_join(
          benthos_reefs_sf |>
            st_drop_geometry() |>
            dplyr::select(Year, Reef, CYC, DHW, OTHER) |>
            group_by(Year, Reef) |>
            summarise(across(c(CYC, DHW, OTHER), mean)),
          by = c("Year", "Reef")
        )
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
benthos_fixed_locs_obs_1 
# A tibble: 2,880 × 13
   Reef    Longitude Latitude Site  Transect  Year   HCC    SC    MA
   <chr>       <dbl>    <dbl> <chr> <chr>    <int> <dbl> <dbl> <dbl>
 1 Reef103      3.66    -19.5 S1    T1           1  18.6  3.99  54.9
 2 Reef103      3.66    -19.5 S1    T1           2  19.0  4.32  50.5
 3 Reef103      3.66    -19.5 S1    T1           3  19.0  4.18  48.7
 4 Reef103      3.66    -19.5 S1    T1           4  21.6  4.81  50.2
 5 Reef103      3.66    -19.5 S1    T1           5  25.4  5.20  49.0
 6 Reef103      3.66    -19.5 S1    T1           6  21.2  5.47  46.3
 7 Reef103      3.66    -19.5 S1    T1           7  22.2  5.26  45.6
 8 Reef103      3.66    -19.5 S1    T1           8  23.2  5.18  46.1
 9 Reef103      3.66    -19.5 S1    T1           9  19.7  4.43  47.2
10 Reef103      3.66    -19.5 S1    T1          10  17.4  4.09  54.4
# ℹ 2,870 more rows
# ℹ 4 more variables: Date <dttm>, CYC <dbl>, DHW <dbl>, OTHER <dbl>
Show the code
      g <- benthos_fixed_locs_obs_2 |>
        ggplot() +
        geom_line(aes(
          y = HCC, x = Year, colour = Site,
          group = interaction(Site, Transect)
        )) +
        facet_wrap(~Reef) +
        theme_bw()
      ggsave(
        filename = paste0(
          fig_path, "R_sampled_reefs_2_plot.png"
        ),
        g,
        width = 8, height = 6, dpi = 72
      )

These data represent a more extreme version of the above reef replacement situation in that there are only a total of five reefs and thus the replacement represents a higher fraction of reefs.

      benthos_fixed_locs_obs_2 <- readRDS(
        file = paste0(
          data_path,
          "synthetic/benthos_fixed_locs_obs_2.rds"
        )
      )
      benthos_fixed_locs_obs_2 <- benthos_fixed_locs_obs_2|>
        left_join(
          benthos_reefs_sf |>
            st_drop_geometry() |>
            dplyr::select(Year, Reef, CYC, DHW, OTHER) |>
            group_by(Year, Reef) |>
            summarise(across(c(CYC, DHW, OTHER), mean)),
          by = c("Year", "Reef")
        )
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
benthos_fixed_locs_obs_2 
# A tibble: 480 × 13
   Reef    Longitude Latitude Site  Transect  Year   HCC    SC    MA
   <chr>       <dbl>    <dbl> <chr> <chr>    <int> <dbl> <dbl> <dbl>
 1 Reef170      1.34    -20.9 S1    T1           1  13.0  3.41  55.0
 2 Reef170      1.34    -20.9 S1    T1           2  13.8  3.53  55.0
 3 Reef170      1.34    -20.9 S1    T1           3  15.6  3.66  55.6
 4 Reef170      1.34    -20.9 S1    T1           4  16.8  4.02  54.3
 5 Reef170      1.34    -20.9 S1    T1           5  16.6  4.24  52.6
 6 Reef170      1.34    -20.9 S1    T1           6  18.3  4.69  58.3
 7 Reef170      1.34    -20.9 S1    T1           7  17.6  4.66  53.6
 8 Reef170      1.34    -20.9 S1    T1           8  15.3  4.32  54.6
 9 Reef170      1.34    -20.9 S1    T1           9  14.1  3.74  51.3
10 Reef170      1.34    -20.9 S1    T2           1  14.5  3.33  54.8
# ℹ 470 more rows
# ℹ 4 more variables: Date <dttm>, CYC <dbl>, DHW <dbl>, OTHER <dbl>

These data represent the synthesized “truth” and are thereby useful to compare to the various modelled outcomes.

benthos_reefs_sf = pd.read_csv(f"{paths['data_path']}synthetic/benthos_reefs_sf.csv")
pd.set_option("display.max_columns", None)  # Show all columns
benthos_reefs_sf 
        Year     Reef                              geometry       HCC  \
0          1    Reef1  c(9.398625412978, -19.9512440470767)  0.309773   
1          2    Reef1  c(9.398625412978, -19.9512440470767)  0.270281   
2          3    Reef1  c(9.398625412978, -19.9512440470767)  0.278780   
3          4    Reef1  c(9.398625412978, -19.9512440470767)  0.291293   
4          5    Reef1  c(9.398625412978, -19.9512440470767)  0.301213   
...      ...      ...                                   ...       ...   
267127     8  Reef187  c(1.478625412978, -14.1412440470767)  0.194059   
267128     9  Reef187  c(1.478625412978, -14.1412440470767)  0.172821   
267129    10  Reef187  c(1.478625412978, -14.1412440470767)  0.157513   
267130    11  Reef187  c(1.478625412978, -14.1412440470767)  0.167410   
267131    12  Reef187  c(1.478625412978, -14.1412440470767)  0.178224   

              SC        MA       CYC       DHW     OTHER  
0       0.074957  0.565184  0.000000  0.398843  0.511869  
1       0.062682  0.592401  0.570183  0.378864  0.745148  
2       0.065237  0.586457  0.000000  0.417448  0.485992  
3       0.069083  0.577790  0.018019  0.339762  0.615030  
4       0.072207  0.570994  0.000000  0.419087  0.428713  
...          ...       ...       ...       ...       ...  
267127  0.022101  0.628042  0.000000  0.453441  0.828005  
267128  0.019233  0.646412  0.000000  0.766604  0.586145  
267129  0.017246  0.659733  0.000000  0.701376  0.603980  
267130  0.018523  0.651113  0.000000  0.321212  0.666380  
267131  0.019950  0.641727  0.000000  0.351262  0.487009  

[267132 rows x 9 columns]
pd.reset_option("display.max_columns")
Show the code
benthos_reefs_temporal_summary = (
    benthos_reefs_sf.groupby("Year").agg(
        Mean=("HCC", "mean"),
        Median=("HCC", "median"),
        SD=("HCC", "std"),
        Lower=("HCC", lambda x: x.quantile(0.025)),
        Upper=("HCC", lambda x: x.quantile(0.975))
    ).reset_index()
)
Show the code
# Create the plot
plt.figure(figsize=(10, 6))
# Add ribbon (shaded area)
plt.fill_between(
    benthos_reefs_temporal_summary["Year"],
    benthos_reefs_temporal_summary["Lower"],
    benthos_reefs_temporal_summary["Upper"],
    alpha=0.2,
    label="Confidence Interval"
)
# Add mean line
plt.plot(
    benthos_reefs_temporal_summary["Year"],
    benthos_reefs_temporal_summary["Mean"],
    label="Mean",
    color="blue"
)
# Add median line
plt.plot(
    benthos_reefs_temporal_summary["Year"],
    benthos_reefs_temporal_summary["Median"],
    label="Median",
    color="orange"
)
# Add labels and theme
plt.xlabel("Year")
plt.ylabel("Value")
plt.title("Temporal Summary")
plt.legend()
plt.grid(True)
# plt.show()
# Save the plot as a PNG file
plt.savefig(f"{paths['fig_path']}Python_all_temporal_summary_plot.png", dpi=300, bbox_inches="tight")

These data represent a simulated sampling design in which 25 fixed (though randomly selected from among all the defined reefs in the synthetic spatial domain) reefs are monitored over a continuous 12 year period.

benthos_fixed_locs_obs = pd.read_csv(f"{paths['data_path']}synthetic/benthos_fixed_locs_obs.csv")
benthos_reefs_sf = pd.read_csv(f"{paths['data_path']}synthetic/benthos_reefs_sf.csv")
# Group and summarize the benthos_reefs_sf DataFrame
benthos_reefs_summary = (
    benthos_reefs_sf
    .drop(columns=["geometry"])  # Equivalent to st_drop_geometry()
    .groupby(["Year", "Reef"], as_index=False)
    .agg({
        "CYC": "mean",
        "DHW": "mean",
        "OTHER": "mean"
    })
)
# Perform the left join
benthos_fixed_locs_obs = benthos_fixed_locs_obs.merge(
    benthos_reefs_summary,
    on=["Year", "Reef"],
    how="left"
)
pd.set_option("display.max_columns", None)  # Show all columns
benthos_fixed_locs_obs
         Reef  Longitude   Latitude Site Transect  Year        HCC        SC  \
0     Reef103   3.658625 -19.531244   S1       T1     1  18.625436  3.985485   
1     Reef103   3.658625 -19.531244   S1       T1     2  18.974974  4.319049   
2     Reef103   3.658625 -19.531244   S1       T1     3  19.045657  4.176435   
3     Reef103   3.658625 -19.531244   S1       T1     4  21.643148  4.812177   
4     Reef103   3.658625 -19.531244   S1       T1     5  25.442402  5.198853   
...       ...        ...        ...  ...      ...   ...        ...       ...   
2995   Reef99   3.758625 -13.031244   S2       T5     8  22.853000  2.448697   
2996   Reef99   3.758625 -13.031244   S2       T5     9  21.445286  2.063128   
2997   Reef99   3.758625 -13.031244   S2       T5    10  21.156792  1.837525   
2998   Reef99   3.758625 -13.031244   S2       T5    11  18.795713  1.968396   
2999   Reef99   3.758625 -13.031244   S2       T5    12  21.955909  2.082333   

             MA                Date  CYC       DHW     OTHER  
0     54.913906   1-01-01T14:00:00Z  0.0  0.411396  0.345324  
1     50.489073   2-01-01T14:00:00Z  0.0  0.383427  0.500119  
2     48.694751   3-01-01T14:00:00Z  0.0  0.527733  0.561641  
3     50.156755   4-01-01T14:00:00Z  0.0  0.192189  0.416467  
4     48.970873   5-01-01T14:00:00Z  0.0  0.349549  0.587900  
...         ...                 ...  ...       ...       ...  
2995  55.811516   8-01-01T14:00:00Z  0.0  0.482534  0.349079  
2996  56.844522   9-01-01T14:00:00Z  0.0  0.864747  0.393180  
2997  59.206188  10-01-01T14:00:00Z  0.0  0.773226  0.368701  
2998  58.821929  11-01-01T14:00:00Z  0.0  0.371080  0.558885  
2999  57.269170  12-01-01T14:00:00Z  0.0  0.385996  0.491746  

[3000 rows x 13 columns]
pd.reset_option("display.max_columns")
Show the code
benthos_fixed_locs_obs = benthos_fixed_locs_obs.assign(
    fYear = benthos_fixed_locs_obs['Year'].astype('category'),
    Reef = benthos_fixed_locs_obs['Reef'].astype('category'),
    Site = benthos_fixed_locs_obs['Reef'].astype(str) + "_" +
    benthos_fixed_locs_obs['Site'].astype(str),
    Transect = benthos_fixed_locs_obs['Site'] + "_" +
    benthos_fixed_locs_obs['Transect'].astype(str),
    cover = benthos_fixed_locs_obs['HCC'] / 100
)
all_sampled_sum = (
    benthos_fixed_locs_obs.groupby("fYear").agg(
        mean_mean_response = ("cover", "mean"),
        median_median_response = ("cover", "median"),
        mean_sd = ("cover", "std"),
        median_lower_lower = ("cover", lambda x: x.quantile(0.025)),
        median_upper_upper = ("cover", lambda x: x.quantile(0.975))
    ).reset_index()
)
# Calculate additional columns
all_sampled_sum['mean_lower_lower'] = all_sampled_sum['mean_mean_response'] - 1.96 * all_sampled_sum['mean_sd']
all_sampled_sum['mean_upper_upper'] = all_sampled_sum['mean_mean_response'] + 1.96 * all_sampled_sum['mean_sd']
# Drop the 'mean_sd' column
all_sampled_sum = all_sampled_sum.drop(columns=['mean_sd'])
# Pivot longer
all_sampled_sum = all_sampled_sum.melt(
    id_vars=['fYear'],
    value_vars=[
        'mean_mean_response', 'median_median_response', 'mean_lower_lower',
        'mean_upper_upper', 'median_lower_lower', 'median_upper_upper'
    ],
    var_name='names',
    value_name='values'
)
# Split the 'names' column into 'type', 'variable', and 'stat'
all_sampled_sum[['type', 'variable', 'stat']] = all_sampled_sum['names'].str.split('_', expand=True)
# Modify the 'type' column
all_sampled_sum['type'] = 'all_sampled_' + all_sampled_sum['type']
# Drop the 'variable' column
all_sampled_sum = all_sampled_sum.drop(columns=['variable', 'names'])
# Pivot wider
all_sampled_sum = all_sampled_sum.pivot(
    index=['fYear', 'type'],
    columns='stat',
    values='values'
).reset_index()
# Convert 'fYear' to numeric
all_sampled_sum['Year'] = pd.to_numeric(all_sampled_sum['fYear'])
all_sampled_sum
Show the code
# Create the plot
plt.figure(figsize=(10, 6))
# Add ribbon (shaded area)
for _, group in all_sampled_sum.groupby('type'):
    plt.fill_between(
        group['Year'],
        group['lower'],
        group['upper'],
        alpha=0.2,
        label=_
    )
# Add mean line
for _, group in all_sampled_sum.groupby('type'):
    plt.plot(
        group["Year"],
        group["response"],
        label=_
    )
# Customize the y-axis
plt.ylim(0, 1)
plt.ylabel("Hard Coral Cover (%)")
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"{x*100:.0f}%"))
# Add theme and legend
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.legend(title='Type')
plt.title("Hard Coral Cover Over Time")
plt.tight_layout()
# Show the plot
#plt.show()
# Save the plot as a PNG file
plt.savefig(f"{paths['fig_path']}Python_full_simple_raw_means_plot.png",
            dpi=72, bbox_inches="tight")

These data represent the fixed 25 sites, yet the site with the lowest coral cover is discontinued after 9 years and the highest performing reef is only introduced after 9 years. In this way, these data are intended to mimic the scenario in which a poor reef is replaced by a relatively good reef.

benthos_fixed_locs_obs_1 = pd.read_csv(f"{paths['data_path']}synthetic/benthos_fixed_locs_obs_1.csv")
benthos_reefs_sf = pd.read_csv(f"{paths['data_path']}synthetic/benthos_reefs_sf.csv")
# Group and summarize the benthos_reefs_sf DataFrame
benthos_reefs_summary = (
    benthos_reefs_sf
    .drop(columns=["geometry"])  # Equivalent to st_drop_geometry()
    .groupby(["Year", "Reef"], as_index=False)
    .agg({
        "CYC": "mean",
        "DHW": "mean",
        "OTHER": "mean"
    })
)
# Perform the left join
benthos_fixed_locs_obs_1 = benthos_fixed_locs_obs_1.merge(
    benthos_reefs_summary,
    on=["Year", "Reef"],
    how="left"
)
pd.set_option("display.max_columns", None)  # Show all columns
benthos_fixed_locs_obs_1
         Reef  Longitude   Latitude Site Transect  Year        HCC        SC  \
0     Reef103   3.658625 -19.531244   S1       T1     1  18.625436  3.985485   
1     Reef103   3.658625 -19.531244   S1       T1     2  18.974974  4.319049   
2     Reef103   3.658625 -19.531244   S1       T1     3  19.045657  4.176435   
3     Reef103   3.658625 -19.531244   S1       T1     4  21.643148  4.812177   
4     Reef103   3.658625 -19.531244   S1       T1     5  25.442402  5.198853   
...       ...        ...        ...  ...      ...   ...        ...       ...   
2875   Reef99   3.758625 -13.031244   S2       T5     8  22.853000  2.448697   
2876   Reef99   3.758625 -13.031244   S2       T5     9  21.445286  2.063128   
2877   Reef99   3.758625 -13.031244   S2       T5    10  21.156792  1.837525   
2878   Reef99   3.758625 -13.031244   S2       T5    11  18.795713  1.968396   
2879   Reef99   3.758625 -13.031244   S2       T5    12  21.955909  2.082333   

             MA                Date  CYC       DHW     OTHER  
0     54.913906   1-01-01T14:00:00Z  0.0  0.411396  0.345324  
1     50.489073   2-01-01T14:00:00Z  0.0  0.383427  0.500119  
2     48.694751   3-01-01T14:00:00Z  0.0  0.527733  0.561641  
3     50.156755   4-01-01T14:00:00Z  0.0  0.192189  0.416467  
4     48.970873   5-01-01T14:00:00Z  0.0  0.349549  0.587900  
...         ...                 ...  ...       ...       ...  
2875  55.811516   8-01-01T14:00:00Z  0.0  0.482534  0.349079  
2876  56.844522   9-01-01T14:00:00Z  0.0  0.864747  0.393180  
2877  59.206188  10-01-01T14:00:00Z  0.0  0.773226  0.368701  
2878  58.821929  11-01-01T14:00:00Z  0.0  0.371080  0.558885  
2879  57.269170  12-01-01T14:00:00Z  0.0  0.385996  0.491746  

[2880 rows x 13 columns]
pd.reset_option("display.max_columns")
Show the code
# Create the plot
g = sns.FacetGrid(
    data=benthos_fixed_locs_obs_1,
    col="Reef",
    col_wrap=5,  # Adjust the number of columns in the facet grid
    height=2,
    aspect=1.6,
    sharey=True,
    sharex=True
)

# Add line plots to each facet
g.map_dataframe(
    sns.lineplot,
    x="Year",
    y="HCC",
    hue="Site",
    style="Transect",
    estimator=None,
    units="Transect",
    legend=False
)

# Customize the plot
g.set_axis_labels("Year", "Hard Coral Cover (HCC)")
g.set_titles(col_template="{col_name}")
g.add_legend(title="Site")
g.fig.tight_layout()

# Show the plot
#plt.show()

# Save the plot as a PNG file
plt.savefig(f"{paths['fig_path']}Python_sampled_reefs_1_plot.png", dpi=72, bbox_inches="tight")

These data represent a more extreme version of the above reef replacement situation in that there are only a total of five reefs and thus the replacement represents a higher fraction of reefs.

benthos_fixed_locs_obs_2 = pd.read_csv(f"{paths['data_path']}synthetic/benthos_fixed_locs_obs_2.csv")
benthos_reefs_sf = pd.read_csv(f"{paths['data_path']}synthetic/benthos_reefs_sf.csv")
# Group and summarize the benthos_reefs_sf DataFrame
benthos_reefs_summary = (
    benthos_reefs_sf
    .drop(columns=["geometry"])  # Equivalent to st_drop_geometry()
    .groupby(["Year", "Reef"], as_index=False)
    .agg({
        "CYC": "mean",
        "DHW": "mean",
        "OTHER": "mean"
    })
)
# Perform the left join
benthos_fixed_locs_obs_2 = benthos_fixed_locs_obs_2.merge(
    benthos_reefs_summary,
    on=["Year", "Reef"],
    how="left"
)
pd.set_option("display.max_columns", None)  # Show all columns
benthos_fixed_locs_obs_2
        Reef  Longitude   Latitude Site Transect  Year        HCC        SC  \
0    Reef170   1.338625 -20.901244   S1       T1     1  13.027202  3.413521   
1    Reef170   1.338625 -20.901244   S1       T1     2  13.841143  3.526345   
2    Reef170   1.338625 -20.901244   S1       T1     3  15.559364  3.658558   
3    Reef170   1.338625 -20.901244   S1       T1     4  16.838991  4.017966   
4    Reef170   1.338625 -20.901244   S1       T1     5  16.595779  4.244544   
..       ...        ...        ...  ...      ...   ...        ...       ...   
475   Reef90   4.038625 -14.171244   S2       T5     8  24.022636  3.109348   
476   Reef90   4.038625 -14.171244   S2       T5     9  21.299862  2.709066   
477   Reef90   4.038625 -14.171244   S2       T5    10  22.066287  2.528670   
478   Reef90   4.038625 -14.171244   S2       T5    11  21.524609  2.581125   
479   Reef90   4.038625 -14.171244   S2       T5    12  21.649099  2.688958   

            MA                Date  CYC       DHW     OTHER  
0    55.028757   1-01-01T14:00:00Z  0.0  0.474864  0.349540  
1    55.015181   2-01-01T14:00:00Z  0.0  0.452288  0.386229  
2    55.572046   3-01-01T14:00:00Z  0.0  0.444938  0.490269  
3    54.276855   4-01-01T14:00:00Z  0.0  0.304818  0.595595  
4    52.587258   5-01-01T14:00:00Z  0.0  0.409084  0.492792  
..         ...                 ...  ...       ...       ...  
475  54.103665   8-01-01T14:00:00Z  0.0  0.525576  0.329536  
476  56.139564   9-01-01T14:00:00Z  0.0  0.785465  0.509434  
477  59.131197  10-01-01T14:00:00Z  0.0  0.650841  0.488150  
478  56.729758  11-01-01T14:00:00Z  0.0  0.400474  0.677549  
479  58.822425  12-01-01T14:00:00Z  0.0  0.407944  0.519343  

[480 rows x 13 columns]
pd.reset_option("display.max_columns")
Show the code
# Create the plot
g = sns.FacetGrid(
    data=benthos_fixed_locs_obs_2,
    col="Reef",
    col_wrap=3,  # Adjust the number of columns in the facet grid
    height=3,
    aspect=1.6,
    sharey=True,
    sharex=True
)

# Add line plots to each facet
g.map_dataframe(
    sns.lineplot,
    x="Year",
    y="HCC",
    hue="Site",
    style="Transect",
    estimator=None,
    units="Transect",
    legend=False
)

# Customize the plot
g.set_axis_labels("Year", "Hard Coral Cover (HCC)")
g.set_titles(col_template="{col_name}")
g.add_legend(title="Site")
g.fig.tight_layout()

# Show the plot
#plt.show()

# Save the plot as a PNG file
plt.savefig(f"{paths['fig_path']}Python_sampled_reefs_2_plot.png", dpi=72, bbox_inches="tight")

4 Data preparations

Perform the following data preparation steps:

  • specifically declare any categorical variables (as factors in R, categories in python)
  • ensure that all levels of hierarchical (varying) effects have unique IDs
  • express cover as a proportion out of 1 as is necessary for modelling against a beta distribution
  • create a prediction grid containing all years in the monitoring range
      benthos_fixed_locs_obs_0 <- benthos_fixed_locs_obs |>
        mutate(
          fYear = as.factor(Year),
          Reef = as.factor(Reef),
          Site = interaction(Reef, Site),
          Transect = interaction(Site, Transect),
          cover = HCC/100
        )
      newdata_0 <-
        benthos_fixed_locs_obs_0 |>
        tidyr::expand(fYear)
      benthos_fixed_locs_obs_1 <- benthos_fixed_locs_obs_1 |>
        mutate(
          fYear = as.factor(Year),
          Reef = as.factor(Reef),
          Site = interaction(Reef, Site),
          Transect = interaction(Site, Transect),
          cover = HCC/100
        )
      newdata_1 <-
        benthos_fixed_locs_obs_1 |>
        tidyr::expand(fYear)
      benthos_fixed_locs_obs_2 <- benthos_fixed_locs_obs_2 |>
        mutate(
          fYear = as.factor(Year),
          Reef = as.factor(Reef),
          Site = interaction(Reef, Site),
          Transect = interaction(Site, Transect),
          cover = HCC/100
        )
      newdata_2 <-
        benthos_fixed_locs_obs_2 |>
        tidyr::expand(fYear)
benthos_fixed_locs_obs_0 = (
    benthos_fixed_locs_obs.assign(
        fYear=lambda df: df["Year"].astype("category"),
        Reef=lambda df: df["Reef"].astype("category"),
        Site=lambda df: (df["Reef"].astype("str") + "_" +
                         df["Site"]).astype("category"),  # Interaction of Reef and Site
        Transect=lambda df: (df["Site"].astype("str") + "_" +
                             df["Transect"]).astype("category"),  # Interaction of Site and Transect
        cover=lambda df: df["HCC"] / 100  # Calculate cover
    )
)
newdata_0 = pd.MultiIndex.from_product(
    [benthos_fixed_locs_obs_0["fYear"].unique()],
    names=["fYear"]
).to_frame(index=False)
benthos_fixed_locs_obs_1 = (
    benthos_fixed_locs_obs_1.assign(
        fYear=lambda df: df["Year"].astype("category"),
        Reef=lambda df: df["Reef"].astype("category"),
        Site=lambda df: (df["Reef"].astype("str") + "_" +
                         df["Site"]).astype("category"),  # Interaction of Reef and Site
        Transect=lambda df: (df["Site"].astype("str") + "_" +
                             df["Transect"]).astype("category"),  # Interaction of Site and Transect
        cover=lambda df: df["HCC"] / 100  # Calculate cover
    )
)
newdata_1 = pd.MultiIndex.from_product(
    [benthos_fixed_locs_obs_1["fYear"].unique()],
    names=["fYear"]
).to_frame(index=False)
benthos_fixed_locs_obs_2 = (
    benthos_fixed_locs_obs_2.assign(
        fYear=lambda df: df["Year"].astype("category"),
        Reef=lambda df: df["Reef"].astype("category"),
        Site=lambda df: (df["Reef"].astype("str") + "_" +
                         df["Site"]).astype("category"),  # Interaction of Reef and Site
        Transect=lambda df: (df["Site"].astype("str") + "_" +
                             df["Transect"]).astype("category"),  # Interaction of Site and Transect
        cover=lambda df: df["HCC"] / 100  # Calculate cover
    )
)
newdata_2 = pd.MultiIndex.from_product(
    [benthos_fixed_locs_obs_2["fYear"].unique()],
    names=["fYear"]
).to_frame(index=False)