Creates the input list for the Stan model

mixmustr_wrangle_input(
  data_streams_list,
  tracer_list,
  model_path,
  sigma_ln_rho,
  sample_tracer,
  fix_unsampled,
  hierarchical,
  ...
)

Arguments

data_streams_list

A list containing the input data streams for the model. It should include two data frames named df_stream_1 anddf_stream_2. See details for the expected structure of these.

tracer_list

A named list containing 1–3 data frames of same size, one for the mean signatures, one for their standard deviations, and another one for sample size. The second data frame is mandatory if sample_tracer is TRUE, or if fix_unsampled is FALSE and sample_tracer is FALSE. The third data frame is mandatory if sample_tracer is TRUE. See details for exact structure of these data frames.

model_path

A character string specifying the path to the Stan model file.

sigma_ln_rho

A numeric value or matrix specifying the confidence around the log mixing proportions. If a matrix, it must be of dimensions N X J, with N being the number of observations and J being the number of sources.

sample_tracer

A logical vector, defaults to FALSE.

fix_unsampled

A logical vector, defaults to FALSE.

hierarchical

A logical vector, defaults to FALSE. Should all observations be treated as independent or should the model include a hierarchical grouping structure?

...

Additional unused arguments.

Value

A list containing all the data elements necessary to run the Stan code contained within model_path.

Details

The mixmustr_wrangle_input function prepares the input data for the Stan model:

  • data_streams_list: A list containing two data frames:

    • df_stream_1: Contains the observed data. If this is a hierarchical dataset, the first column should be named group, representing the group labels, and the remaining columns should contain numeric tracer values.

    • df_stream_2: Contains the log mixing proportions. It must have the same number of rows (N) as df_stream_1. If the design is hierarchical, the first column should be named group, matching the group column in df_stream_1. The remaining columns (e.g., source1, source2, etc.) must contain proportions (i.e., row sums up to 1, and no negative values).

  • tracer_list: A list containing up to three data frames:

  • mus: A data frame containing the mean tracer signatures for each source. The first column should be named source, with source names matching those from data_streams_list$df_stream_2. The remaining columns should contain numeric tracer values, with names matching those in data_streams_list$df_stream_1. This data frame is required across all models in MixMustR.

  • sigmas: A data frame containing the standard deviations of tracer signatures for each source. The structure must match that of mus. It cannot contain NAs.

  • ns: A data frame containing the sample size used to calculate mus and sigmas The structure must match that of mus. It cannot contain NAs.

  • model_path: A character string specifying the path to the Stan model file.

  • sigma_ln_rho: A numeric value or matrix specifying the confidence around the log mixing proportions. If a matrix, it must have dimensions N x (J + 1), with N being the number of observations and J being the number of sources. The additional J + 1 column represents the unsampled source, making the final row sums across the J + 1 columns total 1.

  • sample_tracer: Logical. If TRUE, the model estimates uncertainty around sampled source signatures.

  • fix_unsampled: Logical. If TRUE, the model fixes unsampled source signatures to the mean across all sampled sources.

  • hierarchical: Logical. If TRUE, the model includes a hierarchical grouping structure.

Examples

library(MixMustR)

# Example input data
synthetic_streams_list <- list(
  df_stream_1 = data.frame(
    group = c("A", "A", "B", "B"),
    tracer1 = c(1.2, 1.3, 2.1, 2.2),
    tracer2 = c(3.4, 3.5, 4.6, 4.7)
  ),
  df_stream_2 = data.frame(
    group = c("A", "A", "B", "B"),
    source1 = c(0.6, 0.4, 0.7, 0.3),
    source2 = c(0.4, 0.6, 0.3, 0.7)
  )
)

synthetic_tracer_list <- list(
  mus = data.frame(
    source = c("source1", "source2"),
    tracer1 = c(1.25, 2.15),
    tracer2 = c(3.45, 4.65)
  ),
  sigmas = data.frame(
    source = c("source1", "source2"),
    tracer1 = c(0.05, 0.1),
    tracer2 = c(0.15, 0.2)
  ),
  ns = data.frame(
    source = c("source1", "source2"),
    tracer1 = c(5L, 10L), # make sure these are integers
    tracer2 = c(7L, 9L)
  )
)

# Example parameters
model_path <- tempfile(fileext = ".stan")
sigma_ln_rho <- 0.01
sample_tracer <- TRUE
fix_unsampled <- FALSE
hierarchical <- TRUE

# Call the function
input_list <- mixmustr_wrangle_input(
  data_streams_list = synthetic_streams_list,
  tracer_list = synthetic_tracer_list,
  model_path = model_path,
  sigma_ln_rho = sigma_ln_rho,
  sample_tracer = sample_tracer,
  fix_unsampled = fix_unsampled,
  hierarchical = hierarchical
)