Creates the input list for the Stan model
mixmustr_wrangle_input(
data_streams_list,
tracer_list,
model_path,
sigma_ln_rho,
sample_tracer,
fix_unsampled,
hierarchical,
...
)
A list containing the input
data streams for the model. It should include two
data frames named df_stream_1
anddf_stream_2
. See
details for the expected structure of these.
A named list containing 1–3 data frames of same size,
one for the mean signatures, one for their standard deviations, and another
one for sample size. The second data frame is mandatory if sample_tracer
is TRUE
, or if fix_unsampled
is FALSE
and sample_tracer
is FALSE
.
The third data frame is mandatory if sample_tracer
is TRUE
. See details
for exact structure of these data frames.
A character string specifying the path to the Stan model file.
A numeric value or matrix specifying the confidence around the log mixing proportions. If a matrix, it must be of dimensions N X J, with N being the number of observations and J being the number of sources.
A logical vector, defaults to FALSE.
A logical vector, defaults to FALSE.
A logical vector, defaults to FALSE. Should all observations be treated as independent or should the model include a hierarchical grouping structure?
Additional unused arguments.
A list containing all the data elements necessary to run the Stan
code contained within model_path
.
The mixmustr_wrangle_input
function prepares the input data for the Stan
model:
data_streams_list
: A list containing two data frames:
df_stream_1
: Contains the observed data. If this is a hierarchical
dataset, the first column should be named group
, representing the group
labels, and the remaining columns should contain numeric tracer values.
df_stream_2
: Contains the log mixing proportions. It must have the same
number of rows (N
) as df_stream_1
. If the design is hierarchical, the
first column should be named group
, matching the group
column in
df_stream_1
. The remaining columns (e.g., source1
, source2
, etc.) must
contain proportions (i.e., row sums up to 1, and no negative values).
tracer_list
: A list containing up to three data frames:
mus
: A data frame containing the mean tracer signatures for each
source. The first column should be named source
, with source names
matching those from data_streams_list$df_stream_2. The remaining columns
should contain numeric tracer values, with names matching those in
data_streams_list$df_stream_1. This data frame is required across all models
in MixMustR
.
sigmas
: A data frame containing the standard deviations of tracer
signatures for each source. The structure must match that of mus
. It
cannot contain NAs.
ns
: A data frame containing the sample size used to calculate mus and
sigmas The structure must match that of mus
. It cannot contain NAs.
model_path
: A character string specifying the path to the Stan model
file.
sigma_ln_rho
: A numeric value or matrix specifying the confidence around
the log mixing proportions. If a matrix, it must have dimensions N x (J + 1)
, with N being the number of observations and J being the number of
sources. The additional J + 1 column represents the unsampled source, making
the final row sums across the J + 1 columns total 1.
sample_tracer
: Logical. If TRUE
, the model estimates uncertainty
around sampled source signatures.
fix_unsampled
: Logical. If TRUE
, the model fixes unsampled source signatures to the mean across all sampled sources.
hierarchical
: Logical. If TRUE
, the model includes a hierarchical grouping structure.
library(MixMustR)
# Example input data
synthetic_streams_list <- list(
df_stream_1 = data.frame(
group = c("A", "A", "B", "B"),
tracer1 = c(1.2, 1.3, 2.1, 2.2),
tracer2 = c(3.4, 3.5, 4.6, 4.7)
),
df_stream_2 = data.frame(
group = c("A", "A", "B", "B"),
source1 = c(0.6, 0.4, 0.7, 0.3),
source2 = c(0.4, 0.6, 0.3, 0.7)
)
)
synthetic_tracer_list <- list(
mus = data.frame(
source = c("source1", "source2"),
tracer1 = c(1.25, 2.15),
tracer2 = c(3.45, 4.65)
),
sigmas = data.frame(
source = c("source1", "source2"),
tracer1 = c(0.05, 0.1),
tracer2 = c(0.15, 0.2)
),
ns = data.frame(
source = c("source1", "source2"),
tracer1 = c(5L, 10L), # make sure these are integers
tracer2 = c(7L, 9L)
)
)
# Example parameters
model_path <- tempfile(fileext = ".stan")
sigma_ln_rho <- 0.01
sample_tracer <- TRUE
fix_unsampled <- FALSE
hierarchical <- TRUE
# Call the function
input_list <- mixmustr_wrangle_input(
data_streams_list = synthetic_streams_list,
tracer_list = synthetic_tracer_list,
model_path = model_path,
sigma_ln_rho = sigma_ln_rho,
sample_tracer = sample_tracer,
fix_unsampled = fix_unsampled,
hierarchical = hierarchical
)