Prepare input data and parameters for REMA model
Source:R/prepare_rema_input.R
prepare_rema_input.Rd
After the data is read into R (either manually from a .csv or other data file
or by using read_admb_re
), this function prepares the data and
parameter settings for fit_rema
. The model can be set up to run
in single survey mode with one or more strata, or in multi-survey mode, which
uses an additional relative abundance index (i.e. cpue) to inform predicted
biomass. The optional inputs described below related to the CPUE survey data
or scaling parameter q
, such as cpue_dat
and options_q
are only used when multi_survey = 1
. The function structure and
documentation is modeled after
wham::prepare_wham_input.
Usage
prepare_rema_input(
model_name = "REMA for unnamed stock",
multi_survey = 0,
admb_re = NULL,
biomass_dat = NULL,
cpue_dat = NULL,
sum_cpue_index = FALSE,
start_year = NULL,
end_year = NULL,
wt_biomass = NULL,
wt_cpue = NULL,
PE_options = NULL,
q_options = NULL,
zeros = NULL,
extra_biomass_cv = NULL,
extra_cpue_cv = NULL
)
Arguments
- model_name
name of stock or other identifier for REMA model
- multi_survey
switch to run model in single or multi-survey mode. 0 (default) = single survey, 1 = multi-survey.
- admb_re
list object returned from
read_admb_re.R
, which includes biomass survey data (admb_re$biomass_dat
), optional cpue survey data (admb_re$cpue_dat
), years for model predictions (admb_re$model_yrs
), and model predictions of log biomass by strata in the correct format for input into REMA (admb_re$init_log_biomass_pred
). If supplied, the user does not need enter biomass_dat or cpue_dat.- biomass_dat
data.frame of biomass survey data in long format with the following columns:
strata
character; the survey name, survey region, management unit, or depth strata. Note that the user must include this column even if there is only one survey strata
year
integer; survey year. Note that the user only needs to include years for which there are observations (i.e. there is no need to supply
NULL
orNA
values for missing survey years)biomass
numeric; the biomass estimate/observation (e.g. bottom trawl survey biomass in mt). By default, if
biomass == 0
, this value will be treated as an NA (i.e., a failed survey). If the user wants to make other assumptions about zeros (e.g. adding a small constant), they must define it in the data manually.cv
numeric; the coefficient of variation (CV) of the biomass estimate (i.e. sd(biomass)/biomass)
- cpue_dat
(optional) data.frame of relative abundance index (i.e. cpue) data in long format with the following columns:
strata
character; the survey name, survey region, management unit, or depth strata (note that the user must include this column even if there is only one survey strata)
year
integer; survey year. Note that the user only needs to include years for which there are observations (i.e. there is no need to supply
NULL
orNA
values for missing survey years)cpue
numeric; the cpue estimate/observation (e.g. longline survey cpue or relative population number); By default, if
cpue == 0
, this value will be treated as an NA (i.e., a failed survey). If the user wants to make other assumptions about zeros (e.g. adding a small constant), they must define it in the data manually.cv
numeric; the coefficient of variation (CV) of the cpue estimate (i.e. sd(cpue)/cpue)
- sum_cpue_index
T/F or 1/0, is the CPUE survey index able to be summed across strata to get a total CPUE survey index? For example, Longline survey relative population numbers (RPNs) are summable but longline survey numbers per hachi (CPUE) are not. Default =
FALSE
.- start_year
(optional) integer value specifying the start year for estimation in the model; if
admb_re
is supplied, this value defaults tostart_year = min(admb_re$model_yrs)
; ifadmb_re
is not supplied, this value defaults to the first year in eitherbiomass_dat
orcpue_dat
- end_year
(optional) integer value specifying the last year for estimation in the model; if
admb_re
is supplied, this value defaults toend_year = max(admb_re$model_yrs)
; ifadmb_re
is not supplied, this value defaults to the last year in eitherbiomass_dat
orcpue_dat
- wt_biomass
(optional) a multiplier on the biomass survey data component of the negative log likelihood. For example,
nll = wt_biomass * nll
. Defaults towt_biomass = 1
- wt_cpue
(optional) a multiplier on the CPUE survey data component of the negative log likelihood. For example,
nll = wt_cpue * nll
. Defaults towt_cpue = 1
- PE_options
(optional) customize implementation of process error (PE) parameters, including options to share PE across biomass survey strata, change starting values, fix parameters, and add penalties or priors (see details)
- q_options
(optional) customize implementation of scaling parameters (q), including options to define q by biomass or cpue survey cpue strata, change starting values, fix parameters, and add penalties or priors (see details). only used when
multi_survey = 1
- zeros
(optional) define assumptions about how to treat zero biomass or CPUE observations, including treating zeros as NAs, changing the zeros to small constants with fixed CVs, or modeling the zeros using a Tweedie distribution (see details).
- extra_biomass_cv
(optional) estimate additional observation error for the biomass survey data (see details). By default,
assumption = "extra_cv"
will estimate one extra CV parameter, regardless of the number of biomass survey strata.- extra_cpue_cv
(optional) estimate additional observation error for the CPUE survey data (see details). By default,
assumption = "extra_cv"
will estimate one extra CV parameter, regardless of the number of CPUE survey strata.
Value
This function returns a named list with the following components:
data
Named list of data, passed to
TMB::MakeADFun
par
Named list of parameters, passed to
TMB::MakeADFun
map
Named list defining how to optionally collect and fix parameters, passed to
TMB::MakeADFun
random
Character vector of parameters to treat as random effects, passed to
TMB::MakeADFun
model_name
Name of stock or other identifier for REMA model
biomass_dat
A tidied long format data.frame of the biomass survey observations and associated CVs by strata. This data.frame will be 'complete' in that it will include all modeled years, with missing values treated as NAs. Note that this data.frame could differ from the
admb_re$biomass_dat
or inputbiomass
if assumptions about zero biomass observations are different between the ADMB model and what the user specifies for REMA. The user can change their assumptions about zeros using thezeros
argument.cpue_dat
If optional CPUE survey data are provided and
multi_survey = 1
, this will be a tidied long-format data.frame of the CPUE survey observations and associated CVs by strata. This data.frame will be 'complete' in that it will include all modeled years, with missing values treated as NAs. Note that this data.frame could differ from theadmb_re$biomass_dat
or inputbiomass
if assumptions about zero CPUE observations are different between the ADMB model and what the user specifies for REMA. The user can change their assumptions about zeros using thezeros
argument. If optional CPUE survey data are not provided ormulti_survey = 0
, this object will beNULL
.
Details
PE_options
allows the user to specify options for process error (PE) parameters. If
NULL
, default PE specifications are used: one PE parameter is
estimated for each biomass survey strata, initial values for log_PE
are set to 1, and no penalties or priors are added. The user can modify the
default PE_options
using the following list of entries:
- $pointer_PE_biomass
An index to customize the assignment of PE parameters to individual biomass strata. Vector with length = number of biomass strata, starting with an index of 1 and ending with the number of unique PE estimated. For example, if there are three biomass survey strata and the user wants to estimate only one PE, they would specify
pointer_PE_biomass = c(1, 1, 1)
. By default there is one unique log_PE estimated for each unique biomass survey stratum- $initial_pars
A vector of initial values for log_PE. The default initial value for each log_PE is 1.
- $fix_pars
Option to fix PE parameters, where the user specifies the index value of the PE parameter they would like to fix at the initial value. For example, if there are three biomass survey strata, and the user wants to fix the
log_PE
for the second stratum but estimate thelog_PE
for the first and third strata they would specifyfix_pars = c(2)
Note that this option is not recommended.- $penalty_options
Warning: the following options are experimental and not well-tested. Options for penalizing the PE likelihood or adding a prior on
log_PE
include the following:- "none"
(default) no penalty or prior used
- "wt"
a multiplier on the PE and random effects component of the negative log likelihood. For example, nll = wt * nll, where wt = 1.5 is specified as a single value in the penalty_values argument
- "squared_penalty"
As implemented in an earlier version of the RE.tpl, this penalty prevents the PE from shrinking to zero. For example,
nll = nll + (log_PE + squared_penalty)^2
, wheresquared_penalty = 1.5
. A vector ofsquared_penalty
values is specified for each PE in thepenalty_values
argument- "normal_prior"
Normal prior in log space, where
nll = nll - dnorm(log_PE, pmu_log_PE, psig_log_PE, 1)
andpmu_log_PE
andpsig_log_PE
are specified for each PE parameter in thepenalty_values
argument
- penalty_values
user-defined values for the
penalty_options
. Each penalty type will is entered as follows:- "none"
(default) NULL For example,
penalty_values = NULL
- "wt"
a single numeric value. For example,
penalty_values = 1.5
- "squared_penalty"
a vector of numeric values with length = number of PE parameters. For example, if three PE parameters are being estimated and the user wants them to have the same penalty for each one, they would use
penalty_values = c(1.5, 1.5, 1.5)
- "normal_prior"
a vector of paired values for each PE parameter, where each vector pair is the prior mean of log_PE
pmu_log_PE
and the associated standard deviationpsig_log_PE
. For example, if three PE parameters are being estimated and the user wants them to have the same normal prior of log_PE ~ N(1.0, 0.08),penalty_values = c(c(1.0, 0.08), c(1.0, 0.08), c(1.0, 0.08))
q_options
allows the user to specify options for the CPUE survey
scaling parameters (q). If multi_survey = 0
(default), no q parameters
are estimated regardless of what the user defines in q_options
.
multi_survey = 0
and q_options = NULL
, default q specifications
are used: one q parameter is estimated for each CPUE survey strata, biomass
and CPUE surveys are assumed to share strata definitions (i.e.,
biomass_dat
and cpue_dat
have the same number of columns and
the columns represent the same strata), initial values for log_q
are
set to 1, and no penalties or priors are added. The user can modify the
default q_options
using the following list of entries:
- $pointer_q_cpue
An index to customize the assignment of q parameters to individual CPUE survey strata. Vector with length = number of CPUE strata, starting with an index of 1 and ending with the number of unique q parameters estimated. For example, if there are three CPUE survey strata and the user wanted to estimate only one q, they would specify
pointer_q_cpue = c(1, 1, 1)
. The recommended model configuration is to estimate one log_q for each CPUE survey stratum.- $pointer_biomass_cpue_strata
An index to customize the assignment of biomass predictions to individual CPUE survey strata. Vector with length = the number of biomass survey strata, starting with an index of 1 and ending with the number of unique CPUE survey strata. This pointer only needs to be defined if the number of biomass and CPUE strata are not equal. The
pointer_biomass_cpue_strata
option allows the user to calculate predicted biomass at the CPUE survey strata level under the scenario where the biomass survey strata is at a higher resolution than the CPUE survey strata. For example, if there are 3 biomass survey strata that are represented by only 2 CPUE survey strata, the user may specifypointer_biomass_cpue_strata = c(1, 1, 2)
. This specification would assign the first 2 biomass strata to the first CPUE strata, and the third biomass stratum to the second CPUE stratum. If there is no CPUE data to compliment a specific biomass stratum, the user can populate these with NAs. For example if pointer_biomass_cpue_strata = c(1, NA, 3), it means there is CPUE data for biomass strata 1 and 3 but not 2. NOTE: there cannot be a scenario where there are more CPUE survey strata than biomass survey strata because the CPUE survey is used to inform the biomass survey trend. An error will be thrown ifq_options$pointer_biomass_cpue_strata
is not defined and the biomass and CPUE survey strata definitions are not the same.- $initial_pars
A vector of initial values for
log_q
. The default initial value for each log_q is 1.- $fix_pars
Option to fix q parameters, where the user specifies the index value of the q parameter they would like to fix at the initial value. For example, if there are three CPUE survey strata, and the user wants to fix the
log_q
for the second stratum but estimate thelog_q
for the first and third strata they would specifyfix_pars = c(2)
- $penalty_options
Options for penalizing the q likelihood or adding a prior on
log_q
include the following:- "none"
(default) no penalty or prior used
- "normal_prior"
Warning, experimental and not well-tested. Normal prior in log space, where
nll = nll - dnorm(log_q, pmu_log_q, psig_log_q, 1)
andpmu_log_q
andpsig_log_q
are specified for each q parameter in thepenalty_values
argument
- penalty_values
user-defined values for the
penalty_options
. Each penalty type will is entered as follows:- "none"
(default) NULL For example,
penalty_values = NULL
- "normal_prior"
a vector of paired values for each q parameter, where each vector pair is the prior mean of log_q
pmu_log_q
and the associated standard deviationpsig_log_q
. For example, if 2 q parameters are being estimated and the user wants them to have the same normal prior of log_q ~ N(1.0, 0.05),penalty_values = c(c(1.0, 0.05), c(1.0, 0.05))
zeros
allows the user to specify options for how to treat zero biomass
or CPUE survey observations. By default zero observations are treated as NAs
and a warning msg to that effect is returned to the console. zeros
allows the user to specify non-default zero assumptions using the following
list of entries:
- $assumption
character, name of assumption using. Only three alternatives are currently implemented,
zeros = list(assumption = c("NA", "small_constant", "tweedie")
."NA"
is the default; this option assumes the zero estimates are failed surveys and removes them."small_constant"
is an ad hoc method where a fixed value is added to the zero with an assumed CV. By default, the small constant = 0.0001 and the CV is the value entered by the user in the data. The user can change the assumed value and CV usingoptions_small_constant
."tweedie"
uses the Tweedie as the assumed error distribution of the survey data, which allows zeros. This alternative estimates one additional power parameter. The assumed CV for zero biomass or zero cpue survey observations defaults to 1.5. The user can change this assumed CV, change initial values for the inverse logit transformed power parameter, or fix it at initial values usingoptions_tweedie
.- $options_small_constant
a vector length of two numeric values. The first value is the small constant to add to the zero observation, the second is the user-defined coefficient for this value. The user can specify the small value but use the input CV by specifying an NA for the second value. E.g., 'options_small_constant = c(0.0001, NA)'.
- $options_tweedie
a list of entries to control initial or fixed values for Tweedie parameters. Currently, this argument accepts the following entries:
extra_biomass_cv
allows the user to specify options for estimating an
additional CV parameter (log_tau_biomass
in the source code, estimated in
log-space) for the biomass survey observations. If extra_biomass_cv =
NULL
(default), no extra CV is estimated. The user can modify the default
extra_biomass_cv
options using the following list of entries:
- $assumption
A string identifying what assumption is used for the biomass survey observations. Options include "none" (default in which no extra CV is estimated) or "extra_cv". If assumption = "extra_cv", by default only one extra CV will be estimated, regardless of how many biomass strata are defined. If
extra_biomass_cv
is not NULL, user must define appropriate assumption.- $pointer_extra_biomass_cv
An index to customize the assignment of extra CV parameters to individual biomass survey strata. Vector with length = number of biomass strata, starting with an index of 1 and ending with the number of unique extra CV parameters estimated. If there are three biomass survey strata and user wanted to estimate an extra CV per stratum, they would specify
pointer_extra_biomass_cv = c(1, 2, 3)
. By default, only one additional parameter is estimated, regardless of how many strata are defined (i.e.pointer_extra_biomass_cv = c(1, 1, 1)
).- $initial_pars
A vector of initial values for the extra biomass
log_tau_biomass
. The default initial value for each log_tau_biomass is log(1e-7) (approximately 0 on the arithmetic scale).- $fix_pars
Option to fix extra biomass CV parameters, where the user specifies the index value of the parameter they would like to fix at the initial value. For example, if there are three biomass survey strata defined in
pointer_extra_biomass_cv
, and the user wants to fix thelog_tau_biomass
for the second stratum but estimate thelog_tau_biomass
for the first and third strata they would specifyfix_pars = c(2)
.
extra_cpue_cv
allows the user to specify options for estimating an
additional CV parameter (log_tau_cpue
in the source code, estimated in
log-space) for the cpue survey observations. If extra_cpue_cv =
NULL
(default), no extra CV is estimated. The user can modify the default
extra_cpue_cv
options using the following list of entries:
- $assumption
A string identifying what assumption is used for the cpue survey observations. Options include "none" (default in which no extra CV is estimated) or "extra_cv". If assumption = "extra_cv", by default only one extra CV will be estimated, regardless of how many cpue strata are defined. If
extra_cpue_cv
is not NULL, user must define appropriate assumption.- $pointer_extra_cpue_cv
An index to customize the assignment of extra CV parameters to individual cpue survey strata. Vector with length = number of cpue strata, starting with an index of 1 and ending with the number of unique extra CV parameters estimated. If there are three cpue survey strata and user wanted to estimate an extra CV per stratum, they would specify
pointer_extra_cpue_cv = c(1, 2, 3)
. By default, only one additional parameter is estimated, regardless of how many strata are defined (i.e.pointer_extra_cpue_cv = c(1, 1, 1)
).- $initial_pars
A vector of initial values for the extra cpue
log_tau_cpue
. The default initial value for each log_tau_cpue is log(1e-7) (approximately 0 on the arithmetic scale).- $fix_pars
Option to fix extra cpue CV parameters, where the user specifies the index value of the parameter they would like to fix at the initial value. For example, if there are three cpue survey strata defined in
pointer_extra_cpue_cv
, and the user wants to fix thelog_tau_cpue
for the second stratum but estimate thelog_tau_cpue
for the first and third strata they would specifyfix_pars = c(2)
.