Base model structure for a single survey and stratum
The base REMA model can be represented as a simple state-space random walk model with added noise. The observation model is comprised of an index of log-transformed annual survey biomass estimates with associated standard deviations , where is approximated using the coefficient of variation of () such that The measurement or observation equation, which describes the relationship between the observed survey biomass and the latent state variable, population biomass , is expressed as The state equation and associated process error variance is defined as
where the initial is constrained by the random walk process. In the base model, the process error variance is the only fixed effect parameter estimated, and the unobserved population biomass is estimated as a series of random effects. The model is fit using marginal maximum likelihood estimation, where the marginal negative log-likelihood is obtained via the Laplace approximation (Skaug and Fournier 2006).
A reproducible example of fitting the univariate (i.e., single stratum, single survey) version of the TMB model and comparing those results to ADMB is available in the REMA basics vignette.
Extending to multiple biomass survey strata
The single survey, single stratum version of the model can be extended to include one or more additional strata from the same survey. The inclusion of multiple strata in the same model is advantageous in scenarios where the apportionment of biomass among areas is a desired result. This extension assumes no correlation in the observation errors among survey strata, though PE variance can be shared or estimated independently among strata.
The ADMB and TMB versions of the REMA model utilize different methods
for estimating variance of the total predicted biomass. Therefore, while
strata-specific estimates of predicted biomass and associated confidence
intervals will be close to identical between the ADMB and TMB versions
of the model, the confidence intervals of the total predicted biomass
will differ slightly. In the original ADMB code, the Marlow (1967)
method is applied, such that the total variance
is approximated as
In the updated rema package,
the total variance is estimated using the standard Delta method and can
be replicated in ADMB using an sdreport_number
and in TMB
using the ADREPORT
macro. As described in Monnahan et
al. (2021), the exploration of methods for summing log-normal variables
is a research topic that has potential impacts beyond the scope of this
package.
A reproducible example of fitting the multivariate version of the REMA model in TMB and a comparison of results to ADMB is available in the REMA basics vignette. In this example, we also show how Akaike Information Criteria (AIC) can be used to explore the inclusion of strata-specific versus a single, shared process error variance.
Addition of an auxiliary catch per unit effort (CPUE) survey
In situations where an auxiliary biomass or catch per unit effort
(CPUE) survey
and associated variance
are available, an additional scaling parameter
(estimated in log-space) can be estimated to facilitate the inclusion of
the new information into biomass predictions. The predicted annual CPUE
survey index
is calculated as
and the CPUE survey observations has an
additional measurement equation and likelihood component similar to the
biomass survey:
By default, rema estimates a
single
for each stratum; however, users can alternatively specify shared
parameters among strata. If the strata definitions are not the same for
t,he biomass and CPUE surveys (e.g. biomass is estimated at a finer
geographic resolution that CPUE), the user can define the relationship
between the two surveys’ strata using the q_options
argument to the prepare_rema_input()
function (see q_options
and
pointer_biomass_cpue_strata)
. However, since the the
auxiliary CPUE survey index is related to unobserved population biomass
at the level of the biomass survey strata, the REMA model can only
accommodate scenarios where the CPUE survey strata have a coarser
resolution or are equivalent to the biomass survey strata.
A reproducible example of fitting to the two-survey version of the REMA model is provided in the Fitting to an additional CPUE survey vignette. This example also describes an error in the previously used ADMB version of the REMA model, and presents several alternative models that estimate additional observation error. These topics are described in more detail in the following two sections.
The ADMB version of REMA
Separability, as implemented in the SEPARABLE_FUNCTION
in ADMB and through automatic detection in TMB, increases the
computational efficiency of the Laplace approximation by breaking
complex, multivariate integrals into a product of simpler, univariate
integrals (Fournier et al. 2012). The use of separability is
particularly relevant for state-space models as it relates to the
efficiency of the Laplace approximation of the marginal negative
log-likelihood and integration of the random effects from the joint
negative log-likelihood. In the random effects implementation of ADMB,
parameters defined inside the PARAMETER_SECTION
of the
template file cannot be used within the SEPARABLE_FUNCTION
unless they are passed as arguments to the function (Skaug and Fournier
2013). In the ADMB version of the REMA model, the calculation of
predicted CPUE
[i.e. ]
occurred outside rather than inside the SEPARABLE_FUNCTION
,
and as a result, violated this rule and affecting the accuracy of the
Laplace approximation.
We explored this error in a simplified
example of the REMA model in both ADMB and TMB. We were able to
reproduce results from the TMB version of REMA in ADMB by passing
as an argument to the SEPARABLE_FUNCTION
and performing the
calculation inside the SEPARABLE_FUNCTION
. When we used
Markov Chain Monte Carlo (MCMC) methods for statistical inference
instead of MLE, the results from both ADMB versions closely matched the
correct version of REMA. Finally, a comparison of the individual
negative log-likelihood (NLL) components, along with the joint and
marginal NLLs, revealed that all were the same between the TMB model,
correct “inside” ADMB version, and incorrect “outside” ADMB version,
except the marginal NLL. Taken together, this analysis confirms
there is a bug in the “outside” ADMB version, which affects the accuracy
of the Laplace approximation and model results.
Estimation of additional observation error
Based on experience gained using alternative observed index estimates (e.g., relative CPUE indices), there appears to be cases where the estimates of observation error variances for the biomass and/or CPUE survey are too low (e.g., Echave et al. 2020). That is, there is a mismatch between biologically reasonable inter-annual variability and the precision of index estimates. In these instances, the model estimates of may be lower than what should be expected based on an individual species life history traits. For example, if the ratio of observation to PE variation is low, model predictions of population biomass may exhibit high inter-annual variability. This behavior would be unexpected in low productivity species, which should exhibit low inter-annual variation in biomass (i.e. low PE variance), especially in situations when fishing exploitation is low.
One approach to addressing this issue is to estimate additional observation error. This method is commonly implemented in Stock Synthesis, has been implemented in Alaskan crab stock assessments (Zheng and Siddeek 2020), and has been explored in several groundfish assessments as well. Using the biomass survey variance as an example, the extra estimated error () is specified as:
This approach is a new method for Tier 5 stock assessments and apportionment methods at the AFSC. A reproducible example with GOA shortraker is outlined in the Fitting to an additional CPUE survey. In this example, the estimation of additional observation error for the biomass and CPUE survey resulted in a better fit by AIC than status quo approaches and has a more biologically realistic trend in predicted population biomass.
Exploration of the Tweedie distribution for zero biomass observations
The Tweedie distributions are a family of probability distributions that can be generalized to include the Gaussian, inverse Gaussian, gamma, Poisson, and compound Poisson-gamma distributions. The Tweedie distribution can be defined with three parameters, a mean (), power parameter (), and dispersion (), where the relationship between the variance and these parameters is defined by . When is bounded between 1 and 2, the Tweedie is a positive, continuous distribution that can handle zero values, thus allowing it to more naturally handle survey time series with one or more zero observations. Values of on these bounds are special cases of the Tweedie distribution, where =1 is equivalent to a Poisson distribution and =2 is a gamma distribution. In rema, the Tweedie is constrained between 1 and 2 using a logit-transformation.
Similar to the normal distribution, observation error variances are treated as known for the Tweedie distribution in the REMA model. Using the biomass survey observation as an example, the dispersion of the biomass observation in strata in year is derived as
The result is that only one additional parameter
()
is estimated when applying this alternative distribution. The
measurement equation for the Tweedie becomes
Because
is undefined when
=0,
the zeros are assumed to have a CV=1.5. This assumption can be explored
by the user in rema
, for example, if the user wanted to
define a CV=2.0 for zeros, they would define
zeros = list(assumption = 'tweedie', options_tweedie = list(zeros_cv = 2.0))
within the prepare_rema_input()
function.
A reproducible example using the Tweedie distribution for observation error is outlined in the Strategies for handling zero biomass observations vignette. We use the non-shortspine thornyhead component of the Bering Sea/Aleutian Islands other rockfish stock, which is unique in that 13 of 38 bottom trawl survey biomass estimates are zeros. While the Tweedie performs well for this case, further exploration revealed that the Tweedie models can be very slow to run and often did not converge, especially in instances where the observation error variance estimates were low. It is possible that further development on estimating additional observation error variance (e.g. in the previous section) could alleviate these convergence errors. However, in the interim, the Tweedie distribution in rema should be considered experimental, and we do not recommend it as a viable alternative for tactical assessments at this time.
Experimental One-Step Ahead (OSA) residuals
The use of one-step ahead (OSA) residuals, also referred to as forecast residuals or prediction errors, is crucial for validation of state-space models like REMA (Thygesen et al. 2017). Instead of comparing the observed and expected values at the same time step, OSA residuals use forecasted values based on all previous observations (i.e. they exclude the observed value at the current time step from prediction). Traditional residuals (e.g. Pearson’s residuals) are inappropriate for state-space models, because process error variance may be over-estimated in cases where the model is mispecified, thus leading to artificially small residuals (see Section 3 of Thygesen et al. 2017 for an example).
Methods for calculating OSA residuals have been implemented in TMB
through the TMB::oneStepPredict()
function. While these
methods are straight forward to implement in TMB, they are
computationally demanding, and the validity (and accuracy) of the OSA
residuals may vary by situation and method used. Methods to implement
OSA residuals in rema are under development and should be
considered experimental. Currently, OSA residuals are implemented using
the method = "cdf"
option in
TMB::oneStepPredict()
, which has the benefit of speeding up
residual calculations by saving copies of the one-step cumulative
density function at each data point and thus reducing the number of
calculations at each function evaluation. While OSA residuals appear to
be calculated correctly for some REMA models, occasionally
NaN
values for residuals are returned, especially in cases
with small measurement errors. One potential cause of this error may lie
in the initialization of the state process within the REMA model and
will be explored further in future versions of this package.
References
Echave, K. B., P. J. F. Hulson, and K. A. Siwicke. 2020. Assessment of the Thornyhead stock complex in the Gulf of Alaska. In: Stock assessment and fishery evaluation report for the groundfish resources of the Gulf of Alaska as projected for 2021. North Pacific Fishery Management Council, 605 W. 4th. Avenue, Suite 306, Anchorage, AK 99501.
Fournier D. A., H. J. Skaug , J. Ancheta , J. Ianelli , A. Magnusson, M. N. Maunder , A. Nielsen, and J. Sibert. 2012. AD Model Builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear models, Optimization Methods and Software, 27:2, 33-249, DOI: 10.1080/10556788.2011.597854
Marlow, N. A. 1967. A normal limit theorem for power sums of independent normal random variables. Bell System Technical Journal. 46 (9): 2081–2089. doi:10.1002/j.1538-7305.1967.tb04244.x.
Methot, R. D. and Wetzel, C. R. 2013. Stock Synthesis: A biological and statistical framework for fish stock assessment and fishery management. Fisheries Research, 142: 86-99. https://doi.org/10.1016/j.fishres.2012.10.012
Skaug H. and D. Fournier. 2013. Random effects in AD Model Builder: ADMB-RE User Guide. http://ftp.admb-project.org/admb-11.2pre/admbre-11.2pre.pdf
Skaug, H. J. and D. A. Fournier. 2006. Automatic approximation of the marginal likelihood in non-Gaussian hierarchical models. Computational Statistics & Data Analysis, 51(2): 699–709. https://doi.org/10.1016/j.csda.2006.03.005
Thygesen, U. H., Albertsen, C. M., Berg, C. W., Kristensen, K., Nielsen, A. 2017. Validation of ecological state space models using the Laplace approximation. Environ Ecol Stat 24, 317–339. https://doi.org/10.1007/s10651-017-0372-4
Zheng, J., and M. S. M. Siddeek. 2020. Bristol Bay red king crab stock assessment in fall 2020. In: Stock assessment and fishery evaluation report the king and tanner crab fisheries of the Bering Sea and Aleutian Islands regions. North Pacific Fishery Management Council, 605 W 4th Ave, Suite 306 Anchorage, AK 99501.