Fitting to an additional CPUE survey

library(rema)
library(ggplot2)
library(dplyr)
library(cowplot) # install.packages('cowplot') # helpful plotting utilities
library(knitr)

ggplot2::theme_set(cowplot::theme_cowplot(font_size = 14) +
                     cowplot::background_grid() +
                     cowplot::panel_border())

This vignette demonstrates how to fit the REMA model to a biomass survey and auxiliary CPUE index. We show the transition from the formerly used ADMB REMA model, which was determined to have invalid statistical inference due to an error in the specification of the SEPARABLE_FUNCTION. We fit the correctly-specified version of the same model using rema and discuss differences in the results and trade-offs between observation and process error (PE) variance. Next, we introduce a model that estimates additional observation error for the biomass and CPUE surveys and compare these models using Akaike Information Criteria (AIC). Finally, we demonstrate how rema can be used to compare apportionment results from alternative models.

This example is based on GOA shortraker rockfish biomass estimates from the GOA bottom trawl survey and Relative Population Weights (RPWs) from the NMFS longline survey. The model is stratified by management area, fitting to data in the eastern GOA (EGOA), central GOA (CGOA), and western GOA (WGOA). The original model estimates a single, shared PE parameter across all strata but estimates unique scaling parameters $q$ for each strata. It applies a likelihood weight of 0.5 on the CPUE survey component of the likelihood. We have replicated this structure in the TMB model fit using rema.

Model 1: Fit to the biomass and CPUE survey through the estimation of a scaling parameter $q$

# read in the data using the report file from the original ADMB model
admb_re <- read_admb_re(filename = 'goasr_rwout.rep',
                        biomass_strata_names = c('CGOA', 'EGOA', 'WGOA'),
                        cpue_strata_names = c('CGOA', 'EGOA', 'WGOA'),
                        model_name = 'M0: ADMB status quo (invalid)')

input <- prepare_rema_input(model_name = 'M1: TMB status quo',
                            # fit to both biomass and CPUE survey data
                            multi_survey = 1, 
                            admb_re = admb_re,
                            # the RPW index is summable across strata
                            sum_cpue_index = TRUE,
                            # likelihood weight 
                            wt_cpue = 0.5,
                            # one process error parameters (log_PE) estimated
                            PE_options = list(pointer_PE_biomass = c(1, 1, 1)),
                            # three scaling parameters (log_q) estimated, indexed as
                            # follows for each biomass survey stratum:
                            q_options = list(pointer_biomass_cpue_strata = c(1, 2, 3)))

m1 <- fit_rema(input)
#> Model runtime: 0.1 seconds 
#> stats::nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 4.13e-04 
#> Max gradient parameter: log_PE 
#> TMB:sdreport() was performed successfully for this model
output <- tidy_rema(m1)
output$parameter_estimates
#>           model_name           parameter  estimate    std_err       lci
#> 1 M1: TMB status quo       process_error 0.1696963 0.03914611 0.1079728
#> 2 M1: TMB status quo scaling_parameter_q 0.4123880 0.04474591 0.3333857
#> 3 M1: TMB status quo scaling_parameter_q 1.2127808 0.10874666 1.0173197
#> 4 M1: TMB status quo scaling_parameter_q 2.1164193 0.30469404 1.5960886
#>         uci
#> 1 0.2667045
#> 2 0.5101115
#> 3 1.4457965
#> 4 2.8063798

plots <- plot_rema(output, biomass_ylab = 'Biomass (t)', cpue_ylab = 'Relative Population Weights')

cowplot::plot_grid(plots$biomass_by_strata,
                   plots$cpue_by_strata,
                   ncol = 1)

A comparison of the ADMB and TMB models

When we compare the fits between the status quo ADMB model and the identically structured but correctly-specified TMB model, we see large differences in model predictions. In particular the PE variance are very different, with the ADMB model producing a PE that is more than seven times the PE in the TMB model: M0: ADMB status quo (invalid) PE = 0.0233 M1: TMB status quo PE = 0.1697

compare <- compare_rema_models(rema_models = list(m1),
                               admb_re = admb_re,
                               biomass_ylab = 'Biomass (t)',
                               cpue_ylab = 'Relative Population Weights')
compare$plots$biomass_by_strata + theme(legend.position = 'top')

Model 2: Estimating additional observation error in the two surveys

The outcome in the TMB model is problematic because we would not expect shortraker rockfish, a notably long-lived species, to exhibit such extreme inter-annual variability. In part, this finding is attributed to relatively low observation error in the survey data. In the following section we explore an alternative model that estimates additional biomass and CPUE survey observation error. We then use AIC to conduct model selection.


input2 <- prepare_rema_input(model_name = 'M2: TMB extra obs error',
                            # fit to both biomass and CPUE survey data
                            multi_survey = 1, 
                            admb_re = admb_re,
                            # the RPW index is summable across strata
                            sum_cpue_index = TRUE,
                            # likelihood weight 
                            wt_cpue = 0.5,
                            # one process error parameters (log_PE) estimated
                            PE_options = list(pointer_PE_biomass = c(1, 1, 1)),
                            # three scaling parameters (log_q) estimated, indexed as
                            # follows for each biomass survey stratum:
                            q_options = list(pointer_biomass_cpue_strata = c(1, 2, 3)),
                            # estimate additional obs error on the biomass
                            # survey (one additional CV shared across all 3
                            # strata)
                            extra_biomass_cv = list(assumption = 'extra_cv'),
                            # estimate additional obs error on the CPUE
                            # survey (one additional CV shared across all 3
                            # strata)
                            extra_cpue_cv = list(assumption = 'extra_cv'))

m2 <- fit_rema(input2)
#> Model runtime: 0.1 seconds 
#> stats::nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 2.77e-04 
#> Max gradient parameter: log_q 
#> TMB:sdreport() was performed successfully for this model
out2 <- tidy_rema(m2)

The following figure shows the fits to the data as before; however, here we used the tidy_extra_cv() and plot_extra_cv() functions to obtain the total 95% confidence intervals for survey biomass or CPUE observations (i.e., fixed + estimated observation error). Note that the 95% confidence intervals of the observations (i.e., obs_lci/obs_uci and tot_obs_lci/tot_obs_uci in output$biomass_by_strata and output$cpue_by_strata are based on the assumption of normality in log-space; therefore, they are asymmetric on the arithmetic scale.

In the figure below, the error bars around the survey observations show the 95% confidence intervals based on the assumed design-based estimates (in bold with whiskers) and the 95% confidence intervals based on the total observation error (design-based estimates of CV + additional estimated CV; no whiskers).

# adds new columns with total observation error and 95% confidence intervals to
# the biomass_by_strata and cpue_by_strata
out2 <- tidy_extra_cv(out2)
cvplots <- plot_extra_cv(out2)
cowplot::plot_grid(cvplots$biomass_by_strata,
                   cvplots$cpue_by_strata,
                   ncol = 1)

Model comparison

A comparison of the the three models shows that estimation of additional observation error substantially reduces PE variance, resulting in a smoother trajectory of biomass predictions.


compare <- compare_rema_models(rema_models = list(m1, m2),
                               admb_re = admb_re,
                               biomass_ylab = 'Biomass (t)',
                               cpue_ylab = 'Relative Population Weights')

compare$plots$biomass_by_strata + theme(legend.position = 'top')

Model selection

Next, we can compare the fits to both biomass and CPUE survey indices from the alternative TMB models if we remove the admb_re object from the comparison function as demonstrated below. An examination of the parameter estimates shows that the M2 (extra observation error) estimates a much lower PE, approximately 50% of the M1 PE. Using AIC, we find that M2 has a superior fit to M1 and that there is statistical support for the additional parameters (note that this example is intended to be illustrative of the model fitting and model selection process and is not exhaustive).


compare <- compare_rema_models(rema_models = list(m1, m2),
                               biomass_ylab = 'Biomass (t)',
                               cpue_ylab = 'Relative Population Weights')

cowplot::plot_grid(compare$plots$biomass_by_strata + theme(legend.position = 'top'),
                   compare$plots$cpue_by_strata + theme(legend.position = 'none'),
                   ncol = 1, rel_widths = c(0.65, 0.35))


kable(compare$output$parameter_estimates)

model_name	parameter	estimate	std_err	lci	uci
M1: TMB status quo	process_error	0.1696963	0.0391461	0.1079728	0.2667045
M1: TMB status quo	scaling_parameter_q	0.4123880	0.0447459	0.3333857	0.5101115
M1: TMB status quo	scaling_parameter_q	1.2127808	0.1087467	1.0173197	1.4457965
M1: TMB status quo	scaling_parameter_q	2.1164193	0.3046940	1.5960886	2.8063798
M2: TMB extra obs error	process_error	0.0808477	0.0400208	0.0306416	0.2133167
M2: TMB extra obs error	scaling_parameter_q	0.4184250	0.0704177	0.3008617	0.5819267
M2: TMB extra obs error	scaling_parameter_q	1.1860894	0.1834338	0.8759422	1.6060513
M2: TMB extra obs error	scaling_parameter_q	2.0383602	0.4008311	1.3864280	2.9968468
M2: TMB extra obs error	extra_biomass_cv	0.2483632	0.0920616	0.1201074	0.5135761
M2: TMB extra obs error	extra_cpue_cv	0.1224046	0.0462574	0.0583611	0.2567270

kable(compare$aic)

model_name	objective_function	n_parameters	aic	delta_aic
M2: TMB extra obs error	95.79626	6	203.6	0.0
M1: TMB status quo	111.37317	4	230.7	27.1

Apportionment results

The proportion of prediction biomass by strata (management area in this example), is often used for apportionment of Acceptable Biological Catch estimates. Results from the compare_rema_models() function include a table of proportioned biomass by management area for this purpose.

compare$output$proportion_biomass_by_strata %>% 
  filter(year == max(year)) %>% 
  kable()

model_name	year	CGOA	EGOA	WGOA
M1: TMB status quo	2022	0.3861332	0.5610169	0.0528499
M2: TMB extra obs error	2022	0.4175339	0.5169581	0.0655080