REMA basics

library(rema)
library(ggplot2)
library(dplyr)
library(knitr)

ggplot2::theme_set(cowplot::theme_cowplot(font_size = 14) +
                     cowplot::background_grid() +
                     cowplot::panel_border())

In this series of vignettes we walk through several examples of how to prepare and fit a suite of random effects (RE) models for biomass estimation and apportionment using the rema package.

The `rema` workflow:

Load rema and data. The user can read biomass or other abundance index data from file (e.g. .csv file), or they can use the rwout.rep report file from the ADMB version of the RE model using read_admb_re().
Specify model structure and assumptions using prepare_rema_input(). This function allows users to quickly transition from a single to two survey model, specify alternative process error structures, add likelihood penalties or priors on parameters, and evaluate alternative assumptions about zero biomass observations.
Fit the specified REMA model using fit_rema() and determine whether the model has met basic convergence criteria (e.g., Hessian is positive definite, a maximum gradient component approximately equal to zero).
Extract rema model output into clean, consistently formatted data frames using tidy_rema(). The user can visualize this model output using plot_rema(), or quickly format it into tables for a report.
Compare alternative REMA models and conduct model selection using compare_rema_models(). Output from this function includes a table of Akaike Information Criteria (AIC) when appropriate, figures, and tidied data frames. This function also accepts model output from the ADMB version of the RE model for easy comparison to past models.

Taken together, these functions allow R users to quickly fit and interrogate a suite of simple statistical models in TMB without needing software-specific training or expertise.

Example 1: Univariate random effects (RE) model with a single survey and stratum

This example uses Aleutian Islands shortraker (aisr_rwout.rep) and fits to NMFS bottom trawl survey estimates.

Read in the rwout.rep, a custom report file generated by the ADMB version of the random effects model.

# ?read_admb_re
admb_re <- read_admb_re(filename = 'aisr_rwout.rep',
                        # optional label for the single biomass survey stratum
                        biomass_strata_names = 'Aleutians Islands',
                        model_name = 'ADMB: AI shortraker')
names(admb_re)
#> [1] "biomass_dat"           "cpue_dat"              "model_yrs"            
#> [4] "init_log_biomass_pred" "admb_re_results"
kable(admb_re$biomass_dat)

strata	year	biomass	cv
Aleutians Islands	1980	6829.3	0.5531111
Aleutians Islands	1983	26275.5	0.2006878
Aleutians Islands	1986	11667.2	0.2505718
Aleutians Islands	1991	21835.4	0.6916255
Aleutians Islands	1994	26284.4	0.2198947
Aleutians Islands	1997	36058.3	0.2694324
Aleutians Islands	2000	37151.8	0.4460828
Aleutians Islands	2002	15342.0	0.2009741
Aleutians Islands	2004	32611.6	0.3743298
Aleutians Islands	2006	11780.6	0.2522081
Aleutians Islands	2010	18223.7	0.2336468
Aleutians Islands	2012	15667.7	0.2636677
Aleutians Islands	2014	16400.5	0.3771796
Aleutians Islands	2016	15024.0	0.3199997

Prepare REMA model inputs using the admb_re data object.

# ?prepare_rema_input 
input <- prepare_rema_input(model_name = 'TMB: AI shortraker', admb_re = admb_re)
names(input) 
#> [1] "data"        "par"         "map"         "random"      "model_name" 
#> [6] "osa"         "biomass_dat"

The input$data, input$par, input$map, and input$random are used by TMB to fit the REMA model. The other objects are used by other functions to process model results or conduct residual analyses.

Data could alternatively be entered into prepare_rema_input() using the biomass_dat argument:

# (not run)
input <- prepare_rema_input(model_name = 'tmb_rema_aisr', biomass_dat = admb_re$biomass_dat)

Fit REMA model

# ?fit_rema
m <- fit_rema(input)
#> Model runtime: 0.1 seconds 
#> stats::nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 1.92e-11 
#> Max gradient parameter: log_PE 
#> TMB:sdreport() was performed successfully for this model

Get tidied model output and plot results

Note that the 95% confidence intervals of the observations (i.e., obs_lci and obs_uci in output$biomass_by_strata are based on the assumption of normality in log-space; therefore, they are asymmetric on the arithmetic scale.

# ?tidy_rema
output <- tidy_rema(rema_model = m)
kable(output$parameter_estimates) # estimated fixed effects parameters

model_name	parameter	estimate	std_err	lci	uci
TMB: AI shortraker	process_error	0.1557774	0.1044475	0.0418582	0.5797331

kable(head(output$biomass_by_strata, 5)) # data.frame of predicted and observed biomass by stratum

model_name	strata	variable	year	log_pred	sd_log_pred	pred	pred_lci	pred_uci	obs	obs_cv	log_obs	sd_log_obs	obs_lci	obs_uci
TMB: AI shortraker	Aleutians Islands	biomass_pred	1980	9.689900	0.3344751	16153.63	8386.187	31115.44	6829.3	0.5531111	8.828978	0.51664	2480.912	18799.27
TMB: AI shortraker	Aleutians Islands	biomass_pred	1981	9.768171	0.2704638	17468.78	10281.190	29681.22	NA	NA	NA	NA	NA	NA
TMB: AI shortraker	Aleutians Islands	biomass_pred	1982	9.846441	0.2155807	18891.00	12380.903	28824.22	NA	NA	NA	NA	NA	NA
TMB: AI shortraker	Aleutians Islands	biomass_pred	1983	9.924711	0.1784564	20429.01	14399.418	28983.42	26275.5	0.2006878	10.176392	0.19871	17799.538	38787.63
TMB: AI shortraker	Aleutians Islands	biomass_pred	1984	9.848307	0.1919366	18926.28	12992.375	27570.33	NA	NA	NA	NA	NA	NA

# ?plot_rema
plots <- plot_rema(tidy_rema = output, biomass_ylab = 'Biomass (t)') # optional y-axis label
plots$biomass_by_strata

Compare with ADMB RE model results

# ?compare_rema_models
compare <- compare_rema_models(rema_models = list(m),
                               admb_re = admb_re,
                               biomass_ylab = 'Biomass (t)')
compare$plots$biomass_by_strata

Example 2: Multivariate random effects model (REM) with a single survey and multiple strata

This example uses Bering Sea and Aleutian Islands shortspine thornyhead (bsaisst_rwout.rep) and fits to the NMFS EBS slope and AI bottom trawl survey estimates. The original ADMB model has a single, shared process error (PE) over the three strata. This example demonstrates how to define alternative PE structures, and how to u to perform model selection using AIC.


admb_re <- read_admb_re(filename = 'bsaisst_rwout.rep',
                      biomass_strata_names = c('AI survey', 'EBS slope survey', 'S. Bering Sea (AI survey)'),
                      model_name = 'ADMB: single PE')

# the original ADMB model shared PE across all strata
input1 <- prepare_rema_input(model_name = 'TMB: single PE',
                            admb_re = admb_re,
                            PE_options = list(pointer_PE_biomass = c(1, 1, 1)))
m1 <- fit_rema(input1)
#> Model runtime: 0.1 seconds 
#> stats::nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 9.80e-10 
#> Max gradient parameter: log_PE 
#> TMB:sdreport() was performed successfully for this model
output <- tidy_rema(rema_model = m1)
kable(output$parameter_estimates) # estimated fixed effects parameters

model_name	parameter	estimate	std_err	lci	uci
TMB: single PE	process_error	0.121168	0.0286548	0.0762236	0.1926136

kable(head(output$biomass_by_strata, 5)) # data.frame of predicted and observed biomass by stratum

model_name	strata	variable	year	log_pred	sd_log_pred	pred	pred_lci	pred_uci	obs	obs_cv	log_obs	sd_log_obs	obs_lci	obs_uci
TMB: single PE	AI survey	biomass_pred	1982	8.792938	0.4034014	6587.561	2987.781	14524.48	NA	9.000024	NA	NA	NA	NA
TMB: single PE	AI survey	biomass_pred	1983	8.792938	0.3847739	6587.561	3098.878	14003.77	NA	9.000024	NA	NA	NA	NA
TMB: single PE	AI survey	biomass_pred	1984	8.792938	0.3651976	6587.561	3220.089	13476.63	NA	9.000024	NA	NA	NA	NA
TMB: single PE	AI survey	biomass_pred	1985	8.792938	0.3445107	6587.561	3353.333	12941.14	NA	9.000024	NA	NA	NA	NA
TMB: single PE	AI survey	biomass_pred	1986	8.792938	0.3224995	6587.561	3501.165	12394.72	NA	9.000024	NA	NA	NA	NA

kable(head(output$total_predicted_biomass, 5)) # combined/total predicted biomass

model_name	variable	year	pred	pred_lci	pred_uci
TMB: single PE	tot_biomass_pred	1982	24821.5	11212.94	54946.08
TMB: single PE	tot_biomass_pred	1983	24821.5	11446.45	53825.14
TMB: single PE	tot_biomass_pred	1984	24821.5	11691.42	52697.34
TMB: single PE	tot_biomass_pred	1985	24821.5	11948.96	51561.55
TMB: single PE	tot_biomass_pred	1986	24821.5	12220.35	50416.46

We can use ggplot2 functions to modify formatting of plots:


plots <- plot_rema(tidy_rema = output, biomass_ylab = 'Biomass (t)')
plots$biomass_by_strata + facet_wrap(~strata, ncol = 1, scales = 'free_y')

One of the primary uses for the RE model is apportioning catch by management area. The tidy_rema() function output includes a table with proportions of predicted biomass by strata:

kable(tail(output$proportion_biomass_by_strata, 3))

model_name	year	AI survey	EBS slope survey	S. Bering Sea (AI survey)
TMB: single PE	2018	0.2882088	0.6856694	0.0261218
TMB: single PE	2019	0.2882088	0.6856694	0.0261218
TMB: single PE	2020	0.2882088	0.6856694	0.0261218

# figure:
# plots$proportion_biomass_by_strata

We can compare the TMB model results with the original ADMB model. Note different confidence intervals in the ADMB and TMB versions. The ADMB code uses the Marlow method to sum the variances of biomass in log-space, whereas the TMB version applies the standard Delta method.

compare <- compare_rema_models(rema_models = list(m1),
                               admb_re = admb_re,
                               biomass_ylab = 'Biomass (t)')

# Note different confidence intervals between the ADMB version (Marlow method) and the TMB version (Delta method)
compare$plots$biomass_by_strata + facet_wrap(~strata, ncol = 1, scales = 'free_y')

compare$plots$total_predicted_biomass + ggplot2::ggtitle('BSAI Shortspine thornyhead')

We can easily fit an alternative REMA model with strata-specific PE parameters, and compare the two models using AIC. In this case we see the single PE model has the lowest AIC value. Note that ADMB models cannot currently be compared with REMA models using AIC, therefore we omit the admb_re = admb_re argument in the compare_rema_models() function call below:


# REMA defaults to strata-specific parameters, which could be explicitly defined
# as: PE_options = list(pointer_PE_biomass = c(1, 2, 3))
input2 <- prepare_rema_input(model_name = 'TMB: strata-specific PE',
                            admb_re = admb_re) 

m2 <- fit_rema(input2)
#> Model runtime: 0.1 seconds 
#> stats::nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 2.31e-07 
#> Max gradient parameter: log_PE 
#> TMB:sdreport() was performed successfully for this model

compare <- compare_rema_models(rema_models = list(m1, m2),
                               biomass_ylab = 'Biomass (t)')

compare$plots$biomass_by_strata + facet_wrap(~strata, ncol = 1, scales = 'free_y')

kable(compare$aic)

model_name	objective_function	n_parameters	aic	delta_aic
TMB: single PE	34.73786	1	71.5	0.0
TMB: strata-specific PE	34.65104	3	75.3	3.8

kable(compare$output$parameter_estimates)

model_name	parameter	estimate	std_err	lci	uci
TMB: single PE	process_error	0.1211680	0.0286548	0.0762236	0.1926136
TMB: strata-specific PE	process_error	0.1257423	0.0384807	0.0690223	0.2290726
TMB: strata-specific PE	process_error	0.1084952	0.0419428	0.0508566	0.2314588
TMB: strata-specific PE	process_error	0.1552381	0.1289141	0.0304885	0.7904241

The rema workflow:

Example 1: Univariate random effects (RE) model with a single survey and stratum

Example 2: Multivariate random effects model (REM) with a single survey and multiple strata

The `rema` workflow: