Strategies for handling zero biomass observations • rema

library(rema)
library(ggplot2)
library(dplyr)
library(cowplot) # install.packages('cowplot') # helpful plotting utilities
library(knitr)

ggplot2::theme_set(cowplot::theme_cowplot(font_size = 14) +
                     cowplot::background_grid() +
                     cowplot::panel_border())

A zero biomass observation occurs when a species is not detected in a survey strata, resulting in an estimated biomass equal to zero and no estimate of variance. These zeros are problematic in the REMA model because biomass is estimated in log-space, and therefore cannot handle zeros. Currently, the default method in REMA is to treat zeros as NAs, or missing values. The REMA package contains alternative methods that allow easy exploration of zero assumptions, including an option to add a user-defined constant (default small constant = 0.0001) and manually define a CV for the zero values (default CV = 1.5).

Here we compare these alternatives with a third, experimental method, using the Tweedie distribution to model observation errors. The Tweedie distribution is a positive, continuous distribution that accepts zeros and is therefore an ideal candidate for situations with zero biomass observations. The tradeoffs of this situation is that it requires the estimation of another parameter (the power parameter $\rho$ ), is computationally expensive, and at least in trial runs, appears to suffer from convergence issues.

The following example uses BSAI non-shortspine thornyhead (non-SST) other rockfish in the EBS shelf bottom trawl survey. The time series of biomass has 13 zeros out of 38 total observations.


nonsst <- read.csv('ebsshelf_orox_biomass.csv')
kable(nonsst)

strata	year	biomass	cv
EBS Shelf	1982	0.00	NA
EBS Shelf	1983	0.00	NA
EBS Shelf	1984	17.75	1.0001177
EBS Shelf	1985	36.17	0.9999791
EBS Shelf	1986	0.00	NA
EBS Shelf	1987	49.72	0.9999973
EBS Shelf	1988	0.00	NA
EBS Shelf	1989	0.00	NA
EBS Shelf	1990	0.00	NA
EBS Shelf	1991	857.40	0.9427366
EBS Shelf	1992	13.75	0.9996411
EBS Shelf	1993	86.07	1.0000160
EBS Shelf	1994	46.61	0.9999849
EBS Shelf	1995	75.58	0.7025223
EBS Shelf	1996	35.63	0.9999759
EBS Shelf	1997	126.24	0.9999773
EBS Shelf	1998	537.59	0.6792890
EBS Shelf	1999	397.58	0.7501752
EBS Shelf	2000	0.00	NA
EBS Shelf	2001	0.00	NA
EBS Shelf	2002	0.00	NA
EBS Shelf	2003	55.35	0.6986692
EBS Shelf	2004	0.00	NA
EBS Shelf	2005	36.32	1.0000123
EBS Shelf	2006	356.64	0.8470749
EBS Shelf	2007	0.00	NA
EBS Shelf	2008	0.00	NA
EBS Shelf	2009	121.56	0.5781683
EBS Shelf	2010	56.95	0.9201803
EBS Shelf	2011	55.82	1.0000285
EBS Shelf	2012	36.63	1.0000276
EBS Shelf	2013	39.73	1.0001188
EBS Shelf	2014	28.25	0.9999542
EBS Shelf	2015	142.91	0.9999827
EBS Shelf	2016	20.06	0.9997865
EBS Shelf	2017	169.32	0.7267037
EBS Shelf	2018	1592.69	0.6999401
EBS Shelf	2019	0.00	NA

Model 1: Zeros as NAs


nonsst <- read.csv('ebsshelf_orox_biomass.csv')

input1 <- prepare_rema_input(model_name = 'M1: Zeros as NAs',
                             biomass_dat = nonsst,
                             zeros = list(assumption = 'NA'))
m1 <- fit_rema(input1)
#> Model runtime: 0.1 seconds 
#> stats::nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 4.23e-10 
#> Max gradient parameter: log_PE 
#> TMB:sdreport() was performed successfully for this model

Model 2: Add a small constant


input2 <- prepare_rema_input(model_name = 'M2: Small constant=0.01, CV=1.5',
                            biomass_dat = nonsst,
                            zeros = list(assumption = 'small_constant',
                                         # values: 1) small constant, 2) assumed CV
                                         options_small_constant = c(0.01, 1.5)))
m2 <- fit_rema(input2)
#> Model runtime: 0.1 seconds 
#> stats::nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 1.33e-07 
#> Max gradient parameter: log_PE 
#> TMB:sdreport() was performed successfully for this model

Model 3: Tweedie distribution


input3 <- prepare_rema_input(model_name = 'M3: Tweedie',
                             biomass_dat = nonsst,
                             zeros = list(assumption = 'tweedie'))

m3 <- fit_rema(input3)
#> Model runtime: 0.1 seconds 
#> stats::nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 1.72e-10 
#> Max gradient parameter: log_PE 
#> TMB:sdreport() was performed successfully for this model

Compare model results

compare <- compare_rema_models(rema_models = list(m1, m2, m3))
compare$plots$biomass_by_strata

Models 1 (Zeros as NAs) and 3 (Tweedie) have the most similar results. While the Tweedie appears to perform well in this case, it can be very slow to run and suffer from convergence issues, especially when observation errors are relatively small. The Tweedie alternative should be considered experimental until these issues can be resolved.