Sampler Documentation

Author

Jim Ianelli

GitHub: afsc-assessments/sampler

1 Overview

This repository contains two operational paths for generating sampler outputs:

  1. ADMB executable workflow (legacy and still supported).
  2. R refactor workflow (R/sampler_refactor.R) that preserves the same input file formats and writes tidy Parquet outputs.

Both paths use the same core input files:

  • sam*.dat control files
  • age*.dat age sample files
  • len*.dat length sample files
  • bs_setup.dat bootstrap settings

2 Repository Map

  • src/: ADMB template (sam.tpl) and Makefile for compiling sam.
  • admb/: alternate ADMB build area.
  • R/: R scripts, including the refactor entry point.
  • example/: small synthetic inputs for smoke testing.
  • cases/: historical species/assessment-specific data and scripts.
  • index.qmd: this documentation source.

3 Input File Formats

3.1 Control File (sam*.dat)

Token order expected by both ADMB (src/sam.tpl) and R (R/sampler_refactor.R):

Position Name Description
1 year Model year label
2 agefile Path to age file
3 lenfile Path to length file
4 na_rcrds Number of age records
5 nl_rcrds Number of length records
6 a1 Min age
7 a2 Max age
8 l1 Min length bin
9 l2 Max length bin
10 nstrata Number of strata
11..(10+nstrata) strata_catch Catch by stratum
last outfile Main output file label

Example (example/sam2020.dat):

2020 age2020.dat len2020.dat 6 6 2 5 30 38 1 100 results/Est_2020.dat

3.2 Age File (age*.dat)

Six columns:

strata haul sex age wt len

3.3 Length File (len*.dat)

Five columns:

strata haul sex len freq

3.4 Bootstrap Setup (bs_setup.dat)

Expected numeric values, one per line:

  1. nbs (number of bootstrap iterations; 0/1 means no bootstrap expansion in many use cases)
  2. sam_level_age_tows
  3. sam_level_ages
  4. sam_level_lf_tows
  5. sam_level_lfreqs

Example (example/bs_setup.dat):

5
2
2
2
2

4 ADMB Workflow

4.1 Build

From repository root:

cd src
make clean
make

This compiles sam from sam.tpl.

4.2 Run

Run from a directory containing sam*.dat, age*.dat, len*.dat, and bs_setup.dat (for example example/):

cd example
../src/sam -ind sam2020.dat

4.3 ADMB Runtime Options

Options parsed in sam.tpl:

  • -io: enable additional diagnostic printing.
  • -sim: simulation mode flag.

Example:

../src/sam -nox -io -ind sam2020.dat

4.4 ADMB Outputs

ADMB writes multiple .rep files under results/ (relative to the run directory), including:

  • sex_catage<year>.rep
  • catage<year>.rep
  • str_catage<year>.rep
  • sex_wtage<year>.rep
  • wtage<year>.rep
  • str_wtage<year>.rep
  • LF_<year>.rep
  • alk_<year>.rep
  • wl_<year>.rep

5 R Refactor Workflow

Refactor entry point: run_sampler_refactor() in R/sampler_refactor.R.

5.1 Prerequisites

R packages:

  • data.table
  • arrow

5.3 Function Arguments

Argument Default Description
control_file required Path to sam*.dat
bs_file "bs_setup.dat" Path to bootstrap file
out_dir "results" Output directory
out_prefix "sampler" Output file prefix

5.4 R Outputs

  • results/<prefix>_tidy.parquet
  • results/<prefix>_bootstrap_<year>.parquet (if nbs >= 1)
  • results/<prefix>_refactor.log

5.5 Notes on Paths

sam*.dat usually stores agefile and lenfile as relative names (for example age2020.dat). Use the same working directory as those files (for example example/) when running.

6 Core Equations

The model composes age and length information from fishery and survey samples with standard multinomial-style proportions and bootstrap summaries.

6.1 Length-frequency proportions

For year \(y\), sex \(s\), and length bin \(l\):

\[ p_{y,s,l} = \frac{n_{y,s,l}}{\sum_{l'} n_{y,s,l'}} \]

where \(n_{y,s,l}\) is the sampled count at length.

6.2 Age-length key (ALK)

For age \(a\) and length bin \(l\):

\[ P(a \mid l, y, s) = \frac{n_{y,s,a,l}}{\sum_{a'} n_{y,s,a',l}} \]

This maps observed length composition to age composition.

6.3 FTNIR-to-TMA age mapping (2024 onward)

Beginning in 2024, FTNIR ages are first mapped to traditional microscope ages (TMA) using an age-age key, analogous to an age-length key. In this step, FTNIR and TMA data are sex-aggregated.

For FTNIR age \(a_F\) and TMA age \(a_T\):

\[ P(a_T \mid a_F, y) = \frac{n_{y,a_F,a_T}}{\sum_{a_T'} n_{y,a_F,a_T'}} \]

where \(n_{y,a_F,a_T}\) are observed paired readings used to build the key.

In practice, this step converts FTNIR age compositions to TMA-equivalent age compositions first; then the standard sampler workflow proceeds (ALK application, catch-at-age, and roll-ups) using those mapped ages.

Updated 2024 bootstrap comparisons (FTNIR vs age-error adjusted) are shown below.

Weight-at-age bootstrap comparison by data source

Catch-at-age bootstrap comparison by data source

6.4 Conditional ALK fallback for sparse cells

The ADMB implementation uses a fallback hierarchy when age data are missing within a given stratum/sex/length cell.

Let \(n_{y,k,s,l,a}\) be age counts for year \(y\), stratum \(k\), sex \(s\), length bin \(l\), age \(a\). Define three normalized ALKs at each length:

\[ A^{(G)}_{y,l,a} = \begin{cases} \dfrac{n_{y,\cdot,\cdot,l,a}}{\sum_{a'} n_{y,\cdot,\cdot,l,a'}}, & \sum_{a'} n_{y,\cdot,\cdot,l,a'} > 0 \\ 0, & \text{otherwise} \end{cases} \]

\[ A^{(S)}_{y,s,l,a} = \begin{cases} \dfrac{n_{y,\cdot,s,l,a}}{\sum_{a'} n_{y,\cdot,s,l,a'}}, & \sum_{a'} n_{y,\cdot,s,l,a'} > 0 \\ A^{(G)}_{y,l,a}, & \text{otherwise} \end{cases} \]

\[ A_{y,k,s,l,a} = \begin{cases} \dfrac{n_{y,k,s,l,a}}{\sum_{a'} n_{y,k,s,l,a'}}, & \sum_{a'} n_{y,k,s,l,a'} > 0 \\ A^{(S)}_{y,s,l,a}, & \text{otherwise} \end{cases} \]

So missing cells are filled as: stratum+sex+length -> sex+length -> global length.

If a length bin has no age observations at any stratum or sex, then \(A^{(G)}_{y,l,a}=0\) for all \(a\), and downstream fallbacks remain zero at that length.

6.5 Catch-at-length and catch-at-age

Given total catch in numbers \(C_{y,s}\):

\[ C_{y,s,l} = C_{y,s} \cdot p_{y,s,l} \]

and catch-at-age is obtained by marginalizing over length:

\[ C_{y,s,a} = \sum_l C_{y,s,l} \cdot P(a \mid l, y, s) \]

6.6 Mean weight-at-age and mean weight-at-length

For sampled weights \(w_i\):

\[ \bar{w}_{y,s,a} = \frac{1}{N_{y,s,a}} \sum_{i=1}^{N_{y,s,a}} w_i \]

\[ \bar{w}_{y,s,l} = \frac{1}{N_{y,s,l}} \sum_{i=1}^{N_{y,s,l}} w_i \]

6.7 Strata roll-up in output files

In results/Est_<year>.dat, rows are stored as year type stratum sex age value with:

  • stratum = 1..H for explicit strata,
  • stratum = 99 for all-strata aggregate,
  • sex = 99 for both-sex aggregate.

For stratum \(k\), sex \(s\), and age \(a\), numbers-at-age are:

\[ N_{y,k,s,a} = \sum_l \mathrm{LF}_{y,k,s,l}\, \mathrm{ALK}_{y,k,s,l,a} \]

where \(\mathrm{LF}\) is the stratum-specific length frequency and \(\mathrm{ALK}\) is the stratum-specific age-length key.

Define stratum-sex-age biomass-at-age as:

\[ B_{y,k,s,a} = \sum_l \left(\mathrm{LF}_{y,k,s,l}\,\bar{w}_{y,k,s,l}\right)\, \mathrm{ALK}_{y,k,s,l,a} \]

and mean weight-at-age as:

\[ \bar{w}_{y,k,s,a} = \frac{B_{y,k,s,a}}{N_{y,k,s,a}} \]

Roll-up equations used for stratum=99 and/or sex=99 are:

\[ N_{y,k,99,a} = \sum_s N_{y,k,s,a}, \qquad N_{y,99,s,a} = \sum_k N_{y,k,s,a}, \qquad N_{y,99,99,a} = \sum_k \sum_s N_{y,k,s,a} \]

\[ \bar{w}_{y,99,99,a} = \frac{\sum_k \sum_s N_{y,k,s,a}\,\bar{w}_{y,k,s,a}} {\sum_k \sum_s N_{y,k,s,a}} \]

So the annual/combined output is a stratum-weighted aggregation, not an unweighted average across strata.

6.8 Bootstrap summary statistics

From bootstrap replicates \(\theta^{(b)}\), \(b = 1,\dots,B\):

\[ \hat{\mu}_\theta = \frac{1}{B} \sum_{b=1}^{B} \theta^{(b)} \]

\[ \widehat{\mathrm{sd}}(\theta) = \sqrt{\frac{1}{B-1} \sum_{b=1}^{B} \left(\theta^{(b)} - \hat{\mu}_\theta\right)^2} \]

\[ \mathrm{CV}(\theta) = \frac{\widehat{\mathrm{sd}}(\theta)}{\hat{\mu}_\theta} \]

7 cases/ebswpSAM Focus

This directory contains the Eastern Bering Sea walleye pollock workflow, with two practical data layouts:

  1. Archived fishery inputs in cases/ebswpSAM/data/fishery/ using samYYYY.dat, ageYYYY.dat, lenYYYY.dat.
  2. Generated working inputs expected by some scripts/Makefile in cases/ebswpSAM/data/ using sam_YYYY.dat, ageYYYY.dat, lenYYYY.dat.

7.2 Alternate: generated-data run (cases/ebswpSAM/Makefile)

cases/ebswpSAM/Makefile runs:

./sam -ind data/sam<year>.dat

Example:

cd cases/ebswpSAM
make yr=2024

Use this when data/sam<year>.dat exists (no underscore naming).

7.3 Consolidated driver (year ranges)

Use the single fishery driver to run ADMB over data/samYYYY.dat and write a rolled-up summary CSV:

cd cases/ebswpSAM
Rscript EBS_sampler_fishery.R start_year=1991 end_year=2024 nbs=1 levels=1,1,1,1

Common options:

  • start_year, end_year: year range to run/read.
  • run_sampler=true|false: run ADMB, or only summarize existing results/Est_<year>.dat files.
  • nbs: first line for bs_setup.dat (for example 1 for point estimates, 1000 for bootstrap mode).
  • levels: four comma-separated sampling controls for bs_setup.dat (for example 1,1,1,1).
  • io=true|false: pass -nox -io to sam when needed.

Summary-only example (no ADMB rerun):

cd cases/ebswpSAM
Rscript EBS_sampler_fishery.R start_year=1991 end_year=2024 run_sampler=false

7.4 Key ebswpSAM scripts

  • cases/ebswpSAM/EBS_sampler_fishery.R: canonical fishery driver for year ranges in data/fishery (ADMB run + summary).
  • cases/ebswpSAM/EBS_sampler.R: compatibility wrapper to run the consolidated driver with default full range.
  • cases/ebswpSAM/EBS_sampler2021.R, cases/ebswpSAM/EBS_sampler2022.R, cases/ebswpSAM/EBS_sampler2023.R: compatibility wrappers pinned to historical end years.
  • cases/ebswpSAM/sampler.R: legacy helper definitions (Sampler(), SetBS()) used by earlier workflows.
  • cases/ebswpSAM/ebs_survey.R: survey-oriented conversion and sampler runs.
  • cases/ebswpSAM/imported/poll_age.sql and cases/ebswpSAM/imported/poll_len.sql: source SQL templates.

7.5 Known path caveats

  • Several legacy scripts in this case directory contain machine-specific setwd()/source() paths and may need local edits before execution.
  • Some scripts reference sampleR/R/sampler_EBSwp.R; verify that file exists in your local clone or update the source() target to the intended sampler helper.

8 Query-Based Data Preparation

These query scripts live in R/:

  • R/CallQueriesForSamplerData.R
  • R/QueryAgesForSampler.R
  • R/QueryLengthsForSampler.R
  • R/QueryCatchByStrata.R

These use AFSC/AKFIN ODBC connections and keyring credentials to generate age*.dat, len*.dat, and sam*.dat files.

9 Appendix: (Roxygen)

Source: cases/ebswpSAM/EBS_sampler_fishery.R

The table below is a concise rendering of the roxygen function documentation in the consolidated driver.

Function Purpose Key Parameters Return
get_script_dir() Resolve script directory for CLI or sourced execution. none Absolute directory path (character).
as_bool(x, default = FALSE) Parse truthy/falsey CLI text to logical. x, default Logical scalar.
parse_cli_args(args) Parse key=value CLI tokens into a named list. args Named list of string values.
discover_control_years(fishery_dir) Detect available years from samYYYY.dat controls. fishery_dir Sorted integer year vector.
write_bs_setup(fishery_dir, nbs, levels) Write bs_setup.dat controls for ADMB. fishery_dir, nbs, levels (length 4) Path to written file.
run_sampler_year(year, fishery_dir, sam_bin, io = FALSE, verbose = TRUE) Run ADMB sam for one control year. year, fishery_dir, sam_bin, io, verbose Invisibly TRUE on success.
read_estimates(year, fishery_dir) Read one results/Est_<year>.dat file. year, fishery_dir data.table or NULL if missing.
summarize_estimates(est_dt) Roll up combined (stratum=99) annual numbers and mean age. est_dt Yearly summary data.table.
run_ebswp_fishery(...) Main consolidated workflow for year-range run + summary. start_year, end_year, run_sampler, nbs, levels, io, fishery_dir, sam_bin, summary_csv, verbose Invisible list: years, summary_csv, n_rows.

10 Render This Documentation

From repository root:

quarto render index.qmd

This produces index.html in the repository root.

11 Push to GitHub

Typical update sequence:

git add index.qmd index.html
git commit -m "Update sampler documentation"
git push origin <branch>

If your repository uses GitHub Pages from root or gh-pages, commit index.html accordingly.

12 Troubleshooting

  • Package 'arrow' is required for Parquet output.
    • Install arrow in your R library before running the refactor.
  • quarto: command not found
    • Install Quarto and ensure it is on your PATH.
  • ADMB binary not found
    • Ensure admb and/or compiled sam exists and is executable.