Sampler Documentation
GitHub: afsc-assessments/sampler
1 Overview
This repository contains two operational paths for generating sampler outputs:
- ADMB executable workflow (legacy and still supported).
- R refactor workflow (
R/sampler_refactor.R) that preserves the same input file formats and writes tidy Parquet outputs.
Both paths use the same core input files:
sam*.datcontrol filesage*.datage sample fileslen*.datlength sample filesbs_setup.datbootstrap settings
2 Repository Map
src/: ADMB template (sam.tpl) and Makefile for compilingsam.admb/: alternate ADMB build area.R/: R scripts, including the refactor entry point.example/: small synthetic inputs for smoke testing.cases/: historical species/assessment-specific data and scripts.index.qmd: this documentation source.
3 Input File Formats
3.1 Control File (sam*.dat)
Token order expected by both ADMB (src/sam.tpl) and R (R/sampler_refactor.R):
| Position | Name | Description |
|---|---|---|
| 1 | year |
Model year label |
| 2 | agefile |
Path to age file |
| 3 | lenfile |
Path to length file |
| 4 | na_rcrds |
Number of age records |
| 5 | nl_rcrds |
Number of length records |
| 6 | a1 |
Min age |
| 7 | a2 |
Max age |
| 8 | l1 |
Min length bin |
| 9 | l2 |
Max length bin |
| 10 | nstrata |
Number of strata |
11..(10+nstrata) |
strata_catch |
Catch by stratum |
| last | outfile |
Main output file label |
Example (example/sam2020.dat):
2020 age2020.dat len2020.dat 6 6 2 5 30 38 1 100 results/Est_2020.dat
3.2 Age File (age*.dat)
Six columns:
strata haul sex age wt len
3.3 Length File (len*.dat)
Five columns:
strata haul sex len freq
3.4 Bootstrap Setup (bs_setup.dat)
Expected numeric values, one per line:
nbs(number of bootstrap iterations;0/1means no bootstrap expansion in many use cases)sam_level_age_towssam_level_agessam_level_lf_towssam_level_lfreqs
Example (example/bs_setup.dat):
5
2
2
2
2
4 ADMB Workflow
4.1 Build
From repository root:
cd src
make clean
makeThis compiles sam from sam.tpl.
4.2 Run
Run from a directory containing sam*.dat, age*.dat, len*.dat, and bs_setup.dat (for example example/):
cd example
../src/sam -ind sam2020.dat4.3 ADMB Runtime Options
Options parsed in sam.tpl:
-io: enable additional diagnostic printing.-sim: simulation mode flag.
Example:
../src/sam -nox -io -ind sam2020.dat4.4 ADMB Outputs
ADMB writes multiple .rep files under results/ (relative to the run directory), including:
sex_catage<year>.repcatage<year>.repstr_catage<year>.repsex_wtage<year>.repwtage<year>.repstr_wtage<year>.repLF_<year>.repalk_<year>.repwl_<year>.rep
5 R Refactor Workflow
Refactor entry point: run_sampler_refactor() in R/sampler_refactor.R.
5.1 Prerequisites
R packages:
data.tablearrow
5.2 Recommended Run Commands
From repository root:
make exampleEquivalent direct call:
cd example
R -q -e "source('../R/sampler_refactor.R'); run_sampler_refactor('sam2020.dat','bs_setup.dat','../results')"5.3 Function Arguments
| Argument | Default | Description |
|---|---|---|
control_file |
required | Path to sam*.dat |
bs_file |
"bs_setup.dat" |
Path to bootstrap file |
out_dir |
"results" |
Output directory |
out_prefix |
"sampler" |
Output file prefix |
5.4 R Outputs
results/<prefix>_tidy.parquetresults/<prefix>_bootstrap_<year>.parquet(ifnbs >= 1)results/<prefix>_refactor.log
5.5 Notes on Paths
sam*.dat usually stores agefile and lenfile as relative names (for example age2020.dat). Use the same working directory as those files (for example example/) when running.
6 Core Equations
The model composes age and length information from fishery and survey samples with standard multinomial-style proportions and bootstrap summaries.
6.1 Length-frequency proportions
For year \(y\), sex \(s\), and length bin \(l\):
\[ p_{y,s,l} = \frac{n_{y,s,l}}{\sum_{l'} n_{y,s,l'}} \]
where \(n_{y,s,l}\) is the sampled count at length.
6.2 Age-length key (ALK)
For age \(a\) and length bin \(l\):
\[ P(a \mid l, y, s) = \frac{n_{y,s,a,l}}{\sum_{a'} n_{y,s,a',l}} \]
This maps observed length composition to age composition.
6.3 FTNIR-to-TMA age mapping (2024 onward)
Beginning in 2024, FTNIR ages are first mapped to traditional microscope ages (TMA) using an age-age key, analogous to an age-length key. In this step, FTNIR and TMA data are sex-aggregated.
For FTNIR age \(a_F\) and TMA age \(a_T\):
\[ P(a_T \mid a_F, y) = \frac{n_{y,a_F,a_T}}{\sum_{a_T'} n_{y,a_F,a_T'}} \]
where \(n_{y,a_F,a_T}\) are observed paired readings used to build the key.
In practice, this step converts FTNIR age compositions to TMA-equivalent age compositions first; then the standard sampler workflow proceeds (ALK application, catch-at-age, and roll-ups) using those mapped ages.
Updated 2024 bootstrap comparisons (FTNIR vs age-error adjusted) are shown below.
6.4 Conditional ALK fallback for sparse cells
The ADMB implementation uses a fallback hierarchy when age data are missing within a given stratum/sex/length cell.
Let \(n_{y,k,s,l,a}\) be age counts for year \(y\), stratum \(k\), sex \(s\), length bin \(l\), age \(a\). Define three normalized ALKs at each length:
\[ A^{(G)}_{y,l,a} = \begin{cases} \dfrac{n_{y,\cdot,\cdot,l,a}}{\sum_{a'} n_{y,\cdot,\cdot,l,a'}}, & \sum_{a'} n_{y,\cdot,\cdot,l,a'} > 0 \\ 0, & \text{otherwise} \end{cases} \]
\[ A^{(S)}_{y,s,l,a} = \begin{cases} \dfrac{n_{y,\cdot,s,l,a}}{\sum_{a'} n_{y,\cdot,s,l,a'}}, & \sum_{a'} n_{y,\cdot,s,l,a'} > 0 \\ A^{(G)}_{y,l,a}, & \text{otherwise} \end{cases} \]
\[ A_{y,k,s,l,a} = \begin{cases} \dfrac{n_{y,k,s,l,a}}{\sum_{a'} n_{y,k,s,l,a'}}, & \sum_{a'} n_{y,k,s,l,a'} > 0 \\ A^{(S)}_{y,s,l,a}, & \text{otherwise} \end{cases} \]
So missing cells are filled as: stratum+sex+length -> sex+length -> global length.
If a length bin has no age observations at any stratum or sex, then \(A^{(G)}_{y,l,a}=0\) for all \(a\), and downstream fallbacks remain zero at that length.
6.5 Catch-at-length and catch-at-age
Given total catch in numbers \(C_{y,s}\):
\[ C_{y,s,l} = C_{y,s} \cdot p_{y,s,l} \]
and catch-at-age is obtained by marginalizing over length:
\[ C_{y,s,a} = \sum_l C_{y,s,l} \cdot P(a \mid l, y, s) \]
6.6 Mean weight-at-age and mean weight-at-length
For sampled weights \(w_i\):
\[ \bar{w}_{y,s,a} = \frac{1}{N_{y,s,a}} \sum_{i=1}^{N_{y,s,a}} w_i \]
\[ \bar{w}_{y,s,l} = \frac{1}{N_{y,s,l}} \sum_{i=1}^{N_{y,s,l}} w_i \]
6.7 Strata roll-up in output files
In results/Est_<year>.dat, rows are stored as year type stratum sex age value with:
stratum = 1..Hfor explicit strata,stratum = 99for all-strata aggregate,sex = 99for both-sex aggregate.
For stratum \(k\), sex \(s\), and age \(a\), numbers-at-age are:
\[ N_{y,k,s,a} = \sum_l \mathrm{LF}_{y,k,s,l}\, \mathrm{ALK}_{y,k,s,l,a} \]
where \(\mathrm{LF}\) is the stratum-specific length frequency and \(\mathrm{ALK}\) is the stratum-specific age-length key.
Define stratum-sex-age biomass-at-age as:
\[ B_{y,k,s,a} = \sum_l \left(\mathrm{LF}_{y,k,s,l}\,\bar{w}_{y,k,s,l}\right)\, \mathrm{ALK}_{y,k,s,l,a} \]
and mean weight-at-age as:
\[ \bar{w}_{y,k,s,a} = \frac{B_{y,k,s,a}}{N_{y,k,s,a}} \]
Roll-up equations used for stratum=99 and/or sex=99 are:
\[ N_{y,k,99,a} = \sum_s N_{y,k,s,a}, \qquad N_{y,99,s,a} = \sum_k N_{y,k,s,a}, \qquad N_{y,99,99,a} = \sum_k \sum_s N_{y,k,s,a} \]
\[ \bar{w}_{y,99,99,a} = \frac{\sum_k \sum_s N_{y,k,s,a}\,\bar{w}_{y,k,s,a}} {\sum_k \sum_s N_{y,k,s,a}} \]
So the annual/combined output is a stratum-weighted aggregation, not an unweighted average across strata.
6.8 Bootstrap summary statistics
From bootstrap replicates \(\theta^{(b)}\), \(b = 1,\dots,B\):
\[ \hat{\mu}_\theta = \frac{1}{B} \sum_{b=1}^{B} \theta^{(b)} \]
\[ \widehat{\mathrm{sd}}(\theta) = \sqrt{\frac{1}{B-1} \sum_{b=1}^{B} \left(\theta^{(b)} - \hat{\mu}_\theta\right)^2} \]
\[ \mathrm{CV}(\theta) = \frac{\widehat{\mathrm{sd}}(\theta)}{\hat{\mu}_\theta} \]
7 cases/ebswpSAM Focus
This directory contains the Eastern Bering Sea walleye pollock workflow, with two practical data layouts:
- Archived fishery inputs in
cases/ebswpSAM/data/fishery/usingsamYYYY.dat,ageYYYY.dat,lenYYYY.dat. - Generated working inputs expected by some scripts/Makefile in
cases/ebswpSAM/data/usingsam_YYYY.dat,ageYYYY.dat,lenYYYY.dat.
7.1 Recommended: archived fishery run (data/fishery)
Run from the directory that contains the samYYYY.dat control files so relative age/len filenames resolve correctly.
7.1.1 ADMB run for one year
cd cases/ebswpSAM/data/fishery
cat > bs_setup.dat <<'TXT'
1
1
1
1
1
TXT
../../../src/sam -ind sam2024.dat7.1.2 R refactor run for one year
cd cases/ebswpSAM/data/fishery
cat > bs_setup.dat <<'TXT'
1000
2
2
2
2
TXT
R -q -e "source('../../../R/sampler_refactor.R'); run_sampler_refactor('sam2024.dat','bs_setup.dat','results','ebswp')"7.2 Alternate: generated-data run (cases/ebswpSAM/Makefile)
cases/ebswpSAM/Makefile runs:
./sam -ind data/sam<year>.datExample:
cd cases/ebswpSAM
make yr=2024Use this when data/sam<year>.dat exists (no underscore naming).
7.3 Consolidated driver (year ranges)
Use the single fishery driver to run ADMB over data/samYYYY.dat and write a rolled-up summary CSV:
cd cases/ebswpSAM
Rscript EBS_sampler_fishery.R start_year=1991 end_year=2024 nbs=1 levels=1,1,1,1Common options:
start_year,end_year: year range to run/read.run_sampler=true|false: run ADMB, or only summarize existingresults/Est_<year>.datfiles.nbs: first line forbs_setup.dat(for example1for point estimates,1000for bootstrap mode).levels: four comma-separated sampling controls forbs_setup.dat(for example1,1,1,1).io=true|false: pass-nox -iotosamwhen needed.
Summary-only example (no ADMB rerun):
cd cases/ebswpSAM
Rscript EBS_sampler_fishery.R start_year=1991 end_year=2024 run_sampler=false7.4 Key ebswpSAM scripts
cases/ebswpSAM/EBS_sampler_fishery.R: canonical fishery driver for year ranges indata/fishery(ADMB run + summary).cases/ebswpSAM/EBS_sampler.R: compatibility wrapper to run the consolidated driver with default full range.cases/ebswpSAM/EBS_sampler2021.R,cases/ebswpSAM/EBS_sampler2022.R,cases/ebswpSAM/EBS_sampler2023.R: compatibility wrappers pinned to historical end years.cases/ebswpSAM/sampler.R: legacy helper definitions (Sampler(),SetBS()) used by earlier workflows.cases/ebswpSAM/ebs_survey.R: survey-oriented conversion and sampler runs.cases/ebswpSAM/imported/poll_age.sqlandcases/ebswpSAM/imported/poll_len.sql: source SQL templates.
7.5 Known path caveats
- Several legacy scripts in this case directory contain machine-specific
setwd()/source()paths and may need local edits before execution. - Some scripts reference
sampleR/R/sampler_EBSwp.R; verify that file exists in your local clone or update thesource()target to the intended sampler helper.
8 Query-Based Data Preparation
These query scripts live in R/:
R/CallQueriesForSamplerData.RR/QueryAgesForSampler.RR/QueryLengthsForSampler.RR/QueryCatchByStrata.R
These use AFSC/AKFIN ODBC connections and keyring credentials to generate age*.dat, len*.dat, and sam*.dat files.
9 Appendix: (Roxygen)
Source: cases/ebswpSAM/EBS_sampler_fishery.R
The table below is a concise rendering of the roxygen function documentation in the consolidated driver.
| Function | Purpose | Key Parameters | Return |
|---|---|---|---|
get_script_dir() |
Resolve script directory for CLI or sourced execution. | none | Absolute directory path (character). |
as_bool(x, default = FALSE) |
Parse truthy/falsey CLI text to logical. | x, default |
Logical scalar. |
parse_cli_args(args) |
Parse key=value CLI tokens into a named list. |
args |
Named list of string values. |
discover_control_years(fishery_dir) |
Detect available years from samYYYY.dat controls. |
fishery_dir |
Sorted integer year vector. |
write_bs_setup(fishery_dir, nbs, levels) |
Write bs_setup.dat controls for ADMB. |
fishery_dir, nbs, levels (length 4) |
Path to written file. |
run_sampler_year(year, fishery_dir, sam_bin, io = FALSE, verbose = TRUE) |
Run ADMB sam for one control year. |
year, fishery_dir, sam_bin, io, verbose |
Invisibly TRUE on success. |
read_estimates(year, fishery_dir) |
Read one results/Est_<year>.dat file. |
year, fishery_dir |
data.table or NULL if missing. |
summarize_estimates(est_dt) |
Roll up combined (stratum=99) annual numbers and mean age. |
est_dt |
Yearly summary data.table. |
run_ebswp_fishery(...) |
Main consolidated workflow for year-range run + summary. | start_year, end_year, run_sampler, nbs, levels, io, fishery_dir, sam_bin, summary_csv, verbose |
Invisible list: years, summary_csv, n_rows. |
10 Render This Documentation
From repository root:
quarto render index.qmdThis produces index.html in the repository root.
11 Push to GitHub
Typical update sequence:
git add index.qmd index.html
git commit -m "Update sampler documentation"
git push origin <branch>If your repository uses GitHub Pages from root or gh-pages, commit index.html accordingly.
12 Troubleshooting
Package 'arrow' is required for Parquet output.- Install
arrowin your R library before running the refactor.
- Install
quarto: command not found- Install Quarto and ensure it is on your PATH.
- ADMB binary not found
- Ensure
admband/or compiledsamexists and is executable.
- Ensure