Sampler Documentation

Author

Jim Ianelli

GitHub: afsc-assessments/sampler

1 Overview

This repository contains two operational paths for generating sampler outputs:

ADMB executable workflow (legacy and still supported).
R refactor workflow (R/sampler_refactor.R) that preserves the same input file formats and writes tidy Parquet outputs.

Both paths use the same core input files:

sam*.dat control files
age*.dat age sample files
len*.dat length sample files
bs_setup.dat bootstrap settings

2 Repository Map

src/: ADMB template (sam.tpl) and Makefile for compiling sam.
admb/: alternate ADMB build area.
R/: R scripts, including the refactor entry point.
example/: small synthetic inputs for smoke testing.
cases/: historical species/assessment-specific data and scripts.
index.qmd: this documentation source.

3 Input File Formats

3.1 Control File (`sam*.dat`)

Token order expected by both ADMB (src/sam.tpl) and R (R/sampler_refactor.R):

Position	Name	Description
1	`year`	Model year label
2	`agefile`	Path to age file
3	`lenfile`	Path to length file
4	`na_rcrds`	Number of age records
5	`nl_rcrds`	Number of length records
6	`a1`	Min age
7	`a2`	Max age
8	`l1`	Min length bin
9	`l2`	Max length bin
10	`nstrata`	Number of strata
11..(10+`nstrata`)	`strata_catch`	Catch by stratum
last	`outfile`	Main output file label

Example (example/sam2020.dat):

2020 age2020.dat len2020.dat 6 6 2 5 30 38 1 100 results/Est_2020.dat

3.2 Age File (`age*.dat`)

Six columns:

strata haul sex age wt len

3.3 Length File (`len*.dat`)

Five columns:

strata haul sex len freq

3.4 Bootstrap Setup (`bs_setup.dat`)

Expected numeric values, one per line:

nbs (number of bootstrap iterations; 0/1 means no bootstrap expansion in many use cases)
sam_level_age_tows
sam_level_ages
sam_level_lf_tows
sam_level_lfreqs

Example (example/bs_setup.dat):

4 ADMB Workflow

4.1 Build

From repository root:

cd src
make clean
make

This compiles sam from sam.tpl.

4.2 Run

Run from a directory containing sam*.dat, age*.dat, len*.dat, and bs_setup.dat (for example example/):

cd example
../src/sam -ind sam2020.dat

4.3 ADMB Runtime Options

Options parsed in sam.tpl:

-io: enable additional diagnostic printing.
-sim: simulation mode flag.

Example:

../src/sam -nox -io -ind sam2020.dat

4.4 ADMB Outputs

ADMB writes multiple .rep files under results/ (relative to the run directory), including:

sex_catage<year>.rep
catage<year>.rep
str_catage<year>.rep
sex_wtage<year>.rep
wtage<year>.rep
str_wtage<year>.rep
LF_<year>.rep
alk_<year>.rep
wl_<year>.rep

5 R Refactor Workflow

Refactor entry point: run_sampler_refactor() in R/sampler_refactor.R.

5.1 Prerequisites

R packages:

data.table
arrow

5.2 Recommended Run Commands

From repository root:

make example

Equivalent direct call:

cd example
R -q -e "source('../R/sampler_refactor.R'); run_sampler_refactor('sam2020.dat','bs_setup.dat','../results')"

5.3 Function Arguments

Argument	Default	Description
`control_file`	required	Path to `sam*.dat`
`bs_file`	`"bs_setup.dat"`	Path to bootstrap file
`out_dir`	`"results"`	Output directory
`out_prefix`	`"sampler"`	Output file prefix

5.4 R Outputs

results/<prefix>_tidy.parquet
results/<prefix>_bootstrap_<year>.parquet (if nbs >= 1)
results/<prefix>_refactor.log

5.5 Notes on Paths

sam*.dat usually stores agefile and lenfile as relative names (for example age2020.dat). Use the same working directory as those files (for example example/) when running.

6 Core Equations

The model composes age and length information from fishery and survey samples with standard multinomial-style proportions and bootstrap summaries.

6.1 Length-frequency proportions

For year \(y\), sex \(s\), and length bin \(l\):

\[ p_{y,s,l} = \frac{n_{y,s,l}}{\sum_{l'} n_{y,s,l'}} \]

where \(n_{y,s,l}\) is the sampled count at length.

6.2 Age-length key (ALK)

For age \(a\) and length bin \(l\):

\[ P(a \mid l, y, s) = \frac{n_{y,s,a,l}}{\sum_{a'} n_{y,s,a',l}} \]

This maps observed length composition to age composition.

6.3 FTNIR-to-TMA age mapping (2024 onward)

Beginning in 2024, FTNIR ages are first mapped to traditional microscope ages (TMA) using an age-age key, analogous to an age-length key. In this step, FTNIR and TMA data are sex-aggregated.

For FTNIR age \(a_F\) and TMA age \(a_T\):

\[ P(a_T \mid a_F, y) = \frac{n_{y,a_F,a_T}}{\sum_{a_T'} n_{y,a_F,a_T'}} \]

where \(n_{y,a_F,a_T}\) are observed paired readings used to build the key.

In practice, this step converts FTNIR age compositions to TMA-equivalent age compositions first; then the standard sampler workflow proceeds (ALK application, catch-at-age, and roll-ups) using those mapped ages.

Updated 2024 bootstrap comparisons (FTNIR vs age-error adjusted) are shown below.

Weight-at-age bootstrap comparison by data source

Catch-at-age bootstrap comparison by data source

6.4 Conditional ALK fallback for sparse cells

The ADMB implementation uses a fallback hierarchy when age data are missing within a given stratum/sex/length cell.

Let \(n_{y,k,s,l,a}\) be age counts for year \(y\), stratum \(k\), sex \(s\), length bin \(l\), age \(a\). Define three normalized ALKs at each length:

\[ A^{(G)}_{y,l,a} = \begin{cases} \dfrac{n_{y,\cdot,\cdot,l,a}}{\sum_{a'} n_{y,\cdot,\cdot,l,a'}}, & \sum_{a'} n_{y,\cdot,\cdot,l,a'} > 0 \\ 0, & \text{otherwise} \end{cases} \]

\[ A^{(S)}_{y,s,l,a} = \begin{cases} \dfrac{n_{y,\cdot,s,l,a}}{\sum_{a'} n_{y,\cdot,s,l,a'}}, & \sum_{a'} n_{y,\cdot,s,l,a'} > 0 \\ A^{(G)}_{y,l,a}, & \text{otherwise} \end{cases} \]

\[ A_{y,k,s,l,a} = \begin{cases} \dfrac{n_{y,k,s,l,a}}{\sum_{a'} n_{y,k,s,l,a'}}, & \sum_{a'} n_{y,k,s,l,a'} > 0 \\ A^{(S)}_{y,s,l,a}, & \text{otherwise} \end{cases} \]

So missing cells are filled as: stratum+sex+length -> sex+length -> global length.

If a length bin has no age observations at any stratum or sex, then \(A^{(G)}_{y,l,a}=0\) for all \(a\), and downstream fallbacks remain zero at that length.

6.5 Catch-at-length and catch-at-age

Given total catch in numbers \(C_{y,s}\):

\[ C_{y,s,l} = C_{y,s} \cdot p_{y,s,l} \]

and catch-at-age is obtained by marginalizing over length:

\[ C_{y,s,a} = \sum_l C_{y,s,l} \cdot P(a \mid l, y, s) \]

6.6 Mean weight-at-age and mean weight-at-length

For sampled weights \(w_i\):

\[ \bar{w}_{y,s,a} = \frac{1}{N_{y,s,a}} \sum_{i=1}^{N_{y,s,a}} w_i \]

\[ \bar{w}_{y,s,l} = \frac{1}{N_{y,s,l}} \sum_{i=1}^{N_{y,s,l}} w_i \]

6.7 Strata roll-up in output files

In results/Est_<year>.dat, rows are stored as year type stratum sex age value with:

stratum = 1..H for explicit strata,
stratum = 99 for all-strata aggregate,
sex = 99 for both-sex aggregate.

For stratum \(k\), sex \(s\), and age \(a\), numbers-at-age are:

\[ N_{y,k,s,a} = \sum_l \mathrm{LF}_{y,k,s,l}\, \mathrm{ALK}_{y,k,s,l,a} \]

where \(\mathrm{LF}\) is the stratum-specific length frequency and \(\mathrm{ALK}\) is the stratum-specific age-length key.

Define stratum-sex-age biomass-at-age as:

\[ B_{y,k,s,a} = \sum_l \left(\mathrm{LF}_{y,k,s,l}\,\bar{w}_{y,k,s,l}\right)\, \mathrm{ALK}_{y,k,s,l,a} \]

and mean weight-at-age as:

\[ \bar{w}_{y,k,s,a} = \frac{B_{y,k,s,a}}{N_{y,k,s,a}} \]

Roll-up equations used for stratum=99 and/or sex=99 are:

\[ N_{y,k,99,a} = \sum_s N_{y,k,s,a}, \qquad N_{y,99,s,a} = \sum_k N_{y,k,s,a}, \qquad N_{y,99,99,a} = \sum_k \sum_s N_{y,k,s,a} \]

\[ \bar{w}_{y,99,99,a} = \frac{\sum_k \sum_s N_{y,k,s,a}\,\bar{w}_{y,k,s,a}} {\sum_k \sum_s N_{y,k,s,a}} \]

So the annual/combined output is a stratum-weighted aggregation, not an unweighted average across strata.

6.8 Bootstrap summary statistics

From bootstrap replicates \(\theta^{(b)}\), \(b = 1,\dots,B\):

\[ \hat{\mu}_\theta = \frac{1}{B} \sum_{b=1}^{B} \theta^{(b)} \]

\[ \widehat{\mathrm{sd}}(\theta) = \sqrt{\frac{1}{B-1} \sum_{b=1}^{B} \left(\theta^{(b)} - \hat{\mu}_\theta\right)^2} \]

\[ \mathrm{CV}(\theta) = \frac{\widehat{\mathrm{sd}}(\theta)}{\hat{\mu}_\theta} \]

7 `cases/ebswpSAM` Focus

This directory contains the Eastern Bering Sea walleye pollock workflow, with two practical data layouts:

Archived fishery inputs in cases/ebswpSAM/data/fishery/ using samYYYY.dat, ageYYYY.dat, lenYYYY.dat.
Generated working inputs expected by some scripts/Makefile in cases/ebswpSAM/data/ using sam_YYYY.dat, ageYYYY.dat, lenYYYY.dat.

7.1 Recommended: archived fishery run (`data/fishery`)

Run from the directory that contains the samYYYY.dat control files so relative age/len filenames resolve correctly.

7.1.1 ADMB run for one year

cd cases/ebswpSAM/data/fishery
cat > bs_setup.dat <<'TXT'
1
1
1
1
1
TXT
../../../src/sam -ind sam2024.dat

7.1.2 R refactor run for one year

cd cases/ebswpSAM/data/fishery
cat > bs_setup.dat <<'TXT'
1000
2
2
2
2
TXT
R -q -e "source('../../../R/sampler_refactor.R'); run_sampler_refactor('sam2024.dat','bs_setup.dat','results','ebswp')"

7.2 Alternate: generated-data run (`cases/ebswpSAM/Makefile`)

cases/ebswpSAM/Makefile runs:

./sam -ind data/sam<year>.dat

Example:

cd cases/ebswpSAM
make yr=2024

Use this when data/sam<year>.dat exists (no underscore naming).

7.3 Consolidated driver (year ranges)

Use the single fishery driver to run ADMB over data/samYYYY.dat and write a rolled-up summary CSV:

cd cases/ebswpSAM
Rscript EBS_sampler_fishery.R start_year=1991 end_year=2024 nbs=1 levels=1,1,1,1

Common options:

start_year, end_year: year range to run/read.
run_sampler=true|false: run ADMB, or only summarize existing results/Est_<year>.dat files.
nbs: first line for bs_setup.dat (for example 1 for point estimates, 1000 for bootstrap mode).
levels: four comma-separated sampling controls for bs_setup.dat (for example 1,1,1,1).
io=true|false: pass -nox -io to sam when needed.

Summary-only example (no ADMB rerun):

cd cases/ebswpSAM
Rscript EBS_sampler_fishery.R start_year=1991 end_year=2024 run_sampler=false

7.4 Key ebswpSAM scripts

cases/ebswpSAM/EBS_sampler_fishery.R: canonical fishery driver for year ranges in data/fishery (ADMB run + summary).
cases/ebswpSAM/EBS_sampler.R: compatibility wrapper to run the consolidated driver with default full range.
cases/ebswpSAM/EBS_sampler2021.R, cases/ebswpSAM/EBS_sampler2022.R, cases/ebswpSAM/EBS_sampler2023.R: compatibility wrappers pinned to historical end years.
cases/ebswpSAM/sampler.R: legacy helper definitions (Sampler(), SetBS()) used by earlier workflows.
cases/ebswpSAM/ebs_survey.R: survey-oriented conversion and sampler runs.
cases/ebswpSAM/imported/poll_age.sql and cases/ebswpSAM/imported/poll_len.sql: source SQL templates.

7.5 Known path caveats

Several legacy scripts in this case directory contain machine-specific setwd()/source() paths and may need local edits before execution.
Some scripts reference sampleR/R/sampler_EBSwp.R; verify that file exists in your local clone or update the source() target to the intended sampler helper.

8 Query-Based Data Preparation

These query scripts live in R/:

R/CallQueriesForSamplerData.R
R/QueryAgesForSampler.R
R/QueryLengthsForSampler.R
R/QueryCatchByStrata.R

These use AFSC/AKFIN ODBC connections and keyring credentials to generate age*.dat, len*.dat, and sam*.dat files.

9 Appendix: (Roxygen)

Source: cases/ebswpSAM/EBS_sampler_fishery.R

The table below is a concise rendering of the roxygen function documentation in the consolidated driver.

Function	Purpose	Key Parameters	Return
`get_script_dir()`	Resolve script directory for CLI or sourced execution.	none	Absolute directory path (`character`).
`as_bool(x, default = FALSE)`	Parse truthy/falsey CLI text to logical.	`x`, `default`	Logical scalar.
`parse_cli_args(args)`	Parse `key=value` CLI tokens into a named list.	`args`	Named list of string values.
`discover_control_years(fishery_dir)`	Detect available years from `samYYYY.dat` controls.	`fishery_dir`	Sorted integer year vector.
`write_bs_setup(fishery_dir, nbs, levels)`	Write `bs_setup.dat` controls for ADMB.	`fishery_dir`, `nbs`, `levels` (length 4)	Path to written file.
`run_sampler_year(year, fishery_dir, sam_bin, io = FALSE, verbose = TRUE)`	Run ADMB `sam` for one control year.	`year`, `fishery_dir`, `sam_bin`, `io`, `verbose`	Invisibly `TRUE` on success.
`read_estimates(year, fishery_dir)`	Read one `results/Est_<year>.dat` file.	`year`, `fishery_dir`	`data.table` or `NULL` if missing.
`summarize_estimates(est_dt)`	Roll up combined (`stratum=99`) annual numbers and mean age.	`est_dt`	Yearly summary `data.table`.
`run_ebswp_fishery(...)`	Main consolidated workflow for year-range run + summary.	`start_year`, `end_year`, `run_sampler`, `nbs`, `levels`, `io`, `fishery_dir`, `sam_bin`, `summary_csv`, `verbose`	Invisible list: `years`, `summary_csv`, `n_rows`.

10 Render This Documentation

From repository root:

quarto render index.qmd

This produces index.html in the repository root.

11 Push to GitHub

Typical update sequence:

git add index.qmd index.html
git commit -m "Update sampler documentation"
git push origin <branch>

If your repository uses GitHub Pages from root or gh-pages, commit index.html accordingly.

12 Troubleshooting

Package 'arrow' is required for Parquet output.
- Install arrow in your R library before running the refactor.
quarto: command not found
- Install Quarto and ensure it is on your PATH.
ADMB binary not found
- Ensure admb and/or compiled sam exists and is executable.

1 Overview

2 Repository Map

3 Input File Formats

3.1 Control File (sam*.dat)

3.2 Age File (age*.dat)

3.3 Length File (len*.dat)

3.4 Bootstrap Setup (bs_setup.dat)

4 ADMB Workflow

4.1 Build

4.2 Run

4.3 ADMB Runtime Options

4.4 ADMB Outputs

5 R Refactor Workflow

5.1 Prerequisites

5.2 Recommended Run Commands

5.3 Function Arguments

5.4 R Outputs

5.5 Notes on Paths

6 Core Equations

6.1 Length-frequency proportions

6.2 Age-length key (ALK)

6.3 FTNIR-to-TMA age mapping (2024 onward)

6.4 Conditional ALK fallback for sparse cells

6.5 Catch-at-length and catch-at-age

6.6 Mean weight-at-age and mean weight-at-length

6.7 Strata roll-up in output files

6.8 Bootstrap summary statistics

7 cases/ebswpSAM Focus

7.1 Recommended: archived fishery run (data/fishery)

7.1.1 ADMB run for one year

7.1.2 R refactor run for one year

7.2 Alternate: generated-data run (cases/ebswpSAM/Makefile)

7.3 Consolidated driver (year ranges)

7.4 Key ebswpSAM scripts

7.5 Known path caveats

8 Query-Based Data Preparation

9 Appendix: (Roxygen)

10 Render This Documentation

11 Push to GitHub

12 Troubleshooting

3.1 Control File (`sam*.dat`)

3.2 Age File (`age*.dat`)

3.3 Length File (`len*.dat`)

3.4 Bootstrap Setup (`bs_setup.dat`)

7 `cases/ebswpSAM` Focus

7.1 Recommended: archived fishery run (`data/fishery`)

7.2 Alternate: generated-data run (`cases/ebswpSAM/Makefile`)