P
PERSEUS Forest Intelligence
Methods note · 2026-05-29

Inventory stratification dominates the libcbm vs GCBM engine gap

A five-state intercomparison of libcbm and GCBM under the CBM-CFS3 modeling family finds that the previously reported +24% engine gap is dominated by how the libcbm bundle stratifies state forest inventory, not by fundamental differences between the two engines.

PERSEUS multi-model framework · GCBM2hpc pipeline · Aaron Weiskittel et al. · github.com/holoros/GCBM2hpc

TL;DR

The +24% libcbm-under-GCBM gap reported across Maine, Minnesota, Indiana, Washington, and Georgia is reproduced when the libcbm bundle allocates equal area to every (forest type, ecoregion, owner) stratum. Replacing the equal allocation with FIA EXPNS expansion factors — the canonical FIA Total Area Estimator — brings Minnesota to essentially exact parity (libcbm-over-GCBM ratio 0.999). The other four states show smaller, real engine differences: Maine 0.71, Washington 1.20, Indiana 1.34, Georgia 1.52. Most of what was called the engine gap was actually an inventory-stratification artifact; the real cross-engine uncertainty is state-dependent and much smaller. Boudewyn component proportions and F3 Q10 temperature scaling tested independently shift the ratio by less than 0.003 each.

Headline numbers

States
48
Complete lower-48 CONUS coverage
CONUS forest area
275 Mha
All lower-48 states
CONUS C stock
60.3 PgC
B1.3 FIA EXPNS, 219.6 Mg/ha mean
B1.1 ratio range
0.74 – 1.05
uniform-FT inventory
B1.2 ratio range
1.05 – 1.67
pixel-weighted inventory
B1.3 ratio range
0.71 – 1.52
FIA EXPNS, MN at parity
Stage 2 shift
<0.003
FIA Boudewyn proportions

The finding

Bar chart of libcbm/GCBM year-5 carbon density ratio for ME, MN, IN, WA, GA under three inventory stratification hypotheses. B1.1 uniform-FT and Stage 2 sit in the previously reported +24% cluster around 0.74-0.79; B1.2 pixel-weighted moves every state above parity, with GA reaching 1.67.
Figure 1. Five-state libcbm-over-GCBM ratio under four inventory stratification hypotheses. The legacy B1.1 uniform allocation (dark purple) reproduces the previously reported +24% cluster (blue band). Stage 2 (blue) replaces Ontario donor Boudewyn proportions with FIA-fit empirical proportions per species and barely moves the ratio. B1.2 (teal) weights by TreeMap pixel counts and overshoots in four of five states. B1.3 (green) uses canonical FIA EXPNS expansion factors and brings Minnesota to parity (0.999); the residual state-dependent gap is the real engine-comparison signal.

How we got here

The five-state pilot in PERSEUS established three sequential controls. First, we refit Boudewyn vol-to-biomass component proportions per species from each state's FIA TREE panel (DRYBIO_STEM / DRYBIO_STEM_BARK / DRYBIO_BRANCH / DRYBIO_FOLIAGE relative to DRYBIO_AG), replacing the Ontario Mixedwood Plains donor values that warm-climate states inherit by default. Across all five states, this Stage 2 refit moves the libcbm-over-GCBM ratio by less than ±0.003. Component proportions are not the lever.

Second, we patched Georgia's spatial-unit mean annual temperature across the range 6.32 to 17.46 °C, sweeping the F3 Q10 decay rate scaling that warm states inherit from the cold donor. Lowering MAT toward the donor (6.32 °C) raises the ratio from 1.05 to 1.10, the opposite direction needed to collapse Georgia into the cluster. DOM accumulates as decay slows; live carbon is unaffected. F3 Q10 is not the lever either.

GA F3 Q10 sensitivity sweep line plot. Lowering mean annual temperature from 17.46C to 6.32C raises the libcbm/GCBM ratio from 1.05 to 1.10. The +24% cluster band sits at 0.74-0.79 (blue).
Figure 2. Georgia F3 Q10 sensitivity sweep. The spatial-unit mean annual temperature drives Q10 decay scaling at runtime on the global decay parameter table. Sweeping from baseline (17.46 °C) to the Ontario donor MAT (6.32 °C) lifts the ratio from 1.05 to 1.10 as DOM accumulates. Live carbon is unchanged. The sweep rules out F3 Q10 as the explanation for GA's distinctness from the cluster.

Third, we decomposed each state's GCBM-minus-libcbm gap into live and DOM components. The +24% gap turns out to be heterogeneous per state. WA is dominated by live carbon (+53.9 Mg/ha vs +18.7 DOM); MN is DOM-dominated (+39.4 vs +11.2); IN balanced; ME DOM-leaning; GA opposite-sign (libcbm overstates live by 6.6 Mg/ha). No single per-pool explanation accounts for the cluster.

Per-pool decomposition of GCBM minus libcbm year-5 carbon density. WA has the largest live gap (+53.9 Mg/ha); MN largest DOM gap (+39.4); GA is the lone opposite-sign case (-6.6 live).
Figure 3. Per-pool decomposition of the engine gap (GCBM year-5 density minus libcbm year-5 density). Positive values mean GCBM higher. WA is live-dominated (+53.9 vs +18.7); MN is DOM-dominated (+39.4 vs +11.2); ME and IN are mixed. GA is the lone opposite-sign case. The heterogeneity motivated looking inside the bundle builder for a structural rather than science-level explanation.

The smoking gun

In build_libcbm_state_bundle.py, the line

area_per_stratum = total_area_ha / len(inv_tuples)

allocated equal area to every (FT, eco, owner) tuple, regardless of real spatial composition. Verified across all five states: the coefficient of variation of per-FT area share was exactly 0.000 in every case. The TreeMap raster stack underneath the GCBM run honors real spatial composition (Pacific Northwest is Douglas-fir dominated; Lake States are aspen + spruce-fir dominated; etc.). When libcbm then replayed a stratum-mean carbon trajectory, the strata it was averaging over differed from the strata GCBM was running spatially. The two engines were modeling different inventories.

WA gave the cleanest signal: libcbm yield at age 56 ranged from 7.5 m³/ha (FT=160) to 170.8 m³/ha (FT=100), a 24-fold range. Real WA forest is Douglas-fir dominated (well over half the state). Equal weighting flattened the live-biomass mean below GCBM's spatial mean by about 35 Mg/ha. That is the WA +54 Mg/ha live gap in Figure 3, almost in full.

The patches

B1.2 (commit 33f2fc8) reads per-FT pixel counts from pixel_attributes.csv (already loaded for the age distribution work in B1.1) and weights each stratum's area proportional to its FT's pixel share, splitting equally among (eco, owner) sub-strata of that FT. This overshoots in 4/5 states because the TreeMap Count column is not the canonical FIA expansion factor — it reflects how many 30 m pixels were assigned to each imputed plot, not the FIA stratified sample design.

B1.3 (commit 3122159) adds compute_fia_expns_areas.py, a state-portable computer for the canonical FIA Total Area Estimator. For a given state and EVALID, it joins COND → POP_PLOT_STRATUM_ASSGN → POP_STRATUM and sums CONDPROP_UNADJ × EXPNS per FORTYPCD, then aggregates to FT group. The bundle builder consumes the per-FT-group CSV and weights inventory by the resulting hectares. The result for Minnesota is parity. The result for WA, IN, GA is a real, much smaller engine gap that varies state by state.

StateGCBM yr-5 (Mg/ha) B1.1 ratioStage 2 ratioB1.2 ratio B1.3 FIA EXPNSdirection
ME306.60.7510.7510.7510.712residual gap below cluster
WA283.80.7440.7441.2031.200real engine overshoot
IN211.80.7860.7861.3151.336real engine overshoot
MN209.20.7580.7581.0540.999parity
GA124.91.0511.0481.6681.516warm-donor Stage 1 placeholder
B1.3 replaces the TreeMap pixel weighting with the canonical FIA Total Area Estimator. Minnesota reaches essentially exact parity (0.999, a 0.14% gap). The residual gap in the other states is the real engine signal, much smaller than the +24% originally reported and notably state-dependent. WA at 1.20 and IN at 1.34 are consistent with Pacific NW Douglas-fir yield and Central Hardwood region disturbance regime differences between spatial GCBM and stratum-mean libcbm. GA at 1.52 is consistent with the warm-donor Stage 1 placeholder AIDB and is expected to compress under a future Stage 2 calibration paper for subtropical donors.

Why this matters

Multi-model forest carbon intercomparisons routinely report inter-engine spread in carbon stocks of 20-40% even at the same spatial scope and the same input data. The implicit assumption is that this spread tracks fundamental differences in how each engine represents growth, decay, and disturbance. PERSEUS's finding here is that one important component of that spread — the libcbm-vs-GCBM gap on CBM-CFS3 implementations — is substantially attributable to how the input inventory is stratified for each engine. That is, two engines running the same science with the same forcing can disagree by 20-30% if one represents inventory spatially and the other replays a stratum mean built with the wrong weights.

The implication for the methods paper is large: the engine gap is real and worth reporting, but the magnitude is highly sensitive to a choice that is easy to overlook in tooling. Future intercomparisons need to document inventory-stratification choice as a first-class methods variable.

CONUS-complete: n=48 finalized (2026-05-30)

Phase 5 lands the last 8 lower-48 states: ND, SD, NE, KS (Plains; Boreal Plains donor for ND/SD/NE, Mixedwood Plains for KS), DE, MD (Mid-Atlantic; Mixedwood Plains), and AZ, NM (arid Southwest; Mixedwood Plains as warm Stage 1 placeholder with large F3 stretch, same posture as GA). The pipeline now runs end to end across the complete lower-48 CONUS forest area.

Regionn statesForest area (Mha) Total C (TgC)Mean density (Mg/ha)
Pacific NW334.411,557336
Northeast1129.17,429255
Mountain West1160.813,316219
Midwest721.74,456205
Lake States321.94,262195
South + Southeast13106.819,319181
CONUS lower-4848 274.860,339220

The 60.34 PgC CONUS total under B1.3 FIA EXPNS-weighted libcbm sits in the literature range: Pan et al. 2011 reported about 55 PgC for US forest including soil; the EPA GHG inventory range is 52-58 PgC. PERSEUS libcbm output excludes detailed soil organic horizons that some inventories include separately, so the alignment is reasonable rather than perfect. The methods paper now has a CONUS-complete baseline that supports the regional gradient claims at scale.

CONUS-scale finding: B1.1 vs B1.3 at n=40 (2026-05-30)

With 40 states in hand, the inventory-stratification finding scales nationally. Rerunning all 40 with the legacy B1.1 uniform-FT inventory and comparing against B1.3 FIA EXPNS shows that the stratification choice alone adds +6.76 PgC to the CONUS total carbon stock — a +14% shift on 246.5 Mha of forested area.

RegionForest area (Mha) B1.1 stock (TgC)B1.3 stock (TgC) B1.1 mean (Mg/ha)B1.3 mean (Mg/ha)
Pacific NW (WA, OR, CA)34.49,17611,557267336
Northeast (9 states)27.96,6817,175239257
Mountain West (6 states)36.27,6518,264212229
Midwest (6 states)19.33,3524,001174207
Lake States (MN, WI, MI)21.93,6544,262167195
South + Southeast (13 states)106.817,30519,319162181
CONUS246.5 47,81854,578 194221

Two regional patterns: the Pacific Northwest gets the biggest mean-density boost (267 → 336 Mg/ha; the +26% reflects how strongly Pacific Doug-fir weighting matters), and the South + Southeast adds the largest absolute stock (+2 PgC) because of its scale. Per-state B1.3 / B1.1 ratios span 0.88 (TX, WY where uniform overestimated) to 1.94 (IN where the spatially dominant oak-hickory yields very different from the equal-weight mean). The 14% CONUS shift lands close to the +24% libcbm-under-GCBM gap that originally motivated this work, supporting the methods paper claim that stratification choice is the dominant component of the cross-engine uncertainty.

Phase 2 + 3 + 4 update: n=40 (2026-05-30)

The pipeline now spans 40 states across the conterminous US. Phase 4 added 17 Southern and Midwestern states (AL, AR, FL, KY, LA, MS, NC, SC, TN, TX, VA, WV, OK, OH, IL, IA, MO) in a single parallel batch. Five Phase 4 states (AL, FL, MS, SC, TN) had pre-existing legacy FIA panels that lacked the DRYBIO_STEM_BARK column the Boudewyn fitter requires; auto-refreshed via rFIA. With the working pipeline + auto-generator, adding the remaining 8 lower-48 states (Plains + Mid-Atlantic + AZ/NM) is mostly mechanical.

Top of range (PNW Doug-fir + Atlantic Maritime northern hardwood): OR 386, VT 347, WA 341, NY 321, NH 300 Mg/ha. Bottom (warm-donor + dry-sparse forest): TX 146, LA 163, AL 175, FL 180 Mg/ha. The full sorted matrix is in the n=40 CSV linked below. The 23-state listing earlier on this page remains visible above as the Phase 2 + Phase 3 reference; Phase 4 numbers are summarized rather than tabled in full to keep the page navigable.

Phase 2 + Phase 3 update: n=23 (2026-05-30)

The pipeline now runs end-to-end across 23 states under canonical FIA EXPNS B1.3 inventory weighting. Phase 2 added 10 Northeast and Lake States; Phase 3 added 8 Pacific Northwest and Mountain West states (OR, ID, MT, WY, CO, UT, NV, CA). The n=23 libcbm year-5 total carbon densities span 187 to 386 Mg/ha — a 2.07x range. The auto-config generator (tools/generate_state_config_template.py) produces state_config.yml + ft_species_composition.yml from a FIA panel and a one-row STATE_META entry; new donors (Pacific Maritime, Montane Cordillera, Boreal Plains) handle PNW and Mountain West climate.

Statetierlive (Mg/ha)DOM (Mg/ha)total (Mg/ha)
ORPhase 3135.3250.6385.8
VTPhase 279.6267.1346.7
WApilot113.2227.3340.5
NYPhase 268.2252.5320.6
NHPhase 266.7233.3300.0
CAPhase 382.6205.1287.8
IDPhase 373.9211.8285.7
INpilot111.6171.2282.8
NVPhase 364.3187.7252.0
MTPhase 374.2175.8250.0
UTPhase 367.9166.3234.2
RIPhase 259.7165.4225.1
NJPhase 255.2165.0220.2
MEpilot52.3166.0218.4
CTPhase 250.9162.2213.1
WYPhase 360.9151.1212.1
MNpilot66.3142.6208.9
PAPhase 249.5158.1207.6
MAPhase 252.7150.6203.2
COPhase 354.4139.9194.3
WIPhase 247.4142.5189.9
GApilot90.399.0189.3
MIPhase 247.7138.9186.6
Oregon and California at the high end carry Pacific Northwest Doug-fir / hemlock with the largest absolute live biomass (OR 135 Mg/ha live alone). VT/NH/NY/WA reflect the cool moist Atlantic Maritime + Pacific Maritime biome with dense northern hardwood / mixed conifer + heavy DOM. Lake States WI/MI at the low end pair the boreal-leaning donor with frequent disturbance. The arid Mountain West (CO/UT/NV/WY) sits mid-range with moderate live biomass and low-moderate DOM. GA at the bottom is consistent with the warm-donor Stage 1 placeholder. With GCBM aggregates pending for Phase 2 and Phase 3 states (statewide SLURM chains queued separately), the n=23 libcbm-vs-GCBM ratio matrix is the remaining deliverable for the methods paper.

CONUS extension

With Phase 2 in hand, the CONUS extension is a known quantity. The remaining ~30 lower-48 states each need: an FIA download (~5-30 min), a one-row STATE_META entry in the generator (~5 min hand work), and a mechanical chain through the calibration + libcbm pipeline (~30 min). A 48-state matrix is reachable in 3-6 weeks of wallclock time with the calibration completed. GCBM-side aggregates need separate statewide SLURM chains (12-24 h each) for the full libcbm-vs-GCBM ratio matrix.

The phased plan (full document in the project hand-off):

1
Calibration (completed 2026-05-30)
Pre-scaling engineering, methodological
FIA EXPNS calibration of inventory stratification is shipped as B1.3 (commit 3122159) and validated against the five-state pilot. Remaining Phase 1 work: extend the species crosswalk to cover SW and Pacific species, auto-generate state configuration templates from FIA per-state aggregates, build an AIDB donor mapping decision tree for warm states.
2
Lake States + Northeast (4-6 weeks)
10 states added, n=15 total
WI, MI, NY, PA, VT, NH, MA, CT, RI, NJ. Most similar to the existing ME and MN templates, lowest donor uncertainty. Publishable result for the methods paper.
3
Pacific NW + Mountain West (4-6 weeks)
8 states added, n=23 total
OR, ID, MT, WY, CO, UT, NV, CA. Donors stretch but stay defensible (BC + AB). Adds the WA-style live-biomass-dominated gap to the cross-state matrix.
4
South + Southeast (6-8 weeks)
17 states added, n=40 total
AL, AR, FL, KY, LA, MS, NC, SC, TN, TX, VA, WV, OK, OH, IL, IA, MO. Phase 1 calibration work pays off for the warm-donor problem.
5
Plains + remaining (3-4 weeks)
6 states added, n=46 total
ND, SD, NE, KS, DE, MD. AZ and NM held for a warm-donor Stage 2 paper.
6
Synthesis (3-4 weeks)
CONUS-wide analysis, methods paper
Regional gradients in inventory artifact magnitude. Comparison against EPA GHG inventory + NCASI national-scale benchmarks. Submission.

Total estimated wallclock: 22-31 weeks (~5.5-8 months) for n=48. The publishable n=15 milestone is reachable in 8-10 weeks with Phase 1 calibration completed first. The recommendation in the project hand-off is to ship the methods paper at n=15 and treat n=48 CONUS as a follow-on regional-gradients paper.

Data and code

📦 GCBM2hpc pipeline 📊 PERSEUS explorer 📋 Per-state run order 📈 Figure generator (R)

Acknowledgments and provenance

PERSEUS is the multi-model forest carbon intercomparison framework at the Center for Research on Sustainable Forests (CRSF), University of Maine. Compute on the Ohio Supercomputer Center Cardinal cluster (allocation PUOM0008). The engines compared here are CBM-CFS3 (Canadian Forest Service), libcbm (CFS Python/C++ reimplementation), and GCBM on moja FLINT (spatially explicit per-pixel implementation, run on SLURM).