Accountability surface

Accuracy, reported honestly

Three numbers govern how ContentRX evaluates its own calibration. They are kept separate on purpose — a single “accuracy score” would obscure the self-drift ceiling and misrepresent what the measurement can actually say. This follows Model Cards (Mitchell et al., 2019) guidance on honest metric reporting with intervals and disaggregation.

Built 2026-04-25. Snapshot from 2026-04-24.

Measured system κ

System vs Robo’s held-out golden verdicts

pending

no standards have completed the weekly kappa series

Measured self-drift κ

Robo vs past-Robo (quarterly blind re-label)

pending

Session 7 drift panel awaiting blind re-label + score

Design target κ

A design assumption, not a measurement

0.900

Design assumption · stated separately from measurements

Graduation ladder

Every standard starts at robo_labels. It graduates when (a) its measured weekly κ stays above the threshold derived from the self-drift ceiling and (b) enough novel counterparts have been seen across moments and content types. Thresholds adjust automatically when the ceiling re-measures; see the graduation dashboard for the mechanics.

robo_labels
43
batch_approval
0
autonomous
0

autonomous threshold κ ≥ 0.846 · batch_approval threshold κ ≥ 0.747

Per-standard measurements

43 standards tracked. Cells show the per-standard κ alongside a sparkline of the last weekly measurements. “Pending” means the weekly κ series hasn't been populated yet — never zero, never filled from the design target.

StandardLevelκ (95% CI)nTrend
ACC-01robo_labelspending
ACC-02robo_labelspending
ACC-05robo_labelspending
ACC-07robo_labelspending
ACT-01robo_labelspending
ACT-02robo_labelspending
ACT-03robo_labelspending
ACT-04robo_labelspending
CLR-01robo_labelspending
CLR-02robo_labelspending
CLR-03robo_labelspending
CLR-04robo_labelspending
CLR-05robo_labelspending
CON-01robo_labelspending
CON-02robo_labelspending
CON-03robo_labelspending
CON-04robo_labelspending
GRM-01robo_labelspending
GRM-02robo_labelspending
GRM-03robo_labelspending
GRM-04robo_labelspending
GRM-05robo_labelspending
GRM-06robo_labelspending
PRF-01robo_labelspending
PRF-03robo_labelspending
PRF-04robo_labelspending
PRF-07robo_labelspending
PRF-09robo_labelspending
PRF-10robo_labelspending
PRF-11robo_labelspending
STR-01robo_labelspending
STR-03robo_labelspending
STR-04robo_labelspending
TRN-01robo_labelspending
TRN-02robo_labelspending
TRN-04robo_labelspending
TRN-06robo_labelspending
TRN-07robo_labelspending
VT-01robo_labelspending
VT-02robo_labelspending
VT-03robo_labelspending
VT-04robo_labelspending
VT-05robo_labelspending

Known failure modes

Things ContentRX doesn't reliably catch yet, or reporting choices that might otherwise look like bugs.

Review queue phase

No standards have populated kappa series yet. The review queue is in its seeding phase — Robo is annotating the industry corpus.

Current phase: early