ContentRX ethical framework
How I collect, attribute, and use external signal
ContentRX learns from public sources — design systems, style guides, OSS repos that demonstrate content-design craft. Those sources deserve clear rules about how their work is used. Below are the five commitments I make. If any of them fail in practice, file it against hello@contentrx.io.
Commitment 1
Transparency
You can see every source I've ingested, when it was last crawled, and what it contributed.
The /sources page lists every design system, style guide, and OSS repo that has informed ContentRX. For each entry you'll see the last crawl timestamp, the license, and how the source contributed — moment weights, standard influences, examples corpus, or training signal. Hidden contributions would be indistinguishable from theft; the page is the accountability surface.
Calibration reporting is the other half: /accuracy shows measured system κ and measured self-drift κ with 95% confidence intervals, the design target stated separately, and per-standard breakdowns. No composite “accuracy score.”
Commitment 2
Attribution
Examples drawn from public sources are cited with source, commit or URL, and license.
When a ContentRX standard is influenced by a specific design system or style guide, the influence is recorded on the standard itself. When examples in the docs or product come from a public source, the source is named inline — with a commit hash or URL when the exact revision matters. Anyone can trace a rule back to its lineage.
Commitment 3
Respect
robots.txt, rate limits, a named bot, and a real opt-out path.
The external-signal crawler identifies itself as contentrx-research-bot with a contact URL in the user-agent string. It honors robots.txt. It rate-limits requests so no host sees ContentRX as a spike. Projects that ask to be excluded are excluded — and signal already derived from them is removed from subsequent training.
To opt out, email hello@contentrx.io with subject line [OPTOUT] <source name>. I confirm receipt within a week and land the removal in the following release cycle. If you haven't heard back within seven days, email again — something went wrong with the routing on my end.
Commitment 4
License-awareness
Permissive licenses default-in with attribution. GPL takes case-by-case review. No verbatim reproduction without credit.
- Permissive licenses — MIT, Apache 2.0, BSD, CC-BY — are default-in as training signal with attribution on /sources.
- GPL code is not ingested as training data without case-by- case review. The derivative-work surface on GPL is non-trivial and I err on the side of not creating it.
- Source strings are never reproduced verbatim as product output without attribution. If a standard's example is lifted directly from a public source, the source is named at the point of use.
Commitment 5
PII avoidance
User-submitted strings evaluated ephemerally by default. Stored evaluations are hashed.
- User text submitted to the engine is evaluated and then discarded by default. The evaluation result returns; the text doesn't persist.
- When evaluations ARE stored (logged violations, team analytics), the text is replaced with its sha256 hash before it touches the database. Raw strings never reach disk.
- The override dataset — what informs future rule calibration — carries no user identity. Only the actor's role-bucket (designer / engineer / PM / other) rides along for signal weighting.
How to opt out
If you maintain a project and don't want ContentRX to train on it, email hello@contentrx.io with subject [OPTOUT] <source name>. Include the repo URL or style-guide URL you're speaking for. No justification required.
The commitment on my side: confirm receipt within a week, stop fresh crawls in the next cycle, and best-effort remove already-derived signal in the release that follows.