wsref
In-band invisible references embedded in a document’s bytes. The out-of-band complement to
critpath: a sidecar can drift from its content; bytes physically attached to the content
can’t. A reference, a lineage edge, or an integrity stamp rides invisibly with the text —
including across copy-paste and export to plaintext / chat / PDF, where visible metadata is
lost.
Stdlib-only Python.
How it works
Section titled “How it works”Payload bytes → Unicode variation selectors (U+FE00–FE0F + U+E0100–E01EF, 256 values =
1 byte each), framed in a tiny envelope (magic · version · type · length · payload · CRC32).
Variation selectors modify the preceding glyph; with no applicable variant they render as
nothing, so a run rides invisibly after any anchor character.
Commands
Section titled “Commands”cd tools/wsref# embed an invisible reference after the first match of an anchorpython3 -m wsref encode doc.md --type ref --payload "orgs/openpanel/CANON.md#thesis" --anchor "publisher" -ipython3 -m wsref decode doc.md # list embedded payloads (+ crc status)
# tamper-evident integrity (hash of the VISIBLE content, embedded invisibly)python3 -m wsref stamp doc.md -ipython3 -m wsref verify doc.md # exit 0 = unchanged, 1 = tampered, 2 = no stamp
python3 -m wsref strip doc.md # remove every invisible codepointpython3 -m wsref lint doc.md *.md # surface every invisible codepoint, with location
python3 -m wsref threat # danger matrix: which payload modalities are dangerouspython3 -m wsref scan doc.md *.md # classify embedded payloads; exit 1 on HIGH/CRITICALObservability is the default
Section titled “Observability is the default”wsref lint reports every invisible codepoint — variation selectors, zero-width, and the
Unicode tag block — with line/column. You own this channel by reading it: a channel you parse
and validate by default is one you can’t be silently surprised on. Run lint in a pre-commit
hook and the repo refuses to absorb invisible characters you didn’t put there.
Threat model & in-construct defenses
Section titled “Threat model & in-construct defenses”Danger is not uniform — it’s proportional to whether a consumer actions the payload:
| Danger | Modality | Why |
|---|---|---|
| · inert | integrity stamp | opaque data; safe for anyone to read |
| ▫ low | relative reflink, provenance watermark | only a context redirect if auto-followed |
| ✗ high | external-URL / path-traversal reflink, bidi override | SSRF · exfil · redirect · Trojan-Source on resolve |
| ☠ critical | prompt-injection text, tag-block ASCII smuggle | LLM hijack on ingest |
So an innocuous watermark is not a disclosure problem; reflinks and injections are. The construct defeats specific vectors in-band:
encoderefuses to emit external/scheme/traversal reflinks (relative-path allowlist;--allow-externalto override) and refuses payloads matching prompt-injection patterns. It will not author the dangerous shapes.scanflags foreign carriers wsref never emits — the Unicode tag block (ASCII smuggling), stray variation-selector runs (unknown smuggled data), zero-width, and bidi overrides — and exits nonzero on HIGH/CRITICAL (wire it into CI / pre-commit).- CRC catches frame corruption/forgery. (Next: optional HMAC keying to cryptographically reject any payload not authored by you — the strongest defense against injected frames.)
- For consumers of untrusted docs:
wsref stripbefore feeding text to an LLM.
wsref threat prints the full matrix above from live fixtures (not stored payloads).
Honest caveats (no overclaiming)
Section titled “Honest caveats (no overclaiming)”- “Invisible” is renderer-dependent. Compliant renderers and GitHub/most editors suppress
lone variation selectors; some terminals and fonts draw a
.notdefbox. It’s invisible in the places that matter, not magically everywhere. Test your target surface. - Fragile to deliberate normalization.
wsref strip, Unicode-NFC pipelines that drop selectors, or aggressive sanitizers will remove the payload. That’s a feature for the defender (you can strip) and a constraint for the author (control your round-trip). - Disclosure is a policy choice, not the tool’s. Declared-but-quiet (a visible note says
“this doc carries an invisible layer”) vs covert (watermark / leak-trace). If you watermark
artifacts that reach other people, disclose it per your own responsible-disclosure norms —
the tool gives you the capability and the
lintto keep it honest; the policy is yours.
Sits beside
Section titled “Sits beside”critpath (visible, in-repo, diff-able lineage) · landgrab (namespace clearance) · this
(in-band provenance that survives leaving the repo).