Skip to content

wsref

In-band invisible references embedded in a document’s bytes. The out-of-band complement to critpath: a sidecar can drift from its content; bytes physically attached to the content can’t. A reference, a lineage edge, or an integrity stamp rides invisibly with the text — including across copy-paste and export to plaintext / chat / PDF, where visible metadata is lost.

Stdlib-only Python.

Payload bytes → Unicode variation selectors (U+FE00–FE0F + U+E0100–E01EF, 256 values = 1 byte each), framed in a tiny envelope (magic · version · type · length · payload · CRC32). Variation selectors modify the preceding glyph; with no applicable variant they render as nothing, so a run rides invisibly after any anchor character.

Terminal window
cd tools/wsref
# embed an invisible reference after the first match of an anchor
python3 -m wsref encode doc.md --type ref --payload "orgs/openpanel/CANON.md#thesis" --anchor "publisher" -i
python3 -m wsref decode doc.md # list embedded payloads (+ crc status)
# tamper-evident integrity (hash of the VISIBLE content, embedded invisibly)
python3 -m wsref stamp doc.md -i
python3 -m wsref verify doc.md # exit 0 = unchanged, 1 = tampered, 2 = no stamp
python3 -m wsref strip doc.md # remove every invisible codepoint
python3 -m wsref lint doc.md *.md # surface every invisible codepoint, with location
python3 -m wsref threat # danger matrix: which payload modalities are dangerous
python3 -m wsref scan doc.md *.md # classify embedded payloads; exit 1 on HIGH/CRITICAL

wsref lint reports every invisible codepoint — variation selectors, zero-width, and the Unicode tag block — with line/column. You own this channel by reading it: a channel you parse and validate by default is one you can’t be silently surprised on. Run lint in a pre-commit hook and the repo refuses to absorb invisible characters you didn’t put there.

Danger is not uniform — it’s proportional to whether a consumer actions the payload:

DangerModalityWhy
· inertintegrity stampopaque data; safe for anyone to read
▫ lowrelative reflink, provenance watermarkonly a context redirect if auto-followed
✗ highexternal-URL / path-traversal reflink, bidi overrideSSRF · exfil · redirect · Trojan-Source on resolve
☠ criticalprompt-injection text, tag-block ASCII smuggleLLM hijack on ingest

So an innocuous watermark is not a disclosure problem; reflinks and injections are. The construct defeats specific vectors in-band:

  • encode refuses to emit external/scheme/traversal reflinks (relative-path allowlist; --allow-external to override) and refuses payloads matching prompt-injection patterns. It will not author the dangerous shapes.
  • scan flags foreign carriers wsref never emits — the Unicode tag block (ASCII smuggling), stray variation-selector runs (unknown smuggled data), zero-width, and bidi overrides — and exits nonzero on HIGH/CRITICAL (wire it into CI / pre-commit).
  • CRC catches frame corruption/forgery. (Next: optional HMAC keying to cryptographically reject any payload not authored by you — the strongest defense against injected frames.)
  • For consumers of untrusted docs: wsref strip before feeding text to an LLM.

wsref threat prints the full matrix above from live fixtures (not stored payloads).

  • “Invisible” is renderer-dependent. Compliant renderers and GitHub/most editors suppress lone variation selectors; some terminals and fonts draw a .notdef box. It’s invisible in the places that matter, not magically everywhere. Test your target surface.
  • Fragile to deliberate normalization. wsref strip, Unicode-NFC pipelines that drop selectors, or aggressive sanitizers will remove the payload. That’s a feature for the defender (you can strip) and a constraint for the author (control your round-trip).
  • Disclosure is a policy choice, not the tool’s. Declared-but-quiet (a visible note says “this doc carries an invisible layer”) vs covert (watermark / leak-trace). If you watermark artifacts that reach other people, disclose it per your own responsible-disclosure norms — the tool gives you the capability and the lint to keep it honest; the policy is yours.

critpath (visible, in-repo, diff-able lineage) · landgrab (namespace clearance) · this (in-band provenance that survives leaving the repo).