Skip to content

The Clean Shop Floor

UCCA Data Architecture — Ownership, Provenance and Zero Retention

10 March 2026 | Tim Rignold + Claude

Strategic document. Investor-ready. Enterprise-ready. Defence-ready. Not a build brief.


1. The Five Principles

Everything in this document flows from five principles. If a design decision conflicts with any of these, the decision is wrong.

P1 — Your data is yours. Content generated by the UCCA engine belongs to the client. UCCA facilitated its creation. UCCA does not own it, does not hold it, and does not have rights over it. The client's R2 bucket is the sole store of record.

P2 — We do not housekeep other people's curiosities. UCCA is not a data warehouse. We run jobs, not archives. Every job produces an envelope that goes to the client. UCCA retains only what is necessary to issue an invoice and defend a billing dispute. Nothing else.

P3 — The shop floor is clean after every job. When a job completes, nothing remains in UCCA's systems except the transaction record. No content. No envelopes. No provenance chains. The signing key proves we were there. The transaction record proves what it cost. That is all UCCA needs.

P4 — Verification without possession. UCCA can prove any envelope is genuine without holding a copy of it. The client presents the envelope. UCCA verifies the cryptographic signature. The public key is published openly so anyone can verify independently — including auditors, regulators, and defence procurement officers — without UCCA being in the room.

P5 — Storage is the client's infrastructure decision. The default is a Cloudflare R2 bucket provisioned at onboarding, credentials transferred immediately, relationship directly with Cloudflare. But the client can nominate any storage backend — their AWS, their Azure, their on-premise server, their classified government cloud. UCCA writes to wherever they nominate. UCCA does not own the bucket, the credentials, or the relationship.

The One-Line Position: UCCA runs the job, hands you everything it produced, wipes the floor, and locks the door. Your data never lives here. Our shop floor is clean after every job.


2. The Job Envelope

Every job in the UCCA system travels with a job envelope. The envelope is created at the moment of client initiation and travels through every stage of the system, collecting stamps as it goes. The final signed envelope is the complete, self-contained, tamper-proof record of everything that happened.

The envelope is both the routing instruction and the audit trail. Same object, dual purpose.

2.1 — Envelope structure

Field Type Description
envelope_id UUID Globally unique. Referenced in billing record. Client uses this to request regeneration or modification.
parent_envelope_id UUID / null If this job is a modification or recontextualisation of a prior job, references the parent envelope ID.
schema_version string Envelope format version. Ensures future envelope readers can parse historical records.
initiated_at ISO timestamp When the client placed the order. L4 timestamp.
client_id string The client who initiated the job.
world_id string Which world — rtopacks, defence, usapacks etc.
order_spec object What was ordered — output formats, cultural flavour, subprocessor preference, contextualisation parameters.
source_unit_code string The TGA or other corpus unit code this job was generated from.
triumvirate_hash string SHA-256 of the Triumvirate object that was fed to the engine. Proves the input was not altered.
engine_receipt object Stamped by engine on receipt — model used, engine version, queue position, receipt timestamp.
engine_completion object Stamped by engine on completion — quality contract result (PASS/FAIL per element), token counts, generation time, cost.
renderer_outputs array One entry per output module invoked — PDF, SCORM, script, VO, video. Each entry has its own completion stamp and output reference.
storage_destination string Where the envelope and outputs were written. Client's nominated storage backend.
signing_key_version string Which version of UCCA's signing key was used. Enables verification after key rotation.
signature string Cryptographic signature of the complete envelope. Produced by UCCA's private key. Verifiable against published public key.

2.2 — The envelope lifecycle

Stage Actor What happens Envelope state
1 L4 Client Client places order. Envelope created with order spec, client ID, world ID, initiated_at timestamp. Created — unsigned
2 L3 → L2 → L1 Envelope travels up the layer stack. Each layer validates permissions and can add context within its authority. Each touch is logged to the transaction record. Validated — countersigned per layer
3 Engine Engine receives envelope. Stamps engine_receipt. Processes Triumvirate. Generates content bundle. Stamps engine_completion with quality contract result, cost, model used. Engine complete — content bundle attached
4 Renderer Layer Renderer reads requested outputs from order_spec. Routes to each output module. Each module stamps its result onto renderer_outputs array. Rendered — all outputs complete
5 Storage Adapter Final envelope assembled. UCCA signs with private key. Envelope + all outputs written to client's nominated storage. Signing key version recorded. Signed — written to client store
6 Transaction Log UCCA records transaction: envelope_id, client_id, model, token counts, cost, timestamp, signing key version. Nothing else. UCCA retains transaction record only

2.3 — Regeneration and modification

If a client wants to regenerate a prior job identically, or modify and recontextualise it, they provide the original envelope from their storage. UCCA verifies the signature, reads the original Triumvirate hash and order spec, and creates a new envelope with parent_envelope_id referencing the original. The lineage is preserved in the envelope chain, not in UCCA's systems.


3. The Renderer Layer

The engine's job ends when the content bundle is produced. The renderer layer's job begins there. The renderer reads the order spec from the envelope and routes the content bundle to the appropriate output modules.

Each output module is discrete, swappable, and independently versioned. The router does not care what modules exist — it reads the order spec and invokes whatever is registered.

3.1 — Output modules

Module Status What it produces
PDF Renderer Exists — extract Formatted course document. Currently in engine — needs extraction to renderer layer.
SCORM Packer Exists — extract LMS-ready SCORM 1.2 package. 780 lines exist in backend/scripts/export_scorm_12.py — needs extraction.
Script Writer Exists — extract Narration-ready video script. Currently in generator/video_script.py — needs extraction.
Workbook Generator Planned Structured assessment workbook from Triumvirate outcomes. ~2hr build. Requires schema restructure first.
VO Generator Future Audio files from script. Third-party TTS integration. Module interface defined now, implementation later.
Video Processor Future Assembled video modules from VO + content. Complex pipeline. Module interface defined now.
Raw Stream Planned Content bundle returned as structured JSON stream. For clients who want the data and will render it themselves.
Marketing Copy Exists — extract Sales copy, SEO description, video script for marketing. Currently in generator/marketing.py.

3.2 — Why the renderer layer increases throughput

The engine is almost certainly single-threaded — one job processes at a time through the content generation pipeline. This is correct for the generation stage, which is inherently sequential (each module builds on the prior one).

But rendering is embarrassingly parallel. Once the content bundle exists, the PDF renderer, SCORM packer, and script writer can all run simultaneously. They don't depend on each other. Keeping them inside the engine serialises work that doesn't need to be serial.

Extracting the renderer layer means the engine completes its job faster, returns to the queue sooner, and the output modules run in parallel on the content bundle. Throughput increases without touching the engine's core generation logic.


4. The Storage Adapter

The storage adapter is the interface between the renderer layer and wherever the client's data lives. Every storage backend implements the same interface. The renderer layer does not know or care which backend is active.

4.1 — The adapter interface contract

Any storage backend must implement these four operations:

  • write(envelope_id, path, data) — write a file to the client's store at the given path
  • read(envelope_id, path) — read a file from the client's store (used for regeneration)
  • exists(envelope_id, path) — check if a file exists without reading it
  • list(envelope_id, prefix) — list files under a given prefix

That is the complete interface. Four operations. Any backend that implements these four operations is a valid storage adapter.

4.2 — Storage backends

Backend Target client Status Notes
Cloudflare R2 Default — all clients Built Provisioned at onboarding. Credentials transferred immediately. Client owns the bucket. UCCA has no ongoing access.
AWS S3 Enterprise / AWS-native clients Planned Client provides bucket name and write credentials. UCCA writes, does not read except on explicit regeneration request.
Azure Blob Storage Enterprise / Microsoft-native clients Planned Same pattern as S3. Client credentials, client bucket.
AWS GovCloud / classified Defence clients Future Requires security assessment. Interface is identical — backend implementation handles classification controls.
On-premise / private server Sovereign / air-gapped clients Future SFTP or S3-compatible interface. Client manages their own infrastructure. UCCA writes via agreed protocol.
Client Cloudflare account Clients with existing CF account Planned Client provides R2 credentials from their own CF account rather than UCCA-provisioned bucket.

4.3 — Onboarding flow

At onboarding, the client is asked: where do you want your data?

  • Default — UCCA provisions an R2 bucket, transfers credentials to the client, relationship ends there. UCCA has no ongoing access to the bucket.
  • Bring your own — client provides storage credentials for their preferred backend. UCCA configures the appropriate adapter. UCCA never owns the storage relationship.

Credential transfer is a one-time event. When UCCA provisions an R2 bucket for a client, the credentials are transferred to the client at onboarding and UCCA's copy is deleted. From that point forward the client is the sole credential holder. UCCA cannot access the bucket. This is not a policy — it is a technical fact.


5. The Signing Key Model

UCCA holds exactly one thing that relates to client data: the private signing key. This key is used to sign the final envelope before it is written to the client's storage. It proves the envelope was produced by UCCA's engine and was not tampered with after leaving UCCA's systems.

5.1 — Key architecture

  • Private key — held by UCCA in Cloudflare Secrets Manager. Never transmitted. Never stored alongside content. Used only to sign completed envelopes.
  • Public key — published openly at ucca.online/.well-known/signing-key-v{N}.pub. Anyone can verify a UCCA-signed envelope independently without contacting UCCA.
  • Key version — every envelope records which key version signed it. Historical envelopes remain verifiable after key rotation.

5.2 — What verification looks like

An auditor, regulator, or client presents an envelope and wants to verify it is genuine:

  1. Read the signing_key_version field from the envelope
  2. Fetch the corresponding public key from ucca.online/.well-known/signing-key-v{N}.pub
  3. Verify the envelope's signature field against the public key
  4. Signature validates → envelope is genuine, unaltered, produced by UCCA

UCCA does not need to be contacted. The auditor does not need access to UCCA's systems. The verification is entirely self-contained. This is the correct posture for a defence or enterprise audit.

5.3 — Key rotation policy

  • Keys are rotated on a defined schedule — annually at minimum, immediately on any suspected compromise
  • Old public keys are never removed from the well-known endpoint — they are retained permanently so historical envelopes remain verifiable
  • The current active key version is published at ucca.online/.well-known/signing-key-current.json
  • Each new key version increments the version number — v1, v2, v3 etc.

6. What UCCA Keeps — The Transaction Log

This is the complete list of what UCCA retains after a job completes. Nothing else is stored, cached, or retained anywhere in UCCA's systems.

Field Example Kept Purpose
envelope_id env_01JXYZ... Yes Billing record
client_id client_rtopacks_0042 Yes Billing record
world_id rtopacks Yes Billing record
subprocessor anthropic/claude-sonnet-4-6 Yes Billing record
tokens_in 12,450 Yes Billing record
tokens_out 8,230 Yes Billing record
cost_to_ucca_aud 0.43 Yes Billing record
billed_to_client_aud 1.20 Yes Billing record
completed_at 2026-03-10T14:23:11Z Yes Billing record
signing_key_version v2 Yes Billing record
status completed Yes Billing record
content (the generated course) No Never stored
envelope_body (the full envelope JSON) No Never stored
triumvirate_data (the input Triumvirate) No Never stored
client_storage_credentials (R2 keys etc) No Never stored

UCCA cannot reconstruct a lost envelope. If a client loses their envelope — deletes their bucket, lets credentials expire, suffers a storage failure on their end — UCCA cannot recover it. The transaction record proves the job ran and what it cost. It does not contain the content or the provenance chain. This is by design and clients must be informed at onboarding.

The billing record is self-sufficient. If a client disputes a charge, they present their envelope. UCCA verifies the signature and confirms the envelope_id matches the transaction log entry. Cost is confirmed. Dispute is resolved. No content needs to be examined.


7. The Commercial Position

This architecture is not just technically correct. It is a commercial differentiator in every market UCCA addresses.

7.1 — VET / RTOpacks

RTOs are acutely sensitive to data ownership. Their scope, their content, their staff records — these are their business assets. A platform that holds those assets has leverage over the RTO. UCCA explicitly does not. The content generated from an RTO's scope is their IP from the moment it is produced. UCCA has no claim on it and no copy of it.

7.2 — Enterprise

Data sovereignty is a procurement requirement in most large organisations. Legal, HR, and IT security will all ask where data lives and who has access. 'Your data lives in your cloud, we have no access' closes procurement conversations that 'your data lives in our platform' loses. The storage adapter model makes this technically provable, not just a policy claim.

7.3 — Defence

Defence procurement has specific requirements around data classification, sovereignty, and access. Many of these requirements make conventional SaaS platforms categorically ineligible. UCCA's architecture — client-owned storage, zero retention, cryptographic verification without UCCA involvement, classified cloud backend support — addresses every one of these requirements by design rather than by exception.

The Defence pitch is: we are not a vendor. We are infrastructure. We run the job, we prove we ran it correctly, we hand you everything, we retain nothing. You don't need to trust us with your data because we never have it.

7.4 — The anti-pattern to the market

Every major AI platform in the market right now either trains on client data, retains client content, or both. UCCA is the explicit counter-position. We don't improve our models on your content. We don't retain your work. We don't accumulate your IP. We are the infrastructure play and the data ownership model is the moat — it cannot be easily replicated by a platform that has already built its business model around data retention.


8. Open Questions

These are not resolved in this document. They need decisions before the storage adapter is built.

  • Retention policy communication — how explicitly does UCCA communicate the no-retention policy at onboarding? Does the client sign an acknowledgement that UCCA cannot recover a lost envelope? This is a legal and UX question.
  • Minimum retention period — UCCA retains the transaction log for billing. How long? Seven years matching ASQA record-keeping requirements seems correct for RTOpacks. Shorter for other worlds? Client-deletable?
  • Key escrow — should UCCA offer key escrow for enterprise clients who need guaranteed verification capability even if UCCA ceases to operate? This is a trust product extension worth considering.
  • Envelope versioning — when a client modifies an existing piece of content, the new envelope references the parent. How deep can the chain go? Is there a maximum lineage depth? Probably not, but worth noting.
  • Offline verification — the public key endpoint requires internet access. For classified or air-gapped environments, UCCA should provide the public key as a deliverable at onboarding so verification works offline permanently.

"Our shop floor is clean after every job." This is not a feature. It is the architecture.

Version History

Version Date Change Author
1.0 2026-03-11 Converted from UCCA-Clean-Shop-Floor-Data-Architecture.docx Claude Code