The Clean Shop Floor¶
UCCA Data Architecture — Ownership, Provenance and Zero Retention
10 March 2026 | Tim Rignold + Claude
Strategic document. Investor-ready. Enterprise-ready. Defence-ready. Not a build brief.
1. The Five Principles¶
Everything in this document flows from five principles. If a design decision conflicts with any of these, the decision is wrong.
P1 — Your data is yours. Content generated by the UCCA engine belongs to the client. UCCA facilitated its creation. UCCA does not own it, does not hold it, and does not have rights over it. The client's R2 bucket is the sole store of record.
P2 — We do not housekeep other people's curiosities. UCCA is not a data warehouse. We run jobs, not archives. Every job produces an envelope that goes to the client. UCCA retains only what is necessary to issue an invoice and defend a billing dispute. Nothing else.
P3 — The shop floor is clean after every job. When a job completes, nothing remains in UCCA's systems except the transaction record. No content. No envelopes. No provenance chains. The signing key proves we were there. The transaction record proves what it cost. That is all UCCA needs.
P4 — Verification without possession. UCCA can prove any envelope is genuine without holding a copy of it. The client presents the envelope. UCCA verifies the cryptographic signature. The public key is published openly so anyone can verify independently — including auditors, regulators, and defence procurement officers — without UCCA being in the room.
P5 — Storage is the client's infrastructure decision. The default is a Cloudflare R2 bucket provisioned at onboarding, credentials transferred immediately, relationship directly with Cloudflare. But the client can nominate any storage backend — their AWS, their Azure, their on-premise server, their classified government cloud. UCCA writes to wherever they nominate. UCCA does not own the bucket, the credentials, or the relationship.
The One-Line Position: UCCA runs the job, hands you everything it produced, wipes the floor, and locks the door. Your data never lives here. Our shop floor is clean after every job.
2. The Job Envelope¶
Every job in the UCCA system travels with a job envelope. The envelope is created at the moment of client initiation and travels through every stage of the system, collecting stamps as it goes. The final signed envelope is the complete, self-contained, tamper-proof record of everything that happened.
The envelope is both the routing instruction and the audit trail. Same object, dual purpose.
2.1 — Envelope structure¶
| Field | Type | Description |
|---|---|---|
envelope_id |
UUID | Globally unique. Referenced in billing record. Client uses this to request regeneration or modification. |
parent_envelope_id |
UUID / null | If this job is a modification or recontextualisation of a prior job, references the parent envelope ID. |
schema_version |
string | Envelope format version. Ensures future envelope readers can parse historical records. |
initiated_at |
ISO timestamp | When the client placed the order. L4 timestamp. |
client_id |
string | The client who initiated the job. |
world_id |
string | Which world — rtopacks, defence, usapacks etc. |
order_spec |
object | What was ordered — output formats, cultural flavour, subprocessor preference, contextualisation parameters. |
source_unit_code |
string | The TGA or other corpus unit code this job was generated from. |
triumvirate_hash |
string | SHA-256 of the Triumvirate object that was fed to the engine. Proves the input was not altered. |
engine_receipt |
object | Stamped by engine on receipt — model used, engine version, queue position, receipt timestamp. |
engine_completion |
object | Stamped by engine on completion — quality contract result (PASS/FAIL per element), token counts, generation time, cost. |
renderer_outputs |
array | One entry per output module invoked — PDF, SCORM, script, VO, video. Each entry has its own completion stamp and output reference. |
storage_destination |
string | Where the envelope and outputs were written. Client's nominated storage backend. |
signing_key_version |
string | Which version of UCCA's signing key was used. Enables verification after key rotation. |
signature |
string | Cryptographic signature of the complete envelope. Produced by UCCA's private key. Verifiable against published public key. |
2.2 — The envelope lifecycle¶
| Stage | Actor | What happens | Envelope state |
|---|---|---|---|
| 1 | L4 Client | Client places order. Envelope created with order spec, client ID, world ID, initiated_at timestamp. | Created — unsigned |
| 2 | L3 → L2 → L1 | Envelope travels up the layer stack. Each layer validates permissions and can add context within its authority. Each touch is logged to the transaction record. | Validated — countersigned per layer |
| 3 | Engine | Engine receives envelope. Stamps engine_receipt. Processes Triumvirate. Generates content bundle. Stamps engine_completion with quality contract result, cost, model used. | Engine complete — content bundle attached |
| 4 | Renderer Layer | Renderer reads requested outputs from order_spec. Routes to each output module. Each module stamps its result onto renderer_outputs array. | Rendered — all outputs complete |
| 5 | Storage Adapter | Final envelope assembled. UCCA signs with private key. Envelope + all outputs written to client's nominated storage. Signing key version recorded. | Signed — written to client store |
| 6 | Transaction Log | UCCA records transaction: envelope_id, client_id, model, token counts, cost, timestamp, signing key version. Nothing else. | UCCA retains transaction record only |
2.3 — Regeneration and modification¶
If a client wants to regenerate a prior job identically, or modify and recontextualise it, they provide the original envelope from their storage. UCCA verifies the signature, reads the original Triumvirate hash and order spec, and creates a new envelope with parent_envelope_id referencing the original. The lineage is preserved in the envelope chain, not in UCCA's systems.
3. The Renderer Layer¶
The engine's job ends when the content bundle is produced. The renderer layer's job begins there. The renderer reads the order spec from the envelope and routes the content bundle to the appropriate output modules.
Each output module is discrete, swappable, and independently versioned. The router does not care what modules exist — it reads the order spec and invokes whatever is registered.
3.1 — Output modules¶
| Module | Status | What it produces |
|---|---|---|
| PDF Renderer | Exists — extract | Formatted course document. Currently in engine — needs extraction to renderer layer. |
| SCORM Packer | Exists — extract | LMS-ready SCORM 1.2 package. 780 lines exist in backend/scripts/export_scorm_12.py — needs extraction. |
| Script Writer | Exists — extract | Narration-ready video script. Currently in generator/video_script.py — needs extraction. |
| Workbook Generator | Planned | Structured assessment workbook from Triumvirate outcomes. ~2hr build. Requires schema restructure first. |
| VO Generator | Future | Audio files from script. Third-party TTS integration. Module interface defined now, implementation later. |
| Video Processor | Future | Assembled video modules from VO + content. Complex pipeline. Module interface defined now. |
| Raw Stream | Planned | Content bundle returned as structured JSON stream. For clients who want the data and will render it themselves. |
| Marketing Copy | Exists — extract | Sales copy, SEO description, video script for marketing. Currently in generator/marketing.py. |
3.2 — Why the renderer layer increases throughput¶
The engine is almost certainly single-threaded — one job processes at a time through the content generation pipeline. This is correct for the generation stage, which is inherently sequential (each module builds on the prior one).
But rendering is embarrassingly parallel. Once the content bundle exists, the PDF renderer, SCORM packer, and script writer can all run simultaneously. They don't depend on each other. Keeping them inside the engine serialises work that doesn't need to be serial.
Extracting the renderer layer means the engine completes its job faster, returns to the queue sooner, and the output modules run in parallel on the content bundle. Throughput increases without touching the engine's core generation logic.
4. The Storage Adapter¶
The storage adapter is the interface between the renderer layer and wherever the client's data lives. Every storage backend implements the same interface. The renderer layer does not know or care which backend is active.
4.1 — The adapter interface contract¶
Any storage backend must implement these four operations:
write(envelope_id, path, data)— write a file to the client's store at the given pathread(envelope_id, path)— read a file from the client's store (used for regeneration)exists(envelope_id, path)— check if a file exists without reading itlist(envelope_id, prefix)— list files under a given prefix
That is the complete interface. Four operations. Any backend that implements these four operations is a valid storage adapter.
4.2 — Storage backends¶
| Backend | Target client | Status | Notes |
|---|---|---|---|
| Cloudflare R2 | Default — all clients | Built | Provisioned at onboarding. Credentials transferred immediately. Client owns the bucket. UCCA has no ongoing access. |
| AWS S3 | Enterprise / AWS-native clients | Planned | Client provides bucket name and write credentials. UCCA writes, does not read except on explicit regeneration request. |
| Azure Blob Storage | Enterprise / Microsoft-native clients | Planned | Same pattern as S3. Client credentials, client bucket. |
| AWS GovCloud / classified | Defence clients | Future | Requires security assessment. Interface is identical — backend implementation handles classification controls. |
| On-premise / private server | Sovereign / air-gapped clients | Future | SFTP or S3-compatible interface. Client manages their own infrastructure. UCCA writes via agreed protocol. |
| Client Cloudflare account | Clients with existing CF account | Planned | Client provides R2 credentials from their own CF account rather than UCCA-provisioned bucket. |
4.3 — Onboarding flow¶
At onboarding, the client is asked: where do you want your data?
- Default — UCCA provisions an R2 bucket, transfers credentials to the client, relationship ends there. UCCA has no ongoing access to the bucket.
- Bring your own — client provides storage credentials for their preferred backend. UCCA configures the appropriate adapter. UCCA never owns the storage relationship.
Credential transfer is a one-time event. When UCCA provisions an R2 bucket for a client, the credentials are transferred to the client at onboarding and UCCA's copy is deleted. From that point forward the client is the sole credential holder. UCCA cannot access the bucket. This is not a policy — it is a technical fact.
5. The Signing Key Model¶
UCCA holds exactly one thing that relates to client data: the private signing key. This key is used to sign the final envelope before it is written to the client's storage. It proves the envelope was produced by UCCA's engine and was not tampered with after leaving UCCA's systems.
5.1 — Key architecture¶
- Private key — held by UCCA in Cloudflare Secrets Manager. Never transmitted. Never stored alongside content. Used only to sign completed envelopes.
- Public key — published openly at
ucca.online/.well-known/signing-key-v{N}.pub. Anyone can verify a UCCA-signed envelope independently without contacting UCCA. - Key version — every envelope records which key version signed it. Historical envelopes remain verifiable after key rotation.
5.2 — What verification looks like¶
An auditor, regulator, or client presents an envelope and wants to verify it is genuine:
- Read the
signing_key_versionfield from the envelope - Fetch the corresponding public key from
ucca.online/.well-known/signing-key-v{N}.pub - Verify the envelope's signature field against the public key
- Signature validates → envelope is genuine, unaltered, produced by UCCA
UCCA does not need to be contacted. The auditor does not need access to UCCA's systems. The verification is entirely self-contained. This is the correct posture for a defence or enterprise audit.
5.3 — Key rotation policy¶
- Keys are rotated on a defined schedule — annually at minimum, immediately on any suspected compromise
- Old public keys are never removed from the well-known endpoint — they are retained permanently so historical envelopes remain verifiable
- The current active key version is published at
ucca.online/.well-known/signing-key-current.json - Each new key version increments the version number — v1, v2, v3 etc.
6. What UCCA Keeps — The Transaction Log¶
This is the complete list of what UCCA retains after a job completes. Nothing else is stored, cached, or retained anywhere in UCCA's systems.
| Field | Example | Kept | Purpose |
|---|---|---|---|
envelope_id |
env_01JXYZ... | Yes | Billing record |
client_id |
client_rtopacks_0042 | Yes | Billing record |
world_id |
rtopacks | Yes | Billing record |
subprocessor |
anthropic/claude-sonnet-4-6 | Yes | Billing record |
tokens_in |
12,450 | Yes | Billing record |
tokens_out |
8,230 | Yes | Billing record |
cost_to_ucca_aud |
0.43 | Yes | Billing record |
billed_to_client_aud |
1.20 | Yes | Billing record |
completed_at |
2026-03-10T14:23:11Z | Yes | Billing record |
signing_key_version |
v2 | Yes | Billing record |
status |
completed | Yes | Billing record |
content |
(the generated course) | No | Never stored |
envelope_body |
(the full envelope JSON) | No | Never stored |
triumvirate_data |
(the input Triumvirate) | No | Never stored |
client_storage_credentials |
(R2 keys etc) | No | Never stored |
UCCA cannot reconstruct a lost envelope. If a client loses their envelope — deletes their bucket, lets credentials expire, suffers a storage failure on their end — UCCA cannot recover it. The transaction record proves the job ran and what it cost. It does not contain the content or the provenance chain. This is by design and clients must be informed at onboarding.
The billing record is self-sufficient. If a client disputes a charge, they present their envelope. UCCA verifies the signature and confirms the envelope_id matches the transaction log entry. Cost is confirmed. Dispute is resolved. No content needs to be examined.
7. The Commercial Position¶
This architecture is not just technically correct. It is a commercial differentiator in every market UCCA addresses.
7.1 — VET / RTOpacks¶
RTOs are acutely sensitive to data ownership. Their scope, their content, their staff records — these are their business assets. A platform that holds those assets has leverage over the RTO. UCCA explicitly does not. The content generated from an RTO's scope is their IP from the moment it is produced. UCCA has no claim on it and no copy of it.
7.2 — Enterprise¶
Data sovereignty is a procurement requirement in most large organisations. Legal, HR, and IT security will all ask where data lives and who has access. 'Your data lives in your cloud, we have no access' closes procurement conversations that 'your data lives in our platform' loses. The storage adapter model makes this technically provable, not just a policy claim.
7.3 — Defence¶
Defence procurement has specific requirements around data classification, sovereignty, and access. Many of these requirements make conventional SaaS platforms categorically ineligible. UCCA's architecture — client-owned storage, zero retention, cryptographic verification without UCCA involvement, classified cloud backend support — addresses every one of these requirements by design rather than by exception.
The Defence pitch is: we are not a vendor. We are infrastructure. We run the job, we prove we ran it correctly, we hand you everything, we retain nothing. You don't need to trust us with your data because we never have it.
7.4 — The anti-pattern to the market¶
Every major AI platform in the market right now either trains on client data, retains client content, or both. UCCA is the explicit counter-position. We don't improve our models on your content. We don't retain your work. We don't accumulate your IP. We are the infrastructure play and the data ownership model is the moat — it cannot be easily replicated by a platform that has already built its business model around data retention.
8. Open Questions¶
These are not resolved in this document. They need decisions before the storage adapter is built.
- Retention policy communication — how explicitly does UCCA communicate the no-retention policy at onboarding? Does the client sign an acknowledgement that UCCA cannot recover a lost envelope? This is a legal and UX question.
- Minimum retention period — UCCA retains the transaction log for billing. How long? Seven years matching ASQA record-keeping requirements seems correct for RTOpacks. Shorter for other worlds? Client-deletable?
- Key escrow — should UCCA offer key escrow for enterprise clients who need guaranteed verification capability even if UCCA ceases to operate? This is a trust product extension worth considering.
- Envelope versioning — when a client modifies an existing piece of content, the new envelope references the parent. How deep can the chain go? Is there a maximum lineage depth? Probably not, but worth noting.
- Offline verification — the public key endpoint requires internet access. For classified or air-gapped environments, UCCA should provide the public key as a deliverable at onboarding so verification works offline permanently.
"Our shop floor is clean after every job." This is not a feature. It is the architecture.
Version History¶
| Version | Date | Change | Author |
|---|---|---|---|
| 1.0 | 2026-03-11 | Converted from UCCA-Clean-Shop-Floor-Data-Architecture.docx | Claude Code |