Knowledge base · 05-sidk

05 - SIDK, and what it is NOT

Status. Draft v0.1 · First draft: 17-03-2026 · Co-authored from a structured discussion with the team.

Why this matters. Three things people get wrong about SIDK that this doc exists to fix. First, they think SIDK is a carbon platform that happens to be industry-neutral. It is the other way round: SIDK is an industry-neutral data-anchoring kernel that acquired a carbon engine because carbon was a natural fit. Second, they think SIDK is GREMI’s backend. It is not, in any sense that matters; the boundary between SIDK and GREMI is one of the most consequential design decisions in the entire architecture and it is not negotiable. Third, they think SIDK’s neutrality is a marketing posture. It is a structural property earned over six years of substrate work across multiple research domains, validated internally before carbon was ever the use case. This doc states what SIDK is, what it is explicitly not, and why the structural decisions that look unusual on first read are load-bearing.

Read this doc after 04 - Carbon as trust infrastructure. Doc 04 names the seven properties any serious carbon platform must deliver; this doc explains what SIDK is such that it delivers them.

1. What SIDK actually is

SIDK is the Sphuran Industrial Data Kernel. A vertical-agnostic infrastructure built to anchor data against physical reality - asset topologies, lot lineages, transformation events, and the sensors and actuators that interact with them - so that downstream models, operational tooling, traceability claims, and verifiable attestations have something concrete to attach to. It is six years old. It was built by Sphuran as research infrastructure first, became a research output that demonstrated the architecture worked, and is now an engineering output suitable for productisation. The carbon engine on top of it is a recent build but rests on years of methodology work in research validation and measurement uncertainty - disciplines that happen to be exactly what the carbon-regulation moment needs.

SIDK does not have a vertical opinion. The same kernel that runs the steel pack today can run a cement pack, an aluminium pack, a pharmaceutical pack, a food-processing pack, an agricultural pack tomorrow. The vertical is content (the pack), not code. The kernel stays stable; new verticals are author-the-pack work, not engineer-the-kernel work. This is the architectural commitment that lets the platform grow into new industries without forking the substrate.

SIDK has been used inside Sphuran for years on non-carbon problems. Tracing how a product’s properties are affected by the processes, machines, and environmental parameters in its production chain. Anchoring insights for medical equipment. Running operational control loops in test conditions. Carbon, via GREMI, is the first product taken to market on SIDK - but inside the building, the kernel is a generalised tool that has been earning its design through multi-domain use for years.

2. What SIDK is NOT

The anti-list matters more than the positive description in this case, because the misconceptions about SIDK are specific and persistent.

SIDK is NOT Gremi’s backend. This is the single most important thing to clear up. A product backend is the application-server tier of a specific product - it implements the product’s workflows, the product’s user-facing data shapes, the product’s commercial logic. GREMI has its own product backend. That backend consumes SIDK through stable contracts (OpenAPI, AsyncAPI, webhooks, SDKs - what the SIDK docs call the “Projection Boundary Layer”), but it is not part of SIDK. SIDK does not know about GREMI’s UI, GREMI’s customer-facing workflows, GREMI’s commercial roles, or GREMI’s product-side notifications. The boundary test (see §4 below) is the rule: if it does not need to resolve and verify later under signed regulatory scrutiny, it does not belong in SIDK. Most of GREMI’s product code is by definition not regulator-facing and therefore lives in GREMI’s backend, not in SIDK.

SIDK is NOT a carbon platform. It is a vertical-neutral substrate. The carbon engine is one capability built into it - significant, mature, and currently the differentiating feature commercially - but conceptually one of many possible. The same primitives serve any industry that produces physical artefacts through traceable transformation events with measurable inputs and outputs. Strip the carbon engine away and SIDK is still a useful substrate for materials informatics, operational analytics, supply-chain traceability, and verifiable supply-chain provenance generally.

SIDK is NOT a UI or a workflow product. It exposes data, events, and primitives. How a user-facing application renders them, what its persona model is, what its navigation looks like, where its notifications go - all of these are product decisions made by the Product Owner (Grevoro for GREMI; some other Product Owner for some hypothetical other product on the same substrate). SIDK’s design discipline includes a strict separation between “what is canonical and verifiable” (in SIDK) and “what is presented to a user” (in the product). See Foundation §4.7 (Canonical vs derived presentation).

SIDK is NOT mutable by inference. Anything an AI model, machine-learning system, or other non-deterministic process produces is derived, not canonical. It can read SIDK’s data; it cannot write to SIDK’s canonical state. Where an inference output proposes a change - a factor adjustment, an anomaly resolution, a methodology update - the proposal routes through a Submission, with a Maker and a Checker, both human, with the change recorded on the audit chain. This is the discipline that lets AI scale on the platform without compromising the trust substrate. See §8 below for what this enables.

SIDK is NOT cross-tenant by default. Each Tenant is authenticated against its own dedicated identity realm; cross-Tenant data access is architecturally impossible for Tenant-scoped principals. Verifiers operate cross-Tenant within their accredited firm’s portfolio, with access to any specific Tenant’s data gated by an active engagement record. Sphuran personnel running platform-level governance see registry-level data, not Tenant operational data. The five-realm isolation model (Tenant, Grevoro, Verifier, Sphuran, Public) is enforced by the kernel; crossing a realm boundary is a system-design failure, not a user-flow convenience.

SIDK is NOT a marketplace or a cross-tenant search system. There is no inter-Tenant discovery surface, no benchmark database that aggregates across customers, no “industry comparison” feature that pulls from Tenant data. Aggregation across Tenants is out of scope by design - both because the trust model would be compromised and because no Tenant has consented to its data being read by another. If such a primitive ever ships, it will be a deliberate addition with a substantial governance discussion, not a quiet feature add.

3. The seven primitives, in plain language

SIDK exposes its capability through seven primitives. The list below paraphrases architecture.md; the source is canonical.

1. Identity & Isolation Layer. Who is authenticated, in which realm, with what scope. Tenant identities, Verifier-firm identities, Grevoro-engineer identities, Sphuran-staff identities. JWT scope vocabulary. Row-level security at the storage layer. The discipline that makes Tenant data un-leakable across the five realms.

2. Live Data Layer. Edge agents, ingestion contracts, telemetry streaming over Server-Sent Events, replay on disconnection, quality flags per data point. How real sensor data physically reaches the kernel and stays addressable. See 07 - Inputs and sensors for the operational details.

3. Canonical Fact Layer. The structured representation of everything that matters about what a Tenant produces. Variables, factors, methods, asset topology, lot lineage, calculation results, locked snapshots, evidence, compliance filings. Canonical means: produced by the kernel, audit-grade, citable as evidence. The CalculationResults the Carbon Engine produces (06 - The Carbon Engine) live here.

4. Governance Layer. Submission lifecycle. Maker-Checker discipline. Period lock and restatement. Append-only audit chain. Cascade orchestration. The machinery that ensures every state-changing operation has two independent identities recorded on it, every change is in a tamper-evident chain, and history is not silently mutable. This is what turns “database” into “audit-grade record”.

5. Trust Layer. Passports - verifiable credentials describing a product’s, process’s, lot’s, or shipment’s carbon footprint, signed by the Tenant under its own DID-registered key. W3C Verifiable Credentials Data Model. JCS canonicalisation. Signing keys, revocation lists, the public-verification payload that resolves end-to-end on a buyer’s phone. This is where the three-signature chain composes.

6. Verifier Authorization Layer. Engagements (which verifier is engaged with which Tenant, on what scope, for what period). Accreditation binding (the verifier’s accredited status is bound to the registry). Engagement scopes (a verifier sees the data their engagement authorises, no more). Access windows (engagements have start and end dates; data access expires). VerificationStatements (the verifier’s signed output). Externally-relied-upon findings (the verifier’s challenges and the Tenant’s responses, retained on the chain). This is the part of SIDK that most people are surprised exists as a first-class primitive - it usually lives informally inside an audit firm’s processes. Making it a kernel primitive is part of what makes the trust chain externally auditable.

7. Projection Boundary Layer. The stable contracts a product backend consumes - OpenAPI (synchronous reads and writes), AsyncAPI (events), webhooks (server-to-server callbacks), SDK packages. The Projection Boundary is the seam between what is canonical (in SIDK) and what is product-specific (in GREMI’s backend, or any other Product Owner’s backend). It is also where the “many products on one substrate” property lives: every product reads SIDK through these contracts, none of them entangles with SIDK’s internals.

4. The boundary test - the rule that determines what belongs in SIDK

The README of the SIDK doc set states the rule as:

Must a signed, verified, regulated, or externally-relied-upon artefact be able to resolve and verify this later?

If yes, it belongs in SIDK. If no, it belongs in the product backend.

The test sounds simple. It is load-bearing. Apply it consistently and the architecture works: SIDK stays small enough to maintain trustworthy invariants on, and product backends stay free to evolve at product speed. Apply it sloppily and one of two failure modes appears: either the product backend accretes responsibilities that should be in the substrate (and the audit chain develops holes) or SIDK accretes responsibilities that should be in the product (and the substrate becomes brittle and slow to evolve).

Concrete examples of how the test resolves:

A passport signed by the Tenant attesting to a lot’s carbon footprint. Externally-relied-upon, regulated, must resolve later. SIDK.
A notification telling the Tenant’s sustainability lead that the passport was signed. Product-side concern. GREMI backend.
A CalculationResult with method versions, factor IDs, evidence breakdown, and uncertainty bands. The verifier walks this. The regulator audits this. SIDK.
A dashboard summarising the CalculationResults across the month. Presentation. GREMI backend.
An engagement record naming which verifier is engaged with which Tenant on what scope. Anchors authorisation; verifier-relied-upon. SIDK.
Whose Slack channel is notified when the engagement begins. Product-side. GREMI backend.

The test resolves every interesting question this way. The discipline in applying it consistently is what keeps SIDK clean.

5. Pack-as-data - the architectural commitment that makes generalisation cheap

The single most consequential design choice in SIDK is that the vertical content lives as data, not code. The kernel ships with a defined set of capabilities - telemetry ingestion, lot lineage, transformation events, factor registry, calculation engine, audit chain, passport framework. What kind of plant produces what kind of products through what kind of process - the steel-specific knowledge of BF-BOF routes, the cement-specific knowledge of clinker production, the pharmaceutical-specific knowledge of batch processes - lives in a pack. The pack declares the node classes (a BLAST_FURNACE is a NodeClass; a BOF is another), the expected variables, the preferred calculation methods per source, the factor library, the allocation rules, the regime mappings.

What this gives you:

Adding a new vertical is a pack-authoring task, not a kernel-engineering task. A cement pack involves deep research on cement production - clinker chemistry, kiln operations, supply-chain inputs - and the result is structured into a data file the kernel reads. The kernel itself stays stable. No risk of breaking steel customers when cement is added.
Different plant types within a vertical are also pack-content questions. The steel pack captures all the principal production routes (BF-BOF, EAF, DRI-EAF, scrap-EAF, H₂-DRI). A new plant configuration that uses, say, a different combination of pre-heater plus EAF is a pack-content adjustment, not a code change.
Different products in the same vertical, similarly. A steel pack handles HRC, plate, rebar, structural sections, wire rod - each a different product category with different allocation rules, but all expressible inside the same pack format.
The product code stays stable. GREMI’s code does not change when a new pack is added. It learns about the new pack through the kernel’s metadata APIs and renders accordingly.

The alternative - vertical-specific kernel forks, vertical-specific microservices, vertical-specific code branches - was deliberately walked away from. Each of those alternatives produces a maintenance burden that grows with the number of verticals and a risk surface that grows with every vertical added. Pack-as-data trades upfront design effort (the kernel’s primitives must be vertical-neutral and rich enough to express any vertical’s needs through configuration) for downstream stability (every new vertical is additive, not invasive). Six years of substrate work is what bought the upfront design effort.

This is the technical heart of why the platform can credibly expand to cement, aluminium, pharma, agri, and other verticals without re-engineering the substrate. The kernel does not learn new tricks; it interprets new pack content. The trick was authoring the kernel correctly in the first place - and that was done with the discipline of a research organisation that builds its tools to last.

The six-year timeline matters because it predates the current carbon-regulation moment. CBAM in its current form was proposed in 2021. CSRD took effect in 2024. ISSB IFRS S2 issued in 2023. California SB 253 signed late 2023. India CCTS notified mid-2023. The wave of binding carbon-disclosure regulation that defines the 2026 commercial landscape did not yet exist when Sphuran began building SIDK.

SIDK was therefore not built reactively to any single regulation. It was built to be the right shape for anchoring data against physical reality - which is the load-bearing requirement for any externally-attested industrial claim, whether about carbon today, water tomorrow, biodiversity the day after, supply-chain provenance the day after that. The 2026 carbon-regulation moment happens to be the moment that demands the substrate’s properties; the substrate was already there.

The carbon engine itself is recent. But it was built by people whose day job is research and validation methodology. The four-method dispatch, the boundary-as-input design, the GUM and Monte Carlo uncertainty propagation, the shadow-calc reconciliation, the locked-snapshot replay discipline - these are not carbon-domain software innovations. They are standard disciplines in scientific measurement, applied to carbon. The reason they look unusually rigorous in the carbon space is that most carbon software was built by carbon-domain teams without that background; the engine inherits its rigour from research practice that predates the team’s involvement in carbon.

The combination - six years of substrate work plus research-grade engine on top - is the moat. It is not a moat of corporate structure or commercial positioning. It is a moat of work - work that another team could in principle do, but cannot do without six years of accumulated design choices and methodology refinement. Doc 04a (Sphuran/Grevoro separation) names the structural property that adds to this. Doc 06 (Carbon Engine) describes the methodology specifically. This doc names the substrate property they both rest on.

7. Deterministic by design

SIDK is purely deterministic. Given the same inputs and the same version pinning, every calculation produces the same result. Every audit-chain hash composes the same way. Every passport canonicalises the same JSON-LD. Every signature verifies against the same key set. This is not a happy property of the implementation; it is an architectural commitment, enforced by the kernel’s design.

The Carbon Engine (06 §11) is the most visible expression - calculationVersionId, methodVersionIds, factorIdsUsed pinned on every result; LockedSnapshot capturing the full pinning at lock time; replay-ready engine code; no wall-clock dependencies; no hidden state. But determinism runs through the whole kernel. The audit chain is content-addressed. The Submission lifecycle is deterministic state-machine transitions. The trust-chain signatures are over canonicalised, deterministically-encoded content.

The architectural rule that protects this is Foundation §4.8: non-canonical inference cannot mutate canonical state. Anything an AI model, ML system, or stochastic process produces is derived, not canonical. It can read SIDK; it cannot write to SIDK’s canonical state. Where it proposes a state change, the proposal routes through a Submission with human Maker and Checker. The inference layer is an assistant; the Submission is the authoritative path.

This is the discipline that lets the platform scale inference capability over time without compromising the trust substrate. AI can become arbitrarily sophisticated on top of SIDK; it can drive better dashboards, better recommendations, better what-if scenarios, better natural-language interfaces - and none of it touches the canonical record without going through the human-approved Submission path. The audit chain stays clean. The verifier’s ability to walk the evidence stays intact. The buyer’s ability to verify the signature stays unaffected.

The next section is where this property pays off in an unexpected way.

8. SIDK as substrate for trustworthy AI

This is the section to read carefully because it is one of the most important consequences of the entire architecture, and it does not appear in most carbon-industry conversations.

The seven properties from doc 04 - stated boundary, traceable to source data, reproducible under pinned versions, uncertainty and evidence quality, governance by approvals, multi-regime rendering, independent attestation - turn out to also be the requirements list for trustworthy AI on industrial data. This is not a coincidence. Both carbon claims and AI outputs are derived assertions about physical reality that someone other than the person making them needs to act on. Both require the same substrate properties to be defensible.

The implication is that SIDK is the right substrate for AI on industrial data, in a way that generic operational-data platforms cannot be. Specifically:

AI on spreadsheets is pattern-matching on artefacts of unknown provenance. The model has no idea where the underlying numbers came from, what units they are in, what boundaries they were computed against, or whether any human ever vetted them. Outputs from such systems can be vivid and wrong, and the wrongness is undetectable without going back to the original sources, which is exactly the work the AI was supposed to obviate.

AI on ERP exports inherits the ERP’s organisational boundaries (which are commercial, not physical), has no audit chain back to physical measurements, and can confidently produce conclusions that depend on accounting choices the model does not understand it has inherited.

AI on historian / SCADA dumps has time-series fidelity but no canonical structure - no Lots, no lineage, no allocation, no boundary discipline, no evidence tiering. It can spot patterns in raw signals but cannot answer “what did this signal contribute to the carbon footprint of this product?”

AI on SIDK reads canonical, traceable, evidence-tiered, boundary-explicit, version-pinned, verifier-walked data. Every input the AI sees has provenance. Every recommendation it makes can be drilled into back to specific sensor readings, lab results, factor versions, and the Maker-Checker approvals that admitted them. The AI’s recommendation remains non-canonical per Foundation §4.8 - it cannot mutate the kernel’s state - but the evidence the recommendation reads from is canonical. The user can ask “show me the data that drove this suggestion” and get a verifiable answer.

The qualitative difference is not subtle. The same prompt, against bad inputs, produces vibes-grade output that is usually wrong in non-obvious ways. Against SIDK-anchored inputs, the same prompt produces output the user can interrogate, audit, and defend. For industrial users in regulated or competitive contexts - which is most of them - only the second category of AI is actually usable.

This is also where the deterministic-substrate commitment compounds with the industry-neutral-pack architecture. AI on SIDK can read across packs - steel today, cement tomorrow, aluminium the day after - without the model learning vertical-specific data formats. The pack normalises the world for the AI just as it normalises the world for the calculation engine. A model trained on one vertical’s anchored data generalises to another vertical’s anchored data because the shape of the data is consistent.

For positioning purposes, the cleanest framing of this is: most “AI for industry” tools today are pattern-matching on bad inputs and producing outputs of unverifiable quality. AI built on SIDK reads from a deterministic substrate with full audit traceability, which means the AI’s outputs can be interrogated and defended in a way no generic operational-data AI tool can match. This is not a feature claim; it is a structural property of building on a substrate engineered for auditable trust.

The platform is already wired to support AI integration. The phasing of when specific AI surfaces ship is a release-cadence question covered in 08 - GREMI phased. The substrate property described in this section is true today regardless of which AI surfaces are user-facing in which release.

9. How SIDK fits with everything else in the knowledge base

A short orientation for navigating the rest of the KB:

01 - Embodied carbon, 02 - Scopes, 02a - Boundaries, 03 - Regulations, 03a - CBAM - the world SIDK operates in.
04 - Carbon as trust infrastructure - the seven requirements any serious carbon platform must satisfy. The conceptual frame this doc operationalises.
04a - Sphuran, Grevoro, and what their separation means - the corporate-structure context for SIDK and its first product on top.
This doc, 05 - what SIDK is, what it is not, why it satisfies the seven properties at substrate level.
06 - The Carbon Engine - the deterministic calculation subsystem that lives inside SIDK; the realisation layer for properties 1, 3, 4, and 5.
07 - Inputs and sensors - how real data reaches SIDK.
08 - GREMI, phased - the first product on SIDK; current state, release trajectory, AI integration in context.

The architecture spec at Docs/SIDK Handoff Docs/architecture.md is the canonical reference for any precise question about SIDK’s primitives, contracts, or implementation. This doc is the explainer; the spec is normative.

References & further reading

Source documents (canonical)

Docs/SIDK Handoff Docs/README.md - the entry point to the SIDK developer documentation. Names the seven primitives, the boundary test, and the explicit anti-list. <Reads in 5 minutes; required.>
Docs/SIDK Handoff Docs/architecture.md - the canonical specification of the seven primitives, the boundary tests, and the citation rule. Authoritative for any precise question about SIDK’s contracts.
Docs/SIDK Handoff Docs/classification-methodology.md - the methodology for deciding what belongs in SIDK vs a product backend. The full version of §4’s boundary test, with worked examples.
Docs/SIDK Handoff Docs/product-architecture-patterns.md - the three common patterns for building a product backend on SIDK. How GREMI relates to SIDK is one specific instance of these patterns.
Docs/SIDK Handoff Docs/data-patterns.md - when to cache, query live, or project against SIDK; late-data handling; idempotency at consumers.
Docs/GREMI-App-Ecosystem/foundation-v1.4.md - Foundation. Especially §1.5 (Platform), §4.7 (Canonical vs derived presentation), §4.8 (Non-canonical inference cannot mutate canonical state), §4.10 (Realm isolation), §5 (Trust chain).

External authoritative sources

W3C Verifiable Credentials Data Model 2.0. The cryptographic standard underpinning SIDK’s Trust Layer. Required reading if you want to understand the passport/signature mechanics in depth. https://www.w3.org/TR/vc-data-model-2.0/
W3C Decentralized Identifiers (DIDs) v1.0. The identifier framework the Tenant signing keys are registered under. https://www.w3.org/TR/did-core/
OpenAPI Initiative - OpenAPI Specification 3.1. The standard SIDK’s synchronous Projection Boundary contracts are published in. https://spec.openapis.org/oas/v3.1.0
AsyncAPI Initiative - AsyncAPI Specification. The standard SIDK’s event Projection Boundary contracts are published in. https://www.asyncapi.com/docs/reference/specification
ISO 14064-3:2019 - Specification with guidance for the verification and validation of greenhouse gas statements. Verifier-side standard; SIDK’s Verifier Authorization Layer operationalises the procedural expectations of this standard. https://www.iso.org/standard/66455.html
GHG Protocol - Corporate Accounting and Reporting Standard (revised 2004). The framework the Canonical Fact Layer’s Scope 1/2/3 partition implements. https://ghgprotocol.org/corporate-standard