Private briefing
Knowledge base · 07-inputs-and-sensors

07 - Inputs and sensors

Status. Draft v0.1 · First draft: 17-03-2026 · Pre-discussion. This doc explains how real data gets into the Carbon Engine; for what the engine then does with it, see 06 - The Carbon Engine.

Why this matters. A deterministic, audit-grade calculation engine is only as good as its inputs. If the inputs are unreliable, the engine produces a precise, well-uncertainty-bounded, audit-trailed wrong number. The platform’s discipline around inputs - what is captured, how, by whom, with what quality flags, under what calibration regime, with what evidence tier - is what stops that. This doc covers the data-side of the platform: how telemetry actually arrives at the kernel, how lab results and ERP transactions land, how supplier PCFs are ingested, and the lifecycle a sensor goes through from planned to live. The internal-team value of understanding this is being able to answer the question “where does that 2.18 tCO₂e/t number actually come from?” in concrete terms instead of architecture-diagram-shaped hand-waving.


1. Four input families

The Carbon Engine consumes four broad families of input:

  • Telemetry - continuous, time-series measurements from sensors and meters. Concentrations, flows, temperatures, fuel meter readings, electricity sub-meters, weighbridge tickets. Arrives via the edge agent. The Tier-3 and Tier-4 evidence backbone.
  • Lab results - periodic analytical results from samples taken at intervals. Carbon content of coal, calorific value of fuel, sulphur in coke, ash fraction in slag. Arrives via LIMS integration or manual upload through templates. Tier-2 to Tier-3 evidence.
  • ERP transactions - discrete commercial and operational events from the Tenant’s enterprise systems. Fuel purchase receipts, electricity invoices, scrap intake, material movements, finished-goods shipments. Arrives via the ERP connector and template uploads. Tier-1 to Tier-2 evidence for what it does well (quantities), supporting evidence for the rest.
  • Supplier declarations - Product Carbon Footprints (PCFs), Environmental Product Declarations (EPDs), and verified emission factors received from a Tenant’s suppliers, attached to specific shipments or product categories. Tier-2 evidence; can become higher-tier when the supplier itself is on the platform and the data is signed-and-walkable to the supplier’s own production data.

The engine combines them per emission source. A blast-furnace mass balance reads telemetry (flow, mass, concentration), lab results (carbon content of coke and PCI), ERP transactions (purchase quantities for reconciliation), and - for the upstream slag-cement credit, if applicable - supplier-declared substitution factors. Each input retains its tier; the result’s evidenceBreakdown aggregates them.


2. The edge agent - how telemetry reaches the kernel

Industrial sites have unreliable connectivity, sporadic outages, and tens of thousands of tags. The platform handles this with an on-site edge agent - a Rust binary running on a gateway, PLC, or dedicated collector at the site - that captures telemetry continuously, buffers it durably, and uploads it to the SIDK ingestion API. 1

Protocols supported in v1. OPC-UA (the dominant industrial protocol), MQTT (lighter-weight; used for greenfield IIoT), and Modbus TCP (for legacy PLCs). The agent has a pluggable collector architecture; additional protocols can be added without changing the core pipeline.

Durable buffering for offline resilience. The agent writes every captured point to a local durable buffer (RocksDB in production) before acknowledging the upstream system. If the connection to SIDK drops, the agent keeps capturing and accumulating points. When the connection returns, the agent uploads everything from the buffer. The v1 guarantee is a 90-day replay window - a site can be offline for up to 90 days without data loss.

Two identities per data point. Every captured TimeSeriesPoint carries two identifiers: a tag_id (adapter-side - e.g., the OPC-UA NodeId or the MQTT topic) and a variable_id (kernel-side - a stable ULID like var_01HXJG…). The mapping between them is part of the Tenant’s topology, owned by the Onboarding Engineer in Grevoro Console (per the provisioning/operating seam, Principle 9 in 08b §5). The agent does not know what a variable means; it knows which kernel ID to send.

Idempotency. Every uploaded point carries a deterministic ID derived from (tenantId, variableId, timestamp, source). Replays after a network glitch are idempotent - the ingestion endpoint deduplicates on this ID. This is one of the load-bearing patterns that lets the agent retry aggressively without poisoning the time series.

Quality is per-point, not per-stream. Each point carries a quality field (GOOD / UNCERTAIN / BAD / SUBSTITUTED / MANUAL). The kernel respects these throughout the calculation pipeline - see 06 §3 on how the engine handles quality flags and substitution.

3. Variables, Streams, ProcessUnits - the topology a sensor lives in

Telemetry does not enter the calculation pipeline as raw points; it enters as readings of a Variable, which sits inside a Stream, which is attached to a ProcessUnit, which lives at a Site inside a Zone. This is the topology the calculation engine reads when constructing the boundary scope.

  • Site, Zone - the geographical/operational hierarchy. A site is a plant; a zone is a region within a plant (the ironmaking area, the rolling mill).
  • ProcessUnit - a piece of process equipment. The blast furnace BF-1, the coke oven battery, the BOF converter, the reheat furnace.
  • Stream - a flow of material or energy in or out of a ProcessUnit. The blast-furnace top-gas stream, the coke feed stream, the electricity import line, the slag export stream.
  • Variable - a measurable quantity on a Stream or piece of equipment. The CO₂ concentration in the BF top-gas, the mass flow of coke into the BF, the kWh imported on the substation line. Variables have a kind (e.g., EMISSION_CONCENTRATION, MASS_FLOW, ELECTRICITY_KWH), a unit, a measurement basis (wet/dry, gross/net), and a preferredMethod declaration that drives engine method selection.

The topology is provisioned by Grevoro Engineers, not by the Tenant. The Tenant operates with the topology that exists; if a Variable is missing or misbehaving, the Tenant raises a ticket through Grevoro Console and an Engineer changes the topology. This is the provisioning/operating seam (Principle 9 in 08b §5). It is also one of the reasons sensor data on the platform is calibration-disciplined: the people configuring the sensors are engineers who do this for a living, not the Tenant’s shift coordinator.

4. Sensor lifecycle - PLANNED → LIVE

A Variable does not appear in calculations the moment it is created. It goes through a five-state lifecycle owned by the Onboarding Engineer in Grevoro Console:

PLANNED ──→ CONFIRMED ──→ INSTALLED ──→ COMMISSIONING ──→ LIVE

                                                            └─→ FAILED
  • PLANNED - the Variable has been scoped during Tenant onboarding. The sensor is on the install list but does not exist yet. The mapping carries it as a planned-but-not-live gap.
  • CONFIRMED - the hardware is purchased and on the way. The Tenant’s site team knows when to expect it.
  • INSTALLED - the sensor is physically in place but not yet sending validated data.
  • COMMISSIONING - data is flowing, but the agent is in commissioning mode. Points are captured and stored but not consumed by calculations. This is where calibration is verified, ranges are sanity-checked, and the Engineer either decides the Variable is ready or pulls it back to INSTALLED for rework.
  • LIVE - the Variable enters the calculation pipeline. Its data starts contributing to CalculationResults from the LIVE transition date forward.
  • FAILED - the Variable has been retired (sensor broken beyond repair, source replaced, source removed). Historical data remains queryable; calculations after the FAILED date do not consume it.

The state is visible to the Tenant in GREMI (read-only); the Engineer manages it in Grevoro Console (read-write). This visibility is part of the trust model: the Tenant always knows which Variables are LIVE and which are still being commissioned, and the engine never quietly upgrades a Variable’s state without an Engineer’s recorded action.

The mapping surface in GREMI also shows what would improve if a planned-but-not-live Variable went LIVE - typically, a calculation currently using a default factor that would shift to a site-empirical value when the relevant sensor is live. This makes the commissioning backlog operationally meaningful, not just bureaucratic.

5. Lab results

Some inputs cannot be measured continuously - the carbon content of coke, the calorific value of natural gas, the ash content of pulverised coal injection. They are measured periodically by lab analysis on samples, with results expressed as values plus uncertainties per analyte.

Ingestion. Lab results enter via two paths:

  • LIMS integration. Where the Tenant’s laboratory information management system exposes results via API, an Engineer-configured connector pulls results directly. This is the preferred path - auditable, low-touch, fast.
  • Manual upload via templates. Where LIMS integration is not feasible, lab staff upload results through structured templates (CSV / spreadsheet formats authored per Tenant by Grevoro Engineers). Each upload becomes a LabResult record with timestamps, lot reference, analyte breakdown, and uncertainty.

Lifecycle. A LabResult progresses through DRAFTREVIEWEDRELEASED. The Carbon Engine consumes only RELEASED results. Released-then-corrected results trigger restatement of any locked period that consumed them.

Per-analyte uncertainty. Each analyte (carbon, sulphur, ash, calorific value) carries its own measurement uncertainty, typically declared by the lab at k=2 (95% coverage). The engine reads coverageFactor and converts internally to k=1 for propagation, then back to k=2 at output for verifier-statement consistency.

The lab as an evidence tier. A Method 2 (fuel-combustion) calculation prefers a lab-confirmed NCV (Tier 3) over a supplier-declared NCV (Tier 2) over an IPCC default (Tier 1). The engine selects in this order and records the tier on the component. A verifier reviewing the period can see exactly how many of the fuel components were lab-confirmed versus defaulted.

6. ERP transactions and template uploads

Two categories of data flow through the Tenant’s enterprise systems rather than its sensors:

Commercial quantities. Fuel purchases, electricity invoices, scrap intakes, alloy receipts, finished-goods shipments. These are reliable for quantity (a tonne is a tonne; the freight bill agrees with the weighbridge) but say nothing about emissions intensity. The engine uses ERP quantities as the AD (activity data) input in Method 2 calculations where the meter is unavailable or untrusted, and as cross-checks for sensor-derived mass flows.

Material lineage. The TransformationEvent and MovementEvent records that describe how Lots flow through the plant - which heat produced which billets, which coil came from which slab, which shipment included which coils. This is the substrate the Carbon Engine’s lot-lineage walk reads when computing product-level PCFs (06 §7).

Integration patterns. Most Tenants integrate via three routes:

  • Direct ERP API. Where the Tenant’s ERP (SAP S/4HANA, Oracle Cloud, IFS, Microsoft Dynamics) exposes the relevant transactions via API, an Engineer-configured connector pulls them on a cadence.
  • Database mirror / replication. For older ERPs, a CDC (change-data-capture) feed against a database mirror.
  • Manual template upload. For the remainder. Templates are authored per Tenant per data type by Grevoro Engineers (a template for “coal receipts”, a template for “shift production summaries”), versioned, and updated when supplier file formats change. Failed uploads (column-format mismatches, value-range violations) raise a ticket to the Engineer.

The platform’s stance on templates is that they are a provisioning concern, not an operational burden the Tenant carries - an upload template that breaks because a supplier changed their CSV column order is a ticket the Tenant raises and the Engineer resolves.

7. Supplier declarations - PCFs and EPDs

For Scope 3 inputs (purchased materials, supplier-provided precursors), the most informative evidence is a Product Carbon Footprint declared by the supplier for the specific shipment, batch, or product. The Carbon Engine has a five-tier resolution order for these, in order of preference 2:

  1. Tenant-private supplier PCF for this specific PO / shipment - match on the purchase-order ID. Most specific, highest priority.
  2. Tenant-private aggregate supplier PCF for the supplier-product combination.
  3. Tenant-private corrected default factor - a tenant-managed override for the input category and geography.
  4. Platform-shared factor - a Sphuran-managed default for the category and geography.
  5. IPCC / industry default - last resort.

The resolution tier is recorded on the component’s factorIdsUsed[] array as resolutionTier, with values from 1_SUPPLIER_PCF_PO_SPECIFIC through 5_INDUSTRY_DEFAULT. A verifier can see exactly how primary the data was for each input.

Where supplier PCFs come from. Today, mostly: spreadsheet attachments to commercial documents, emails, PDF EPDs published on supplier websites. The platform’s intent is that, over time, suppliers on the platform issue signed PCFs as cryptographically verifiable artefacts that travel with shipments - a buyer’s import processes the supplier’s signed PCF, the buyer’s Tenant ingests it into the platform with provenance intact, and the buyer’s Scope 3 number for that lot is the supplier’s signed Scope 1+2 with full lineage. This is the passport / lineage mechanism in SIDK; we cover it in the SIDK doc once that section is written.

Quality criteria for contractual instruments. Where a supplier PCF rests on a market-based Scope 2 contractual instrument (a PPA, a REC), the Quality Criteria from the GHG Protocol Scope 2 Guidance apply (see 02 §3). The engine reads a qualityCriteriaStatus field on the factor and flags FAILED or NOT_VALIDATED accordingly. The platform itself does not validate Quality Criteria - that is a Factor Registry concern - but it surfaces the status to downstream consumers.

8. Quality flags and substitution

Every data point that enters the calculation pipeline carries a quality flag. The engine treats them as follows:

QualityEngine behaviour
GOODUse directly.
UNCERTAIN (sensor flagged uncertain - e.g., recent drift)Use, but inflate the contributing uncertainty by 50% (configurable).
BADExclude from sum; flag period as missing for that interval.
SUBSTITUTED (value substituted upstream per a rule)Use, but record the substitution rule in evidence.
MANUAL (manually entered by operator)Use, but record the actor’s identity in evidence. Triggers a MANUAL_OVERRIDE audit event.

For BAD or missing periods, the engine applies a substitution hierarchy inspired by EPA Part 75’s tiered missing-data substitution philosophy 3:

  • Short gap (default <4 hours, recent availability ≥90%) - linear interpolation between adjacent good values.
  • Medium gap (4–72 hours, availability 75–90%) - 90th-percentile substitute over the prior 30 days.
  • Long gap - conservative worst-case (95th or 99th percentile), flagged SUBSTITUTED_HIGH_CONSERVATIVE.

If the substituted fraction of a period exceeds 10%, a Readiness Check WARNING is raised; above 25%, CRITICAL. The period cannot be locked at CRITICAL without an explicit override Submission. Substitution is never silent.

9. Calibration and EN 14181 / EN 15259 hooks

The engine itself does not enforce EN 14181 (the European standard for CEMS quality assurance, with QAL1/2/3 procedures and AST tests). EN 14181 is a measurement-side regime owned by the CEMS hardware and the calibration teams. But the engine integrates with EN-conformant measurement:

  • It reads MeasurementDevice.uncertaintyClaim and MeasurementDevice.calibrationDueAt from the topology.
  • It treats calibrationDueAt < periodEnd as a Readiness Check CRITICAL by default - most regulatory regimes (EU ETS MRR, EPA Part 75, EN 14181) treat overdue calibration as a serious data-quality issue.
  • It uses the device’s declared uncertainty as an input to uncertainty propagation (see 06 §8).
  • It refuses to silently downgrade an EN-claimed CEMS to Method 2 if the CEMS data is unavailable - it flags the gap explicitly rather than substituting a fuel-combustion calculation in its place.

This is the integration with the broader measurement-quality regime that lets the platform’s data hold up under ISO 14064-3 verifier scrutiny.

10. The provisioning / operating seam, restated

Everything in this doc is on the provisioning side of the seam (Principle 9 in 08b §5):

  • The Onboarding Engineer maps and configures sensors.
  • The Onboarding Engineer authors templates.
  • The Data Engineer maintains live connectors.
  • The Support Engineer handles tickets when something breaks.

The Tenant operates with what the Engineers have built:

  • Reads sensor data; flags anomalies via tickets.
  • Releases lab results.
  • Files Submissions for factor overrides, manual entries, restatements.
  • Locks periods, issues passports, engages verifiers.

The seam matters for inputs specifically because the data discipline that makes the engine’s output trustworthy depends on the people configuring the inputs being engineers, not operators. A Tenant configuring its own sensors would compromise the calibration discipline that makes sensor data audit-grade. A Product Owner operating the Tenant’s daily workflow would put the Engineer inside the Tenant’s audit scope, blurring the verifier’s view of who controls what. The seam is the discipline; the engine’s evidence model assumes it.


References & further reading

Source spec (canonical)

  • Docs/SIDK Handoff Docs/architecture.md - the seven primitives, including the Live Data Layer (edge agents, ingestion, telemetry) and Canonical Fact Layer (variables, lots, calculations).
  • Docs/SIDK Handoff Docs/codebase-guide/sphuran-edge.md - the Rust edge agent’s architecture: collectors, durable buffer, uploader, control plane.
  • Docs/SIDK Handoff Docs/data-patterns.md - late-data handling, idempotency, replay on restart, projection vs cache vs live-query.
  • Docs/GREMI-App-Ecosystem/quantification-loop-v1.2.md - the cross-app flow of which the input pipeline is the upstream end; sensor lifecycle definition.
  • Docs/05-carbon-engine.md §10.1 - sources of uncertainty, including how instrument and lab uncertainty enter the engine.

External authoritative sources (cited inline)

Further reading

Footnotes

  1. SIDK edge agent codebase guide. Rust-implemented; supports OPC-UA, MQTT, Modbus TCP in v1; durable RocksDB buffer with 90-day replay window. Docs/SIDK Handoff Docs/codebase-guide/sphuran-edge.md

  2. Carbon Engine spec, §4.3.1 - Supplier-PCF resolution (extended five-tier order). The factor resolution hierarchy for Scope 3 inputs and contracted Scope 2 inputs. Docs/05-carbon-engine.md

  3. US EPA - 40 CFR Part 75 (Continuous Emission Monitoring), Subpart D - Missing Data Substitution. The tiered missing-data substitution philosophy that the engine’s substitution hierarchy is inspired by (with SIDK defaults, not a literal port - Part 75 uses 24-hour thresholds and PMA-based lookback). https://www.ecfr.gov/current/title-40/chapter-I/subchapter-C/part-75