Introduction
The OpenPacketCore SDK is a toolkit for building 5G Core Network Functions (CNFs) that run on Kubernetes. It combines Rust-based policy engines with Go-based Kubernetes orchestration to give operators both safety and flexibility.
What the SDK Provides
- Rust crates for protocol codecs (GTP-U, PFCP, NAS-5GS, NGAP v0,
experimental Diameter base/application dictionaries, the experimental
opc-proto-gtpv2cS2b subset, and the experimentalopc-proto-ikev2header/payload-chain scaffold), session management, configuration consensus, alarms, and runtime chassis. - Go packages (
operator-sdk-go) for Kubernetes operators: conditions, bridge to Rust policy, drain orchestration, workload synthesis, runtime-gate helpers, Multus/SR-IOV attachment helpers, and metrics. Newly added packet-core helper surfaces are experimental mechanism helpers, not product CRDs or production controller claims. - Reference operator (
sdk-reference-operator) demonstrating end-to-end reconciliation of a network function custom resource.
Getting Started
See Quickstart for environment setup and your first
SdkManagedNetworkFunction deployment.
Architecture
The SDK is documented through RFCs (high-level design) and ADRs (decision records). Start with:
Architecture
Layered view of the SDK. Arrows point in the dependency direction (inward).
flowchart TB
subgraph L1["Layer 1 — pure codecs & types (no async, no I/O)"]
types[opc-types]
codecs["opc-proto-* (pfcp, gtpu, gtpv2c, ngap, nas, diameter, ikev2)"]
protocol[opc-protocol]
end
subgraph L2["Layer 2 — models & ports"]
cfgmodel[opc-config-model]
ports["opc-mgmt-* ports (schema, path, errors, principal, limits, audit, authz, opstate, transport)"]
nacm[opc-nacm]
end
subgraph L3["Layer 3 — app orchestrator"]
bus["opc-config-bus (validate → authorize → persist → publish; commit-confirmed expiry rollback; recovery fence)"]
end
subgraph L4["Layer 4 — adapters (async)"]
netconf["opc-netconf-server (SSH/russh)"]
gnmi["opc-gnmi-server (tonic, mTLS)"]
persist[opc-persist]
tls["opc-tls / opc-identity (SPIFFE)"]
end
subgraph L5["Layer 5 — runtime & operators"]
runtime[opc-runtime]
oplc["operator-lifecycle / operator-controller (Rust)"]
gosdk["operators/operator-sdk-go + sdk-reference-operator (Go)"]
end
facade["opc-sdk (facade / prelude)"]
netconf --> bus
gnmi --> bus
bus --> ports
bus --> cfgmodel
persist --> ports
ports --> types
cfgmodel --> types
codecs --> protocol
nacm --> ports
tls --> ports
runtime --> ports
oplc --> runtime
gosdk -. bridge CLI contract .-> oplc
facade --> netconf
facade --> gnmi
facade --> bus
facade --> codecs
facade --> runtime
Legend: solid arrows are Cargo dependencies (direction = "depends on"); the dashed
edge is the Go↔Rust policy-CLI process boundary (JSON contract, versioned by
scripts/check-downstream-import.sh on the Go side).
OpenPacketCore SDK RFC Index
This directory contains the foundational RFCs for the OpenPacketCore SDK and CNF architecture. These documents are intended to be implementation inputs for engineers.
Foundation Set
| RFC | Title | Primary Scope |
|---|---|---|
| 001 | Transactional Management Substrate | Config commits, persistence, recovery, NACM boundary |
| 002 | YANG-to-Rust Projection | Codegen, RFC 7951, validation, memory layout |
| 003 | Security Substrate | SPIFFE, gNSI, tenant identity, keys, audit |
| 004 | High-Performance Session Store | Session state, leases, fencing, handover, geo-redundancy |
| 005 | Zero-Copy Protocol Framework | Parsers, codecs, lifetimes, fuzzing, spec tags |
| 006 | Conformance and Evidence Pipeline | SBOM, VEX, provenance, signing, known gaps |
| 007 | SBI Service Framework | TS 29.500/29.510, NRF, OAuth2, overload, retries |
| 008 | CNF Runtime Chassis | Startup, supervision, shutdown, health, resource budgets |
| 009 | Operator Lifecycle and Upgrade | CRDs, rollout, migration, drain, rollback |
| 010 | Data Governance and Privacy | Data classes, redaction, retention, LI, regulated records |
| 011 | Node and Data-Plane Resource Contract | SR-IOV, Multus, AF_XDP, CPU, NUMA, pod security |
| 012 | Testbed and Simulator Framework | Scenario DSL, simulators, fixtures, virtual time |
| 013 | Fault Management and Alarm Substrate | Alarms, severity, probable cause, FM sinks |
Recommended Reading Order
- RFC 008: runtime chassis.
- RFC 003: security substrate.
- RFC 001: management substrate.
- RFC 002: YANG projection.
- RFC 007: SBI framework.
- RFC 004: session store.
- RFC 005: protocol framework.
- RFC 009: operator lifecycle.
- RFC 010: data governance.
- RFC 011: node/data-plane resources.
- RFC 013: fault management.
- RFC 012: testbed framework.
- RFC 006: evidence pipeline.
RFC 006 should be revisited after each implementation slice because it defines the evidence required to claim that the slice is complete.
OPC-SDK-RFC-001: Transactional Management Substrate
Status: Draft for Implementation
Version: 2.0.0
Date: 2026-05-19
Audience: SDK implementers, NF owners, security reviewers, test authors
1. Abstract
This RFC defines the transactional management substrate for OpenPacketCore network functions. It specifies the configuration commit state machine, the isolation boundary between the management plane and data plane, the reference persistent store, recovery behavior, authorization hooks, observability, and implementation acceptance criteria.
The core invariant is:
An NF's running configuration is a deterministic, validated, authorized, and durable projection of its YANG-defined configuration.
This RFC corrects the initial draft in four important ways:
- The commit pipeline is a single-writer state machine, not a long-held async mutex.
- The management plane is explicitly resource-isolated from the data plane.
- SQLite WAL is allowed only as a reference management-plane store with container storage preflight checks.
- Persistence, encryption, audit, rollback, and recovery are made explicit enough for independent implementation by multiple contributors.
2. Scope
2.1 In Scope
- gNMI, NETCONF, and local operator configuration commits.
- Candidate, running, startup, rollback, and shadow-security configuration stores.
- Authorization of configuration mutations.
- Durable commit history and audit trail.
- Deterministic change notification to NF subsystems.
- Reference SQLite persistence backend.
- Interfaces that allow other persistence backends later.
2.2 Out of Scope
- User-plane packet forwarding.
- High-rate session state. See RFC 004.
- Protocol parsing. See RFC 005.
- Full supply-chain evidence generation. See RFC 006.
- Cluster-wide consensus. This RFC covers per-replica local persistence and commit sequencing. Cluster-level orchestration must be layered above it.
3. Design Goals
3.1 Security
- Default-deny authorization for all write operations.
- Fail-closed behavior for corrupt storage, invalid identity, failed decryption, failed validation, and incomplete recovery.
- No unredacted secret material in audit logs, telemetry, traces, or error messages.
- Cryptographic binding between config payload, schema version, transaction metadata, and principal identity.
- Tamper-evident audit history.
3.2 Performance
- Configuration commits must not starve data-plane workers.
- Data-plane readers must see configuration through wait-free or bounded-time snapshot access.
- Commit admission must provide bounded memory growth and clear backpressure.
- Heavy validation, serialization, compression, encryption, and fsync must not run on the async I/O worker set.
3.3 Maintainability
- The state machine must be explicit and testable.
- Generated and hand-written validation must use the same error model.
- Storage backends must implement a narrow trait with deterministic semantics.
- Each phase must have owner modules, metrics, logs, and fault injection tests.
3.4 Functionality
- Support create, update, replace, delete, validate-only, commit-confirmed, rollback, and startup restore.
- Support path-level audit and change notifications.
- Support rollback points and schema migrations.
- Support shadow-security configuration that is not exposed through ordinary
gNMI
Get.
4. Core Concepts
4.1 Stores
The SDK defines the following logical stores:
| Store | Purpose | Durable | Exposed By gNMI Get |
|---|---|---|---|
candidate | Transaction-local mutable config | No | No |
running | Active immutable config | Yes | Yes, after NACM filtering |
startup | Optional boot config alias or snapshot | Yes | Operator controlled |
rollback | Explicit rollback points | Yes | Metadata only |
shadow-security | gNSI/certificate/authz material | Yes | No |
The data plane MUST consume only immutable snapshots of running plus any
explicitly subscribed derived state. It MUST NOT read from candidate,
startup, or the raw persistence backend.
4.2 Config Snapshot
Generated root configs MUST implement:
#![allow(unused)] fn main() { pub trait OpcConfig: Clone + Send + Sync + 'static { type Delta: Send + Sync + core::fmt::Debug + 'static; fn schema_digest(&self) -> SchemaDigest; fn diff(&self, previous: &Self) -> Result<Vec<Self::Delta>, ConfigError>; fn apply_delta(&mut self, delta: Self::Delta) -> Result<(), ConfigError>; fn validate_syntax(&self) -> Result<(), ValidationError>; fn validate_semantics(&self, ctx: &ValidationContext) -> Result<(), ValidationError>; } }
Clone is required for the reference implementation, but large generated
configs SHOULD use structural sharing internally so candidate creation does not
copy every leaf for small patches.
4.3 Runtime Snapshot Access
The running config MUST be published through an atomic snapshot mechanism such
as arc-swap or an equivalent SDK type:
#![allow(unused)] fn main() { pub trait ConfigSnapshot<C>: Send + Sync { fn load(&self) -> std::sync::Arc<C>; fn version(&self) -> ConfigVersion; } }
Data-plane reads MUST NOT acquire the commit lock, await I/O, allocate large buffers, or call validation hooks.
5. Commit State Machine
5.1 States
Each commit moves through the following states:
| State | Description | May Fail | Durable Side Effect |
|---|---|---|---|
Admitted | Request accepted into bounded queue | Yes | No |
Authenticated | Peer identity verified | Yes | No |
Authorized | NACM/path policy passed | Yes | Audit denial |
Staged | Candidate built from running snapshot | Yes | No |
SyntaxValidated | YANG constraints passed | Yes | No |
SemanticallyValidated | NF validation passed | Yes | No |
Prepared | Serialized, encrypted, and ready to write | Yes | No |
Persisted | Commit record and audit record fsynced | Yes | Yes |
Published | Running pointer atomically swapped | No in normal operation | Yes |
Notified | Subscribers informed | Best effort per subscriber | Metrics/audit only |
No state is allowed to panic as part of ordinary error handling. A panic in the
commit worker is a process bug and MUST be treated as StateMachineFault.
5.2 Corrected Phase Ordering
The commit worker MUST serialize commits, but it MUST NOT hold a
tokio::sync::Mutex across .await, blocking validation, encryption,
serialization, or database I/O. The recommended structure is:
- Northbound handlers push
CommitRequestinto a bounded mpsc queue. - A single commit worker owns sequencing and transaction IDs.
- CPU-heavy validation runs through a bounded blocking/CPU pool.
- Crypto and serialization run through a bounded crypto pool.
- Persistence runs through a single writer backend handle.
- Publication is an atomic pointer swap.
This keeps ordering deterministic without turning the async runtime lock into a global bottleneck.
5.3 Commit Request
#![allow(unused)] fn main() { pub struct CommitRequest<C: OpcConfig> { pub request_id: RequestId, pub principal: TrustedPrincipal, pub transport: TransportType, pub source: RequestSource, pub operation: ConfigOperation, pub mode: CommitMode, pub deadline: std::time::Instant, pub idempotency_key: Option<IdempotencyKey>, pub base_version: ConfigVersion, pub candidate: Option<C>, pub changed_paths: Vec<YangPath>, } pub enum CommitMode { Commit, ValidateOnly, CommitConfirmed { timeout: std::time::Duration }, Rollback { target: RollbackTarget }, } }
idempotency_key SHOULD be supported for northbound clients that retry after
UNAVAILABLE.
Candidate-bearing requests MUST carry the running config base_version used to
build the candidate. The ConfigBus worker MUST reject the request before
validation or publication when that value no longer matches the current running
version, so stale full-candidate writers cannot overwrite an intervening
commit.
5.4 Commit Result
#![allow(unused)] fn main() { pub struct CommitResult { pub tx_id: TxId, pub base_version: ConfigVersion, pub new_version: Option<ConfigVersion>, pub status: CommitStatus, pub changed_paths: Vec<YangPath>, pub apply_plan: Option<ApplyPlan>, } }
Failed commits MUST include stable machine-readable error codes. Error strings MUST NOT contain secrets or raw config fragments.
Candidate-bearing commit, commit-confirmed, and validate-only requests SHOULD
return an ApplyPlan that classifies the operational impact of the
SDK-derived changed paths after validation and before durable append. The
default classifier returns hot plans so existing products remain compatible;
products MAY install a ConfigImpactClassifier for domain-specific warm,
drain-required, restart-required, or forbidden-live behavior.
forbidden-live and apply-plan hard errors MUST fail closed before durable
append/publication and attach the rejected plan to CommitError.apply_plan.
6. Management Thread Boundary
6.1 Required Execution Domains
The initial "Three-Pool" model is directionally correct but underspecified. The SDK MUST implement the following boundaries:
| Domain | Work | Requirement |
|---|---|---|
| Async I/O | gNMI, NETCONF, gNSI, health, metrics | Never perform CPU-heavy work or fsync |
| Commit worker | Sequencing, state machine ownership | Single logical writer, bounded queue |
| Validation pool | Generated and NF semantic validation | Bounded threads and timeout |
| Crypto/serialization pool | RFC 7951 serialization, compression, AEAD | Bounded threads and memory |
| Persistence writer | SQLite or backend write transaction | Single writer per local store |
| Data-plane workers | Packet/session fast path | No dependency on management pools |
Implementations MAY combine validation and crypto pools for small deployments, but the default carrier profile MUST expose independent limits for both.
6.2 Starvation Protection
The SDK MUST provide:
- Separate semaphores for validation, crypto, and persistence work.
- Configurable max queued commits, default
32. - Configurable max pending bytes across staged candidates, default
64 MiB. - Per-request deadline propagation.
- Admission rejection with gRPC
UNAVAILABLEand retry metadata when queues are full. - A hard rule that data-plane threads never run management-plane blocking work.
Carrier CNF deployments SHOULD pin data-plane workers and management workers to different CPU sets using Kubernetes CPU Manager or an equivalent runtime mechanism. The SDK MUST work without CPU pinning, but the documented production profile MUST include it.
6.3 Time Budgets
Default phase budgets:
| Phase | Default Budget |
|---|---|
| Admission wait | 2 seconds |
| Syntax validation | 5 seconds |
| Semantic validation | 30 seconds |
| Serialization/encryption | 10 seconds |
| Persistence | 10 seconds |
| Notification fanout | 2 seconds per subscriber batch |
Budgets MUST be configurable per NF. Expired commits MUST fail before publication. Persistence timeouts after partial backend work MUST be resolved by backend recovery logic before the next commit is accepted.
7. Persistence Abstraction
7.1 Trait
#![allow(unused)] fn main() { #[async_trait::async_trait] pub trait ConfigStore: Send + Sync { async fn load_latest(&self) -> Result<Option<StoredConfig>, PersistError>; async fn load_rollback(&self, target: RollbackTarget) -> Result<StoredConfig, PersistError>; async fn append_commit(&self, record: CommitRecord, audit: Vec<AuditRecord>) -> Result<(), PersistError>; async fn mark_confirmed(&self, tx_id: TxId) -> Result<(), PersistError>; async fn create_rollback_point(&self, tx_id: TxId, label: Option<String>) -> Result<(), PersistError>; async fn preflight(&self) -> Result<PersistCapabilities, PersistError>; } }
append_commit MUST be atomic: either the commit record and its audit records
are durable together, or neither is visible during recovery.
7.2 Commit Record
#![allow(unused)] fn main() { pub struct CommitRecord { pub tx_id: TxId, pub parent_tx_id: Option<TxId>, pub version: ConfigVersion, pub committed_at: Timestamp, pub principal: TrustedPrincipal, pub source: RequestSource, pub schema_digest: SchemaDigest, pub plaintext_digest: Sha256Digest, pub encrypted_blob: EncryptedBlob, pub rollback_point: bool, pub confirmed_deadline: Option<Timestamp>, } }
The plaintext digest is verified only after successful AEAD decryption. It is not a substitute for AEAD integrity.
8. SQLite Reference Backend
8.1 Positioning
SQLite WAL is a sound reference backend for a single NF replica's management configuration and audit history because commits are low-rate, read access is local, recovery is simple, and the operational footprint is small.
SQLite MUST NOT be treated as a distributed consensus system. It MUST NOT be used for high-rate session state or cross-replica active/active configuration coordination.
8.2 Mandatory Container Storage Preflight
Before accepting writes, the SQLite backend MUST verify and report:
- Database path is on a persistent volume when persistence is required.
- Filesystem supports POSIX byte-range locking compatible with SQLite.
- WAL, SHM, and database files are on the same filesystem.
- The volume is not a known-unsafe network filesystem unless explicitly overridden by an operator with an evidence waiver.
fsyncis not disabled by mount options or runtime configuration.- The database directory is writable only by the NF service account UID/GID.
- Free space is above configured threshold.
- Startup can create, checkpoint, close, and reopen a test WAL transaction.
If preflight fails, the NF MUST fail closed unless configured for an explicit ephemeral development mode.
8.3 PRAGMA Profile
The reference backend MUST apply and verify:
PRAGMA journal_mode = WAL;
PRAGMA synchronous = EXTRA;
PRAGMA foreign_keys = ON;
PRAGMA locking_mode = NORMAL;
PRAGMA busy_timeout = 5000;
PRAGMA temp_store = MEMORY;
locking_mode = EXCLUSIVE SHOULD NOT be the default in containers because it
can break sidecar backup, online inspection, and some recovery workflows. The
backend MAY offer exclusive mode for sealed appliances, but the default is
NORMAL with a single SDK writer and no external writers.
synchronous = EXTRA is acceptable as a conservative default, but the backend
MUST document that durability still depends on the underlying filesystem and
storage class. Production deployments MUST use tested PVC/storage classes, not
overlay filesystem layers for durable config.
8.4 Schema
CREATE TABLE schema_version (
id INTEGER PRIMARY KEY CHECK (id = 1),
schema_digest BLOB NOT NULL,
sdk_version TEXT NOT NULL,
created_at TEXT NOT NULL
);
CREATE TABLE config_history (
tx_id BLOB PRIMARY KEY,
parent_tx_id BLOB NULL REFERENCES config_history(tx_id),
version INTEGER NOT NULL UNIQUE,
committed_at TEXT NOT NULL,
principal TEXT NOT NULL,
source TEXT NOT NULL,
schema_digest BLOB NOT NULL,
plaintext_digest BLOB NOT NULL,
encrypted_blob BLOB NOT NULL,
rollback_point INTEGER NOT NULL DEFAULT 0,
confirmed_deadline TEXT NULL,
confirmed_at TEXT NULL
);
CREATE TABLE audit_trail (
id INTEGER PRIMARY KEY AUTOINCREMENT,
tx_id BLOB NOT NULL REFERENCES config_history(tx_id) ON DELETE RESTRICT,
sequence INTEGER NOT NULL,
yang_path TEXT NOT NULL,
op_type TEXT NOT NULL CHECK(op_type IN ('CREATE', 'UPDATE', 'REPLACE', 'DELETE')),
previous_value TEXT NULL,
new_value TEXT NULL,
redaction_applied INTEGER NOT NULL DEFAULT 0,
previous_hash BLOB NOT NULL,
entry_hmac BLOB NOT NULL,
UNIQUE(tx_id, sequence)
);
CREATE INDEX audit_trail_tx_id_idx ON audit_trail(tx_id);
CREATE INDEX config_history_rollback_idx ON config_history(version, rollback_point);
8.5 WAL Maintenance
The backend MUST:
- Set a bounded WAL autocheckpoint threshold.
- Run explicit checkpoints during graceful shutdown and after large commits.
- Export metrics for WAL size and checkpoint failures.
- Refuse startup when WAL recovery fails.
- Avoid deleting WAL or SHM files manually.
9. Encryption at Rest
Configuration encryption is specified here at the envelope level and governed by RFC 003 for key management.
9.1 Algorithm
- Default AEAD:
AES-256-GCM-SIV. - Alternative for non-AES-accelerated targets:
XChaCha20-Poly1305, if allowed by the deployment security profile. - Random nonce generation is still REQUIRED even when using nonce-misuse resistant AEAD.
9.2 Envelope
struct ConfigEnvelopeV1 {
magic: [u8; 4] = "OPCE";
version: u16 = 1;
alg_id: u16;
key_id_len: u16;
nonce_len: u16;
aad_len: u32;
key_id: [u8; key_id_len];
nonce: [u8; nonce_len];
aad: [u8; aad_len];
ciphertext_and_tag: [u8; remaining];
}
AAD MUST include:
tx_idparent_tx_idversioncommitted_atprincipalschema_digeststore_kind
9.3 Key Derivation
When using a master secret, per-commit keys MUST be derived with HKDF-SHA256:
salt = tx_id || schema_digest
info = "openpacketcore/config/v1" || store_kind || key_id
key = HKDF(master_secret, salt, info, 32)
The backend MUST support key rotation by retaining enough key metadata to read old commits until the operator performs re-encryption or retention expiry.
10. Authorization Boundary
10.1 Auth Context
#![allow(unused)] fn main() { pub struct AuthContext { pub principal: TrustedPrincipal, pub spiffe_id: Option<SpiffeId>, pub transport: TransportType, pub source_ip: std::net::IpAddr, pub tenant: TenantId, pub authenticated_at: Timestamp, } }
10.2 NACM Requirements
The NACM engine MUST:
- Normalize YANG paths before policy evaluation.
- Reject ambiguous module prefixes.
- Treat missing policy as deny.
- Authorize every changed path, not just the top-level request path.
- Authorize
read,create,update,replace,delete,exec, andsubscribeactions separately. - Enforce policy before candidate mutation and again before publication if the policy changed during a long-running commit.
Trie evaluation is acceptable for performance, but wildcard, subtree, module, and default-deny semantics MUST be tested against RFC 8341 behavior.
11. Notifications
After publication, the ConfigBus MUST notify subscribers with:
#![allow(unused)] fn main() { pub struct ConfigChange<C: OpcConfig> { pub tx_id: TxId, pub version: ConfigVersion, pub previous: std::sync::Arc<C>, pub current: std::sync::Arc<C>, pub deltas: Vec<C::Delta>, pub changed_paths: Vec<YangPath>, } }
Subscriber channels MUST be bounded. Slow subscribers MUST be isolated so they cannot block publication of future commits. Each subscriber must choose one of:
drop_oldestdrop_newestdisconnect_on_lagforce_resync
Critical NF subsystems that cannot tolerate missed notifications MUST expose a
resync method and compare local applied version against ConfigBus::version().
12. Recovery
12.1 Startup
Startup MUST:
- Run storage preflight.
- Recover or checkpoint WAL if required.
- Load highest confirmed config version.
- Decrypt and authenticate envelope.
- Verify plaintext digest.
- Verify schema compatibility or run migration.
- Run syntax validation.
- Run semantic validation in startup mode.
- Publish running snapshot.
- Start northbound write admission only after running is published.
12.2 Rollback
If latest config fails startup semantic validation, the NF MAY try rollback points in descending version order. It MUST audit the rollback decision on the next successful write-capable startup. If no rollback point validates, the NF MUST fail closed and expose a read-only recovery endpoint only if explicitly enabled.
12.3 Commit-Confirmed
commit-confirmed MUST:
- Persist the tentative config with a deadline.
- Publish it as running.
- Require explicit confirmation before deadline.
- Automatically roll back to the parent config if not confirmed.
- Emit warning telemetry before rollback.
The rollback timer MUST survive process restart by reading persisted
confirmed_deadline.
13. Observability
Required metrics:
opc_config_commits_total{outcome,reason,transport}opc_config_commit_duration_seconds{phase}opc_config_commit_queue_depthopc_config_commit_queue_rejections_total{reason}opc_config_running_versionopc_config_subscriber_lag{subscriber}opc_persist_wal_bytesopc_persist_checkpoint_total{outcome}opc_persist_fsync_duration_secondsopc_nacm_decisions_total{action,outcome}
Required structured log fields:
request_idtx_idversionprincipaltenanttransportphaseoutcomeerror_code
Logs MUST NOT contain secret values or raw config blobs.
14. Testing Requirements
14.1 Unit Tests
- State transition table.
- NACM path normalization and default deny.
- Candidate patch behavior.
- Encryption envelope parse/decrypt failures.
- Audit hash chain validation.
- Subscriber lag policies.
14.2 Integration Tests
- Concurrent commits serialize deterministically.
- Validation timeout does not block health/read endpoints.
- Persistence crash before commit is invisible after restart.
- Persistence crash after commit is visible after restart.
- WAL checkpoint and recovery on restart.
- Commit-confirmed rollback after process restart.
- Rollback point selection when latest config fails validation.
14.3 Fault Injection
- Disk full.
fsyncfailure.- Corrupt WAL.
- Corrupt encrypted blob.
- Missing key.
- Expired SPIFFE identity.
- NACM policy change during long commit.
- Slow or disconnected subscriber.
14.4 Performance Tests
Minimum carrier profile gates:
- Data-plane config snapshot load p99 under 1 microsecond in-process.
- Northbound read path remains available during 30 second semantic validation.
- Commit queue rejects rather than exceeding configured memory limit.
- 10,000 path-level audit records commit without unbounded memory growth.
- SQLite backend sustains 10 commits/second for 60 seconds on reference PVC.
15. Module Ownership
Contributors should implement these modules independently with the listed ownership:
| Module | Responsibility |
|---|---|
opc-config-bus | Commit worker, snapshot publication, subscriber fanout |
opc-config-model | Shared IDs, errors, request/result types |
opc-nacm | Path normalization and authorization decisions |
opc-persist | ConfigStore trait and SQLite backend |
opc-crypto | Envelope encryption/decryption and key lookup adapter |
opc-audit | Audit records, redaction markers, hash chain |
opc-config-testkit | Fault injection, mock store, mock NACM |
Each module MUST expose a narrow public API, avoid cyclic dependencies, and include doc examples for the primary workflow.
16. Acceptance Criteria
This RFC is implemented when:
- A commit cannot publish unless authorization, validation, encryption, and durable append all succeed.
- Data-plane snapshot access is independent of commit queue and persistence health.
- SQLite preflight rejects unsafe durable deployments.
- Recovery handles clean restart, crash restart, rollback, and commit-confirmed expiry.
- Audit logs are tamper-evident and redacted.
- Metrics expose queue, phase latency, persistence, and authorization health.
- Fault injection tests cover all failures listed in Section 14.3.
OPC-SDK-RFC-002: YANG-to-Rust Projection and Codegen Engine
Status: Draft for Implementation
Version: 2.0.0
Date: 2026-05-19
Audience: SDK implementers, YANG model authors, NF teams, operator authors
1. Abstract
This RFC defines how OpenPacketCore projects YANG models into Rust data structures, validators, serializers, patch applicators, metadata tables, and operator-facing schemas. The generated code must preserve YANG semantics, support RFC 7951 JSON encoding, avoid stack blowups on large configurations, and provide deterministic APIs for the management substrate in RFC 001.
The key correction from the initial draft is that code generation MUST NOT rely on ad hoc recursive traversal or direct translation of arbitrary XPath strings into Rust closures. The SDK must compile YANG into a typed intermediate representation with bounded validation behavior, stable metadata, and differential tests against a reference YANG engine.
2. Scope
2.1 In Scope
- YANG 1.1 module loading and schema resolution.
- RFC 7951 JSON serialization and deserialization.
- Rust type generation for config and state trees.
- Validation for type constraints,
must,when,leafref,unique,min-elements,max-elements,mandatory, and defaults. - gNMI/NETCONF patch application metadata.
- Secret/redaction metadata for RFC 001 and RFC 003.
- Runtime schema metadata consumed by gNMI, NETCONF, NACM, audit, and operator policy helpers.
- Conformance tags for RFC 006.
2.2 Out of Scope
- Runtime session state schema. See RFC 004.
- Protocol wire codecs. See RFC 005.
- UI form generation.
- Go/Kubernetes CRD generation. Product operators own their API shape and may consume the generated Rust schema/policy metadata through RFC 009 helpers.
- Support for proprietary YANG extensions unless explicitly registered in the extension registry defined here.
3. Design Goals
3.1 Security
- Generated deserializers must reject unknown, ambiguous, duplicate, or malformed fields unless the relevant protocol explicitly allows them.
- Secret leaves must use secret-aware generated types and redaction metadata.
- Generated validators must not panic on hostile input.
- Generated code must avoid
unsafeunless an RFC-specific exception is approved and fuzzed.
3.2 Performance
- Validation must be linear or near-linear in the size of the config for common cases.
- Large lists must validate through generated indices, not repeated global depth-first searches.
- Generated root structs must keep stack footprint bounded.
- Patch application must avoid full-tree clone when structural sharing is enabled.
3.3 Maintainability
- Code generation must be deterministic for identical inputs.
- Generated files must have stable names, stable item order, and stable formatting.
- Constraint lowering must go through a typed IR that can be inspected, tested, and rendered.
- Generated APIs must be boring and consistent across all NFs.
3.4 Functionality
- Support canonical YANG schema features required by 3GPP and IETF models.
- Preserve presence, default, namespace, ordering, and key semantics.
- Emit enough metadata for NACM, audit, gNMI paths, and conformance mapping.
- Support schema migrations between SDK releases.
4. Inputs and Outputs
4.1 Inputs
The code generator consumes:
- YANG module files.
- A module lockfile containing exact module names, revisions, and checksums.
- A generation profile.
- Optional extension registry.
4.2 Outputs
For each generation unit, the tool emits:
- Rust structs, enums, newtypes, validators, serializers, and patch applicators.
- Static schema metadata tables.
- Path constants and path parser helpers.
- Redaction and NACM metadata.
- Property test fixtures.
schema-digest.jsonfor runtime compatibility checks.conformance-tags.jsonfor RFC 006.
Generated output MUST be reproducible from the lockfile and profile.
5. Schema Resolution Pipeline
5.1 Frontend
The frontend MUST parse YANG 1.1 and preserve:
- Module and submodule identity.
- Revision.
- Namespace and prefix.
- Imports and includes.
- Extension statements.
- Source locations for diagnostics.
The implementation MAY use libyang2 through a safe wrapper or a native Rust
parser. In either case, the SDK MUST include differential tests against at least
one reference YANG implementation for supported constructs.
5.2 Middle-End
The middle-end MUST produce a flattened schema IR by resolving:
typedefgroupingandusesaugmentdeviationrefinefeatureandif-featureidentityinheritance- module prefixes and namespaces
The flattened model MUST retain enough source mapping to produce diagnostics that point back to the original YANG module and line.
5.3 Backend
The backend emits Rust and schema metadata. It MUST:
- Sort emitted items deterministically.
- Use stable generated filenames.
- Run generated Rust through
rustfmt. - Fail generation if generated code does not compile.
- Emit compile-time size checks.
6. Rust Type Mapping
6.1 Scalar Leaves
| YANG Type | Rust Representation | RFC 7951 JSON Notes |
|---|---|---|
int8, int16, int32 | i8, i16, i32 | JSON number |
uint8, uint16, uint32 | u8, u16, u32 | JSON number |
int64, uint64 | i64, u64 | JSON string to avoid precision loss |
decimal64 | generated fixed-scale newtype or rust_decimal::Decimal | JSON string |
string | String or generated constrained newtype | JSON string |
boolean | bool | JSON boolean |
empty | generated unit marker | RFC 7951 [null] |
enumeration | generated Rust enum | renamed variants preserve YANG names |
bits | generated bitflags/newtype | space-separated string |
binary | bytes::Bytes or Vec<u8> | base64 string |
identityref | generated enum or IdentityRef newtype | namespace-qualified string when needed |
instance-identifier | YangInstanceIdentifier | namespace-aware path string |
leafref | generated newtype over target type | encoded like target leaf |
union | generated ordered enum | parse order follows YANG union member order |
Generated constrained newtypes MUST enforce range, length, and pattern constraints during deserialization and validation.
6.2 Containers
YANG containers map to Rust structs. The generator must distinguish:
- Presence containers.
- Non-presence containers.
- Optional generated fields.
- Mandatory generated fields.
Large or optional containers SHOULD be boxed. The generator MUST box a field if embedding it would make the parent exceed the configured stack budget.
Default stack budget:
max_size_of_root = 4096 bytes
max_size_of_any_struct = 1024 bytes
Budgets are profile-configurable. Generated code MUST include compile-time assertions for these limits.
6.3 Lists
YANG list projection depends on key and ordering:
| YANG List Kind | Rust Representation |
|---|---|
keyed, ordered-by system | BTreeMap<Key, Value> |
keyed, ordered-by user | Vec<Value> plus generated key index |
| unkeyed config list | Vec<Value> with min/max validation |
config false operational list | Vec<Value> or backend-specific iterator |
The key type MUST be a generated struct when there are multiple key leaves. Duplicate keys MUST be rejected during deserialization and patch application.
6.4 Leaf-Lists
Leaf-lists map to Vec<T> plus generated validation for:
min-elementsmax-elements- uniqueness, when required by YANG semantics
- user ordering
- default values
Generated code SHOULD build a temporary set for uniqueness checks rather than performing O(n^2) comparisons.
6.5 Choices and Cases
choice maps to a generated enum. The generator MUST preserve:
- default case
- mandatory choice behavior
whenconditions on cases- removal of sibling case data when a different case is selected
Patch application MUST enforce case exclusivity.
7. Presence and Defaults
YANG requires distinguishing absent, defaulted, and explicitly set values. The
generator MUST NOT collapse these states into plain Option<T> when protocol
semantics require the distinction.
Generated fields SHOULD use a profile-selected representation such as:
#![allow(unused)] fn main() { pub enum LeafPresence<T> { Absent, Defaulted(T), Explicit(T), } }
For ergonomic NF logic, generated structs MAY expose helper accessors:
#![allow(unused)] fn main() { impl UpfInterface { pub fn mtu(&self) -> u16; pub fn mtu_presence(&self) -> LeafPresence<&u16>; } }
RFC 7951 serialization MUST follow the selected output mode:
ExplicitOnly: omit defaults unless explicitly set.WithDefaults: include effective defaults.Operational: include state and effective values.
8. RFC 7951 Encoding Requirements
The serializer/deserializer MUST handle:
- Namespace-qualified member names where required.
- 64-bit integers as strings.
decimal64as strings.emptyas[null].- Base64 for
binary. - Identity names with module prefixes when the identity is not in the parent namespace.
- Instance identifiers with namespace-aware path segments.
- Duplicate JSON object member rejection.
- Unknown field handling according to protocol profile.
Round-trip tests MUST cover all scalar mappings.
9. Constraint IR and Validation
9.1 Constraint IR
The generator MUST lower must, when, range, length, pattern, and other
constraints into a typed IR:
#![allow(unused)] fn main() { pub enum ConstraintExpr { Path(PathExpr), Literal(Literal), Function(FunctionCall), Compare { op: CompareOp, left: Box<ConstraintExpr>, right: Box<ConstraintExpr> }, Boolean { op: BooleanOp, terms: Vec<ConstraintExpr> }, } }
Direct string-to-Rust closure generation is forbidden because it is difficult to audit, hard to fuzz, and prone to semantic drift.
9.2 Supported XPath Profile
The initial SDK profile MUST support the XPath subset required by OpenPacketCore YANG models and selected IETF/3GPP dependencies. Unsupported expressions MUST fail generation with a clear diagnostic, not become runtime warnings.
The supported function list must be versioned. Each function implementation MUST have:
- Unit tests.
- Source-location diagnostics.
- Differential tests against the reference YANG engine.
9.3 Validation Engine
Generated validation MUST be split:
validate_typesvalidate_cardinalityvalidate_choicesvalidate_whenvalidate_mustvalidate_leafrefsvalidate_uniquevalidate_semanticshook for NF-owned logic
Validators MUST return structured errors:
#![allow(unused)] fn main() { pub struct ValidationError { pub path: YangPath, pub code: ValidationCode, pub message: String, pub source: Option<YangSourceLocation>, } }
Messages MUST be safe for northbound clients and MUST NOT expose secrets.
10. Leafref and Indexing
The initial draft required a depth-first search for each leafref. That is not
acceptable for large configs.
The generator MUST create validation indices for referenced lists and leaves:
#![allow(unused)] fn main() { pub struct ValidationIndices<'a> { pub interfaces_by_name: BTreeMap<&'a str, &'a Interface>, pub slices_by_s_nssai: BTreeMap<SNssaiKeyRef<'a>, &'a Slice>, } }
Validation flow:
- Build indices in deterministic order.
- Reject duplicate keys.
- Validate all
leafrefconstraints using the indices. - Drop indices before publication.
Index building MUST be iterative and bounded by the configured validation memory budget.
11. Memory Safety and Stack Discipline
Generated code MUST be safe Rust by default.
11.1 Stack Budget
The generator MUST calculate size_of::<T>() for generated root and nested
types through compile-time tests. Any type exceeding budget must be boxed,
interned, or represented through a collection.
11.2 Traversal
Generated validation and serialization MUST avoid unbounded recursive traversal. Implementations SHOULD use explicit stacks:
#![allow(unused)] fn main() { let mut work = Vec::with_capacity(initial_capacity); work.push(NodeRef::Root(root)); while let Some(node) = work.pop() { // validate node and push children } }
The SDK MUST define a maximum schema depth and maximum instance depth. Exceeding either MUST fail parsing or validation with a structured error.
11.3 Drop Behavior
Generated models MUST NOT create recursive self-referential types. If future extensions introduce recursive structures, the generator must provide iterative drop or arena ownership to avoid stack overflow.
11.4 Large Configs
The generator MUST support configs with:
- 100,000 list entries in a single keyed list.
- 1,000,000 scalar leaves across the tree in stress tests.
- Deep but valid schemas up to the configured maximum depth.
Stress tests must verify no stack overflow and bounded peak memory.
12. Patch Application
Generated patch applicators MUST support:
- gNMI
Update - gNMI
Replace - gNMI
Delete - NETCONF
merge - NETCONF
replace - NETCONF
create - NETCONF
delete - NETCONF
remove
Patch behavior MUST be generated from schema metadata, not hand-written per NF.
Patch application MUST:
- Validate path existence and key predicates.
- Preserve YANG default semantics.
- Enforce list key immutability.
- Enforce choice/case exclusivity.
- Track changed paths for NACM and audit.
- Avoid mutating
running; onlycandidatemay be modified.
13. Secret and Redaction Metadata
The generator MUST mark fields as secret when indicated by:
opc:secrettailf:display-hint "password"- configured extension registry entries
- explicit projection profile overrides
Generated secret fields SHOULD use a secret-aware type:
#![allow(unused)] fn main() { pub struct SecretLeaf<T> { inner: secrecy::SecretBox<T>, } }
Generated Debug, audit, telemetry, and error rendering MUST redact these
values. Serialization for persistence may include encrypted secret values only
through the RFC 001/RFC 003 envelope.
14. Operator Schema Boundary
The generator MUST expose enough Rust schema metadata for operator policy code to validate compatibility, migrations, admission, and config-apply decisions without hand-maintained side schemas.
Generated schema metadata MUST include:
- canonical YANG paths and module identity;
- config/state classification;
- list-key ordering;
- NACM action mapping;
- redaction data classes;
- schema digest data for compatibility checks.
The SDK does not generate Go structs or Kubernetes CRD fragments from
opc-yanggen. Product operators own their Kubernetes API shape and may use the
Rust operator-lifecycle, operator-controller, and operator-lifecycle-cli
contracts to bridge those APIs into the SDK policy surface. Large NF configs are
therefore split, referenced, or summarized by the product operator rather than by
the YANG generator.
15. Schema Migration
Generated code MUST include schema digest metadata. On startup, RFC 001 uses the digest to determine whether persisted config can be loaded directly or requires migration.
Migration support MUST provide:
#![allow(unused)] fn main() { pub trait ConfigMigration { fn from_schema(&self) -> SchemaDigest; fn to_schema(&self) -> SchemaDigest; fn migrate(&self, input: serde_json::Value) -> Result<serde_json::Value, MigrationError>; } }
Migrations MUST be deterministic and tested with golden inputs.
16. Implementation Contracts
To keep the generated system modular and reviewable, every generated module MUST follow this layout:
generated/<module_name>/
mod.rs
types.rs
paths.rs
serde.rs
validate.rs
patch.rs
metadata.rs
redaction.rs
tests/
roundtrip.rs
validation.rs
patch.rs
Rules:
- Hand-written code MUST NOT edit generated files.
- Generated files MUST contain a header with generator version and schema digest.
- Public generated APIs MUST be documented with YANG path and source module.
- Each generated validation function MUST be small enough for review and have a stable name derived from the YANG path.
- Conformance tags for RFC 006 MUST be emitted near the generated item that implements the requirement.
17. Testing Requirements
17.1 Generator Tests
- Deterministic output for identical inputs.
- Stable schema digest.
- Unsupported YANG feature fails generation.
- Differential validation against reference YANG engine.
- Source-location diagnostics.
17.2 Generated Code Tests
- RFC 7951 round trips for every scalar type.
- Presence/default serialization modes.
- Leafref validation with large lists.
mustandwhenvalidation.- Choice/case exclusivity.
- Patch operation matrix.
- Secret redaction.
- Stack size compile-time checks.
17.3 Fuzzing
Fuzz targets MUST include:
- RFC 7951 JSON deserialization.
- Path parsing.
- Patch application.
- Constraint evaluator.
Fuzz failures MUST be minimized and committed as regression tests.
17.4 Performance Gates
Minimum gates for a generated carrier profile:
- Deserialize 10 MiB RFC 7951 config without stack overflow.
- Validate 100,000 keyed list entries with leafrefs in O(n log n) or better.
- Patch a single leaf in a large config without full serialization.
- Generated root
size_ofbelow configured budget. - No unbounded recursion in validation or serialization paths.
18. Extension Registry
The SDK MUST maintain a versioned extension registry:
[[extension]]
name = "opc:secret"
behavior = "secret"
[[extension]]
name = "tailf:display-hint"
value = "password"
behavior = "secret"
Unknown extensions default to ignore-with-warning only if the generation
profile allows it. Carrier profiles SHOULD fail generation for unknown
extensions that affect config, security, or validation behavior.
19. Acceptance Criteria
This RFC is implemented when:
- Generated Rust preserves YANG presence, defaults, ordering, keys, and namespace semantics.
- RFC 7951 round trips pass for all supported types.
- Large config validation is bounded and does not use unbounded recursive DFS.
- Unsupported XPath/YANG constructs fail generation with diagnostics.
- Generated patch applicators support gNMI and NETCONF operation semantics.
- Secret metadata integrates with audit redaction and persistence.
- Operator policy helpers can consume generated schema metadata without a hand-maintained side schema or generated Go/Kubernetes projection.
- Output is deterministic and suitable for parallel implementation.
OPC-SDK-RFC-003: Security Substrate
Status: Draft for Implementation
Version: 2.0.0
Date: 2026-05-19
Audience: SDK implementers, security engineers, operator authors, NF teams
1. Abstract
This RFC defines the OpenPacketCore security substrate: workload identity, transport security, authorization, key management, secret handling, audit integrity, and runtime security administration. It integrates SPIFFE/SPIRE, gNSI, NACM, AEAD envelope encryption, and tenant-aware policy into a coherent boundary suitable for carrier-grade cloud-native network functions.
The initial draft correctly selected SPIFFE and gNSI, but it did not define a strong enough multi-tenant carrier boundary, key lifecycle, replay controls, or break-glass governance. This version makes those contracts explicit.
2. Security Objectives
2.1 Security
- Authenticate every workload and operator action with cryptographic identity.
- Authorize every operation by tenant, role, transport, method, and YANG path.
- Encrypt all sensitive persistent configuration and session state.
- Keep secret material out of logs, telemetry, panic messages, and ordinary gNMI reads.
- Provide tamper-evident audit and durable security event trails.
- Fail closed on invalid identity, unknown issuer, expired certificate, failed authorization, key lookup failure, or audit integrity failure.
2.2 Performance
- TLS rotation must not drop established data-plane sessions unless policy requires it.
- Authorization decisions must be cacheable and bounded.
- Crypto operations must use the RFC 001 crypto pool or equivalent offload so they do not starve async or data-plane workers.
- Security checks on high-rate paths must avoid heap allocation in the common case.
2.3 Maintainability
- Identity parsing, authorization, key lookup, and redaction must be separate modules with narrow APIs.
- Policy documents must be versioned, validated, and testable offline.
- Security defaults must live in one profile file, not scattered constants.
- The same security metadata must drive NACM, audit, and evidence generation.
2.4 Functionality
- Support SPIFFE X.509-SVID identity.
- Support trust domain federation.
- Support gNSI certificate and authorization services.
- Support break-glass with strict governance.
- Support tenant-aware policy.
- Support key rotation and historical decryption.
3. Threat Model
The SDK assumes attackers may:
- Control an unprivileged pod in the same Kubernetes cluster.
- Control another tenant namespace.
- Replay old management-plane requests.
- Attempt confused-deputy attacks through the operator.
- Read persistent volumes or backend snapshots offline.
- Corrupt local database files.
- Delay, drop, or reorder network packets.
- Trigger malformed gNMI, NETCONF, gNSI, or protocol inputs.
- Observe timing, status codes, and logs.
- Compromise a single NF replica.
The SDK does not claim to survive:
- Total compromise of the root trust domain signing keys.
- Compromise of the active KMS/HSM root keys without detection.
- Kernel-level compromise of the node running the NF.
- Malicious code compiled into the NF binary.
These residual risks MUST be documented in RFC 006 known gaps.
4. Identity Model
4.1 SPIFFE Workload Identity
Every NF replica MUST obtain an X.509-SVID from the local SPIRE Workload API.
Default SPIFFE ID format:
spiffe://<trust-domain>/tenant/<tenant-id>/ns/<namespace>/sa/<service-account>/nf/<nf-kind>/instance/<instance-id>
The original namespace/service-account pattern is insufficient for
multi-tenant carrier isolation because namespaces are often operational
boundaries, not contractual tenant boundaries. tenant-id MUST be explicit
unless the deployment uses one trust domain per tenant.
4.2 Identity Claims
The SDK MUST parse the SVID into:
#![allow(unused)] fn main() { pub struct WorkloadIdentity { pub trust_domain: TrustDomain, pub tenant: TenantId, pub namespace: Namespace, pub service_account: ServiceAccount, pub nf_kind: NetworkFunctionKind, pub instance: InstanceId, pub spiffe_id: SpiffeId, pub expires_at: Timestamp, } }
Identity parsing MUST reject:
- Unknown path formats.
- Missing tenant.
- Invalid NF kind.
- Expired SVID.
- SVIDs with trust domains not present in the active bundle set.
4.3 Workload Attestation
SPIRE registration entries MUST bind identity to Kubernetes selectors such as:
- namespace
- service account
- pod label set
- node attestation policy
- image digest, when available through the attestor
The SDK MUST document the required SPIRE registration pattern. Relying only on service account name is not sufficient for production carrier profiles.
4.4 Trust Domain Federation
Federation MUST be explicit. The SDK MUST load and validate trust bundles for:
- local workload trust domain
- management/operator trust domain
- optional peer-region trust domains
Federation policy MUST define which remote trust domains may perform which actions. Accepting a federated bundle MUST NOT automatically grant management privileges.
Example:
[[federation]]
trust_domain = "operator.openpacketcore.example"
allowed_tenants = ["tenant-a"]
allowed_roles = ["config-admin", "security-admin"]
allowed_transports = ["gnmi", "gnsi"]
4.5 Rotation
The SDK MUST watch SVID and bundle updates and hot-reload TLS acceptors and clients without process restart.
Rotation requirements:
- New connections use the latest identity immediately after reload.
- Existing connections are reauthenticated on stream boundaries or at a configurable maximum connection age.
- Expired identities are not accepted.
- Bundle removal revokes future handshakes.
- Rotation failures emit critical telemetry.
5. Transport Security
5.1 gRPC Transports
gNMI, gNSI, and internal gRPC APIs MUST use mTLS with SPIFFE identity verification.
Requirements:
- TLS 1.3 required by default.
- TLS 1.2 disabled by default and only allowed by explicit compatibility profile.
- Peer certificate SAN MUST contain a valid SPIFFE URI.
- Common Name MUST NOT be used for authorization.
- ALPN and service/method authorization MUST be enforced.
- Certificates MUST be validated against active SPIFFE bundles, not system web PKI.
5.2 Cipher Suites
Default modern profile:
TLS_AES_256_GCM_SHA384TLS_CHACHA20_POLY1305_SHA256
FIPS profile:
- MUST use a FIPS 140-3 validated module and only approved algorithms.
- MUST document any difference from the modern profile.
- MUST disable algorithms not available through the validated boundary.
The SDK MUST expose the selected security profile in metrics and evidence.
5.3 NETCONF over SSH
If NETCONF/SSH is enabled:
- SSH host keys MUST be generated or provisioned through the security substrate.
- Client identity MUST map to a
TrustedPrincipal. - Password authentication MUST be disabled by default.
- SSH certificate authorities SHOULD be used when SPIFFE-native SSH identity is unavailable.
- SSH authorization MUST flow through the same NACM engine as gNMI.
6. Authorization
6.1 Principal Model
#![allow(unused)] fn main() { pub struct TrustedPrincipal { pub identity: WorkloadIdentity, pub tenant: TenantId, pub roles: Vec<Role>, pub groups: Vec<Group>, pub auth_strength: AuthStrength, } }
Roles and groups MUST come from signed policy or trusted identity attributes. They MUST NOT be accepted from unsigned client metadata.
6.2 Policy Layers
Authorization is evaluated in this order:
- Transport and peer authentication.
- Trust domain allowlist.
- Tenant boundary check.
- gRPC service/method authorization.
- NACM/YANG path authorization.
- Operation-specific guardrails, such as break-glass or key export denial.
Any deny at any layer is final unless a governed break-glass flow applies.
6.3 NACM Requirements
NACM MUST authorize:
readcreateupdatereplacedeleteexecsubscribesecurity-admin
The engine MUST evaluate all changed paths after patch expansion. It is not enough to authorize the request's root path.
Authorization decisions SHOULD be cached by:
- principal digest
- tenant
- policy version
- normalized path
- action
Cache entries MUST be invalidated on policy updates and SVID rotation.
6.4 Multi-Tenant Boundary
Cross-tenant access is denied by default. A principal from tenant A MUST NOT
read or mutate tenant B config, session state, keys, or audit records unless a
federated policy explicitly grants a scoped operation.
The tenant boundary MUST be enforced in:
- identity parsing
- authorization
- persistence key namespace
- session key namespace
- audit query filters
- telemetry labels, with cardinality controls
- operator reconciliation
7. gNSI Services
The SDK MUST provide server-side support for:
| Service | Purpose | SDK Component |
|---|---|---|
gnsi.certz.v1 | Certificate and trust material distribution | opc-gnsi-server |
gnsi.pathz.v1 | Path authorization policy | opc-nacm |
gnsi.authz.v1 | gRPC service/method authorization | opc-nacm |
gNSI endpoints are security-critical. Access MUST require security-admin or a
more specific role. gNSI mutations MUST be audited and persisted through the
shadow-security store from RFC 001.
7.1 Shadow Security Store
Security material pushed through gNSI is stored in shadow-security.
Rules:
- Not visible through ordinary gNMI
Get. - Exportable only through explicitly authorized security APIs.
- Encrypted at rest with a distinct key purpose from normal config.
- Included in backup only when backup policy allows secret material.
- Redacted in audit and telemetry.
7.2 Policy Staging
Authorization policy updates MUST support validate-only and staged apply. A policy that would lock out all security administrators MUST be rejected unless a break-glass recovery policy exists.
8. Break-Glass
Break-glass is dangerous and MUST be treated as an exceptional workflow, not a convenience override.
Requirements:
- Disabled by default in production profiles unless explicitly enabled.
- Requires a high-assurance principal.
- Requires reason, ticket/reference, requested scope, and duration.
- Maximum default duration: 15 minutes.
- Requires dual authorization or an externally signed emergency token in carrier profiles.
- Cannot bypass cryptographic verification, tenant boundary, or audit logging.
- Cannot export raw key material unless a separate key recovery policy allows it.
- Emits critical audit events at start, use, and expiry.
- Emits high-priority telemetry.
Break-glass must grant the narrowest possible action set and path set.
9. Key Management
9.1 Key Hierarchy
The SDK uses purpose-separated keys:
| Purpose | Example Use |
|---|---|
config | RFC 001 encrypted config blobs |
shadow-security | gNSI security material |
session | RFC 004 session store data |
audit | HMAC hash chains |
backup | encrypted export bundles |
Keys MUST be separated by KMS key ID or HKDF info labels. Reusing one raw key
for multiple purposes is forbidden.
9.2 Key Sources
Production profiles MUST obtain root or wrapping keys from one of:
- KMS plugin.
- HSM plugin.
- Kubernetes Secret encrypted by a cluster KMS provider, only for lower assurance profiles.
- SPIRE/SVID-authenticated key service.
Environment variables are forbidden for production key material.
9.3 Key Lookup API
#![allow(unused)] fn main() { #[async_trait::async_trait] pub trait KeyProvider: Send + Sync { async fn get_active_key(&self, purpose: KeyPurpose, tenant: &TenantId) -> Result<KeyHandle, KeyError>; async fn get_key_by_id(&self, key_id: &KeyId) -> Result<KeyHandle, KeyError>; async fn rotate_key(&self, purpose: KeyPurpose, tenant: &TenantId) -> Result<KeyId, KeyError>; } }
KeyHandle MUST avoid exposing raw bytes unless required by the crypto module.
If raw bytes are materialized, they MUST be zeroized after use where the crypto
backend permits.
9.4 Rotation
Key rotation MUST support:
- New writes using the active key.
- Old reads using key ID from the envelope.
- Optional background re-encryption.
- Retention windows.
- Emergency key revocation.
If a key is unavailable, the SDK MUST fail closed for writes and for reads that require the missing key.
10. AEAD Envelope Encryption
10.1 Default Profile
Default persistent encryption uses AES-256-GCM-SIV for misuse resistance.
Nonce reuse is still a bug and MUST be monitored.
10.2 FIPS Profile
Some FIPS validated modules may not expose AES-GCM-SIV. A FIPS profile MAY use
AES-256-GCM only when:
- Nonces are generated by a validated DRBG or deterministic counter scheme.
- Nonce uniqueness is guaranteed per key.
- The uniqueness state is crash-safe.
- Tests prove duplicate nonce detection.
The active AEAD algorithm MUST be recorded in each envelope and in RFC 006 evidence.
10.3 Associated Data
AAD MUST bind ciphertext to:
- tenant
- purpose
- tx/session identifier
- schema digest or state type
- key ID
- version
- principal, for config commits
AAD mismatch MUST produce a generic integrity error without exposing which field failed.
10.4 Replay and Rollback
Encryption alone does not prevent replay of an old valid blob. The management store MUST enforce monotonic config versions as specified in RFC 001. Session store backends MUST use generation numbers or lease fencing as specified in RFC 004.
11. Audit Security
11.1 Hash Chain
Audit records MUST include:
entry_hmac = HMAC(audit_key, tenant || sequence || canonical_entry || previous_hash)
The hash chain MUST be tenant-scoped and purpose-separated. Startup MUST verify the local audit chain unless the operator explicitly configures degraded recovery mode.
11.2 External Audit Sink
Carrier profiles SHOULD stream audit events to an external append-only system. Local SQLite audit is necessary for recovery and debugging but is not sufficient against host-level compromise.
11.3 Time
Audit timestamps MUST use UTC. The SDK SHOULD record both wall-clock timestamp and monotonic sequence number. Security decisions MUST NOT rely only on wall clock when monotonic ordering is required.
12. Redaction
The redaction subsystem consumes metadata generated by RFC 002.
Redaction MUST apply to:
Debug- structured logs
- audit records
- metrics labels
- error messages
- traces
- panic hooks where possible
- gNMI read responses after NACM filtering
Redaction MUST preserve enough information for debugging, such as value presence, length class, or stable digest when explicitly allowed by policy.
13. Observability
Required metrics:
opc_security_authn_total{outcome,reason,transport}opc_security_authz_total{outcome,reason,action}opc_security_svid_expires_secondsopc_security_bundle_versionopc_security_rotation_total{kind,outcome}opc_security_key_lookup_total{purpose,outcome}opc_security_breakglass_activeopc_security_breakglass_total{outcome}opc_security_audit_chain_verify_total{outcome}opc_security_redactions_total{source}
Metrics MUST control label cardinality. Raw SPIFFE IDs SHOULD be exposed through logs, not high-cardinality metrics, unless explicitly enabled.
14. Module Ownership
| Module | Responsibility |
|---|---|
opc-identity | SPIFFE ID parsing, SVID watch, trust bundle watch |
opc-tls | TLS acceptor/client reload and peer extraction |
opc-authz | Principal, roles, method policy, decision cache |
opc-nacm | YANG path authorization and RFC 8341 semantics |
opc-gnsi-server | gNSI service handlers and staged policy apply |
opc-key | KeyProvider trait and KMS/HSM adapters |
opc-crypto | AEAD envelopes and key derivation |
opc-redaction | Secret metadata and safe rendering |
opc-audit | HMAC chain, external sink adapter |
opc-security-testkit | fake SPIRE, fake KMS, policy fixtures |
Agents must not mix transport identity parsing with NACM path logic. Each module should have deterministic test fixtures and no hidden global state.
15. Testing Requirements
15.1 Unit Tests
- SPIFFE ID parser accepts valid pattern and rejects malformed identities.
- Federation allowlist denies unknown trust domains.
- Authorization cache invalidates on policy version change.
- NACM denies missing rules.
- Redaction covers generated secret fields.
- AEAD envelope rejects wrong AAD, wrong key, corrupted tag, and wrong tenant.
- Break-glass scope and TTL enforcement.
15.2 Integration Tests
- SVID rotation without process restart.
- Trust bundle rotation revokes removed trust domain.
- gNSI policy staging and rollback.
- Management commit rejected after NACM policy update removes permission.
- Shadow-security store not visible through ordinary gNMI
Get. - Key rotation reads old commits and writes new commits.
- External audit sink outage does not drop local audit.
15.3 Fault Injection
- SPIRE socket unavailable.
- Expired SVID.
- Corrupt trust bundle.
- KMS timeout.
- Missing historical key.
- Duplicate AEAD nonce detector trigger, when applicable.
- Audit HMAC mismatch.
- Break-glass token replay.
15.4 Performance Gates
- Authorization decision cache p99 under 50 microseconds for hot entries.
- TLS reload completes without blocking new accepts longer than 100 milliseconds on reference hardware.
- Key lookup cache hit p99 under 25 microseconds.
- Redaction of a 10 MiB config audit diff completes within configured commit budget.
16. Acceptance Criteria
This RFC is implemented when:
- Every management connection is authenticated with SPIFFE-aware mTLS or an explicitly configured SSH identity profile.
- Tenant identity is explicit and enforced across authz, persistence, audit, and telemetry.
- gNSI services can stage, validate, apply, audit, and roll back security policy.
- Config, shadow-security, session, and audit keys are purpose-separated and rotatable.
- AEAD envelopes bind ciphertext to tenant, purpose, version, and schema/state metadata.
- Break-glass is scoped, time-limited, audited, and disabled by default in production unless carrier policy enables it.
- Security failure modes fail closed and are covered by fault injection tests.
OPC-SDK-RFC-004: High-Performance Session Store
Status: Draft for Implementation
Version: 2.0.0
Date: 2026-05-19
Audience: SDK implementers, NF owners, data-plane engineers, reliability engineers
1. Abstract
This RFC defines opc-session-store, the SDK substrate for high-rate network
function state such as PDU sessions, PFCP associations, TEID mappings, QoS flow
state, handover coordination metadata, and data-plane derived counters that
need controlled persistence.
The initial draft correctly identified the need for partitioning, local-first operation, and distributed leases. It was not strict enough for 5G continuity: last-writer-wins based on synchronized clocks is not safe for authoritative session state. This version requires monotonic fencing tokens, compare-and-set updates, owner epochs, explicit handover state transitions, and a documented consistency model per data class.
2. Scope
2.1 In Scope
- Per-session control-plane state needed by AMF, SMF, UPF, and related NFs.
- Data-plane lookup state that can be safely snapshotted or reconstructed.
- Lease and fencing mechanisms for single-owner session mutation.
- Local cache and distributed backend abstraction.
- Geo-redundant replication for disaster recovery and warm standby.
- Serialization, encryption, integrity, TTL, metrics, and fault injection.
2.2 Out of Scope
- Configuration management. See RFC 001.
- Packet parsing and protocol codecs. See RFC 005.
- Full 3GPP procedure implementation. This RFC provides storage primitives and state-machine support used by NF-specific procedure logic.
- Hard real-time packet forwarding in the remote store. Packet fast paths must use local data-plane structures.
3. Design Goals
3.1 Security
- Encrypt session state before it leaves process memory unless the backend is explicitly trusted by profile.
- Bind encrypted records to tenant, NF kind, session key, generation, and state type through AEAD AAD.
- Prevent stale owners from overwriting newer session state.
- Prevent cross-tenant key collision or data exposure.
- Redact SUPI/GPSI and other subscriber identifiers in logs by default.
3.2 Performance
- Keep packet forwarding off the remote store path.
- Support 100,000+ session updates/second per NF replica for local in-memory or batched backend profiles.
- Keep hot read p99 below 1 ms for local-cluster operations where the selected backend can meet it.
- Provide bounded allocation and zero-copy or low-copy decode for common session reads.
- Support batching, pipelining, and async replication without sacrificing fencing correctness.
3.3 Maintainability
- Separate storage API, lease API, serialization, encryption, and replication.
- Require backend capability declarations so NF code does not assume semantics a backend cannot provide.
- Use typed session records instead of arbitrary blobs at module boundaries.
- Provide a deterministic testkit for split-brain, failover, and handover races.
3.4 Functionality
- Support create, get, update, delete, compare-and-set, TTL refresh, lease, renew, release, snapshot, and replication.
- Support session handover prepare/activate/abort flows.
- Support backend implementations for in-memory, Redis, Aerospike, and optional strongly consistent stores.
- Support region-aware replication and recovery.
4. State Classes
The SDK distinguishes state by consistency need:
| Class | Examples | Consistency Requirement |
|---|---|---|
authoritative-session | PDU session owner, AMF/SMF ownership, handover phase | Single writer with fencing |
dataplane-lookup | TEID to session mapping, FAR/QER/PDR snapshots | Local atomic snapshot, rebuildable |
replicated-dr | Warm standby copy of session records | Async, ordered by generation |
telemetry-derived | Counters, rates, last seen timestamps | Mergeable or lossy |
ephemeral-procedure | Temporary handover transaction state | TTL, fenced owner |
Only telemetry-derived state may use last-writer-wins based on timestamps.
Authoritative session state MUST NOT use wall-clock LWW.
5. Session Identity
Session keys MUST be tenant-scoped and type-scoped:
#![allow(unused)] fn main() { pub struct SessionKey { pub tenant: TenantId, pub nf_kind: NetworkFunctionKind, pub key_type: SessionKeyType, pub stable_id: bytes::Bytes, } }
Examples:
- SUPI-derived subscriber context key.
- PDU session ID plus SUPI hash.
- TEID mapping key.
- PFCP session SEID key.
- Handover transaction key.
Raw SUPI/GPSI MUST NOT be used directly as a backend key in production. The SDK SHOULD derive stable keys with a tenant-specific keyed hash.
6. Backend Capability Model
The initial get/set/delete trait is too weak. Backends MUST declare
capabilities:
#![allow(unused)] fn main() { pub struct BackendCapabilities { pub atomic_compare_and_set: bool, pub monotonic_fencing_token: bool, pub per_key_ttl: bool, pub server_side_lease_expiry: bool, pub ordered_replication_log: bool, pub batch_write: bool, pub watch: bool, pub max_value_bytes: usize, } }
Carrier profiles MUST reject a backend for authoritative-session state unless
it supports atomic compare-and-set and monotonic fencing tokens or an adapter
can provide equivalent semantics.
7. Storage API
#![allow(unused)] fn main() { #[async_trait::async_trait] pub trait SessionBackend: Send + Sync { async fn capabilities(&self) -> BackendCapabilities; async fn get(&self, key: &SessionKey) -> Result<Option<StoredSessionRecord>, StoreError>; async fn compare_and_set(&self, op: CompareAndSet) -> Result<CompareAndSetResult, StoreError>; async fn delete_fenced(&self, key: &SessionKey, fence: FenceToken) -> Result<(), StoreError>; async fn refresh_ttl(&self, key: &SessionKey, fence: FenceToken, ttl: Duration) -> Result<(), StoreError>; async fn batch(&self, ops: Vec<SessionOp>) -> Result<Vec<SessionOpResult>, StoreError>; } }
set without fencing is allowed only for state classes that explicitly do not
require authoritative ownership.
8. Record Format
#![allow(unused)] fn main() { pub struct StoredSessionRecord { pub key: SessionKey, pub generation: Generation, pub owner: OwnerId, pub fence: FenceToken, pub state_class: StateClass, pub state_type: StateType, pub expires_at: Option<Timestamp>, pub payload: EncryptedSessionPayload, } }
generation is a monotonic per-session version. Every authoritative update
MUST increment it atomically.
9. Lease and Fencing
9.1 Lease API
#![allow(unused)] fn main() { #[async_trait::async_trait] pub trait SessionLeaseManager: Send + Sync { async fn acquire(&self, key: &SessionKey, owner: OwnerId, ttl: Duration) -> Result<LeaseGuard, LeaseError>; async fn renew(&self, lease: &LeaseGuard, ttl: Duration) -> Result<LeaseGuard, LeaseError>; async fn release(&self, lease: LeaseGuard) -> Result<(), LeaseError>; } pub struct LeaseGuard { pub key: SessionKey, pub owner: OwnerId, pub fence: FenceToken, pub acquired_at: Timestamp, pub expires_at: Timestamp, } }
9.2 Fencing Rules
Every successful lease acquisition MUST produce a monotonic fencing token for that session key. Backends MUST reject any write with a token lower than the current recorded token.
This prevents an old owner whose lease expired during a pause or partition from overwriting a newer owner after it resumes.
9.3 Lease Expiry
Lease expiry alone is not correctness. It is only a liveness mechanism. Safety comes from fencing.
Rules:
- Lease TTL MUST be longer than worst-case expected procedure pause plus backend failover detection time.
- Renewals MUST happen before 50 percent of TTL elapsed by default.
- A failed renewal MUST stop authoritative writes immediately.
- Owners MUST treat unknown lease state as lost.
- Stale writes MUST fail with a distinct
StaleFenceerror.
9.4 Backend Notes
- Redis implementations MUST use atomic Lua scripts or equivalent server-side transactions for acquire, renew, and fenced CAS. Redis deployments that can lose acknowledged writes during failover MUST NOT be used for strict authoritative state without an external consensus/fencing source.
- Aerospike implementations SHOULD use generation checks and record UDF or transaction mechanisms where available.
- In-memory backend is for single-process tests or single-replica development unless paired with a consensus lease manager.
- Strongly consistent stores may be used for leases even when bulk state is in a faster backend.
10. 3GPP Session Continuity and Handover
10.1 Storage Guarantees Needed by Handover
5G handover procedures require avoiding duplicate authoritative writers while preserving continuity of PDU session and bearer/QoS state. The store must support:
- Idempotent procedure steps.
- Prepared-but-not-active state.
- Activation with a fencing token.
- Abort/rollback of prepared handover.
- Recovery after source or target NF restart.
- Detection of stale source updates after target activation.
A lease mechanism without fencing is not sufficient.
10.2 Handover State Machine
The SDK provides generic storage states:
#![allow(unused)] fn main() { pub enum HandoverPhase { Stable, Preparing { tx: HandoverTxId, target: OwnerId }, Prepared { tx: HandoverTxId, target: OwnerId }, Activating { tx: HandoverTxId, target: OwnerId }, Active { owner: OwnerId }, Aborting { tx: HandoverTxId }, } }
NF-specific AMF/SMF/UPF logic maps 3GPP procedure messages to these states.
10.3 Procedure Rules
The session store MUST support these generic steps:
- Source owner holds a valid lease.
- Source creates
Preparingrecord with current generation. - Target acquires or is assigned a higher fence for activation.
- Target writes
Preparedwith expected generation. - Activation performs a fenced CAS to
Active { owner: target }. - Source updates with old fence are rejected.
- Abort performs a fenced CAS back to
Stableif activation did not complete.
All steps MUST be idempotent by HandoverTxId.
10.4 Packet Continuity
The session store does not itself guarantee zero packet loss. It provides the state consistency needed by NFs to implement make-before-break, buffering, or tunnel switching. NF-specific procedures MUST state their packet continuity behavior and evidence in RFC 006 reports.
11. Geo-Redundancy
11.1 Corrected Consistency Model
Asynchronous geo-replication is suitable for disaster recovery and warm standby. It is not sufficient for strict active/active mutation of the same authoritative session unless a higher-level single-owner protocol is used.
Authoritative state MUST use one of:
- Home-region ownership per session.
- Explicit ownership transfer with fencing.
- A strongly consistent multi-region backend, if the deployment accepts the latency cost.
Wall-clock last-writer-wins is forbidden for authoritative session state.
11.2 Replication Log
Backends SHOULD expose an ordered replication log:
#![allow(unused)] fn main() { pub struct ReplicationEvent { pub key: SessionKey, pub generation: Generation, pub fence: FenceToken, pub state_class: StateClass, pub payload_digest: Sha256Digest, pub encrypted_payload: EncryptedSessionPayload, } }
Replicas MUST apply events only if generation and fence are newer according
to the state class rules.
11.3 RPO and RTO
Every deployment profile MUST publish:
- Recovery point objective for session state.
- Recovery time objective for session service.
- Maximum tolerated replication lag.
- Which state classes are replicated.
- Which state classes are rebuildable.
12. Serialization
Rust has no garbage collector, so the goal is allocation, CPU, and cache efficiency rather than "GC pressure" reduction.
12.1 Formats
Allowed formats:
- FlatBuffers for read-mostly zero-copy records.
- Prost/Protobuf for compatibility, with careful allocation profiling.
- Postcard or bincode-like formats only for internal state with stable version policy.
Each state type MUST define:
- schema version
- compatibility policy
- max encoded size
- fuzz target
- migration path
12.2 Decode Rules
Decoders MUST:
- Validate length prefixes and offsets.
- Reject trailing garbage unless explicitly allowed.
- Avoid borrowing data beyond the lifetime of the source buffer.
- Avoid panics on corrupt data.
- Support partial decode for lookup keys where useful.
13. Local Cache
The SDK SHOULD provide a two-level model:
- Local in-process cache for hot reads.
- Distributed backend for ownership, recovery, and replication.
Cache entries MUST include generation and fence. Stale cache entries MUST NOT be used for authoritative writes. Data-plane lookup snapshots SHOULD be updated through atomic swap or RCU-like mechanisms.
Cache invalidation options:
- backend watch stream
- polling by generation
- explicit publish from owner
- TTL expiry
NF owners must choose a cache mode per state class.
14. Security
14.1 Encryption
Session payloads MUST be encrypted before storage unless the profile explicitly marks the backend as inside the same cryptographic boundary.
AAD MUST include:
- tenant
- NF kind
- session key digest
- state type
- generation
- fence
- backend namespace
14.2 Integrity
AEAD integrity is required. Additional MAC fields MAY be used for backends that need independent integrity checks, but they do not replace AEAD.
14.3 Privacy
Logs and metrics MUST NOT expose raw subscriber identifiers. The SDK SHOULD use stable keyed digests for correlation when needed.
15. Observability
Required metrics:
opc_session_store_ops_total{op,state_class,outcome}opc_session_store_latency_seconds{op,state_class}opc_session_store_cas_conflicts_total{state_class}opc_session_store_stale_fence_total{state_class}opc_session_lease_acquire_total{outcome}opc_session_lease_renew_total{outcome}opc_session_lease_lost_total{reason}opc_session_replication_lag_seconds{region}opc_session_cache_hit_ratio{state_class}opc_session_record_bytes{state_type}
Required logs for state transitions:
session_key_digesttenantstate_classgenerationfenceownerhandover_tx_id, when applicableoutcome
Raw subscriber identifiers MUST be redacted.
16. Module Ownership
| Module | Responsibility |
|---|---|
opc-session-model | Keys, record headers, generations, state classes |
opc-session-backend | Backend trait and capability model |
opc-session-lease | Lease manager and fencing rules |
opc-session-cache | Local cache and snapshot publication |
opc-session-codec | Session serialization and migrations |
opc-session-crypto | Payload envelope integration with RFC 003 |
opc-session-replication | Region log and apply rules |
opc-handover | Generic handover storage state machine |
opc-session-testkit | Fake backend, split-brain tests, stale fence tests |
Agents implementing backends must not modify NF-specific handover logic. Agents implementing handover logic must use the public lease/CAS APIs and not bypass fencing.
17. Testing Requirements
17.1 Unit Tests
- Session key tenant separation.
- CAS success and conflict.
- Stale fence rejection.
- Lease acquire/renew/release.
- TTL refresh with valid and stale fences.
- Serialization corrupt input rejection.
- AEAD AAD mismatch rejection.
- Cache generation checks.
17.2 Integration Tests
- Two owners racing for the same session.
- Owner pause beyond TTL, new owner writes, old owner resumes and is rejected.
- Handover prepare/activate/abort idempotency.
- Backend restart with leases recovered or invalidated according to profile.
- Geo-replication applies newer generation and rejects older generation.
- Cache invalidation after remote update.
17.3 Fault Injection
- Backend timeout.
- Partial batch failure.
- Redis/Aerospike failover.
- Clock skew.
- Network partition between owners and backend.
- Replication lag spike.
- Corrupt encrypted payload.
- Missing session key decryption key.
17.4 Performance Gates
Profiles must state which backend they apply to. Minimum SDK reference gates:
- Local cache read p99 under 50 microseconds.
- In-memory fenced CAS p99 under 100 microseconds.
- Backend adapter exposes measured p50/p99 for get, CAS, lease acquire, and renew.
- 100,000 updates/second per replica for in-memory or batched local profile.
- No packet fast-path benchmark depends on remote backend availability.
18. Acceptance Criteria
This RFC is implemented when:
- Authoritative session writes require monotonic fencing and CAS.
- Stale owners cannot overwrite newer session state after lease expiry.
- Handover state transitions are idempotent and recoverable.
- Geo-replication does not use wall-clock LWW for authoritative state.
- Backend capabilities are declared and enforced by profile.
- Session payloads are encrypted and tenant-bound.
- Local cache supports fast reads without compromising write correctness.
- Fault injection covers split-brain, failover, replication lag, and stale fences.
OPC-SDK-RFC-005: Zero-Copy Protocol Framework
Status: Draft for Implementation
Version: 2.0.0
Date: 2026-05-19
Audience: SDK implementers, protocol crate authors, fuzzing engineers, NF teams
1. Abstract
This RFC defines the protocol codec framework for OpenPacketCore. It covers zero-copy parsing, encoding, lifetime discipline, allocation budgets, parser security, fuzzing, conformance tags, and implementation layout for 3GPP and IETF protocol crates.
The initial draft correctly required nom, bytes, fuzzing, and exact spec
citations. It was incomplete in two areas: the codec trait did not express
borrowed lifetimes safely, and the round-trip property was too simplistic for
protocols with canonical encodings, unknown fields, padding, or lossy
normalization. This version corrects those issues.
2. Scope
2.1 In Scope
- Binary protocol parsing and encoding.
- Borrowed zero-copy PDU views.
- Owned conversion for async and cross-thread use.
- Length, bounds, recursion, and integer safety.
- Fuzzing, property tests, and corpus management.
- Spec traceability for RFC 006.
- Protocol crate layout and module boundaries.
2.2 Out of Scope
- Management config projection. See RFC 002.
- Session persistence. See RFC 004.
- Full NF procedure state machines.
- Kernel bypass packet I/O frameworks, except for buffer ownership contracts.
3. Design Goals
3.1 Security
- No out-of-bounds reads or writes.
- No panics on untrusted input.
- No unbounded recursion, loops, allocation, or CPU use from hostile packets.
- Constant-time comparison for secrets, MACs, authentication tags, and keys.
- Strict validation of length fields, IE cardinality, duplicate handling, and unknown critical elements.
3.2 Performance
- Parse common fast-path headers without heap allocation.
- Avoid copying payloads where a borrowed view is sufficient.
- Encode into caller-provided buffers with exact or bounded capacity planning.
- Support partial decode when only routing keys are needed.
- Provide per-protocol allocation and latency budgets.
3.3 Maintainability
- Each protocol crate uses the same module layout.
- Every message and field cites the exact spec section/table.
- Parser errors are structured and stable.
- Unsafe code is forbidden by default.
- Generated tables are separated from hand-written parser logic.
3.4 Functionality
- Support borrowed and owned message representations.
- Support streaming/incomplete input where protocols require reassembly.
- Support extension headers and unknown IE preservation when required.
- Support canonical encoding and raw-preserving encoding modes.
4. Parsing Model
4.1 Borrowed Views
Protocol decoders SHOULD return borrowed views over the input buffer:
#![allow(unused)] fn main() { pub struct GtpHeader<'a> { pub flags: u8, pub msg_type: u8, pub length: u16, pub teid: u32, pub payload: &'a [u8], } }
Borrowed views MUST NOT outlive the input buffer. They MUST NOT store pointers into mutable buffers that can be changed while the view exists.
4.2 Owned Messages
Every borrowed PDU that may cross an async boundary, thread boundary, queue, or long-lived store MUST provide an owned conversion:
#![allow(unused)] fn main() { pub trait ToOwnedPdu { type Owned; fn to_owned_pdu(&self) -> Self::Owned; } }
Owned PDUs MAY use bytes::Bytes to retain cheap shared ownership of the
original packet.
4.3 No Self-Referential Types
Generated or hand-written protocol structs MUST NOT be self-referential. If a message needs both raw bytes and parsed fields, use either:
- borrowed view tied to external input lifetime, or
- owned
Bytesplus offsets validated at construction.
5. Codec Traits
The SDK defines separate traits for borrowed decode, owned decode, and encode.
#![allow(unused)] fn main() { pub type DecodeResult<'a, T> = Result<(&'a [u8], T), DecodeError>; pub trait BorrowDecode<'a>: Sized { fn decode(input: &'a [u8], ctx: DecodeContext) -> DecodeResult<'a, Self>; } pub trait OwnedDecode: Sized { fn decode_owned(input: bytes::Bytes, ctx: DecodeContext) -> Result<Self, DecodeError>; } pub trait Encode { fn encode(&self, dst: &mut bytes::BytesMut, ctx: EncodeContext) -> Result<(), EncodeError>; fn wire_len(&self, ctx: EncodeContext) -> Result<usize, EncodeError>; } }
This avoids pretending that a borrowed PDU can be represented by a lifetime-free
Self.
5.1 Decode Context
#![allow(unused)] fn main() { pub struct DecodeContext { pub protocol_version: ProtocolVersion, pub max_depth: usize, pub max_ies: usize, pub max_message_len: usize, pub unknown_ie_policy: UnknownIePolicy, pub duplicate_ie_policy: DuplicateIePolicy, pub validation_level: ValidationLevel, } }
Protocol crates MUST define safe defaults.
5.2 Error Model
#![allow(unused)] fn main() { pub struct DecodeError { pub code: DecodeErrorCode, pub offset: usize, pub spec_ref: Option<SpecRef>, } }
Errors MUST be safe to expose in logs. They MUST NOT include raw packet payload unless debug packet capture is explicitly enabled.
6. nom Usage
nom is the default parser combinator framework for binary TLV, bitfield, and
header-oriented protocols.
Rules:
- Use
nom::number::completeornom::number::streamingdeliberately. - Map
nom::Err::Incompleteto a structured incomplete-input error. - Do not discard remaining input unless the message definition allows trailing padding.
- Wrap
nomerrors at module boundaries; do not expose combinator internals in public API. - Prefer small named parser functions over deeply nested combinator expressions.
Protocols based on ASN.1 PER, JSON, HTTP/2, or other specialized encodings MAY
use proven dedicated parsers instead of nom, but they must implement the same
SDK codec, error, fuzzing, and evidence contracts.
7. Buffer Management
Encoders MUST use bytes::BytesMut or bytes::BufMut.
Encoding rules:
wire_lenMUST use checked arithmetic.encodeMUST fail before writing if required capacity exceeds configured maximum.- Encoders SHOULD reserve exact capacity when cheap to compute.
- Encoders MUST produce canonical output unless raw-preserving mode is selected.
- Partial writes on error SHOULD be avoided. If unavoidable, document the behavior and do not reuse the buffer without caller awareness.
8. Allocation Budgets
Each protocol crate MUST define an allocation profile:
#![allow(unused)] fn main() { pub struct AllocationBudget { pub decode_heap_allocations_fast_path: usize, pub decode_max_temporary_bytes: usize, pub encode_max_temporary_bytes: usize, } }
Default fast-path target:
- Fixed header decode: 0 heap allocations.
- Routing-key partial decode: 0 heap allocations.
- Full message decode: protocol-specific, bounded.
Variable IE lists SHOULD use:
- iterators over borrowed IE views,
smallvecfor small bounded lists,- caller-provided scratch buffers, or
- validated owned vectors when required.
9. Security Invariants
9.1 Length and Offset Safety
All length calculations MUST use checked arithmetic. Parsers MUST verify:
- field length is within remaining input,
- nested IE length does not exceed parent length,
- padding length is valid,
- extension header chains terminate,
- total parsed elements do not exceed
max_ies, - recursion or nesting does not exceed
max_depth.
9.2 Integer Safety
All offset, length, and capacity calculations MUST use:
checked_addchecked_subchecked_mulusize::try_from
Integer truncation with as is forbidden in parser and encoder length paths.
9.3 Constant-Time Operations
Constant-time comparison is REQUIRED for:
- MACs
- authentication tags
- keys
- nonces when secrecy or oracle behavior matters
- authentication tokens
Checksums over public packet data do not require constant-time comparison, but checksum parsing must still be bounds-safe and panic-free.
9.4 Denial of Service Controls
Every decoder MUST enforce:
- maximum message length,
- maximum IE count,
- maximum nesting depth,
- maximum extension chain length,
- maximum decompressed length if compression exists,
- maximum parse time indirectly through bounded loops.
Protocol crates MUST expose these limits through profile configuration.
10. Validation Levels
The decoder supports levels:
#![allow(unused)] fn main() { pub enum ValidationLevel { HeaderOnly, Structural, Strict, ProcedureAware, } }
HeaderOnly: parse enough for routing.Structural: verify lengths and container structure.Strict: enforce field cardinality, enum ranges, and critical IE rules.ProcedureAware: call NF-specific semantic validators.
Data-plane fast paths SHOULD use the minimum level needed for safe routing and leave expensive semantic validation to control-plane paths where appropriate.
11. Unknown and Duplicate Elements
Protocol crates MUST define:
- Unknown IE behavior.
- Duplicate IE behavior.
- Critical/mandatory IE behavior.
- Extension preservation behavior.
If a protocol requires preserving unknown elements for forwarding or round-trip, the borrowed view MUST expose raw slices and owned conversion MUST retain them.
12. Round-Trip Properties
The simplistic property encode(decode(input)) == input is not universally
valid. The SDK requires three properties:
12.1 Canonical Round Trip
For generated valid model values:
decode(encode(model)) == model
12.2 Raw-Preserving Round Trip
For accepted inputs where unknown/padding preservation is enabled:
encode_raw_preserving(decode_raw_preserving(input)) == input
12.3 Reject Stability
For rejected inputs, the decoder returns a structured error and never panics, hangs, or allocates beyond budget.
13. Fuzzing
Every protocol crate MUST include fuzz targets for:
- full decode,
- header-only decode,
- encode after generated model mutation,
- round-trip properties,
- length and extension chains,
- security fields where applicable.
Fuzz gates SHOULD be time and coverage based, not only iteration-count based. Minimum admission gate:
- 30 minutes sanitizer-enabled fuzzing per new parser target in CI or nightly.
- 1,000,000 generated cases for property tests where practical.
- All crashes minimized and committed as regression tests.
Required sanitizers where supported:
- AddressSanitizer for native dependencies.
- UndefinedBehaviorSanitizer for C/C++ parser dependencies.
- Miri for unsafe Rust, if any unsafe exception is approved.
14. Spec Traceability
Every public PDU, IE, field enum, and procedure-relevant constant MUST cite:
- standards body,
- document number,
- release or revision where applicable,
- section,
- table or figure where applicable,
- conformance status.
Example:
#![allow(unused)] fn main() { /// @3gpp TS 29.281 Release 18, Section 5.1, Table 5.1-1 /// @conformance full pub struct Gtpv1uHeader<'a> { ... } }
These tags feed RFC 006 evidence extraction.
15. Protocol Crate Layout
Each protocol crate MUST use:
crates/opc-proto-<name>/
src/
lib.rs
error.rs
context.rs
header.rs
ie.rs
message.rs
parser.rs
encode.rs
validate.rs
spec.rs
generated/
tables.rs
tests/
corpus.rs
roundtrip.rs
conformance.rs
fuzz/
fuzz_targets/
decode.rs
header.rs
roundtrip.rs
For protocols without IEs, ie.rs may be omitted. Generated tables MUST live
under generated/ and be reproducible.
16. Implementation Contracts
Contributors implementing protocol crates MUST follow these rules:
- Start from
spec.rsconstants and conformance tags. - Implement
error.rsandcontext.rsbefore parser logic. - Implement header parsing before full message parsing.
- Add fuzz target with the first parser.
- Do not add
unsafe. - Do not use
unwrap,expect, or indexing on untrusted input. - Keep parser functions small and named after spec structures.
- Add one regression test per newly handled malformed input class.
Agents may work independently on:
- header parser,
- IE parser,
- encoder,
- validation,
- fuzz/test corpus,
- generated spec tables.
17. Testing Requirements
17.1 Unit Tests
- Minimum and maximum length messages.
- Truncated input at every byte position for fixed headers.
- Invalid enum values.
- Duplicate IE policies.
- Unknown IE policies.
- Extension header chain termination.
- Checked arithmetic overflow cases.
17.2 Integration Tests
- Decode real capture fixtures.
- Encode/decode canonical known-good messages.
- Partial decode for routing keys.
- Owned conversion across async boundary.
- Protocol-specific strict validation.
17.3 Performance Tests
Each protocol crate MUST benchmark:
- header-only decode,
- full structural decode,
- strict validation,
- encode,
- owned conversion.
Benchmarks MUST report:
- p50/p99 latency,
- heap allocations,
- bytes copied,
- throughput in messages/second.
17.4 Negative Corpus
Every parser MUST maintain a negative corpus:
- truncated,
- overlong,
- nested too deep,
- duplicate mandatory fields,
- unknown critical fields,
- invalid length,
- invalid padding,
- integer overflow candidate.
18. Acceptance Criteria
This RFC is implemented when:
- Borrowed decoders express lifetimes safely and owned conversion is available.
- Fast-path header decode is allocation-free for supported protocols.
- All length and offset math is checked.
- Decoders reject hostile input without panic, hang, or unbounded allocation.
- Round-trip tests distinguish canonical and raw-preserving modes.
- Fuzz targets and regression corpora exist for every protocol crate.
- Spec traceability tags feed RFC 006 evidence.
- Protocol modules follow the standard layout for parallel implementation.
OPC-SDK-RFC-006: Conformance and Evidence Pipeline
Status: Draft for Implementation
Version: 2.0.0
Date: 2026-05-19
Audience: release engineers, security engineers, standards reviewers, SDK implementers, NF teams
1. Abstract
This RFC defines the OpenPacketCore evidence pipeline: standards conformance mapping, test evidence, SBOM generation, VEX, provenance, artifact signing, performance baselines, known-gap management, and release gates.
The purpose is not to create marketing compliance claims. The purpose is to produce machine-readable, signed evidence that states exactly what is implemented, tested, partially implemented, not implemented, or intentionally out of scope.
The initial draft correctly required conformance tags, SBOMs, signed bundles, and performance baselines. This version expands those into a full evidence system suitable for high-integrity carrier CNFs and parallel implementation.
2. Scope
2.1 In Scope
- Standards requirement inventory.
- Code-to-spec and test-to-spec mapping.
- Conformance status extraction.
- Known-gap registry.
- SBOM and VEX generation.
- Build provenance and artifact signing.
- Performance baseline capture.
- Evidence bundle format.
- Release and PR gates.
2.2 Out of Scope
- Legal certification by standards bodies.
- Operator-specific acceptance testing.
- Live-network certification.
- Runtime audit storage. See RFC 003.
3. Design Goals
3.1 Security
- Evidence must be tamper-evident and tied to artifact digests.
- Supply-chain metadata must include source, dependencies, build environment, container base images, and vulnerability status.
- Claims must be traceable to tests, source, and reviewed gaps.
- Signing keys or identities must be auditable.
3.2 Performance
- Evidence generation must be incremental for PR workflows.
- Full release evidence may be more expensive but must be reproducible.
- Performance baselines must record environment details so regressions are meaningful.
3.3 Maintainability
- Conformance tags must use a strict schema.
- Known gaps must be first-class records, not prose-only notes.
- Evidence tools must fail closed when claims are ambiguous.
- Output formats must be stable for downstream automation.
3.4 Functionality
- Produce human-readable and machine-readable reports.
- Support partial, full, not-implemented, not-applicable, and gap statuses.
- Attach tests and benchmark results to claims.
- Sign artifacts and attestations.
- Support release promotion gates.
4. Evidence Model
4.1 Claim Types
The evidence pipeline recognizes:
| Claim | Meaning |
|---|---|
implemented | Code exists for the requirement |
tested | Automated tests exercise the requirement |
partial | Some required behavior is missing |
not-implemented | No implementation exists |
not-applicable | Requirement does not apply to this SDK/NF/profile |
gap | Known missing behavior with owner and mitigation |
waived | Temporary exception approved by policy |
No release may claim full conformance for a requirement unless it has both
implemented and tested evidence, plus no open blocking gap.
4.2 Requirement IDs
Every tracked requirement receives a stable ID:
REQ-<source>-<document>-<release>-<section>-<ordinal>
Example:
REQ-3GPP-TS29281-R18-5.1-001
Requirement IDs are stored in a versioned inventory file. Comments in code may reference IDs, but comments do not define the inventory.
4.3 Evidence Records
{
"requirement_id": "REQ-3GPP-TS29281-R18-5.1-001",
"status": "partial",
"source_refs": ["crates/opc-proto-gtp/src/header.rs:Gtpv1uHeader"],
"test_refs": ["crates/opc-proto-gtp/tests/roundtrip.rs:test_gtpu_header"],
"gap_refs": ["GAP-000123"],
"artifact_digests": ["sha256:..."],
"reviewed_by": ["standards-reviewer"],
"last_updated": "2026-05-19T00:00:00Z"
}
The pipeline MUST validate evidence records against a JSON schema.
5. Conformance Tracking
5.1 Inventory
The repository MUST maintain:
evidence/
requirements/
3gpp-ts-29.281-r18.yaml
ietf-rfc-7951.yaml
mappings/
code-map.yaml
test-map.yaml
gaps/
known-gaps.yaml
Requirement inventories SHOULD be generated from structured sources when available. When manual extraction is required, each requirement must include source document, release/revision, section, and reviewer.
5.2 Code Tags
Code tags use strict syntax:
#![allow(unused)] fn main() { /// @spec 3GPP TS 29.281 R18 5.1 Table 5.1-1 /// @req REQ-3GPP-TS29281-R18-5.1-001 /// @conformance partial /// @gap GAP-000123 pub struct Gtpv1uHeader<'a> { ... } }
Allowed tag keys:
@spec@req@conformance@gap@security@performance@test
Unknown tags MUST fail evidence extraction in release mode.
5.3 Test Tags
Tests SHOULD reference requirement IDs:
#![allow(unused)] fn main() { #[test] #[req("REQ-3GPP-TS29281-R18-5.1-001")] fn gtpu_header_roundtrip() { ... } }
The extraction tool MUST support Rust test attributes or a sidecar test mapping
file. A requirement with code but no test remains implemented, not full.
5.4 Status Rules
Status calculation:
| Inputs | Result |
|---|---|
| code + passing tests + no blocking gaps | full |
| code + some tests + open nonblocking gaps | partial |
| code + no tests | implemented-untested |
| gap with no code | not-implemented |
| reviewed N/A record | not-applicable |
| approved waiver | waived |
The machine-readable report MUST include both raw evidence and calculated status.
6. Known Gaps
6.1 Gap Record
Known gaps MUST be structured:
id: GAP-000123
title: GTP-U extension headers not fully decoded
status: open
severity: medium
applies_to:
- REQ-3GPP-TS29281-R18-5.2-004
owner: opc-proto-gtp
created: 2026-05-19
target_release: 0.3.0
mitigation: Reject unsupported extension headers in strict mode.
security_impact: Low if strict mode is enabled.
performance_impact: None.
6.2 Gap Gates
Release mode MUST fail when:
- A
partialornot-implementedstatus has no gap. - A gap has no owner.
- A gap has no mitigation or explicit "no mitigation" rationale.
- A gap target release is overdue.
- A security-critical gap lacks security approval.
The root known-gaps.md MAY be generated from known-gaps.yaml, but the YAML
is the source of truth.
7. SBOM and VEX
7.1 SBOM Requirements
Every release MUST include CycloneDX JSON SBOMs for:
- Rust workspace dependencies.
- Container images.
- Helm charts and embedded images.
- Generated artifacts where dependencies differ.
- Native libraries linked into binaries.
SBOMs MUST include:
- direct and transitive dependencies,
- package URLs where available,
- license data,
- hashes,
- supplier/source repository where available,
- build target,
- feature flags,
- container base image digests.
7.2 VEX Requirements
VEX records MUST state vulnerability applicability:
- affected,
- not affected,
- fixed,
- under investigation.
Each VEX decision MUST include:
- CVE or advisory ID,
- package and version,
- scanner database timestamp,
- justification,
- reviewer or automated policy source,
- expiry for temporary decisions.
Release mode MUST fail on unresolved critical vulnerabilities unless an approved VEX record exists.
8. Provenance and Signing
8.1 Artifact Digests
Every artifact must be addressed by digest:
- binaries,
- container images,
- Helm charts,
- SBOMs,
- evidence bundles,
- performance reports,
- conformance reports.
Tags are not sufficient.
8.2 Provenance
Release builds MUST produce SLSA-style provenance, preferably in in-toto/DSSE format, including:
- source repository URL,
- commit SHA,
- dirty tree status,
- builder identity,
- build workflow reference,
- build inputs,
- dependency lockfiles,
- environment image digest,
- output artifact digests.
8.3 Signing
Release artifacts and attestations MUST be signed with Sigstore/Cosign or an approved offline carrier signing profile.
Keyless profile:
- OIDC issuer and subject must be policy-allowed.
- Transparency log entry must be verifiable.
- Certificate identity must match release workflow.
Offline profile:
- Public key must be published through an approved channel.
- Signing key custody and rotation must be documented.
- Transparency log use SHOULD be retained where possible.
8.4 Bundle Signing
Signing only evidence-bundle.tar.gz is not enough. The bundle MUST include a
manifest of file digests, and the manifest or DSSE envelope MUST be signed.
Individual high-value artifacts SHOULD also carry their own attestations.
9. Performance Evidence
9.1 Benchmark Classes
Performance evidence MUST cover:
- RFC 001 config commit phases.
- RFC 002 generated validation and patch application.
- RFC 004 session store operations.
- RFC 005 protocol decode/encode.
- Security operations from RFC 003 where relevant.
9.2 Environment Capture
performance-baseline.json MUST include:
- CPU model and count,
- memory size and speed where available,
- kernel version,
- container runtime,
- Kubernetes version when applicable,
- storage class for persistence tests,
- network plugin for distributed tests,
- compiler version,
- cargo profile,
- feature flags,
- git commit,
- date/time,
- benchmark tool version.
9.3 Regression Policy
Each benchmark defines:
- metric,
- baseline,
- allowed regression threshold,
- required sample count,
- noise handling,
- owner.
Data-plane PRs MUST fail when they exceed regression thresholds unless a performance waiver is approved.
10. Evidence Bundle
10.1 Files
The release evidence bundle MUST contain:
evidence-bundle/
manifest.json
conformance-report.json
conformance-report.md
known-gaps.json
sbom/
workspace.cdx.json
containers.cdx.json
vex/
vex.json
provenance/
build.intoto.jsonl
signatures/
cosign.bundle
performance/
performance-baseline.json
raw/
tests/
test-summary.json
junit/
security/
vulnerability-report.json
policy-results.json
10.2 Manifest
manifest.json MUST include:
- evidence schema version,
- SDK version,
- git commit,
- artifact digests,
- file digests,
- signing identity,
- generation tool version,
- generation timestamp,
- known incomplete sections.
10.3 Packet-core evidence packs
A release evidence bundle MAY include one or more packet-core evidence packs for protocol fixtures, attach procedure results, and kernel dataplane/XFRM proof. These packs are intended to make smoke artifacts and test evidence from different network functions comparable, not to create product-specific certification claims.
Each pack is a JSON object conforming to packet-core-evidence-pack.schema.json
and contains:
protocol_evidence: protocol fixture evidence records.attach_evidence: attach and session-establishment procedure results.kernel_dataplane_evidence: kernel dataplane, XFRM, routing, and firewall state summaries.
Packet-core evidence schemas are versioned independently within RFC 006 and
are currently experimental. A pack MUST declare experimental: true until
the schema graduates. Every pack MUST pass redaction validation before it is
included in a bundle; validation fails closed if any string field contains a
raw IMSI, MSISDN, IMEI, NAI, Session-Id, LI identifier, or key material.
Downstream products (for example, ePDG smoke artifacts) MAY map their own evidence into this SDK format. Doing so documents how the product evidence corresponds to SDK schema fields; it does not imply the SDK has certified the product.
11. PR and Release Gates
11.1 PR Gates
Required for every PR:
- Build.
- Unit tests.
- Formatting and lint checks.
- Incremental evidence extraction.
- New public protocol/config items include spec or explicit non-spec tags.
- New gaps are structured and owned.
- Security-sensitive changes run targeted tests.
11.2 Release Gates
Required for every release:
- Full test suite.
- Fuzzing gate for changed protocol crates.
- SBOM generation.
- VEX evaluation.
- Vulnerability scan.
- Provenance generation.
- Artifact signing.
- Conformance report.
- Known-gap validation.
- Performance baseline.
- Evidence bundle signing.
Release MUST fail closed if evidence generation fails.
12. Implementation Evidence Requirements
Generated code is allowed only when evidence remains strict.
Rules:
- Every new protocol struct must include spec tags.
- Every new generated config item must include YANG path metadata.
- Every new security behavior must include a threat/test note.
- Every generated test must map to a requirement or state it is purely internal.
- Contributors must not mark conformance
full; only the evidence calculator may calculate final status. - Ambiguous or unsupported spec behavior must create a gap record.
The evidence pipeline is the guardrail that prevents plausible-looking code from silently becoming unsupported compliance claims.
13. Tooling Architecture
crates/opc-evidence/
src/
inventory.rs
extract.rs
conformance.rs
sbom.rs
vex.rs
provenance.rs
performance.rs
bundle.rs
policy.rs
report.rs
Tool responsibilities:
inventory: load and validate requirement inventories.extract: scan source and test tags.conformance: calculate status.sbom: invoke or parse SBOM generators.vex: correlate vulnerabilities and VEX decisions.provenance: collect build attestation metadata.performance: normalize benchmark output.bundle: create manifest and bundle.policy: enforce PR/release gates.report: emit Markdown and JSON.
14. Schemas
The repository MUST version JSON schemas for:
- requirement inventory,
- evidence record,
- conformance report,
- gap record,
- performance baseline,
- bundle manifest,
- VEX policy result,
- packet-core protocol evidence,
- packet-core attach evidence,
- packet-core kernel dataplane evidence,
- packet-core evidence pack.
Schema changes MUST be backward compatible within a major SDK release or include a migration tool.
15. Testing Requirements
15.1 Unit Tests
- Tag parser accepts valid tags and rejects invalid tags.
- Requirement inventory schema validation.
- Gap gate logic.
- Status calculation matrix.
- Manifest digest calculation.
- VEX decision expiry.
15.2 Integration Tests
- End-to-end evidence generation on fixture crate.
- Release gate fails on undocumented partial conformance.
- Release gate fails on unsigned artifact.
- Release gate fails on unresolved critical CVE.
- Performance regression gate fails on threshold breach.
- Known-gaps Markdown generation from YAML.
15.3 Tamper Tests
- Modify artifact after manifest generation.
- Remove test evidence for full claim.
- Change SBOM after signing.
- Use disallowed signing identity.
- Replay old VEX with expired decision.
16. Acceptance Criteria
This RFC is implemented when:
- Conformance claims are calculated from requirement inventory, code tags, tests, and gaps.
- A requirement cannot silently remain partial without a structured known gap.
- SBOM and VEX are generated and release-gated.
- Provenance ties artifacts to source commit, builder, inputs, and digests.
- Evidence bundles include signed manifests and verifiable artifact digests.
- Performance baselines include environment details and regression thresholds.
- PR and release gates fail closed on missing or inconsistent evidence.
- Generated code must supply traceable tags and tests before it can support conformance claims.
OPC-SDK-RFC-007: SBI Service Framework
Status: Draft for Implementation
Version: 1.0.0
Date: 2026-05-19
Audience: SBI NF implementers, security engineers, operator authors, test authors
1. Abstract
This RFC defines the OpenPacketCore Service Based Interface (SBI) framework for 5G control-plane CNFs. It standardizes HTTP/2 transport behavior, 3GPP ProblemDetails, OAuth2/JWT-SVID authentication, NRF discovery, service registration, retry/backoff, overload control, circuit breaking, idempotency, callback delivery, OpenAPI/model generation, observability, and conformance tests.
Without this RFC, every SBI-producing NF would independently implement common TS 29.500/29.501 behavior. That would create incompatible error semantics, token validation, discovery caching, and overload behavior across AMF, SMF, PCF, NRF, UDM, AUSF, NSSF, NEF, NWDAF, BSF, CHF, SCP, and SEPP.
2. Scope
2.1 In Scope
- SBI HTTP/2 server and client substrate.
- TS 29.500 common headers and ProblemDetails behavior.
- TS 29.510 NRF registration, heartbeat, discovery, and access token client helpers.
- OAuth2 bearer token validation and client-credentials acquisition.
- SPIFFE JWT-SVID client authentication to NRF where configured.
- Retry, timeout, backoff, idempotency, and callback delivery.
- Per-peer, per-slice, and per-service overload controls.
- Circuit breakers and outlier detection.
- OpenAPI-driven model generation and compatibility.
- Metrics, tracing, audit, and evidence hooks.
2.2 Out of Scope
- NF-specific SBI resource semantics. Those live in per-NF crates.
- Management-plane gNMI/NETCONF. See RFC 001 and RFC 003.
- Protocol codecs below HTTP/2. See RFC 005.
- Session persistence. See RFC 004.
3. Design Goals
3.1 Security
- Authenticate every SBI peer with mTLS and, where applicable, OAuth2 access tokens.
- Bind peer identity, NF type, NF instance ID, PLMN, tenant, slice, and token scopes into authorization decisions.
- Prevent topology scraping, token replay, confused-deputy calls, callback spoofing, and cross-slice data exposure.
- Avoid logging raw SUPI/GPSI, bearer tokens, assertion JWTs, or subscriber payloads.
3.2 Performance
- Use HTTP/2 connection pooling and bounded concurrency per peer.
- Avoid per-request DNS/NRF discovery.
- Make token verification hot-path cacheable.
- Provide low-latency fast paths for common ProblemDetails and header parsing.
- Enforce backpressure before request queues grow unbounded.
3.3 Maintainability
- Keep TS 29.500 common behavior in
opc-sbi, not in every NF. - Generate typed models from version-pinned OpenAPI definitions where possible.
- Keep retry and overload policy declarative through YANG.
- Provide one shared testkit for SBI peers and NRF behavior.
3.4 Functionality
- Support SBI producer and consumer roles.
- Support NRF registration, heartbeat, discovery, subscriptions, and token acquisition.
- Support service-version negotiation.
- Support callbacks with retry and dead-letter behavior.
- Support direct NF-to-NF routing and SCP-mediated routing.
4. Standards Baseline
The initial target is 3GPP Release 17 with explicit support for selected Release 18 behavior when per-NF specs require it.
Required references:
- TS 29.500: Common API framework, HTTP behavior, headers, ProblemDetails.
- TS 29.501: Principles and guidelines for services definition.
- TS 29.510: NRF NFManagement, NFDiscovery, AccessToken.
- TS 33.501: SBI security and OAuth2 usage.
- RFC 6749: OAuth2.
- RFC 6750: Bearer token usage.
- RFC 7515/7517/7519: JWS, JWK, JWT.
- RFC 7662: Token introspection, if enabled by profile.
- RFC 9110/RFC 9113: HTTP semantics and HTTP/2.
The exact release and supported service APIs are captured in RFC 006 evidence.
5. Crate Model
The shared crate is opc-sbi.
crates/opc-sbi/
src/
lib.rs
error.rs
problem.rs
headers.rs
identity.rs
oauth.rs
nrf/
mod.rs
registration.rs
discovery.rs
heartbeat.rs
access_token.rs
cache.rs
client/
mod.rs
pool.rs
retry.rs
circuit_breaker.rs
overload.rs
server/
mod.rs
auth.rs
extractors.rs
middleware.rs
callback/
mod.rs
dispatcher.rs
dead_letter.rs
models/
generated/
observability.rs
testkit/
NF crates MUST use opc-sbi for common SBI behavior. They MUST NOT duplicate
ProblemDetails encoding, bearer-token parsing, NRF discovery caching, or retry
policy.
6. Transport Contract
6.1 HTTP/2
SBI uses HTTP/2 by default. The framework MUST:
- Use TLS 1.3 by default.
- Verify peer certificate identity through RFC 003.
- Support direct NF endpoints and SCP endpoints.
- Enforce max header list size, max frame size, max body size, stream concurrency, and idle timeouts.
- Reject HTTP/1.1 in production profiles unless a per-NF compatibility profile explicitly permits it.
6.2 Connection Pooling
The client pool key MUST include:
- target NF instance or service set,
- transport mode: direct or SCP,
- trust domain,
- tenant,
- service name,
- API version,
- TLS profile,
- OAuth2 audience/scope set.
Pools MUST enforce:
- maximum connections per peer,
- maximum concurrent streams per connection,
- idle connection eviction,
- connection max age,
- backpressure when all streams are saturated.
6.3 Deadlines
Every outbound SBI request MUST carry a deadline from the caller. The framework MUST enforce request timeout locally and SHOULD propagate timeout hints through headers where 3GPP permits.
7. ProblemDetails
7.1 Error Type
opc-sbi owns the canonical ProblemDetails type:
#![allow(unused)] fn main() { pub struct ProblemDetails { pub status: http::StatusCode, pub cause: Option<CauseCode>, pub title: Option<String>, pub detail: Option<String>, pub instance: Option<String>, pub invalid_params: Vec<InvalidParam>, pub supported_features: Option<String>, } }
NF code returns domain errors; the framework maps them to ProblemDetails.
7.2 Mapping Rules
ProblemDetails mapping MUST be:
- deterministic,
- spec-cited,
- test-covered,
- safe for logs and clients,
- evidence-linked through RFC 006.
No domain handler may return ad hoc JSON error bodies on SBI routes.
8. Common Headers
The framework MUST parse and render configured TS 29.500 headers, including:
3gpp-Sbi-Message-Priority3gpp-Sbi-Correlation-Info3gpp-Sbi-Binding3gpp-Sbi-Routing-Binding3gpp-Sbi-Target-apiRootRetry-AfterLocationAuthorization
Header parsing MUST reject malformed values with structured errors. Sensitive headers MUST be redacted.
9. Identity and Authorization
9.1 Peer Identity
The server middleware extracts:
#![allow(unused)] fn main() { pub struct SbiPeer { pub spiffe: Option<SpiffeId>, pub nf_instance_id: Option<NfInstanceId>, pub nf_type: Option<NfType>, pub tenant: TenantId, pub plmn: Option<PlmnId>, pub snssai: Option<Snssai>, } }
Identity MAY come from mTLS SPIFFE, NRF-issued token claims, or a legacy certificate mapping profile. Unsigned metadata headers MUST NOT establish identity.
9.2 OAuth2 Validation
SBI producers that require OAuth2 MUST validate:
- issuer,
- audience,
- expiry and not-before,
- signature and key ID,
- scope,
- NF type and instance binding,
- tenant and slice binding where configured,
- replay-sensitive claims when configured.
Token validation results MAY be cached until the earlier of token expiry or policy version change.
9.3 OAuth2 Client Credentials
SBI consumers MUST acquire tokens through NRF or configured authorization server. Client authentication methods:
- SPIFFE JWT-SVID, preferred.
- mTLS-bound client authentication.
- Private key JWT.
- Kubernetes Secret client secret only in explicit compatibility profile.
Long-lived shared client secrets are forbidden in production carrier profiles unless an RFC 006 waiver exists.
10. NRF Integration
10.1 Registration
opc-sbi MUST provide helpers for NF registration, update, deregistration, and
heartbeat.
NF profiles MUST be generated from typed NF metadata and canonical YANG. Raw free-form JSON construction is forbidden outside test fixtures.
10.2 Heartbeats
The heartbeat driver MUST:
- derive interval from NRF response where present,
- jitter heartbeat timing,
- mark the NF degraded on repeated heartbeat failure,
- keep serving existing local traffic according to per-NF policy,
- deregister gracefully on shutdown when possible.
10.3 Discovery
The discovery client MUST provide:
- query construction with typed filters,
- response validation,
- cache with TTL and stale-if-error policy,
- negative caching,
- per-service-set load balancing,
- SCP preference where configured,
- tenant and slice filter enforcement.
Discovery cache entries MUST be invalidated on canonical config changes that affect peers, PLMN, slice, trust anchors, or routing mode.
10.4 Subscriptions
NRF subscription handling MUST support retry, backoff, and dead-letter behavior for failed notifications. Subscription callbacks MUST be authenticated and authorized like any other SBI request.
11. Routing Modes
Supported modes:
| Mode | Behavior |
|---|---|
direct | Consumer dials producer discovered from NRF or static peer config |
scp | Consumer sends through SCP with routing headers |
sepp | Inter-PLMN traffic goes through SEPP policy |
static | Explicit peer list from YANG, for lab or interop |
The mode is selected per service, tenant, PLMN, and slice. Inter-PLMN traffic MUST NOT bypass SEPP when policy requires SEPP.
12. Retry, Idempotency, and Callback Delivery
12.1 Retry Policy
Retry policy MUST be declarative:
#![allow(unused)] fn main() { pub struct RetryPolicy { pub max_attempts: u8, pub base_delay: Duration, pub max_delay: Duration, pub jitter: Jitter, pub retry_on_status: Vec<StatusCode>, pub retry_on_transport_error: bool, } }
The framework MUST NOT retry non-idempotent requests unless the request carries an idempotency key or the operation is explicitly marked idempotent by the service definition.
12.2 Idempotency
For operations that can be retried, the framework SHOULD provide:
- idempotency key generation,
- inbound idempotency cache,
- replay-safe response caching,
- expiry and memory bounds.
12.3 Callback Delivery
Callback dispatchers MUST support:
- bounded queues,
- retry budget,
- backoff,
- callback authentication,
- dead-letter sink,
- observability,
- cancellation on subscription deletion.
Callback storms MUST be rate-limited per callback target.
13. Overload Control
13.1 Admission
The framework MUST provide admission control before request bodies are fully read when possible.
Admission keys:
- peer identity,
- NF type,
- tenant,
- slice,
- service,
- operation,
- priority.
13.2 Response Semantics
Overload responses MUST use:
- HTTP
429for rate limiting, - HTTP
503for temporary service overload, Retry-Afterwhere retry is appropriate,- ProblemDetails with a stable cause code.
13.3 Priority
Requests with emergency, lawful, registration, paging, or charging criticality MAY receive higher priority only when the per-NF spec and 3GPP behavior justify it. Priority policy MUST be explicit, audited, and tested.
13.4 Circuit Breakers
Outbound circuit breakers MUST track:
- consecutive failures,
- error-rate window,
- latency outliers,
- half-open probes,
- per-peer and per-service state.
Circuit breaker state MUST be visible in metrics and debug endpoints without exposing secrets or topology beyond authorized users.
14. Generated Models and OpenAPI
opc-sbi SHOULD generate models from version-pinned OpenAPI sources where
available. Generated code MUST:
- be reproducible,
- preserve unknown extension fields only when configured,
- avoid ad hoc stringly typed JSON in NF handlers,
- include spec tags for RFC 006,
- pass serialization round trips.
OpenAPI mismatches with normative 3GPP text MUST create RFC 006 known gaps or generator overrides with citations.
15. Configuration Model
Each SBI NF YANG SHOULD expose:
sbi/listenerssbi/clientssbi/nrfsbi/oauth2sbi/retry-policysbi/overloadsbi/circuit-breakerssbi/callbacks
These may be embedded under the shared listeners, peers, rate-limits,
and policy containers defined by the cloud-native pattern.
16. Observability
Required metrics:
opc_sbi_requests_total{nf,service,operation,outcome}opc_sbi_request_duration_seconds{service,operation}opc_sbi_problem_details_total{service,cause,status}opc_sbi_oauth_validation_total{outcome,reason}opc_sbi_nrf_discovery_total{outcome}opc_sbi_nrf_cache_entries{service}opc_sbi_nrf_heartbeat_total{outcome}opc_sbi_circuit_state{peer,service,state}opc_sbi_overload_rejections_total{service,reason}opc_sbi_callback_delivery_total{target,outcome}
Tracing MUST propagate W3C traceparent and 3GPP correlation headers when
present.
17. Module Ownership
| Module | Responsibility |
|---|---|
opc-sbi-problem | ProblemDetails model and mappings |
opc-sbi-headers | 3GPP header parse/render/redaction |
opc-sbi-auth | OAuth2/JWT-SVID validation and token acquisition |
opc-sbi-nrf | NRF registration, heartbeat, discovery, cache |
opc-sbi-client | HTTP/2 pool, deadlines, retries, circuit breakers |
opc-sbi-server | Axum/tower middleware, extractors, admission |
opc-sbi-callback | Callback queues, retry, dead-letter |
opc-sbi-codegen | OpenAPI/model generation |
opc-sbi-testkit | Mock NRF, mock producer, token fixtures |
Agents must not implement NF-specific business logic in opc-sbi.
18. Testing Requirements
18.1 Unit Tests
- ProblemDetails mappings.
- Header parsing and redaction.
- Token validation matrix.
- Retry idempotency policy.
- Circuit breaker transitions.
- NRF cache expiry and invalidation.
18.2 Integration Tests
- Mock NRF registration, heartbeat, discovery, and token issuance.
- Producer validates mTLS and OAuth2 together.
- Consumer refreshes token before expiry.
- SCP routing header generation.
- Callback retry and dead-letter.
- Overload rejection with
Retry-After.
18.3 Fault Injection
- NRF unavailable.
- Expired token.
- Bad JWK key ID.
- Peer certificate rotation.
- DNS failure.
- HTTP/2 stream reset.
- Slow callback target.
- Discovery cache stale while NRF down.
18.4 Performance Gates
- Hot token validation cache p99 under 25 microseconds.
- ProblemDetails mapping allocation-free for common static errors.
- Discovery cache lookup p99 under 10 microseconds.
- Client pool does not allocate per request beyond body/model needs.
- Overload admission rejects before full body read for oversized bodies.
19. Acceptance Criteria
This RFC is implemented when:
- All SBI NFs use shared ProblemDetails, header, auth, retry, and NRF code.
- OAuth2 validation and client-credential acquisition are test-covered.
- NRF registration, heartbeat, discovery, and cache behavior are shared.
- Retry behavior is idempotency-aware.
- Overload control returns consistent 429/503/Retry-After semantics.
- Circuit breaker state is observable and bounded.
- Generated models are reproducible and evidence-tagged.
- A shared SBI testkit can exercise producer and consumer behavior for every SBI NF.
OPC-SDK-RFC-008: CNF Runtime Chassis and Resource Governance
Status: Draft for Implementation
Version: 1.0.0
Date: 2026-05-19
Audience: NF implementers, platform engineers, SREs, security reviewers
1. Abstract
This RFC defines the common Rust runtime chassis used by every OpenPacketCore CNF. It standardizes process startup, task supervision, shutdown, health probes, admin endpoints, runtime pools, resource budgets, panic policy, configuration bootstrap, signal handling, telemetry initialization, memory behavior, and operational debug surfaces.
The goal is that AMF, SMF, UPF, NRF, PCF, SEPP, SMSC, and all other CNFs share one predictable runtime skeleton instead of each inventing its own Tokio setup, shutdown behavior, health semantics, and task lifecycle.
2. Scope
2.1 In Scope
- Runtime initialization.
- Tokio worker and blocking pool configuration.
- Task supervision and cancellation.
- Startup and readiness phases.
- Graceful shutdown and drain.
- Health and admin HTTP endpoints.
- Runtime resource budgets and backpressure hooks.
- Panic and fatal-error policy.
- Metrics, logging, and tracing bootstrap.
- Memory, allocator, and OOM behavior.
- Common CLI/env/bootstrap contract.
2.2 Out of Scope
- NF-specific protocol logic.
- Kubernetes controller behavior. See RFC 009.
- Node/NIC scheduling and SR-IOV contracts. See RFC 011.
- Config commit semantics. See RFC 001.
3. Design Goals
3.1 Security
- Fail closed when required bootstrap security material is unavailable.
- Keep debug endpoints disabled or authorization-gated in production.
- Ensure panic output and fatal-error reports are redacted.
- Make shutdown safe: no partial config writes, key leaks, or unaudited emergency exits.
3.2 Performance
- Avoid runtime-pool contention between management, control, crypto, and data-plane work.
- Bound queues, tasks, memory, and blocking work.
- Make health checks cheap and non-blocking.
- Provide predictable drain behavior under load.
3.3 Maintainability
- Provide one reusable
opc-runtimecrate. - Make lifecycle phases explicit and testable.
- Provide standard task naming and metrics.
- Keep per-NF custom code in callbacks, not in process scaffolding.
3.4 Functionality
- Support control-plane, data-plane, and library-like CNF profiles.
- Support local developer mode and production mode.
- Support graceful restart, termination, and Kubernetes probe integration.
- Support runtime introspection without exposing secrets.
4. Runtime Crate
The shared crate is opc-runtime.
crates/opc-runtime/
src/
lib.rs
bootstrap.rs
profile.rs
supervisor.rs
task.rs
shutdown.rs
health.rs
admin.rs
resources.rs
panic.rs
telemetry.rs
memory.rs
signals.rs
testkit.rs
Every NF binary SHOULD be a thin wrapper around opc_runtime::run.
5. Runtime Profile
#![allow(unused)] fn main() { pub struct RuntimeProfile { pub mode: RuntimeMode, pub nf_kind: NetworkFunctionKind, pub instance_id: InstanceId, pub async_workers: WorkerCount, pub blocking_threads: ThreadLimit, pub crypto_threads: ThreadLimit, pub management_threads: ThreadLimit, pub max_tasks: usize, pub max_queued_bytes: usize, pub shutdown_grace: Duration, pub drain_timeout: Duration, } }
Profiles:
dev: permissive, local files, debug endpoints enabled on loopback.lab: production-like, explicit waivers allowed.production: fail closed, debug gated, strict resource limits.conformance: deterministic test profile.perf: optimized benchmark profile with fixed CPU/resource assumptions.
6. Startup State Machine
Every CNF starts through:
| Phase | Purpose |
|---|---|
ProcessInit | parse CLI/env, install panic hook, initialize logging |
TelemetryInit | metrics/tracing/logging exporters |
SecurityInit | identity, trust bundles, key providers |
ConfigBootstrap | load initial config through RFC 001 |
ResourcePreflight | verify CPU, memory, filesystem, devices |
ServiceBind | bind listeners but do not report ready |
PeerWarmup | optional NRF registration, discovery, backend connection |
Ready | readiness probe returns success |
Draining | termination accepted, new work limited |
Stopped | all supervised tasks exited |
Startup MUST fail closed in production if any required phase fails.
7. Task Supervision
7.1 Task Model
All long-lived tasks MUST be registered with the supervisor:
#![allow(unused)] fn main() { pub struct TaskSpec { pub name: TaskName, pub kind: TaskKind, pub criticality: Criticality, pub restart: RestartPolicy, pub shutdown: ShutdownPolicy, } }
Task kinds:
listenerprotocol-workersession-workermanagement-workerbackground-syncmetrics-exporterwatchertimer
7.2 Criticality
| Criticality | Behavior on Failure |
|---|---|
fatal | Transition CNF to fatal shutdown |
degrade | Mark degraded and optionally restart |
best-effort | Log/metric and continue |
Critical task failures MUST be visible through readiness and alarm state.
7.3 Restart Policy
Restart policy MUST include:
- max restarts per window,
- backoff,
- jitter,
- failure classification,
- whether restart is allowed after config changes.
Unbounded task restart loops are forbidden.
8. Runtime Pool Isolation
The runtime MUST expose separate execution domains:
- async I/O workers,
- blocking/CPU pool,
- crypto pool,
- management pool,
- data-plane workers where applicable.
Data-plane CNFs SHOULD integrate with RFC 011 CPU pinning and IRQ affinity. Management-plane work MUST NOT execute on data-plane pinned workers.
9. Resource Governance
9.1 Budgets
Each CNF declares:
#![allow(unused)] fn main() { pub struct ResourceBudget { pub max_heap_bytes: Option<usize>, pub max_tasks: usize, pub max_channels: usize, pub max_queue_bytes: usize, pub max_request_body_bytes: usize, pub max_open_files: usize, pub max_backend_connections: usize, } }
Budgets MUST be profile-configurable and observable.
9.2 Backpressure
The runtime provides shared primitives:
- bounded mpsc channels,
- byte-accounted queues,
- weighted semaphores,
- admission guards,
- deadline propagation,
- cancellation tokens.
Unbounded channels are forbidden in production runtime code unless an RFC 006 waiver exists.
9.3 Memory Behavior
The runtime SHOULD:
- expose allocator metrics where available,
- support an optional hardened allocator profile,
- fail fast on configured memory-budget breach,
- avoid memory-heavy debug dumps in production,
- support heap profile endpoints only under explicit authorization.
10. Shutdown and Drain
10.1 Signals
The runtime MUST handle:
SIGTERM: graceful drain.SIGINT: graceful drain in dev, configurable in production.- fatal internal errors: controlled shutdown path when possible.
10.2 Drain Sequence
Drain order:
- Stop accepting new external work.
- Mark readiness false.
- Notify NRF/deregister where applicable.
- Stop management writes except emergency recovery.
- Drain protocol workers up to timeout.
- Flush audit and evidence breadcrumbs.
- Checkpoint local state where applicable.
- Shut down listeners and background tasks.
Each NF can add steps but MUST preserve safety ordering.
10.3 Kubernetes Integration
terminationGracePeriodSeconds MUST be at least shutdown_grace plus probe
latency margin. PreStop hooks MAY call admin drain but MUST NOT be the only
drain mechanism.
11. Health and Admin Surface
11.1 Endpoints
Default admin listener:
/livez/readyz/startupz/metrics/debug/runtimegated/debug/tasksgated/debug/config-versiongated
Production debug endpoints MUST require authorization or be disabled.
11.2 Health Semantics
/livez means the process event loop is alive. It MUST NOT depend on external
peers.
/readyz means the CNF can serve its intended role. It SHOULD include:
- config applied,
- critical tasks healthy,
- required listeners bound,
- required security material valid,
- required backends reachable according to NF policy.
12. Panic and Fatal Error Policy
12.1 Panics
Production builds MUST install a panic hook that:
- redacts secrets,
- records task name,
- increments fatal metrics,
- emits a structured fatal log,
- triggers supervisor policy.
Panics in parser or protocol handlers are bugs and MUST be covered by RFC 005 fuzzing regression tests.
12.2 unwrap and expect
Runtime and NF code MUST avoid unwrap and expect outside tests, build
scripts, and explicitly justified invariants. Justifications MUST be grep-able
and evidence-linked.
13. Bootstrap Contract
CLI/env values are limited to bootstrap concerns:
- config bootstrap source,
- management bind address,
- admin bind address,
- production/dev mode,
- identity socket path,
- tracing exporter endpoint,
- initial log level,
- feature gates for explicit waivers.
Dense protocol behavior MUST come from canonical config, not env vars.
14. Telemetry Initialization
The runtime initializes:
- structured JSON logging,
- OpenTelemetry tracing,
- Prometheus metrics,
- build info,
- runtime profile info,
- panic/fatal counters.
Required metrics:
opc_runtime_build_info{nf,version,git_sha}opc_runtime_tasks{nf,kind,state}opc_runtime_task_restarts_total{nf,task}opc_runtime_queue_depth{nf,queue}opc_runtime_queue_bytes{nf,queue}opc_runtime_shutdown_total{nf,reason}opc_runtime_panic_total{nf,task}opc_runtime_memory_bytes{nf,kind}opc_runtime_ready{nf}
15. Time and Clocks
The runtime MUST provide a clock abstraction for tests:
#![allow(unused)] fn main() { pub trait Clock: Send + Sync { fn now(&self) -> Timestamp; fn monotonic(&self) -> Instant; } }
Security expiry and audit timestamps use wall clock plus monotonic sequencing where required. Timers use monotonic time.
16. Module Ownership
| Module | Responsibility |
|---|---|
opc-runtime-bootstrap | CLI/env/profile loading |
opc-runtime-supervisor | task registry, restart, failure policy |
opc-runtime-shutdown | signal handling and drain orchestration |
opc-runtime-health | health model and probe endpoints |
opc-runtime-admin | gated debug/admin routes |
opc-runtime-resources | budgets, queues, semaphores |
opc-runtime-telemetry | logging, metrics, tracing init |
opc-runtime-testkit | fake clock, fake tasks, shutdown tests |
Agents implementing NF business logic should consume opc-runtime; they should
not fork startup/shutdown code.
17. Testing Requirements
17.1 Unit Tests
- Startup state transitions.
- Task restart/backoff.
- Fatal vs degraded task failure.
- Bounded queue byte accounting.
- Panic hook redaction.
- Health state aggregation.
- Clock abstraction.
17.2 Integration Tests
- SIGTERM drains in order.
- Readiness flips false before listeners stop.
- NRF deregistration hook is called during drain.
- Background task failure degrades readiness.
- Debug endpoints are disabled or authorized in production.
17.3 Fault Injection
- Hung task on shutdown.
- Task restart loop.
- Telemetry exporter unavailable.
- Missing identity socket.
- Memory budget breach.
- Queue saturation.
- Panic in a worker task.
17.4 Performance Gates
/livezp99 under 1 millisecond in healthy process.- Supervisor task spawn overhead negligible relative to direct spawn in NF startup tests.
- Runtime metrics collection does not allocate on every scrape for static metric sets.
- Queue admission overhead p99 under 10 microseconds.
18. Acceptance Criteria
This RFC is implemented when:
- Every NF binary uses
opc-runtimefor startup, supervision, health, and shutdown. - Long-lived tasks are supervised and named.
- Readiness semantics are consistent across CNFs.
- Shutdown drains safely and predictably.
- Production debug endpoints are gated or disabled.
- Runtime pools and queues are bounded.
- Panic and fatal-error handling is redacted and observable.
- Runtime behavior is covered by shared testkit and fault injection tests.
OPC-SDK-RFC-009: Operator Lifecycle, Upgrade, Migration, and Rollback
Status: Draft for Implementation
Version: 1.0.0
Date: 2026-05-19
Audience: operator authors, NF owners, release engineers, SREs
1. Abstract
This RFC defines the lifecycle contract between the OpenPacketCore Kubernetes operator, lifecycle CRDs, canonical YANG configuration, NF pods, persistent state, and release artifacts. It specifies reconciliation phases, version skew, CRD conversion, YANG schema migration, state migration, rollout strategies, rollback, drain, status conditions, and release gates.
This RFC turns the thin-CRD/fat-YANG pattern into an upgrade-safe product contract across all CNFs.
2. Scope
2.1 In Scope
- Lifecycle CRD reconciliation.
- Operator/NF version compatibility.
- CRD versioning and conversion webhooks.
- Canonical config revision and schema migration.
- NF image rollout strategies.
- Session-aware drain and handover coordination.
- Rollback and downgrade policy.
- Status, events, and GitOps health gates.
- Multi-cluster rollout topology.
2.2 Out of Scope
- Runtime process shutdown internals. See RFC 008.
- Session storage primitives. See RFC 004.
- Evidence bundle generation. See RFC 006.
- Node resource scheduling. See RFC 011.
3. Design Goals
3.1 Security
- Prevent unsigned, unverified, or policy-disallowed images from rolling out.
- Prevent config downgrades that bypass validation or reintroduce forbidden settings.
- Ensure rollback preserves audit and does not silently lose regulated data.
- Keep break-glass upgrade paths narrow, explicit, and auditable.
3.2 Performance
- Rollouts must avoid unnecessary full-cluster disruption.
- Stateful NFs must drain or transfer ownership before termination.
- Operator reconciliation must avoid hot loops and unbounded API traffic.
- Large config migrations must be staged and observable.
3.3 Maintainability
- Every lifecycle phase has stable names, conditions, and event reasons.
- Compatibility matrices are machine-readable.
- Migration functions are versioned, deterministic, and tested.
- Per-NF deviations are explicit.
3.4 Functionality
- Support install, update, scale, config change, restart, drain, rollback, restore, and delete.
- Support CRD conversion webhooks.
- Support canary and partitioned rollouts.
- Support GitOps promotion gates.
4. Version Model
4.1 Versions
The operator tracks:
- operator version,
- CRD API version,
- lifecycle contract version,
- NF image version and digest,
- NF binary SDK version,
- YANG schema digest,
- canonical config revision,
- session state schema version,
- evidence bundle digest.
4.2 Compatibility Matrix
Every release MUST publish:
operator: 0.4.0
supports:
crd_versions: ["v1alpha1", "v1alpha2"]
lifecycle_contracts: ["v1alpha1"]
nf_images:
opc-amf: ">=0.3.0 <0.5.0"
opc-smf: ">=0.3.0 <0.5.0"
yang_schema_digests:
opc-amf:
- "sha256:..."
The operator MUST reject unsupported combinations unless an explicit waiver is present and policy allows it.
5. Lifecycle State Machine
Every reconcile moves through:
| Phase | Purpose |
|---|---|
Admitted | CR accepted by admission policy |
Resolved | image, config, secrets, devices, and dependencies resolved |
Provisioning | workload resources created/updated |
Bootstrapping | pod reachable and management plane alive |
Configuring | canonical config applied |
Verifying | drift, health, and readiness checked |
Ready | service is available |
Draining | rollout/delete drain in progress |
Migrating | schema/state migration in progress |
Degraded | service impaired but not terminal |
Failed | reconciliation cannot proceed without operator action |
Terminating | deletion finalizers active |
Phase names are public API and MUST be stable.
6. Conditions and Events
Required conditions:
AdmittedResolvedProvisionedBootstrappedConfigResolvedAppConfigAppliedDriftMigrationReadyMigrationAppliedDrainReadyRollbackAvailableReady
Each condition MUST include:
- status,
- reason,
- message,
- observed generation,
- last transition time.
Event reasons MUST be stable and documented. Events MUST NOT contain secrets or raw config payloads.
7. Admission and Policy
Admission MUST verify:
- image digest present,
- image signature valid,
- evidence bundle available where required,
- CRD field validation,
- canonical config reference exists,
- manual/break-glass authority policy,
- required secrets and service accounts,
- pod security exceptions,
- per-NF node resource references.
Admission should reject failures early rather than allowing a reconcile to fail late in the workload.
8. Canonical Config Lifecycle
8.1 Revision
canonicalConfigRevision is opaque but immutable for a given config artifact.
Changing config content MUST change the revision or digest.
8.2 Apply
The operator applies config through RFC 001 management APIs. It MUST:
- verify schema digest,
- run validate-only before commit where supported,
- use idempotency keys for retries,
- record applied revision and tx ID,
- read back running config for drift detection.
8.3 Drift
Drift states:
InSyncDriftDetectedBreakGlassActiveResyncRequiredUnknown
Runtime state such as counters and sessions MUST be filtered out of drift comparison.
9. CRD Versioning and Conversion
Public lifecycle CRDs MUST use hub-and-spoke conversion once a second served version exists.
Rules:
- one storage version at a time,
- conversion webhooks are deterministic,
- lossy conversion is forbidden unless the target version has an explicit status condition and known gap,
- deprecated fields retain read compatibility for at least one minor release,
- removed fields require migration notes and evidence.
Conversion tests MUST include round trips for every CRD version pair.
10. YANG Schema Migration
YANG migration follows RFC 002. Operator responsibilities:
- detect persisted schema digest,
- select migration chain,
- run validate-only against target NF before commit,
- back up previous config envelope before migration,
- record migration tx ID,
- fail closed if migration chain is missing.
Per-NF migrations MUST be deterministic and golden-tested.
11. State Migration
Session and durable state migrations are separate from config migrations.
State migration plans MUST define:
- source version,
- target version,
- online/offline mode,
- rollback support,
- validation query,
- maximum expected duration,
- data-loss risk,
- RPO/RTO impact.
Authoritative session migrations MUST preserve RFC 004 generation and fencing semantics.
12. Rollout Strategies
Supported strategies:
| Strategy | Use |
|---|---|
rolling | stateless or safely drainable NFs |
partitioned | stateful sets and ordered migrations |
canary | high-risk release or config change |
blue-green | major upgrades or incompatible config/state changes |
manual | operator-approved special cases |
Each NF declares allowed strategies.
13. Drain and Handover
Before terminating or replacing a pod, the operator MUST invoke or observe NF drain where the NF is stateful.
Drain contract:
#![allow(unused)] fn main() { pub enum DrainMode { RejectNewWork, TransferOwnership, FlushAndStop, ImmediateEmergency, } }
Drain MUST:
- mark readiness false before removing work,
- stop new session ownership,
- transfer or release leases where possible,
- flush audit and local state,
- respect timeout,
- expose progress in status.
UPF, AMF, SMF, ePDG, N3IWF, SMSC, and IMS NFs MUST define NF-specific drain behavior.
14. Rollback and Downgrade
14.1 Rollback
Rollback is allowed when:
- previous image digest is still policy-allowed,
- previous config schema is compatible or migration back exists,
- state schema supports downgrade or state can be rebuilt,
- evidence permits rollback.
14.2 Downgrade
Downgrade is forbidden by default for stateful NFs unless explicitly supported. If downgrade is unsupported, the operator MUST fail before changing workload resources.
14.3 Failed Rollout
On failed rollout:
- Stop further pod replacement.
- Preserve logs/events/evidence references.
- Mark
DegradedorFailed. - Attempt rollback only if policy says automatic rollback is safe.
- Require manual approval for destructive recovery.
15. Backup and Restore
Before high-risk migration, the operator MUST ensure backups exist for:
- canonical config,
- shadow-security material where policy allows,
- session state if durable and required,
- audit state,
- CR status needed for recovery.
Restore MUST be tested per NF and recorded in RFC 006 evidence.
16. Multi-Cluster Lifecycle
In multi-cluster deployments:
- management cluster owns desired lifecycle state,
- workload clusters own local pod status,
- status aggregation is explicit,
- cluster identity is part of every condition source,
- rollout waves are region-aware,
- rollback can be per-cluster or global.
The operator MUST avoid applying incompatible migrations to only part of a fenced session ownership domain.
17. Observability
Required metrics:
opc_operator_reconcile_total{kind,outcome}opc_operator_reconcile_duration_seconds{kind,phase}opc_operator_rollout_total{kind,strategy,outcome}opc_operator_migration_total{kind,type,outcome}opc_operator_drain_total{kind,outcome}opc_operator_drift_observations_total{kind,state}opc_operator_rollback_total{kind,outcome}opc_operator_version_skew{kind}
Required status fields:
- current image digest,
- desired image digest,
- applied config revision,
- applied config hash,
- running schema digest,
- last successful tx ID,
- evidence bundle digest,
- migration state.
18. Module Ownership
| Module | Responsibility |
|---|---|
operator-lifecycle | shared phase/condition composition |
operator-compat | compatibility matrix parser/evaluator |
operator-config-apply | validate-only, commit, readback |
operator-conversion | CRD conversion webhook helpers |
operator-migration | config/state migration orchestration |
operator-rollout | rolling/canary/blue-green strategies |
operator-drain | NF drain API clients and progress |
operator-backup | backup/restore orchestration |
operator-testkit | fake NF, fake config bus, fake session store |
Agents must keep NF-specific reconcile logic behind interfaces and avoid duplicating phase/condition code.
19. Testing Requirements
19.1 Unit Tests
- Compatibility matrix evaluation.
- Phase transition reducer.
- Condition reason stability.
- CRD conversion round trips.
- Migration chain selection.
- Rollback eligibility.
19.2 Integration Tests
- Install fresh NF.
- Config-only update.
- Image-only update.
- Image plus config update.
- Failed validate-only blocks rollout.
- Drift detection and resync.
- Canary success and failure.
- Rollback with compatible config.
19.3 Fault Injection
- Operator restart mid-rollout.
- NF pod deleted during migration.
- gNMI commit timeout.
- Conversion webhook unavailable.
- Backup failure.
- Session drain timeout.
- Partial multi-cluster rollout failure.
19.4 Performance Gates
- Reconcile avoids hot loops under persistent failure.
- 1,000 lifecycle CRs do not exceed configured API QPS.
- Drift compare for large config stays within budget.
- Status update rate is bounded.
20. Acceptance Criteria
This RFC is implemented when:
- Operator/NF/version compatibility is machine-readable and enforced.
- Lifecycle phases and conditions are stable across all CNFs.
- Config apply uses RFC 001 validate/commit/readback behavior.
- CRD conversions are deterministic and tested.
- YANG and state migrations are explicit and evidence-linked.
- Stateful rollouts drain or transfer ownership before termination.
- Rollback eligibility is evaluated before workload mutation.
- Multi-cluster rollout status is explicit and safe.
OPC-SDK-RFC-010: Data Governance, Privacy, and Regulated Records
Status: Draft for Implementation
Version: 1.0.0
Date: 2026-05-19
Audience: security engineers, privacy reviewers, NF owners, LI/charging implementers, SREs
1. Abstract
This RFC defines the data governance substrate for OpenPacketCore CNFs. It standardizes classification, handling, redaction, retention, encryption, backup, export, audit, and evidence rules for subscriber identifiers, session records, charging data, lawful-intercept material, analytics, security logs, and management configuration.
The purpose is to ensure that every CNF treats sensitive telecom data consistently and that privacy behavior is implemented as an auditable platform contract, not as scattered per-NF convention.
2. Scope
2.1 In Scope
- Data classification taxonomy.
- SUPI/GPSI/MSISDN/IP address handling.
- Charging, audit, lawful-intercept data classification, analytics, and session state records.
- Redaction and pseudonymization.
- Retention and deletion.
- Backup and restore handling.
- Export and external sink policy.
- Tenant/slice/PLMN data boundaries.
- Evidence and test requirements.
2.2 Out of Scope
- Cryptographic key management internals. See RFC 003.
- Session store consistency. See RFC 004.
- Evidence bundle mechanics. See RFC 006.
- Product lawful-intercept mediation, collection workflows, and target-specific LI policy engines. The SDK classifies and protects LI material; it does not implement an LI product subsystem.
- Jurisdiction-specific legal interpretation.
3. Design Goals
3.1 Security
- Minimize sensitive data exposure by default.
- Encrypt regulated data at rest and in transit.
- Prevent cross-tenant, cross-slice, and cross-PLMN data leakage.
- Make audit and regulated exports tamper-evident.
- Ensure backup and debug workflows preserve classification.
3.2 Performance
- Redaction and classification must be cheap enough for hot-path logging.
- High-volume telemetry must avoid high-cardinality raw identifiers.
- Bulk retention jobs must be bounded and schedulable.
- Analytics minimization must be profile-driven and measurable.
3.3 Maintainability
- One classification vocabulary across all CNFs.
- Generated redaction metadata from RFC 002 drives code behavior.
- Retention policies are declarative through YANG.
- Exceptions are structured known gaps or waivers.
3.4 Functionality
- Support operational debugging without leaking raw subscriber data.
- Support charging and audit records with correct retention.
- Classify lawful-intercept material and keep it separated from ordinary telemetry, analytics, support bundles, and exports.
- Support analytics minimization and privacy-preserving export.
4. Data Classification
4.1 Classes
| Class | Examples | Default Handling |
|---|---|---|
public | build version, static feature flags | log/export allowed |
operational | readiness, queue depth, non-sensitive counters | log/export allowed with cardinality controls |
network-sensitive | topology, NF instance IDs, peer FQDNs | restricted logs, auth-gated debug |
subscriber-id | SUPI, IMSI, GPSI, MSISDN, PEI | redacted or keyed digest |
subscriber-session | PDU session, TEID, SEID, IP address, QoS state | encrypted, access-controlled |
security-secret | keys, tokens, credentials, OP/OPc/K | never logged, secret types |
charging-record | CDR, usage, rating inputs | retained/exported by charging policy |
lawful-intercept | warrant, target selectors, X2/X3 products | LI plane only |
analytics-sensitive | NWDAF source events, location, behavior traces | minimized before export |
audit-regulated | admin actions, break-glass, security events | tamper-evident retention |
Each data field in generated models and hand-written domain types MUST be classified.
4.2 Classification Metadata
#![allow(unused)] fn main() { pub enum DataClass { Public, Operational, NetworkSensitive, SubscriberId, SubscriberSession, SecuritySecret, ChargingRecord, LawfulIntercept, AnalyticsSensitive, AuditRegulated, } }
Generated YANG metadata and Rust annotations MUST feed the same classification registry.
5. Identity and Pseudonymization
Raw SUPI/GPSI/MSISDN/PEI MUST NOT appear in:
- metric labels,
- info/warn/error logs,
- ordinary traces,
- backend keys,
- Kubernetes Events,
- unauthenticated debug output.
The default correlation form is a tenant-scoped keyed digest:
digest = HMAC(tenant_privacy_key, data_class || identifier_type || raw_value)
Digest keys MUST be purpose-separated from encryption keys. Rotating digest keys changes correlation IDs; this must be documented in operational runbooks.
6. Redaction
Redaction levels:
| Level | Behavior |
|---|---|
drop | omit the field entirely |
mask | show fixed placeholder |
class | show class and presence only |
length-class | show approximate length bucket |
digest | show keyed digest |
cleartext | allowed only by explicit policy |
cleartext is forbidden for security-secret and restricted for
lawful-intercept.
Redaction MUST apply to:
- logs,
- traces,
- metrics,
- audit views,
- admin/debug endpoints,
- panic hooks,
- error messages,
- test snapshots committed to git.
7. Retention Policy
Each data class has a retention policy:
#![allow(unused)] fn main() { pub struct RetentionPolicy { pub class: DataClass, pub min_duration: Option<Duration>, pub max_duration: Option<Duration>, pub deletion_mode: DeletionMode, pub legal_hold_supported: bool, pub export_allowed: bool, } }
Retention MUST be configured through canonical YANG and surfaced in evidence.
Default posture:
- operational telemetry: short retention,
- audit-regulated: longer tamper-evident retention,
- charging-record: charging policy retention,
- lawful-intercept: legal/LI policy retention,
- security-secret: no export, rotate/delete per key policy.
8. Legal Hold and Deletion
Legal hold prevents deletion of matching regulated records. It MUST:
- be authenticated and authorized,
- be audited,
- include scope and expiry,
- be visible to retention jobs,
- not expose target selectors outside authorized LI/audit roles.
Deletion jobs MUST be idempotent and evidence-producing. They MUST avoid deleting records under legal hold.
9. Data Boundaries
The platform enforces boundaries by:
- tenant,
- slice/S-NSSAI,
- PLMN,
- region,
- NF instance,
- data class.
Every storage key, audit query, export job, and backup manifest MUST include boundary metadata. Cross-boundary export is denied by default.
10. Backups and Restore
Backups MUST preserve:
- classification metadata,
- encryption envelope metadata,
- tenant and slice boundary,
- retention policy,
- legal hold flags,
- manifest digests.
Restore MUST verify that destination tenant/slice/PLMN policy allows the data. Restoring LI or security-secret material into a different environment is denied unless an explicit recovery policy allows it.
11. Charging Records
Charging records are regulated operational records. CNFs that produce charging data MUST:
- classify records as
charging-record, - avoid raw identifiers in logs,
- use durable, auditable write path,
- support duplicate detection/idempotency,
- expose export status,
- test retention and replay behavior.
Charging exports MUST be signed or transmitted over authenticated channels.
12. Lawful Intercept Data
LI data is a special class with strict separation:
- X1 management/control material,
- X2 intercept-related information,
- X3 content/user-plane products.
LI records MUST NOT share ordinary audit, telemetry, or debug paths unless the path is explicitly LI-authorized. LI selectors and products MUST be encrypted, audited, and retained according to LI policy.
CNFs that are not LI functions MUST NOT adopt LI vocabulary for ordinary analytics or operational telemetry.
13. Analytics and Privacy
Analytics-producing CNFs, especially NWDAF, MUST implement minimization before export.
Minimization methods:
- field drop,
- coarsening,
- keyed hash,
- aggregation threshold,
- k-anonymity threshold,
- differential privacy noise where policy requires it.
The active minimization policy version MUST be recorded with each analytics export.
14. Debug and Support Bundles
Support bundles MUST:
- exclude secrets by default,
- redact subscriber identifiers,
- include manifest and classification summary,
- require authorization,
- be time-bounded,
- be audited,
- be signed or checksummed.
Debug packet captures are disabled by default and require explicit policy.
15. Configuration Model
Shared YANG groupings SHOULD include:
data-governance/classification-overridesdata-governance/retentiondata-governance/export-policydata-governance/legal-holddata-governance/redactiondata-governance/support-bundle
NF-specific YANG can refine but not bypass the baseline.
16. Observability
Required metrics:
opc_data_records_total{class,operation,outcome}opc_data_redactions_total{class,level}opc_data_retention_deletions_total{class,outcome}opc_data_legal_holds{class,state}opc_data_exports_total{class,outcome}opc_data_policy_version_info{class,version}opc_data_privacy_minimization_total{method,outcome}
Metrics MUST NOT use raw subscriber identifiers as labels.
17. Evidence Requirements
RFC 006 evidence MUST include:
- classification registry,
- retention policy report,
- redaction test report,
- export policy report,
- legal hold test report,
- privacy minimization report for analytics NFs,
- known gaps for any class not fully handled.
18. Module Ownership
| Module | Responsibility |
|---|---|
opc-data-governance | class registry, retention policy, legal-hold policy, and annotations |
opc-redaction | redaction renderers and generated metadata adapter |
opc-privacy | digesting, minimization, support bundle policy |
opc-export | signed/exported data handling |
opc-evidence | data-governance evidence reports and release gates |
opc-sdk-integration | integration tests covering redaction, retention, export, and analytics policy |
Agents implementing NF features must classify new fields before exposing logs, metrics, storage, or exports.
19. Testing Requirements
19.1 Unit Tests
- Classification coverage.
- Redaction levels.
- Keyed digest stability.
- Retention eligibility.
- Legal hold blocks deletion.
- Support bundle manifest redaction.
19.2 Integration Tests
- NF logs contain no raw SUPI/GPSI/MSISDN.
- Metrics reject high-cardinality raw labels.
- Backup/restore preserves classification.
- Export denied across tenant boundary.
- Analytics minimization records policy version.
19.3 Fault Injection
- Missing privacy digest key.
- Retention job interrupted.
- Export sink unavailable.
- Backup manifest tampered.
- Legal hold expiry during deletion.
19.4 Performance Gates
- Hot-path redaction p99 under 5 microseconds for scalar identifiers.
- Digest generation p99 under 25 microseconds.
- Retention jobs respect configured I/O budget.
- Metrics classification checks do not allocate on common paths.
20. Acceptance Criteria
This RFC is implemented when:
- Every generated and hand-written sensitive field has a data class.
- Raw subscriber identifiers do not appear in logs, metrics, traces, events, backend keys, or support bundles by default.
- Retention and legal hold policies are declarative and tested.
- Backups, restores, and exports preserve classification metadata.
- LI data is separated from ordinary telemetry and analytics.
- Analytics exports record minimization policy.
- RFC 006 evidence reports classification, redaction, retention, and privacy behavior.
OPC-SDK-RFC-011: Node and Data-Plane Resource Contract
Status: Draft for Implementation
Version: 1.0.0
Date: 2026-05-19
Audience: UPF/data-plane engineers, platform engineers, Kubernetes operators, security reviewers
1. Abstract
This RFC defines the node, kernel, NIC, CNI, CPU, memory, and pod-security contract required by OpenPacketCore data-plane and signaling-heavy CNFs. It standardizes how CNFs request and verify SR-IOV, Multus, AF_XDP, XDP/eBPF, hugepages, NUMA alignment, CPU pinning, IRQ affinity, device plugins, kernel features, and pod security exceptions.
The goal is to make data-plane performance and privilege requirements explicit, admissible, testable, and portable across carrier Kubernetes environments.
2. Scope
2.1 In Scope
- Node capability discovery.
- Kubernetes scheduling/resource requests.
- Multus and SR-IOV attachment contracts.
- AF_XDP/XDP/eBPF requirements.
- CPU pinning, NUMA, hugepages, and IRQ affinity.
- Pod security exceptions and capability minimization.
- Data-plane preflight and readiness.
- Metrics and conformance tests for platform resources.
2.2 Out of Scope
- Packet parser behavior. See RFC 005.
- Session state consistency. See RFC 004.
- Runtime task supervision. See RFC 008.
- Vendor-specific NIC tuning beyond declared capability adapters.
3. Design Goals
3.1 Security
- Grant only the minimum Linux capabilities needed by each CNF.
- Bind privileged data-plane pods to explicitly labeled nodes.
- Prevent untrusted workloads from using OpenPacketCore data-plane device resources.
- Make kernel/eBPF program loading auditable.
3.2 Performance
- Preserve CPU, cache, NUMA, NIC queue, and IRQ locality.
- Avoid noisy-neighbor interference on data-plane cores.
- Provide deterministic preflight before declaring readiness.
- Expose packet drop and queue pressure metrics.
3.3 Maintainability
- One shared contract for platform assumptions.
- Per-NF specs declare deviations through structured resource profiles.
- Device and kernel feature detection is reusable.
- CI can verify chart/resource generation without real NICs.
3.4 Functionality
- Support UPF AF_XDP fast path.
- Support ePDG/N3IWF IPsec and tunnel workloads.
- Support L4 UDP fan-in proxy.
- Support SCTP-heavy AMF/SMS/IMS workloads.
- Support lab mode without hardware acceleration.
4. Resource Profiles
Each CNF declares a resource profile:
#![allow(unused)] fn main() { pub enum DataPlaneProfile { ControlPlaneOnly, SignalingHeavy, KernelNetworking, AfXdpFastPath, SriovFastPath, IpsecGateway, } }
Profiles determine required node labels, capabilities, CNI attachments, and preflight checks.
IpsecGateway is a resource and admission profile only in the current SDK. It
does not imply that this repository ships IKEv2, ESP, xfrm orchestration, or
N3IWF/NWu procedure implementations. Those protocol crates are required for a
selected ePDG/N3IWF/untrusted-access product target, but are not a blocker for
the current AMF-lite/N2/N1 first-NF profile.
5. Node Capability Discovery
The platform MUST provide a node capability report:
node:
kernel: "6.8.0"
bpf:
cap_bpf: true
xdp_supported: true
btf_available: true
cpu:
manager_policy: static
isolated_cores: "2-15"
numa_nodes: 2
memory:
hugepages_2Mi: 4096
hugepages_1Gi: 8
nics:
- name: ens5f0
driver: ice
sriov_vfs: 16
xdp_modes: ["native", "skb"]
queues: 32
The operator or node agent MUST publish this through labels, annotations, or a custom resource.
6. Scheduling Contract
Data-plane CNFs MUST use:
- node selectors for required hardware,
- tolerations for dedicated nodes,
- pod anti-affinity where replicas need failure-domain separation,
- topology spread constraints,
- resource requests/limits matching CPU Manager static policy,
- hugepage requests where required,
- device plugin resource requests for SR-IOV or specialized devices.
The operator MUST reject a lifecycle CR if no eligible node can satisfy the declared profile, unless lab mode allows software fallback.
7. CPU and NUMA
7.1 CPU Pinning
Data-plane workers SHOULD run on exclusive CPUs. Management and async control tasks MUST NOT run on those same pinned data-plane CPUs.
The runtime receives an explicit CPU allocation:
#![allow(unused)] fn main() { pub struct CpuLayout { pub data_plane_cores: Vec<CpuId>, pub control_plane_cores: Vec<CpuId>, pub management_cores: Vec<CpuId>, pub numa_node: Option<NumaNodeId>, } }
7.2 NUMA Locality
NIC queues, AF_XDP UMEM, hugepages, and worker threads SHOULD be NUMA-local. Preflight MUST warn or fail according to profile when locality is broken.
7.3 IRQ Affinity
The platform SHOULD pin NIC IRQs to the correct NUMA-local cores. The CNF MUST report IRQ affinity mismatches when detectable.
8. Memory and Hugepages
CNFs using DPDK-like or AF_XDP memory pools MUST declare:
- hugepage size,
- hugepage count,
- per-queue buffer count,
- max packet size,
- headroom,
- NUMA node.
The pod MUST request hugepages explicitly. Overcommitting data-plane memory is forbidden in production profiles.
9. Network Attachments
9.1 Multus
Each data-plane interface is a named attachment:
multus:
n3:
networkAttachmentDefinition: upf-n3
interfaceName: n3
n4:
networkAttachmentDefinition: upf-n4
interfaceName: n4
n6:
networkAttachmentDefinition: upf-n6
interfaceName: n6
Canonical YANG defines interface roles; lifecycle CR values reference attachment objects only.
9.2 SR-IOV
SR-IOV profiles MUST define:
- resource name,
- VF trust/spoof-check settings,
- VLAN policy,
- link state policy,
- allowed device drivers,
- whether IPAM is static or dynamic.
The operator MUST validate that referenced SR-IOV resources are allowlisted for the NF kind.
10. AF_XDP and XDP/eBPF
AfXdpFastPath is a resource and admission profile only in the current SDK. It
does not imply that this repository ships AF_XDP sockets, UMEM management, RX/TX
rings, or packet I/O runtime support. Those crates are required for a selected
UPF or other accelerated data-plane product target, but are not a blocker for
the current AMF-lite/N2/N1 first-NF profile.
10.1 Kernel Requirements
AF_XDP fast-path profiles MUST declare:
- minimum kernel version,
- required BPF features,
- required XDP mode,
- required capabilities,
- required maps and pin paths,
- whether generic XDP fallback is allowed.
10.2 Capabilities
Allowed capabilities for AF_XDP profile:
CAP_BPFCAP_NET_ADMINCAP_NET_RAW
CAP_SYS_ADMIN is forbidden in production profiles. If a kernel requires
CAP_SYS_ADMIN, the node is not eligible.
10.3 eBPF Program Governance
eBPF programs MUST be:
- built from source in release pipeline,
- included in SBOM/evidence,
- signed or digest-pinned,
- loaded only from approved paths,
- audited on load/unload,
- pinned under controlled bpffs path.
11. Pod Security Exceptions
Baseline pod security remains:
- run as non-root,
- read-only root filesystem,
- no privilege escalation,
- drop all capabilities except explicit allowlist,
- seccomp profile enabled,
- AppArmor/SELinux profile where supported.
Every exception MUST be declared in:
- per-NF spec,
- Helm values,
- operator admission policy,
- RFC 006 evidence.
12. Data-Plane Preflight
Before readiness, data-plane CNFs MUST verify:
- required interfaces exist,
- link state is up where required,
- MTU matches config,
- NIC driver and queues match profile,
- XDP attach succeeded,
- BPF maps created,
- hugepages allocated,
- CPU layout applied,
- session table initialized,
- drop counters accessible.
Failures mark readiness false and emit alarms.
13. Lab and Fallback Modes
Lab mode MAY use:
- veth instead of SR-IOV,
- generic XDP instead of native XDP,
- software packet path,
- relaxed CPU pinning,
- no hugepages.
Lab fallback MUST be visible in status and MUST NOT be silently used in production.
14. Observability
Required metrics:
opc_node_capability_info{node,kernel,profile}opc_dataplane_interface_up{nf,interface}opc_dataplane_rx_packets_total{nf,interface}opc_dataplane_tx_packets_total{nf,interface}opc_dataplane_drops_total{nf,interface,reason}opc_dataplane_queue_fill_ratio{nf,interface,queue}opc_dataplane_xdp_attach_total{nf,outcome}opc_dataplane_bpf_map_entries{nf,map}opc_dataplane_numa_mismatch{nf}opc_dataplane_irq_affinity_mismatch{nf}
15. Configuration Model
Shared YANG groupings SHOULD include:
resources/cpuresources/numaresources/hugepagesresources/interfacesresources/xdpresources/sriovresources/preflight
Lifecycle CRDs reference Kubernetes resource names; dense tuning lives in YANG.
16. Module Ownership
| Module | Responsibility |
|---|---|
opc-node-capabilities | node feature report parser/model |
opc-resource-admission | operator resource validation |
opc-cpu-layout | CPU/NUMA layout helpers |
opc-net-attach | Multus/SR-IOV model helpers |
opc-af-xdp-platform | AF_XDP preflight and map metadata |
opc-bpf-governance | BPF artifact digest/load audit |
opc-resource-testkit | fake node capabilities and chart tests |
Agents implementing UPF or similar CNFs must consume these modules rather than hard-coding node assumptions.
17. Testing Requirements
17.1 Unit Tests
- Node capability parsing.
- Resource profile validation.
- CPU layout validation.
- SR-IOV allowlist policy.
- Capability exception rendering.
- Lab fallback status.
17.2 Integration Tests
- Helm renders correct resource requests.
- Operator rejects unsatisfied node profile.
- AF_XDP preflight succeeds with fake capabilities.
- Production profile rejects
CAP_SYS_ADMIN. - Readiness false when required interface is missing.
17.3 Fault Injection
- XDP attach failure.
- Hugepage allocation failure.
- NIC link down.
- NUMA mismatch.
- IRQ affinity mismatch.
- Device plugin resource unavailable.
17.4 Performance Gates
- Preflight completes within configured startup budget.
- Data-plane metrics scrape does not stall packet workers.
- Resource admission for 1,000 CNF CRs stays within operator API budget.
18. Acceptance Criteria
This RFC is implemented when:
- Data-plane CNFs declare structured resource profiles.
- Operator admission rejects unsatisfied production resource requirements.
- CPU, NUMA, hugepage, NIC, and CNI assumptions are explicit.
- AF_XDP/eBPF programs are governed by signed/digest-pinned artifacts.
- Pod security exceptions are minimal and evidence-linked.
- Readiness depends on data-plane preflight.
- Lab fallback cannot silently enter production.
OPC-SDK-RFC-012: Common Testbed, Simulator, and Scenario Framework
Status: Draft for Implementation
Version: 1.0.0
Date: 2026-05-19
Audience: test engineers, NF implementers, conformance owners, SREs
1. Abstract
This RFC defines the shared OpenPacketCore testbed and simulator framework. It standardizes reusable peer simulators, virtual time, traffic scenarios, protocol fixtures, conformance packs, chaos hooks, load profiles, and evidence output.
The purpose is to prevent every CNF from building isolated mocks that cannot compose into end-to-end 5G scenarios. The framework lets multiple contributors implement NFs independently while verifying them against the same scenario language and peer behavior.
2. Scope
2.1 In Scope
- Peer simulators for UE, gNB, AMF, SMF, UPF, NRF, AUSF, UDM, PCF, NSSF, SCP, SEPP, SMSC, and other core peers.
- Protocol fixture management and PCAP replay.
- Virtual time and deterministic timers.
- Scenario DSL.
- Conformance scenario packs.
- Load and soak profiles.
- Chaos and fault injection hooks.
- Evidence output for RFC 006.
2.2 Out of Scope
- Production NF logic.
- Standards certification by external bodies.
- Full radio access network simulation beyond interfaces required for core testing.
3. Design Goals
3.1 Security
- Test secrets must be synthetic and clearly marked.
- Fixtures containing real subscriber data are forbidden.
- Negative tests must cover malformed and hostile peer behavior.
- Testbed artifacts must not weaken production code paths.
3.2 Performance
- Simulators must support both deterministic unit-scale tests and high-rate load tests.
- Virtual time should make timer-heavy procedures fast and deterministic.
- Load profiles must be reproducible.
3.3 Maintainability
- One scenario DSL across all CNFs.
- Reusable protocol fixtures and peer simulators.
- Test evidence links back to RFC 006 requirement IDs.
- Each simulator has a documented fidelity level.
3.4 Functionality
- Support component, integration, end-to-end, conformance, chaos, and performance testing.
- Support both in-process and Kubernetes-deployed test modes.
- Support golden traces and expected state assertions.
4. Crate and Tooling Layout
crates/opc-testbed/
src/
lib.rs
scenario.rs
virtual_time.rs
assertions.rs
fixtures.rs
pcap.rs
load.rs
evidence.rs
chaos.rs
simulators/
nrf.rs
amf.rs
smf.rs
upf.rs
epc.rs
gnb.rs
ue.rs
ausf.rs
udm.rs
pcf.rs
nssf.rs
scp.rs
sepp.rs
Each NF MAY also provide opc-<nf>-testkit, but NF testkits SHOULD build on
opc-testbed.
5. Scenario DSL
Scenarios are declarative:
id: AMF-REG-001
title: UE registration success
requirements:
- REQ-3GPP-TS23502-R17-4.2.2-001
topology:
nfs:
amf: { image: opc-amf:test }
nrf: { simulator: nrf-basic }
ausf: { simulator: ausf-5g-aka }
udm: { simulator: udm-auth-sdm }
steps:
- send_ngap:
from: gnb-1
to: amf
message: InitialUEMessage.registration_request
- expect_sbi:
from: amf
to: ausf
operation: Nausf_UEAuthentication.Authenticate
- expect_ngap:
from: amf
to: gnb-1
message: InitialContextSetupRequest
assertions:
- amf.ue_context.state == REGISTERED
The DSL MUST be versioned and schema-validated.
6. Simulator Fidelity Levels
| Level | Meaning |
|---|---|
stub | fixed responses only |
stateful-mock | protocol-aware state machine, simplified |
procedure-faithful | follows normative procedure enough for conformance |
load-model | optimized for traffic generation |
adversarial | emits malformed, delayed, duplicated, or hostile behavior |
Every simulator MUST declare its fidelity level per interface.
7. Virtual Time
The testbed MUST provide a virtual clock compatible with RFC 008 runtime clocks.
Use cases:
- NAS timers,
- PFCP heartbeat,
- NRF heartbeat,
- retry/backoff,
- session lease expiry,
- SMS retry/expiry,
- retention jobs.
Tests MUST NOT sleep real time for long protocol timers when virtual time can advance deterministically.
8. Protocol Fixtures and PCAP
Fixtures MUST include:
- source standard reference,
- release/version,
- generation tool or capture provenance,
- whether synthetic or captured,
- sanitization status,
- expected decode result,
- linked requirement IDs.
Real customer/subscriber captures are forbidden in the public repository.
PCAP replay MUST support:
- timestamp-preserving mode,
- accelerated mode,
- deterministic mode,
- packet mutation for fuzz-style tests.
9. Peer Simulators
Minimum simulator set:
- UE/NAS procedure driver.
- gNB/NGAP over SCTP driver.
- NRF SBI simulator.
- AUSF/UDM auth and subscription simulators.
- SMF/UPF/PFCP simulator pair.
- EPC and untrusted-access peer skeletons such as PGW S2b and Diameter metadata peers. These must consume SDK protocol-crate decoded views and must not introduce local product parsers.
- PCF policy simulator.
- NSSF slice selection simulator.
- SCP routing simulator.
- SEPP partner simulator.
- SMSC/SMSF/SMPP simulators.
Simulators MUST expose deterministic state assertions.
10. Test Modes
| Mode | Purpose |
|---|---|
in-process | fast component integration |
multi-process | local network behavior |
kind | Kubernetes operator/chart validation |
hardware-lab | SR-IOV/AF_XDP/real NIC validation |
chaos | failure injection |
soak | long-running reliability |
The same scenario SHOULD run in multiple modes where practical.
11. Fault Injection
Faults:
- packet loss,
- reordering,
- duplication,
- malformed protocol messages,
- delayed responses,
- peer restart,
- NRF outage,
- token expiry,
- backend timeout,
- clock skew,
- node drain,
- network partition.
Faults MUST be declarative in scenarios and evidence-linked.
12. Load Profiles
Load profiles define:
- arrival distribution,
- subscriber population,
- slice distribution,
- DNN distribution,
- session duration,
- mobility/handover rate,
- message mix,
- target throughput,
- duration,
- pass/fail SLOs.
Profiles MUST be reproducible from seeds.
13. Assertions
Assertions may target:
- protocol messages,
- SBI calls,
- config state,
- session store records,
- metrics,
- logs,
- traces,
- alarms,
- Kubernetes status,
- evidence output.
Assertions MUST avoid depending on nondeterministic ordering unless explicitly marked.
14. Evidence Output
Each scenario run emits:
{
"scenario_id": "AMF-REG-001",
"requirements": ["REQ-..."],
"mode": "kind",
"seed": 1234,
"artifacts": ["trace.json", "metrics.prom", "events.json"],
"outcome": "pass"
}
RFC 006 consumes these records for conformance reports.
15. Security and Privacy Rules
The testbed MUST:
- generate synthetic subscriber identities,
- mark all test keys as non-production,
- reject fixture import without sanitization metadata,
- prevent real bearer tokens from being stored in artifacts,
- redact logs and traces through RFC 010 redaction.
16. Module Ownership
| Module | Responsibility |
|---|---|
opc-testbed-scenario | DSL schema, parser, executor |
opc-testbed-time | virtual clock and timer control |
opc-testbed-fixtures | fixture registry and provenance |
opc-testbed-pcap | PCAP replay and mutation |
opc-testbed-sim-nrf | NRF simulator |
opc-testbed-sim-ran | UE/gNB/NAS/NGAP drivers |
opc-testbed-sim-sbi | generic SBI producer/consumer mock |
opc-testbed-chaos | failure injection |
opc-testbed-evidence | RFC 006 result emission |
Agents implementing a new NF must add scenarios before declaring conformance.
17. Testing Requirements
17.1 Unit Tests
- DSL schema validation.
- Virtual time advancement.
- Fixture provenance validation.
- Deterministic seed behavior.
- Assertion engine.
17.2 Integration Tests
- Scenario runs against fake NF.
- Mock NRF discovery and token flow.
- PCAP replay into protocol parser.
- Kind-mode lifecycle install and readiness.
- Evidence JSON emitted and validated.
17.3 Fault Injection Tests
- Peer timeout.
- Malformed message.
- Duplicate message.
- Clock skew.
- Node drain in kind.
- Backend outage.
17.4 Performance Gates
- In-process scenarios start under 100 milliseconds.
- Virtual-time timer tests avoid long real sleeps.
- Load generator reports achieved TPS and latency.
- Scenario artifacts remain within configured size budgets.
18. Acceptance Criteria
This RFC is implemented when:
- A versioned scenario DSL exists.
- Shared peer simulators cover core 5G procedures.
- Virtual time is integrated with runtime/test clocks.
- Fixtures carry provenance and sanitization metadata.
- Scenarios emit RFC 006 evidence records.
- NF testkits build on the shared framework.
- Conformance and chaos scenarios are reusable across local and Kubernetes modes.
OPC-SDK-RFC-013: Fault Management and Alarm Substrate
Status: Draft for Implementation
Version: 1.0.0
Date: 2026-05-19
Audience: SREs, NF implementers, operator authors, observability engineers
1. Abstract
This RFC defines the OpenPacketCore fault management and alarm substrate. It standardizes alarm identity, severity, probable cause, affected object, raise/update/clear semantics, deduplication, suppression, correlation, Kubernetes condition mapping, gNMI/NETCONF notification projection, external fault-management sink integration, and evidence requirements.
Metrics, logs, and traces describe behavior. Alarms describe actionable service faults. Carrier CNFs need both.
2. Scope
2.1 In Scope
- Alarm model and lifecycle.
- Severity and probable-cause taxonomy.
- Affected-object naming.
- Raise, update, clear, acknowledge, suppress.
- Alarm correlation and deduplication.
- Mapping to Kubernetes conditions and events.
- Mapping to gNMI/NETCONF notifications.
- External FM sink integration.
- Alarm metrics, audit, and tests.
2.2 Out of Scope
- Full OSS/BSS ticketing implementation.
- Vendor-specific FM protocols unless implemented as adapters.
- Raw log aggregation.
- Performance SLO alerting rules outside CNF-generated alarms.
3. Design Goals
3.1 Security
- Alarms must not leak secrets or raw subscriber identifiers.
- Alarm administration must be authorized.
- Suppression and acknowledgement are audited.
- LI/security alarms must preserve regulated handling boundaries.
3.2 Performance
- Raising an alarm must be cheap and non-blocking.
- Alarm storms must be deduplicated and rate-limited.
- External sink outages must not block packet or request handling.
3.3 Maintainability
- One alarm vocabulary across all CNFs.
- Stable alarm IDs and probable causes.
- Generated YANG notification projection.
- Shared testkit for alarm lifecycle.
3.4 Functionality
- Support active and historical alarms.
- Support severity changes.
- Support clear conditions.
- Support suppression windows.
- Support external sinks and local query.
4. Alarm Model
#![allow(unused)] fn main() { pub struct Alarm { pub alarm_id: AlarmId, pub alarm_type: AlarmType, pub severity: Severity, pub probable_cause: ProbableCause, pub affected_object: AffectedObject, pub tenant: Option<TenantId>, pub slice: Option<Snssai>, pub region: Option<RegionId>, pub text: RedactedText, pub details: AlarmDetails, pub raised_at: Timestamp, pub updated_at: Timestamp, pub cleared_at: Option<Timestamp>, pub correlation_id: Option<CorrelationId>, } }
AlarmId MUST be stable for the same active fault instance.
5. Severity
Severity levels:
| Severity | Meaning |
|---|---|
critical | service outage, data loss, security boundary failure |
major | serious degradation or redundancy loss |
minor | limited impairment with workaround |
warning | approaching fault or policy exception |
indeterminate | fault detected but impact unknown |
cleared | fault no longer active |
Severity mapping MUST be consistent across CNFs.
6. Probable Cause Taxonomy
The SDK maintains a versioned taxonomy:
config-apply-failedconfig-drift-detectedcertificate-expiringcertificate-expiredidentity-unavailableauthorization-policy-invalidsession-store-unavailablelease-lostbackend-timeoutnrf-unreachablesbi-overloadpeer-unreachablepacket-drop-thresholddataplane-preflight-failedstorage-corruptionaudit-chain-invalidkey-unavailableli-delivery-failedcharging-export-failedprivacy-policy-violation
Per-NF causes may be added but MUST be namespaced.
7. Affected Object
Affected objects use structured names:
#![allow(unused)] fn main() { pub enum AffectedObject { NfInstance { kind: NfKind, instance: InstanceId }, Interface { nf: InstanceId, name: String }, Peer { nf: InstanceId, peer_id: String }, SessionStore { nf: InstanceId, shard: Option<String> }, Slice { snssai: Snssai }, Tenant { tenant: TenantId }, Certificate { key_id: KeyId }, DataPlaneQueue { nf: InstanceId, interface: String, queue: u16 }, } }
Raw subscriber identifiers MUST NOT be affected-object names.
8. Alarm Lifecycle
States:
raisedupdatedacknowledgedsuppressedclearedexpired
Lifecycle rules:
- A repeated raise with same dedup key updates the active alarm.
- Clear requires a matching active alarm or creates a no-op metric.
- Acknowledgement does not clear.
- Suppression does not delete history.
- Severity downgrade is an update, not clear plus raise.
9. Deduplication and Correlation
Dedup key:
alarm_type || probable_cause || affected_object || tenant || slice
Correlation groups related alarms, such as:
- NRF unavailable causing SBI discovery failures.
- certificate expiry causing mTLS failures.
- session store outage causing lease lost alarms.
Correlation MUST NOT hide critical alarms; it only helps presentation.
10. Suppression
Suppression may be:
- maintenance window,
- known outage,
- test mode,
- dependency alarm correlation.
Suppression requires authorization and audit. Security-critical alarms SHOULD not be suppressible unless carrier policy explicitly allows it.
11. Storage
The alarm store MUST support:
- active alarm query,
- historical alarm query,
- append-only lifecycle events,
- bounded retention,
- tenant/slice filtering,
- tamper-evident audit for admin actions.
Local storage may use RFC 001 persistence for management alarms. High-volume alarm history SHOULD be exported to an external FM system.
12. Projection to Kubernetes
Alarms map to Kubernetes Conditions and Events:
- critical/major active alarms can drive
Ready=FalseorDegraded=Trueaccording to NF policy, - warning alarms usually do not change readiness,
- clear events update conditions when no other active alarm holds the state.
Condition reason strings MUST be stable.
13. Projection to gNMI/NETCONF
The alarm subsystem MUST expose:
- active alarms operational tree,
- alarm history operational tree,
- notifications for raise/update/clear,
- authorized acknowledge/suppress operations.
YANG notification generation SHOULD use RFC 002 metadata and RFC 006 evidence tags.
14. External FM Sinks
Sink adapters:
- webhook,
- Kafka/NATS,
- OpenTelemetry events,
- SNMP/NETCONF adapter where needed,
- carrier OSS adapter.
External sink failure MUST:
- raise a sink alarm,
- buffer within limits if policy allows,
- never block fast paths,
- expose drop counters.
15. Alarm Sources
Common sources:
- RFC 001 config commit failures,
- RFC 003 identity/key/cert failures,
- RFC 004 session store and lease failures,
- RFC 007 SBI overload/discovery failures,
- RFC 008 runtime task failures,
- RFC 009 lifecycle migration failures,
- RFC 011 data-plane preflight and drop thresholds,
- RFC 010 privacy/legal-hold/export failures.
16. Observability
Required metrics:
opc_alarm_active{severity,cause}opc_alarm_events_total{event,severity,cause}opc_alarm_suppressed_total{cause}opc_alarm_sink_delivery_total{sink,outcome}opc_alarm_sink_queue_depth{sink}opc_alarm_clear_without_active_total{cause}
Alarm text MUST be redacted through RFC 010.
17. Configuration Model
Shared YANG groupings SHOULD include:
alarms/severity-policyalarms/suppressionalarms/sinksalarms/retentionalarms/readiness-impactalarms/correlation
Per-NF YANG may add alarm thresholds, such as packet drop ratio or peer outage duration.
18. Module Ownership
| Module | Responsibility |
|---|---|
opc-alarm-model | alarm structs, severity, causes |
opc-alarm-store | active/history store |
opc-alarm-manager | raise/update/clear/dedup |
opc-alarm-policy | suppression and readiness impact |
opc-alarm-k8s | condition/event mapping |
opc-alarm-yang | gNMI/NETCONF operational projection |
opc-alarm-sink | external sink adapters |
opc-alarm-testkit | alarm lifecycle fixtures |
Agents adding new alarms must add taxonomy entries, tests, and evidence tags.
19. Testing Requirements
19.1 Unit Tests
- Dedup key stability.
- Severity transition.
- Clear behavior.
- Suppression authorization.
- Redaction.
- Readiness impact policy.
19.2 Integration Tests
- Runtime task failure raises alarm.
- Alarm maps to Kubernetes condition.
- Alarm notification appears on gNMI subscription.
- External sink receives raise/update/clear.
- Sink outage buffers or drops according to policy.
19.3 Fault Injection
- Alarm storm.
- Sink outage.
- Store unavailable.
- Unauthorized suppression attempt.
- Duplicate raise from many tasks.
19.4 Performance Gates
- Alarm raise common path does not block longer than 100 microseconds.
- Alarm storm of 10,000 duplicate events deduplicates without unbounded memory.
- External sink outage does not impact protocol request p99.
20. Acceptance Criteria
This RFC is implemented when:
- Every CNF uses shared alarm model and manager.
- Alarm severity and probable cause taxonomy are stable and versioned.
- Raise/update/clear semantics are deterministic.
- Kubernetes conditions and events are derived consistently.
- gNMI/NETCONF alarm operational state and notifications are available.
- Suppression and acknowledgement are authorized and audited.
- External sink failures do not block service paths.
- Alarm behavior is covered by shared testkit and evidence.
Architecture Decision Records
This directory contains accepted and proposed architecture decisions for the OpenPacketCore SDK hardening and management-plane work.
ADRs are the durable record of architectural intent. The audit completion reports and implementation status matrix record what was validated; these ADRs record why the shape of the SDK is what it is. Proposed ADRs are included here when they gate in-progress work, but they do not authorize implementation until accepted.
Index
| ADR | Decision |
|---|---|
| 0001 | Config management is secure by default, commit-confirmed, audited, and explicitly authorized. |
| 0002 | Config persistence HA uses ConsensusConfigStore with Raft-style quorum safety, authenticated transport, durable membership, and snapshot integrity. |
| 0003 | Authoritative session state uses quorum ordered-log replication with majority-supported repair, not standalone SQLite HA. |
| 0004 | Production identity, TLS, keys, and audit integrity are explicit SDK substrates with fail-closed adapters. |
| 0005 | Runtime health, admin/probe routes, metrics, and alarms are shared SDK surfaces with production authorization and redaction. |
| 0006 | Storage, security, runtime, HA, and release evidence are validated through fail-closed fault injection. |
| 0007 | Operator lifecycle policy logic lives in Rust SDK crates as reusable policy engines. |
| 0008 | Kubernetes operator integration is demonstrated by a Go reference harness without becoming a product CNF operator. |
| 0009 | Production data-plane claims require explicit node-resource, BPF, pod-security, and fallback validation. |
| 0010 | RFC 006 evidence, SBOM/VEX, provenance, bundle verification, performance baselines, and gates are first-class release inputs. |
| 0011 | opc-amf-lite is the SDK vertical integration proof, not a product NF. |
| 0012 | Diagnostics safety and privacy governance boundaries are structured, fail-closed, and compile-gated. |
| 0013 | NGAP requires generated ASN.1 APER code; hand-written and FFI codecs are rejected. |
| 0014 | rustls/tokio-only dependency policy, no gRPC stack in SDK crates, and a measured (not aspirational) MSRV. |
| 0015 | Protocol codecs are proven against spec-authored byte fixtures, never only their own encoder output. |
| 0016 | (proposed) tonic/prost are permitted only for opc-gnmi-server as the ADR 0014 §3 exception; core SDK crates stay gRPC-free. |
| 0017 | Explicitly allowlisted Linux kernel UAPI sys crates, including opc-libsctp-sys and opc-linux-xfrm-sys, hold all unsafe UAPI FFI; this OS-transport exception to ADR 0014 §8 does not reopen ADR 0013's rejection of foreign C codec FFI. |
| 0018 | EPC and untrusted-access additions are limited to SDK-owned reusable mechanisms; product policy, deployment defaults, ePDG orchestration, and carrier-readiness claims remain product-owned. |
ADR 0001: Secure Config Management
Status
Accepted
Date
2026-06-08
Context
The SDK exposes shared configuration management primitives that downstream CNFs will use for production configuration changes. Early helper APIs made it too easy to wire allow-all authorization or treat commit-confirmed behavior as a test-only convention.
For carrier deployments, configuration writes must be explicit, authorized, recoverable, and auditable. Pending configuration must either be confirmed before its deadline or roll back to a confirmed point without silently accepting unsafe state.
Decision
Configuration management is secure by default:
- Production-facing
ConfigBusconstructors require an explicitConfigAuthorizer. - Allow-all construction is limited to clearly named dev/test helpers.
- Commit-confirmed state is persisted durably with deadline metadata.
- Expired pending commits roll back to a previous confirmed configuration.
- Failed rollback or failed confirmation fences the bus into recovery-required state instead of allowing further writes.
- Configuration audit records are persisted after redaction and protected by a hash chain/HMAC.
Consequences
Downstream CNFs must provide an authorization adapter rather than relying on SDK defaults. Tests can still use dev-only allow-all constructors, but production call sites are visibly different.
Rollback and recovery behavior is now part of the SDK contract. Operators can recover from failed commits, but they cannot pretend a pending or failed commit is a confirmed production state.
Evidence
crates/opc-config-bus/src/lib.rscrates/opc-persist/src/backend.rscrates/opc-persist/tests/persist.rsdocs/implementation-status.md
ADR 0002: Config Store Consensus HA
Status
Accepted
Date
2026-06-08
Context
Single-node SQLite persistence is not acceptable for carrier HA configuration claims. The SDK needed a production HA config persistence path with leader fencing, majority commit behavior, restart recovery, and authenticated transport. It also needed to make clear that standalone SQLite remains a development, lab, conformance, or explicitly accepted edge/single-replica profile.
Decision
High-availability configuration persistence is provided by
ConsensusConfigStore.
The consensus backend uses:
- Durable cluster membership and node identity checks.
- Leader election, current-term no-op gating, and majority write commitment.
- Linearizable read verification instead of follower-local reads.
- Authenticated mTLS/SPIFFE transport using shared identity/TLS substrates.
- Controlled TCP server lifecycle with bounded concurrency, read timeouts, and explicit shutdown.
- Snapshot persistence and HMAC verification.
- Non-voter catch-up and promotion guards for membership changes.
- Metrics and chaos/failover tests for partitions, restart, rejoin, and stale leader behavior.
Consequences
Config HA is a quorum system, not a property of SQLite. Any production claim must use the consensus backend or an equivalent adapter that satisfies the same contract.
The SDK accepts additional operational complexity so correctness is explicit: membership, certificates, node identity, quorum availability, and recovery state all become deployment responsibilities.
Evidence
crates/opc-persist/src/consensus.rscrates/opc-persist/tests/consensus_tests.rscrates/opc-persist/tests/tcp_consensus_tests.rsdocs/ha-design.mddocs/consensus-operator-runbook.md
ADR 0003: Session Store Quorum Replication
Status
Accepted
Date
2026-06-08
Context
Authoritative telecom session state cannot rely on single-node storage, wall-clock last-writer-wins, or best-effort replica repair. Session records need monotonic fencing, compare-and-set semantics, TTL handling, watch resume support, and stale replica recovery.
Decision
Authoritative session HA is implemented as quorum ordered-log replication in
QuorumSessionStore.
The session store contract includes:
- Monotonic fences and CAS for authoritative writes.
- Durable ordered replication logs for lease acquire, renew, release, CAS, delete, TTL refresh, and batch operations.
- Idempotent replay using log position, generation, fence, and transaction ID.
- Majority-supported committed-prefix repair for stale or divergent replicas.
- Watch/change-stream resume cursors.
- Partial-quorum write rollback to prevent failed writes from resurrecting during later catch-up.
- Truthful capability reporting so standalone SQLite does not claim replicated behavior.
Consequences
Standalone SqliteSessionBackend remains useful as a durable local backend,
but it is not HA. Production CNFs that need authoritative session HA must use
QuorumSessionStore or an equivalent replicated profile.
The SDK favors fail-closed reads over returning divergent session state when a majority cannot agree.
Evidence
crates/opc-session-store/src/quorum.rscrates/opc-session-store/src/sqlite.rscrates/opc-session-testkit/docs/ha-design.mddocs/operator-readiness.md
ADR 0004: Security Identity, Keying, And Audit Integrity
Status
Accepted
Date
2026-06-08
Context
The SDK needs reusable production security substrates rather than bespoke per-CNF wiring. Identity, mTLS transport, key retrieval, audit redaction, and tamper evidence must be consistent across config, session, persistence, alarm, and operator-facing paths.
Decision
Production security uses explicit shared adapters:
opc-identitywatches SPIFFE SVIDs and trust bundles.opc-tlsbuilds reloadable mTLS client/server configurations from identity material.opc-keyprovides durableKmsKeyProvideradapters over authenticated KMS transports or local Unix-socket agents.- Memory key providers remain deterministic test/conformance adapters.
- Persistence audit records redact sensitive values before storage and before hash-chain/HMAC material is calculated.
- Alarm administration uses NACM-backed authorization and durable audit sinks.
Consequences
Production deployments must supply real identity and KMS infrastructure. Unauthenticated TCP KMS and in-memory keys are not production key sources.
Security failures should fail closed and surface sanitized errors rather than leaking paths, SQL details, PEM material, keys, subscriber identifiers, or network addresses.
Evidence
crates/opc-identity/crates/opc-tls/crates/opc-key/crates/opc-persist/src/backend.rscrates/opc-alarm/src/nacm_adapter.rscrates/opc-alarm/src/persist_adapter.rs
ADR 0005: Runtime Observability And Admin Probes
Status
Accepted
Date
2026-06-08
Context
Production CNFs need consistent runtime health, readiness, metrics, alarm visibility, and debug/admin routes. These surfaces must be shared and redaction-safe, not reimplemented by each NF.
Decision
Runtime observability is a shared SDK surface:
opc-runtimeowns liveness, readiness, startup, debug, and admin route semantics.- Production and lab admin/probe/debug endpoints require bearer token authorization.
/metricsexports Prometheus text through a sharedSdkMetricsregistry.- Metrics use low-cardinality, redaction-safe labels.
- Runtime, ConfigBus, persistence, session store, NACM, and alarms report counters/gauges/histograms through the shared metrics surface.
- Runtime failures and drain failures raise SDK-managed alarms.
Consequences
Downstream CNFs should wire the SDK runtime and metrics instead of creating incompatible health/admin conventions.
Debug endpoints are production-controlled operational surfaces. They must never expose raw configs, tokens, SQL, file paths, certificate material, subscriber IDs, or other sensitive data.
Evidence
crates/opc-runtime/src/admin.rscrates/opc-runtime/src/health.rscrates/opc-redaction/src/metrics.rscrates/opc-sdk-integration/tests/observability.rsdocs/operator-readiness.md
ADR 0006: Fail-Closed Fault Injection Validation
Status
Accepted
Date
2026-06-08
Context
Happy-path tests are insufficient for SDK stability claims. Storage, KMS, SPIFFE, consensus, session replication, runtime, and evidence release gates all have failure modes where unsafe behavior can look like success unless tested directly.
Decision
The SDK validates production safety with explicit fault injection and chaos tests:
- Persistence can simulate disk full, fsync/write failure, corrupt database, corrupt WAL, failed rollback target load, failed rollback point creation, and audit-chain corruption.
- Config and session HA are tested under partitions, crashes, stale leaders, stale fences, rejoin/catch-up, split-brain healing, and partial writes.
- SPIFFE and KMS are tested under expiry, rotation, bundle removal, timeout, and unavailability.
- Runtime and admin routes are tested for authentication, malformed requests, timeouts, and redaction.
- Release gates are tested for missing evidence, malformed JSON, dirty provenance, missing signatures, tampered bundles, and unsafe evidence values.
Consequences
Test-only fault hooks are acceptable when explicitly gated and named as dangerous test hooks. Production APIs should not expose fault injection knobs.
Regression tests must prefer fail-closed assertions: no publish, no partial commit, no stale promotion, no sensitive error leak, and no unsafe readiness claim.
Evidence
crates/opc-sdk-integration/tests/fault_injection.rscrates/opc-security-testkit/crates/opc-session-testkit/crates/opc-evidence/tests/evidence_pipeline.rs
ADR 0007: Operator Lifecycle Rust Policy Core
Status
Accepted
Date
2026-06-08
Context
The SDK is not a product operator, but downstream CNF operators need common policy decisions for compatibility, admission, configuration apply, migration, drain, rollback, and fleet status. Those policy decisions should be reusable from Rust SDK code and Go Kubernetes operators.
Decision
Operator lifecycle policy lives in Rust SDK crates:
operator-lifecycleowns lifecycle phases, admission checks, compatibility matrix policy, config-apply decisions, and rollback constraints.operator-controllerowns deterministic conversion helpers, migration plan execution, drain client orchestration, and multi-cluster status aggregation.- Policy functions use structured inputs/outputs and fail closed on unknown, malformed, stale, or unsupported state.
- Error messages are sanitized before crossing operator or webhook boundaries.
Consequences
The SDK can expose consistent policy decisions to multiple operator implementations without forcing all Kubernetes code into Rust.
Rust lifecycle crates do not deploy workloads by themselves. Product CNF operators still own reconciliation of Deployments, StatefulSets, Services, protocol-specific CRDs, and live cluster behavior.
Evidence
crates/operator-lifecycle/crates/operator-controller/crates/operator-lifecycle-cli/docs/operator-readiness.md
ADR 0008: Go Reference Operator Boundary
Status
Accepted
Date
2026-06-08
Context
The original repository direction is polyglot: SDK core behavior is Rust, while
Kubernetes operator integration should use Go controller-runtime, which is the
first-class Kubernetes operator ecosystem. At the same time, this repository is
an SDK, not an AMF/SMF/UPF product operator.
Decision
The repository includes a Go reference operator harness under
operators/sdk-reference-operator.
The Go harness demonstrates:
- CRD API versions and conversion wiring.
- Validating webhook integration.
- Controller reconciliation shape and status updates.
- Kustomize/RBAC/cert-manager/manager manifests.
- A Go-to-Rust JSON CLI bridge to
operator-lifecycle-cli.
The harness is explicitly not a production CNF operator and does not encode product-specific reconciliation.
Consequences
Downstream CNF teams get a concrete Go integration pattern without importing product behavior into the SDK repository.
Reference tests use Go unit tests, fake-client controller/webhook tests, rendered Kustomize manifests, and Rust CLI contract tests. Product CNF operators must add envtest, kind, and real-cluster end-to-end tests around their own reconciliation logic.
Manager images must package both the Go manager binary and the Rust
operator-lifecycle-cli, or set OPERATOR_LIFECYCLE_CLI_PATH to a valid CLI
location.
Evidence
operators/sdk-reference-operator/crates/operator-lifecycle-cli/docs/operator-readiness.mddocs/implementation-status.md
ADR 0009: Platform Preflight Resource Contract
Status
Accepted
Date
2026-06-08
Context
Carrier CNFs often depend on CPU isolation, NUMA locality, hugepages, NIC capabilities, SR-IOV, AF_XDP/eBPF, CNI behavior, and pod-security exceptions. These assumptions cannot remain tribal knowledge or comments in deployment manifests.
Decision
Production data-plane readiness is an explicit SDK contract:
opc-node-resourcesmodels resource profiles and node capability reports.- CPU manager, topology manager, isolated/reserved CPU sets, NUMA mappings, hugepage pools, NIC capabilities, and data-plane interfaces are validated.
- AF_XDP/eBPF artifacts require digest pinning, signer/evidence identity, program type, attach point, and allowed capability checks.
- Pod-security exceptions must be minimal and evidence-linked.
- Lab/dev fallback paths fail closed in production.
- Operator admission and config-apply paths consume the preflight report.
Consequences
Production manifests must provide explicit resource profiles and node capability evidence. If evidence is absent, stale, or incompatible, the SDK policy blocks rollout instead of silently downgrading to lab behavior.
The Go reference operator projects this contract into CRD fields but does not replace product-specific operator resource management.
Evidence
crates/opc-node-resources/src/lib.rscrates/operator-lifecycle/src/admission.rscrates/operator-lifecycle/src/config_apply.rsoperators/sdk-reference-operator/api/
ADR 0010: Release Assurance Evidence Pipeline
Status
Accepted
Date
2026-06-08
Context
The SDK needs release evidence that is machine-readable and fail-closed. Manual claims like "tests passed" are insufficient for conformance, supply-chain assurance, and auditability.
Decision
opc-evidence is the RFC 006 release-assurance pipeline.
It provides:
- Source extraction for RFC 006 tags such as
@spec,@req,@conformance,@gap,@security,@performance, and@test. - Deterministic CycloneDX SBOM generation from local Cargo manifests and lock data.
- VEX policy result and record validation.
- SLSA/in-toto-style provenance tied to commit, builder, input materials, output digests, and dirty/clean worktree state.
- Bundle assembly and verification with canonical manifest signing bytes.
- Signer/verifier traits and deterministic in-process test signing.
- Performance baseline schema with redaction-safe environment metadata and regression status.
- PR/release gate policy that fails closed on missing evidence, missing signatures, tampering, mismatched commits, dirty release provenance, malformed JSON, or unsafe evidence content.
Consequences
Release pipelines must treat evidence artifacts as required inputs, not as optional reports.
Real Sigstore/Cosign keyless signing remains an external signer adapter boundary. The SDK owns the signing/verifier interface and test verifier, not a hard dependency on one hosted signing provider.
Evidence
crates/opc-evidence/src/extract.rscrates/opc-evidence/src/sbom.rscrates/opc-evidence/src/vex.rscrates/opc-evidence/src/provenance.rscrates/opc-evidence/src/bundle.rscrates/opc-evidence/src/performance.rscrates/opc-evidence/src/policy.rscrates/opc-evidence/tests/evidence_pipeline.rs
ADR 0011: First NF Vertical Proof
Status
Accepted
Date
2026-06-08
Context
The SDK needed proof that its seams compose in a real NF-shaped control-plane slice. Toy examples can validate local APIs, but they do not prove that runtime, config, session, identity, KMS, NACM, alarms, metrics, and HA recovery work together.
Decision
opc-amf-lite is the first NF vertical integration proof.
It demonstrates:
- Runtime startup and supervised workers.
- Secure ConfigBus integration.
- Consensus-backed configuration persistence.
- Quorum session storage with read-repair behavior.
- KMS-backed encryption paths.
- NACM authorization and audit.
- Alarm and metrics integration.
- HA recovery and failure validation.
opc-amf-lite is not a product AMF. It is a reusable SDK proof slice that
downstream CNFs can study when wiring their own production crates.
Consequences
The SDK can claim that its core seams compose into an NF-shaped control-plane vertical. It cannot claim complete AMF/SMF/UPF protocol coverage from this slice.
Future NF crates should follow the integration pattern but own their procedure-specific logic, protocol fidelity, and product tests.
Evidence
crates/opc-amf-lite/crates/opc-amf-lite/README.mddocs/implementation-status.mddocs/operator-readiness.md
ADR 0012: Diagnostics Safety and Privacy Governance
Status
Accepted
Date
2026-06-08
Context
Diagnostics, support bundles, exports, and evidence files pose a high risk of leaking sensitive subscriber identifiers (SUPI, IMSI, MSISDN), secrets, cryptographic credentials, database internals, and local filesystem paths. The SDK required a structured, fail-closed diagnostics and privacy boundary to satisfy RFC 010.
Decision
Establish a clear, multi-crate boundary for diagnostics safety and privacy governance:
-
Structured, Redacted Support Bundles:
- Diagnostic data is collected as structured
DiagnosticEntryvariants. - Support bundles are redacted prior to serialization using
redact_support_bundle. - The engine cleans sensitive subscriber identifiers, IPs, SPIFFE IDs, JWTs, paths, database errors, and secrets, producing a
RedactionSummary. - Unknown or unsafe attachments fail closed in Production mode.
- Diagnostic data is collected as structured
-
Declarative Retention & Legal Holds:
RetentionPolicyschema inopc-data-governancedictates retention duration, data class, and disposal action.- Policies validate durational boundaries and block deletion/disposal decisions when a legal hold flag is active.
-
Classification-Preserving Exports:
ExportedIteminopc-exportencapsulates the payload andExportMetadata.- Production validation rejects raw sensitive payloads unless they are encrypted.
-
Analytics Minimization:
MinimizationPolicyinopc-privacyenforces k-anonymity cohort sizing thresholds, binning, and subscriber ID digest hashing.- Cohorts below the threshold or direct identifiers are rejected.
-
Data-Governance Evidence Gating:
- Release gates require
DataGovernanceEvidenceReportvalidation. - The evaluator parses the report and scans it to ensure no absolute paths, credentials, or raw IPs are present.
- Release gates require
Consequences
- Diagnostic attachments and support bundles cannot silently leak raw sensitive identifiers or secrets in Production mode.
- Downstream CNFs can safely collect support bundles and perform analytics exports without violating privacy regulations.
- Data-governance compliance is automatically checked and enforced at release compile/gate time.
Evidence
crates/opc-redaction/src/support_bundle.rscrates/opc-data-governance/src/retention.rscrates/opc-export/src/lib.rscrates/opc-privacy/src/lib.rscrates/opc-evidence/src/data_governance.rscrates/opc-sdk-integration/tests/privacy_governance.rs
ADR 0013: NGAP ASN.1 Strategy
Status
Accepted — amended 2026-06 with first implementation experience
Date
2026-06-11
Context
NGAP (NG Application Protocol, 3GPP TS 38.413) is required for gNodeB↔AMF and AMF↔SMF signaling. Unlike GTP-U (fixed binary headers) or PFCP (TLV IEs), NGAP is defined in ASN.1 using APER (Aligned Packed Encoding Rules). Hand-writing an APER codec is error-prone, high-maintenance, and incompatible with the SDK's goal of spec-traceable, fuzz-safe protocol code.
The SDK currently has:
opc-protocol— zero-copy codec framework withBorrowDecode/Encodeopc-proto-gtpu— GTP-U codec following the above frameworkopc-proto-pfcp— PFCP codec (planned, TS 29.244)
NGAP is the next mandatory codec after PFCP, but its ASN.1 nature makes it structurally different from the existing binary codecs.
Decision
We will not hand-write NGAP APER parsing or code-generation.
Instead, we will evaluate and adopt a maintained Rust ASN.1 / APER toolchain that can consume the 3GPP ASN.1 modules directly. The evaluation criteria are:
- MSRV 1.81 compatibility — must compile on the SDK's declared MSRV.
- License compatibility — Apache-2.0 or MIT, no copyleft dependencies.
#![forbid(unsafe_code)]— generated and runtime code must be pure safe Rust.- Fuzzability — the generated codec must integrate with
cargo-fuzzand tolerate hostile inputs without panics. - Maintenance risk — actively maintained, responsive to security issues, ideally with existing 3GPP or telecom user base.
Options Evaluated
Option A: hampi / rasn ecosystem
- hampi (GitHub:
repnop/hampi) — ASN.1 compiler generating Rust structs with APER/UPER/OER support. - rasn (GitHub:
XAMPPRocky/rasn) — runtime ASN.1 codec library with derive macros.
Pros: Pure Rust, no_std capable, active development, Apache-2.0.
Cons: hampi's APER support is partial (v0.x); no proven 3GPP NGAP corpus
yet; smaller community than protobuf alternatives.
Verdict: Leading candidate. Requires a spike to compile 3GPP R18 NGAP ASN.1
modules and validate against known-good PCAPs.
Option B: Generated code from asn1-codecs (ERI framework)
The asn1-codecs family (used by some telecom OSS projects) generates Rust
from ASN.1 via an intermediate representation.
Pros: Explicitly designed for telecom ASN.1 modules. Cons: Mixed maintenance status; some forks carry unsafe code; licensing unclear on some forks; heavy dependency tree. Verdict: Fallback if Option A fails the spike. Requires legal review of upstream license before adoption.
Option C: FFI to srsRAN / OAI C NGAP codec
Reuse the established C NGAP implementations from srsRAN or OpenAirInterface.
Pros: Battle-tested against live networks; spec-complete.
Cons: FFI requires unsafe blocks, violating the SDK's #![forbid(unsafe_code)]
invariant. Cross-compilation for musl/target environments adds complexity.
Memory-safety bugs in C code become SDK security issues.
Verdict: Rejected. The forbid(unsafe_code) constraint is architectural and
non-negotiable for a carrier-grade CNF security substrate.
Option D: Hand-written subset
Implement only NGSetupRequest/Response and InitialUEMessage by hand and omit the rest.
Pros: Zero new dependencies; full control over decode limits and fuzzing. Cons: Maintenance nightmare on every 3GPP release; no spec-traceability to ASN.1 modules; high bug rate. Verdict: Rejected. The SDK explicitly rejected hand-written ASN.1 for NGAP at the architecture level.
Recommendation
Proceed with Option A (hampi/rasn).
Phased plan:
- Spike (v0.2.x follow-up): Compile 3GPP R18 NGAP ASN.1 modules with
hampi/rasn, generate structs, and validate against a small corpus of known-good NGAP PDUs (extracted from 3GPP test specifications oropc-testbedfixtures). - Subset crate (v0.3.0): Create
opc-proto-ngapwrapping onlyNGSetupRequest/ResponseandInitialUEMessageto prove the integration pattern withopc-protocol's decode-context limits. - Full message surface (v0.4.0+): Expand to the full NGAP message and IE surface required by the AMF-lite reference implementation.
Consequences
- The SDK gains a maintainable, spec-traceable NGAP codec path.
- Downstream NF operators must accept a generated-code dependency (acceptable given the alternative of FFI or hand-written bugs).
- If
hampi/rasnfails the spike, we fall back to Option B with a license review gate.
Implementation experience (2026-06)
The first opc-proto-ngap attempt followed the phased plan and stalled at
step 1 on toolchain compatibility, not on the codec approach itself:
rasn(0.22 and 0.25) failed the then-declared MSRV of 1.81. Its derive implementation transitively requiresuuid ^1.11, which resolves to agetrandomrelease whose manifest usesedition2024— unparseable by Cargo 1.81. No pinning escape existed withinrasn's requirements.- Investigating the failure exposed that the workspace's own dependency
graph had already drifted past MSRV 1.81 through the same
getrandomrelease (reached viauuid,tempfile, andquickcheck), i.e. the MSRV declaration no longer reflected reality independent of NGAP. hampiwas not pursued: no meaningful release since 2021 and its APER encoder was still marked work-in-progress then — unacceptable abandonment risk for a protocol codec.
Consequences acted on:
- The workspace MSRV was raised to 1.88, the actual floor of the
resolved dependency graph (set by
time;edition2024support needs ≥ 1.85, theicustack ≥ 1.86). This repairs the MSRV gate and removes the blocker on Option A. See ADR 0014 for the toolchain/dependency policy. - The Option A spike should be re-run against
rasnon the raised MSRV before any consideration of Option B (asn1-codecs, which still carries its license-review gate per the comparison above).
Evidence
- Gap register updated:
GAP-PROTO-003now records the partially closed codec boundary. docs/implementation-status.mdlinked.
ADR 0014: Dependency and Toolchain Policy
Status
Accepted (amended 2026-06-12: crypto-provider scope and JWT backend, point 9)
Date
2026-06-11
Context
The SDK is the foundation for downstream CNFs with carrier security and audit requirements. Every dependency the workspace takes is inherited by every downstream NF, and several incidents during development showed that implicit policy does not survive contact with routine maintenance:
- The declared MSRV silently drifted out of truth: routine lockfile updates
pulled a
getrandomrelease whose manifest requiresedition2024, unparseable by the Cargo version the workspace claimed to support — and the breakage reached the graph through three independent parents (uuid,tempfile,quickcheck), one of them in the production graph. - An HTTP adapter was nearly built on a second client stack when the workspace already standardized on one.
- A license gate failure appeared days after the dependency that caused it, because the gate's evidence had been captured before the dependency landed.
Decision
- TLS: rustls only. No
openssl/native-tlsanywhere in the graph, including transitively via feature defaults (disabledefault-featureswhere needed). Rationale: a single auditable TLS stack and reproducible cross-compilation, with no coupling to a system OpenSSL/native-tlslibrary (dynamic linking, version skew). This rule targets system/dynamic crypto; vendored crypto built statically from source as part of the graph (e.g.ring,aws-lc-sys) is permitted — see point 9. - Async runtime: tokio only. No second runtime, no runtime-agnostic abstraction layers.
- No gRPC stack (
tonic/prost) in SDK crates. Internal transports (e.g. session replication) use hand-specified framing over the existing tokio/rustls stack; external 3GPP interfaces are HTTP/2 (hyper) or raw protocol codecs. A future exception requires an ADR, not a Cargo.toml edit. (An ASN.1 codec dependency for NGAP per ADR 0013 is the kind of exception that warrants that process.) - HTTP clients:
hyperis the workspace HTTP stack.reqwest(rustls-backed, built on hyper) is tolerated in leaf adapter crates (currentlyopc-key-vault) but must not spread into core crates. - MSRV is the measured floor of the resolved graph, not an aspiration.
Currently 1.88 (set by
time). The CImsrvjob compiles the whole workspace (--all-targets --all-features) on exactly the declared version; a lockfile update that raises the floor must raiserust-version, this ADR's record, and the contributor docs in the same change. Raising MSRV is acceptable for a pre-1.0 SDK; lying about it is not. - Licenses: Apache-2.0/MIT/BSD-family only, enforced by
cargo denywith a curated allow-list; uncommon-but-permissive licenses are admitted as per-crate exceptions indeny.toml, never as global allows. - Every new dependency is justified in the PR description (what it replaces, why the existing stack cannot serve, license, MSRV impact).
unsafe_code = "forbid"is workspace-wide and non-negotiable, which also rules out FFI-based protocol libraries (see ADR 0013).- Cryptographic providers. rustls uses the
ringprovider for TLS;opc-sbi'sjsonwebtokenuses theaws_lc_rsbackend for JWT-SVID signature verification. Both are vendored, statically-built crypto (no system OpenSSL), consistent with point 1.aws_lc_rsis chosen overjsonwebtoken's pure-Rustrust_cryptobackend because the latter pulls thersacrate, which carries RUSTSEC-2023-0071 (the "Marvin" timing sidechannel) with no fixed release available upstream. That advisory is unreachable for our verify-only (public-key) usage — the SDK never holds or decrypts with an RSA private key — butaws_lc_rsis constant-time and keeps both security gates (cargo audit,cargo deny) green without a standing advisory exception, which matters for a security SDK whose advisory surface is inherited by every downstream consumer. Future goal: migrate JWT verification to the pure-Rustrust_cryptobackend once thersacrate ships a constant-time release (its in-progresscrypto-bigintmigration), dropping theaws-lc-sys/cmakebuild step and fully satisfying the pure-Rust ideal.
Consequences
- Some integrations cost more to build (hand-rolled framing instead of tonic; hyper plumbing instead of convenience clients) in exchange for a dependency graph that downstream carriers can audit once and trust.
- MSRV moves forward with the ecosystem rather than pinning old dependency lines; downstream consumers should track a recent stable toolchain.
scripts/publish-order.py --checkandcargo deny checkare the mechanical halves of this policy; this ADR is the rationale they enforce.
ADR 0015: Protocol Codec Conformance Policy
Status
Accepted
Date
2026-06-11
Context
The SDK ships wire codecs for 3GPP protocols (GTP-U, PFCP, NAS-5GS, with NGAP planned). Codec bugs are uniquely dangerous: an encoder and decoder written by the same hand are internally consistent, so round-trip tests pass perfectly while every byte on the wire is wrong for a real peer. This failure mode occurred twice during development — a scrambled PFCP header flag layout and a byte-swapped Outer Header Creation description field — and in both cases the existing test suite was green because the fixtures had been derived from the codec's own output.
Decision
Every protocol codec crate (opc-proto-*) MUST satisfy all of the
following before it is merged, and CONFORMANCE.md must claim nothing the
tests do not prove:
- Spec-authored fixtures. Conformance tests include byte fixtures hand-authored from the 3GPP specification (or captured from an independent implementation), with octet-level comments citing the spec section. Fixtures derived from this codec's own encoder do not count as conformance evidence — they detect regressions, not wire-format errors.
- Byte-exact round-trips.
decode → encodemust reproduce the input bytes exactly for every fixture, including unknown/vendor-extension elements, which must be preserved raw. - Declared canonicalization. Where a typed view legitimately normalizes (zeroing spare bits, dropping forward-compatibility trailing octets that the spec requires receivers to ignore), CONFORMANCE.md must say so explicitly, and a raw byte-preserving layer must remain available for forwarding paths.
- Hostile-input safety. No panics on any input: checked arithmetic on all length/offset math, enforced decode limits (message length, element count, recursion depth), and negative tests for truncation, overflow, and depth bombs.
- Fuzzing. A fuzz target over the decode surface with a seed corpus of spec-valid messages, registered in the fuzz CI workflow. The fuzz crate must compile in CI even when fuzzing is not executed.
- Framework fit. Codecs implement the
opc-protocoltraits (BorrowDecode/OwnedDecode/Encode) and carry@spec/@reqtraceability tags so RFC 006 evidence tooling can index them. - CONFORMANCE.md enumerates exactly which messages, elements, and fields are covered, at which 3GPP release, and what belongs outside the codec boundary.
Consequences
- Writing a codec costs more up front: authoring fixtures from the spec is slower than round-tripping the encoder. That cost is the point — it is the only test construction that catches self-consistent wire errors.
- Reviews of codec changes start from the fixtures: a reviewer verifies bytes against the cited spec section before reading the implementation.
opc-proto-gtpu,opc-proto-pfcp, andopc-proto-nasconform today and serve as the templates; future codecs (NGAP per ADR 0013) inherit the same bar.
ADR 0016: Northbound gRPC Stack Exception (gNMI)
Status
Accepted
Date
2026-06-13
Context
ADR 0014 §3 states: "No gRPC stack (tonic/prost) in SDK crates. … A future
exception requires an ADR, not a Cargo.toml edit." That rule keeps the core
SDK dependency graph lean and auditable: internal transports use hand-specified
framing over tokio/rustls, and external 3GPP interfaces are HTTP/2 (hyper) or
raw protocol codecs.
The management-plane work introduces opc-gnmi-server (see
docs/design/opc-gnmi-server-spec.md).
gNMI (OpenConfig) is a gRPC service:
its contract is a protobuf service over HTTP/2. There is no rustls/hyper-only
or hand-framed path to a conformant gNMI server — a client (gnmic, gNMIc,
OpenConfig collectors) speaks gRPC and nothing else. So opc-gnmi-server cannot
exist without a gRPC stack, and per ADR 0014 §3 that requires this ADR.
gNMI is a distinct dependency category from the cases ADR 0014 §3 was written for. It is a northbound management interface embedded by a CNF that chooses to expose gNMI — not an internal SDK transport and not a 3GPP data-plane codec.
Decision
Permit tonic, prost, and prost-types only for the northbound gNMI server
crate, opc-gnmi-server. prost-types is included because the vendored
OpenConfig gNMI proto uses standard Google protobuf types such as
google.protobuf.Any. tonic-build is permitted only as that crate's
build-time proto-generation dependency if the Phase-0 spike chooses build-time
generation. Any future gRPC-based management crate requires an explicit ADR
amendment and an update to the mechanical allow-list; this exception is not a
blanket "management crates may use gRPC" policy. Specifically:
- Scope boundary.
tonic/prost/prost-typesMUST NOT appear in any core SDK crate (opc-config-bus,opc-config-model,opc-persist,opc-runtime,opc-identity,opc-tls,opc-nacm,opc-yanggen, theopc-proto-*codecs,opc-sbi, theopc-mgmt-*foundation crates, etc.). They live only inopc-gnmi-serverunless this ADR is amended. ADR 0014 §3 remains in force everywhere else. Inside this SDK workspace, no other crate may depend on or re-exportopc-gnmi-server; downstream CNFs outside the workspace opt in to gNMI by depending on the server crate directly. - Boundary is enforced mechanically.
scripts/check-management-plane-policy.py --checkasserts that no crate outside the explicit allowed set directly or transitively depends ontonic/prost/prost-types/tonic-build, or onopc-gnmi-serveritself. The CI job runs this gate. The initial allowed set is exactlyopc-gnmi-server. - One TLS stack only (ADR 0014 §1 preserved).
opc-gnmi-serverserves tonic over therustls::ServerConfigproduced byopc-mgmt-transport(ringprovider), not tonic's own/native TLS. Noopenssl/native-tlsenters the graph (verifytonic/hyperfeatures withdefault-features = false, rustls only). - Dependency hygiene (ADR 0014 §6/§7).
tonic/prost/prost-typesare MIT/Apache — compatible with the license gate. The PR adding them justifies them per §7 and passescargo deny. The pinnedtonicversion MUST compile on the workspace MSRV (currently 1.88, ADR 0014 §5); the Phase-0 spike validates this before the version is pinned, and any MSRV bump follows the §5 process. - Proto pin and generation mode. The gNMI proto is vendored at an exact tag
under
crates/opc-gnmi-server/proto/; the vendored files carry the upstream tag/commit in their header, and the advertised gNMI version string derives from this pin. The Phase-0 spike must choose and document exactly one generation mode:- build-time generation with
tonic-build, which adds an explicitprotocbuild prerequisite and a CI check that generated output is reproducible; or - checked-in generated Rust, which avoids
protocin downstream builds but requires a regeneration script and a CI drift check. In either mode, generated service code is treated as part of theopc-gnmi-serverboundary and does not become a shared SDK dependency.
- build-time generation with
- This exception does not generalize. It authorizes a gRPC server for a northbound management protocol that is gRPC by definition. It is not license to adopt gRPC for internal transports or to relax ADR 0014 §3 for core crates.
Consequences
- A downstream CNF outside this workspace that embeds
opc-gnmi-serverinheritstonic/prost/prost-types. That is an explicit opt-in to gNMI; CNFs that do not expose gNMI never pull the stack. - The core SDK graph stays gRPC-free and auditable, exactly as ADR 0014 §3 intends; only the optional northbound server adds gRPC.
- The mechanical gate from point 2 exists and runs in CI, so this exception's scope cannot silently erode — the same "implicit policy does not survive maintenance" lesson that motivated ADR 0014.
- NETCONF (
opc-netconf-server) is unaffected: it is XML over SSH/TLS and needs no gRPC stack.
ADR 0017: SCTP Transport Strategy and Unsafe-FFI Sys-Crate Boundary
Status
Accepted
Date
2026-06-13
Context
ADR 0014 §8 states unsafe_code = "forbid" is workspace-wide and
"non-negotiable, which also rules out FFI-based protocol libraries (see ADR
0013)." ADR 0013 rejected Option C — FFI to the srsRAN/OAI C NGAP codec —
because foreign C code parsing attacker-controlled bytes turns memory-safety bugs
into SDK security issues.
opc-sctp is required for CNFs that terminate
N2/NGAP or other SCTP interfaces. Unlike NGAP, SCTP is not a codec — it is an
OS transport. Linux implements SCTP in the kernel (lksctp); a userspace
program reaches it through SCTP sockets:
socket(AF_INET, SOCK_STREAM|SOCK_SEQPACKET, IPPROTO_SCTP), SCTP setsockopt
options, sendmsg/recvmsg with SCTP control messages, and, where necessary,
thin libsctp helper calls such as bind/send/receive variants over the same
kernel SCTP UAPI. Rust's std and tokio expose no SCTP socket API, so
reaching kernel SCTP requires libc/UAPI FFI, which is unsafe. ADR 0014 §8 was
written for protocol codec libraries and did not anticipate an OS-transport
syscall surface.
The distinction is decisive:
- ADR 0013's rejected FFI links a large foreign C parser (thousands of lines) that consumes attacker-controlled wire bytes. The attack surface is the C code itself.
- SCTP FFI is a thin wrapper over kernel socket UAPI and optional
libsctphelper functions that themselves configure or call the kernel SCTP stack. The SCTP protocol implementation is the kernel — already trusted, exactly as for TCP/UDP. This is the same category ofunsafethattokio/mioalready use internally for socket I/O in the workspace. The "foreign C parsing attacker bytes" risk ADR 0013 guarded against simply is not present.
Options
- A. Kernel SCTP behind a narrow
opc-libsctp-syssys crate. Thinlibc/SCTP-UAPI FFI in one crate, includinglibsctphelpers only where the Linux SCTP API requires them; a safeopc-sctpwrapper above it. Linux-only. - B. Userspace SCTP stack (pure Rust). Reimplement the SCTP transport protocol with no FFI. Rejected: a from-scratch transport-protocol implementation is large and security-sensitive (association state machine, retransmission, multihoming, chunk bundling) and is more likely to harbor exploitable bugs than thin syscall FFI over the hardened kernel stack; no maintained pure-Rust SCTP stack exists to adopt.
- C. Omit SCTP from the SDK. Ship no SCTP transport. Acceptable only if the first production CNF does not terminate N2/NGAP or any SCTP interface; it blocks N2-terminating CNFs.
Decision
Amend ADR 0014 §8 to permit a narrow, explicitly allowlisted unsafe exception pattern for Linux kernel UAPI sys crates, and adopt Option A when an SCTP-terminating CNF is in scope:
opc-libsctp-sysprovides thin FFI over Linux SCTP socket UAPI and minimallibsctphelpers where required. It is the only SCTP workspace crate permitted to containunsafe; follow-on Linux kernel UAPI exceptions such asopc-linux-xfrm-sysmust be separately and explicitly allowlisted by the same mechanical gate. It does not inherit[workspace.lints](so the workspace-wideunsafe_code = "forbid"stays in force for every other crate); it sets its own local crate policy (unsafe_code = "allow"plusunsafe_op_in_unsafe_fn = "deny", or equivalent crate attributes) that allowsunsafeonly there, with a// SAFETY:comment required on every allowedunsafetoken (unsafeblock,unsafe fn,unsafe impl,unsafe trait, or unsafe extern block).opc-sctp(the public crate) is#![forbid(unsafe_code)]and exposes only safe async abstractions (associations, messages, events) over the sys crate, integrated withtokio::io::unix::AsyncFd(the spec's async model). Its manifest must declare the tokio features it relies on, includingnet, instead of relying on feature unification from unrelated workspace crates.- Boundary is enforced mechanically.
scripts/check-management-plane-policy.py --checktoken-scans OpenPacketCore workspace crate sources and assertsunsafeappears only in explicitly allowlisted Linux UAPI sys crates (opc-libsctp-sysand later, reviewed kernel-UAPI boundaries such asopc-linux-xfrm-sys); the same gate also rejects each allowed sys crate if it inherits[workspace.lints], rejects it if it lacks the required local unsafe lint policy, and requires each allowedunsafetoken in that sys crate to be documented by an adjacentSAFETY:comment. The CI job runs this gate, so the exception cannot silently spread or become undocumented. - ABI safety. Every C struct crossing the boundary has a struct-layout (size/alignment/offset) test; the sys crate builds on Linux in CI and compiles to a clean "unsupported platform" stub elsewhere.
- This exception pattern does not reopen ADR 0013. It authorizes FFI only to explicitly reviewed trusted Linux kernel UAPI boundaries such as SCTP socket/XFRM netlink calls and minimal helper calls that wrap those UAPIs. FFI that links a foreign C protocol codec (parsing attacker-controlled bytes — NGAP/NAS/etc.) remains rejected; those stay pure-Rust per ADR 0013/0015.
- SCTP is implemented per Option A behind this boundary, never as scattered
unsafeand never as a userspace reimplementation without revisiting this ADR.
Consequences
- The workspace gains small, auditable OpenPacketCore Linux UAPI sys crates
containing
unsafe; downstream carrier auditors review those explicitly allowlisted sys crates rather than a diffuse unsafe surface, andunsafe_code = "forbid"remains true everywhere else. - The CI gate from point 3 exists, mirroring the "policy must be mechanically enforced" lesson of ADR 0014.
opc-sctpuses the non-inheritance mechanism andAsyncFdmodel described by this ADR. Its README and tests record the current capability profile.- NGAP-over-SCTP wiring (PPID 60) is separate integration work and is not authorized to use FFI for the NGAP codec itself.
OPC gNMI Server Design Spec
Status
Implemented foundation, owned by opc-gnmi-server.
Scope
opc-gnmi-server is the optional northbound gNMI server for CNFs that choose to
expose OpenConfig management. It is outside the core SDK dependency graph and is
the only workspace crate allowed to depend on tonic, prost, prost-types,
or tonic-build.
The crate owns:
- vendored gNMI protobuf bindings and the tonic service wrapper;
- authenticated gNMI-over-TLS listener integration;
- Capabilities, Get, Set, and Subscribe handling;
- OpenPacketCore commit-confirmed registered extension semantics;
- gNMI master-arbitration enforcement;
- schema-backed path, value, audit, metrics, and config-bus integration.
Security Contract
Production embeddings must construct GnmiServer with an explicit audit sink
through new, new_with_audit, new_with_arbitration, or
new_with_audit_and_arbitration. The tracing audit sink is available only
through *_dev_only constructors for tests, conformance fixtures, and local
development.
GnmiService::new requires an authenticated transport principal on every RPC.
The unauthenticated service wrapper is crate-private and compiled only for
tests. Runtime listeners must derive principals from the mTLS transport and
attach them to requests before dispatch.
Set commits submit complete candidates to opc-config-bus with the running
snapshot version they were built from. opc-config-bus enforces that base
version for candidate-bearing requests, so a stale gNMI Set cannot overwrite an
intervening commit.
Extension Semantics
The OpenPacketCore commit-confirmed extension uses the experimental registered
extension ID documented in opc-gnmi-server. It is advertised only when the
extension registry enables it and master arbitration is also configured.
Every commit-confirmed Begin, Confirm, or Cancel Set must carry a valid master-arbitration extension. This binds control actions to the gNMI election fence for the tenant and role, preventing a different writer from confirming or cancelling another writer's pending commit unless it wins arbitration first.
Servers with arbitration disabled reject commit-confirmed registration at construction time.
Dependency Boundary
ADR 0016 permits the gRPC stack only in opc-gnmi-server. The CI policy script
must continue to enforce that:
- no other workspace crate depends on
tonic,prost,prost-types, ortonic-build; - no other workspace crate depends on or re-exports
opc-gnmi-server; - all gNMI TLS serving uses the shared
rustlsconfiguration built by the OPC management transport stack.
Verification
The gNMI foundation is covered by crate tests for:
- authenticated Capabilities, Get, Set, and Subscribe behavior;
- Set stale-candidate rejection after intervening commits;
- commit-confirmed timeout, confirm, cancel, malformed payload, and missing arbitration cases;
- master-arbitration election, tenant, and role fencing;
- listener mTLS principal derivation and max-session bounds;
- extension payload redaction in status, metrics, and audit paths.