Introduction

The OpenPacketCore SDK is a toolkit for building 5G Core Network Functions (CNFs) that run on Kubernetes. It combines Rust-based policy engines with Go-based Kubernetes orchestration to give operators both safety and flexibility.

What the SDK Provides

Rust crates for protocol codecs (GTP-U, PFCP, NAS-5GS, NGAP v0, experimental Diameter base/application dictionaries, the experimental opc-proto-gtpv2c S2b subset, and the experimental opc-proto-ikev2 header/payload-chain scaffold), session management, configuration consensus, alarms, and runtime chassis.
Go packages (operator-sdk-go) for Kubernetes operators: conditions, bridge to Rust policy, drain orchestration, workload synthesis, runtime-gate helpers, Multus/SR-IOV attachment helpers, and metrics. Newly added packet-core helper surfaces are experimental mechanism helpers, not product CRDs or production controller claims.
Reference operator (sdk-reference-operator) demonstrating end-to-end reconciliation of a network function custom resource.

Getting Started

See Quickstart for environment setup and your first SdkManagedNetworkFunction deployment.

Architecture

The SDK is documented through RFCs (high-level design) and ADRs (decision records). Start with:

Architecture

Layered view of the SDK. Arrows point in the dependency direction (inward).

flowchart TB
  subgraph L1["Layer 1 — pure codecs & types (no async, no I/O)"]
    types[opc-types]
    codecs["opc-proto-* (pfcp, gtpu, gtpv2c, ngap, nas, diameter, ikev2)"]
    protocol[opc-protocol]
  end
  subgraph L2["Layer 2 — models & ports"]
    cfgmodel[opc-config-model]
    ports["opc-mgmt-* ports (schema, path, errors, principal, limits, audit, authz, opstate, transport)"]
    nacm[opc-nacm]
  end
  subgraph L3["Layer 3 — app orchestrator"]
    bus["opc-config-bus (validate → authorize → persist → publish; commit-confirmed expiry rollback; recovery fence)"]
  end
  subgraph L4["Layer 4 — adapters (async)"]
    netconf["opc-netconf-server (SSH/russh)"]
    gnmi["opc-gnmi-server (tonic, mTLS)"]
    persist[opc-persist]
    tls["opc-tls / opc-identity (SPIFFE)"]
  end
  subgraph L5["Layer 5 — runtime & operators"]
    runtime[opc-runtime]
    oplc["operator-lifecycle / operator-controller (Rust)"]
    gosdk["operators/operator-sdk-go + sdk-reference-operator (Go)"]
  end
  facade["opc-sdk (facade / prelude)"]

  netconf --> bus
  gnmi --> bus
  bus --> ports
  bus --> cfgmodel
  persist --> ports
  ports --> types
  cfgmodel --> types
  codecs --> protocol
  nacm --> ports
  tls --> ports
  runtime --> ports
  oplc --> runtime
  gosdk -. bridge CLI contract .-> oplc
  facade --> netconf
  facade --> gnmi
  facade --> bus
  facade --> codecs
  facade --> runtime

Legend: solid arrows are Cargo dependencies (direction = "depends on"); the dashed edge is the Go↔Rust policy-CLI process boundary (JSON contract, versioned by scripts/check-downstream-import.sh on the Go side).

OpenPacketCore SDK RFC Index

This directory contains the foundational RFCs for the OpenPacketCore SDK and CNF architecture. These documents are intended to be implementation inputs for engineers.

Foundation Set

RFC	Title	Primary Scope
001	Transactional Management Substrate	Config commits, persistence, recovery, NACM boundary
002	YANG-to-Rust Projection	Codegen, RFC 7951, validation, memory layout
003	Security Substrate	SPIFFE, gNSI, tenant identity, keys, audit
004	High-Performance Session Store	Session state, leases, fencing, handover, geo-redundancy
005	Zero-Copy Protocol Framework	Parsers, codecs, lifetimes, fuzzing, spec tags
006	Conformance and Evidence Pipeline	SBOM, VEX, provenance, signing, known gaps
007	SBI Service Framework	TS 29.500/29.510, NRF, OAuth2, overload, retries
008	CNF Runtime Chassis	Startup, supervision, shutdown, health, resource budgets
009	Operator Lifecycle and Upgrade	CRDs, rollout, migration, drain, rollback
010	Data Governance and Privacy	Data classes, redaction, retention, LI, regulated records
011	Node and Data-Plane Resource Contract	SR-IOV, Multus, AF_XDP, CPU, NUMA, pod security
012	Testbed and Simulator Framework	Scenario DSL, simulators, fixtures, virtual time
013	Fault Management and Alarm Substrate	Alarms, severity, probable cause, FM sinks

OPC-SDK-RFC-001: Transactional Management Substrate

Status: Draft for Implementation
Version: 2.0.0
Date: 2026-05-19
Audience: SDK implementers, NF owners, security reviewers, test authors

1. Abstract

This RFC defines the transactional management substrate for OpenPacketCore network functions. It specifies the configuration commit state machine, the isolation boundary between the management plane and data plane, the reference persistent store, recovery behavior, authorization hooks, observability, and implementation acceptance criteria.

The core invariant is:

An NF's running configuration is a deterministic, validated, authorized, and durable projection of its YANG-defined configuration.

This RFC corrects the initial draft in four important ways:

The commit pipeline is a single-writer state machine, not a long-held async mutex.
The management plane is explicitly resource-isolated from the data plane.
SQLite WAL is allowed only as a reference management-plane store with container storage preflight checks.
Persistence, encryption, audit, rollback, and recovery are made explicit enough for independent implementation by multiple contributors.

2. Scope

2.1 In Scope

gNMI, NETCONF, and local operator configuration commits.
Candidate, running, startup, rollback, and shadow-security configuration stores.
Authorization of configuration mutations.
Durable commit history and audit trail.
Deterministic change notification to NF subsystems.
Reference SQLite persistence backend.
Interfaces that allow other persistence backends later.

2.2 Out of Scope

User-plane packet forwarding.
High-rate session state. See RFC 004.
Protocol parsing. See RFC 005.
Full supply-chain evidence generation. See RFC 006.
Cluster-wide consensus. This RFC covers per-replica local persistence and commit sequencing. Cluster-level orchestration must be layered above it.

3. Design Goals

3.1 Security

Default-deny authorization for all write operations.
Fail-closed behavior for corrupt storage, invalid identity, failed decryption, failed validation, and incomplete recovery.
No unredacted secret material in audit logs, telemetry, traces, or error messages.
Cryptographic binding between config payload, schema version, transaction metadata, and principal identity.
Tamper-evident audit history.

3.2 Performance

Configuration commits must not starve data-plane workers.
Data-plane readers must see configuration through wait-free or bounded-time snapshot access.
Commit admission must provide bounded memory growth and clear backpressure.
Heavy validation, serialization, compression, encryption, and fsync must not run on the async I/O worker set.

3.3 Maintainability

The state machine must be explicit and testable.
Generated and hand-written validation must use the same error model.
Storage backends must implement a narrow trait with deterministic semantics.
Each phase must have owner modules, metrics, logs, and fault injection tests.

3.4 Functionality

Support create, update, replace, delete, validate-only, commit-confirmed, rollback, and startup restore.
Support path-level audit and change notifications.
Support rollback points and schema migrations.
Support shadow-security configuration that is not exposed through ordinary gNMI Get.

4. Core Concepts

4.1 Stores

The SDK defines the following logical stores:

Store	Purpose	Durable	Exposed By gNMI Get
`candidate`	Transaction-local mutable config	No	No
`running`	Active immutable config	Yes	Yes, after NACM filtering
`startup`	Optional boot config alias or snapshot	Yes	Operator controlled
`rollback`	Explicit rollback points	Yes	Metadata only
`shadow-security`	gNSI/certificate/authz material	Yes	No

The data plane MUST consume only immutable snapshots of running plus any explicitly subscribed derived state. It MUST NOT read from candidate, startup, or the raw persistence backend.

4.2 Config Snapshot

Generated root configs MUST implement:

#![allow(unused)]
fn main() {
pub trait OpcConfig: Clone + Send + Sync + 'static {
    type Delta: Send + Sync + core::fmt::Debug + 'static;

    fn schema_digest(&self) -> SchemaDigest;
    fn diff(&self, previous: &Self) -> Result<Vec<Self::Delta>, ConfigError>;
    fn apply_delta(&mut self, delta: Self::Delta) -> Result<(), ConfigError>;
    fn validate_syntax(&self) -> Result<(), ValidationError>;
    fn validate_semantics(&self, ctx: &ValidationContext) -> Result<(), ValidationError>;
}
}

Clone is required for the reference implementation, but large generated configs SHOULD use structural sharing internally so candidate creation does not copy every leaf for small patches.

4.3 Runtime Snapshot Access

The running config MUST be published through an atomic snapshot mechanism such as arc-swap or an equivalent SDK type:

#![allow(unused)]
fn main() {
pub trait ConfigSnapshot<C>: Send + Sync {
    fn load(&self) -> std::sync::Arc<C>;
    fn version(&self) -> ConfigVersion;
}
}

Data-plane reads MUST NOT acquire the commit lock, await I/O, allocate large buffers, or call validation hooks.

5. Commit State Machine

5.1 States

Each commit moves through the following states:

State	Description	May Fail	Durable Side Effect
`Admitted`	Request accepted into bounded queue	Yes	No
`Authenticated`	Peer identity verified	Yes	No
`Authorized`	NACM/path policy passed	Yes	Audit denial
`Staged`	Candidate built from running snapshot	Yes	No
`SyntaxValidated`	YANG constraints passed	Yes	No
`SemanticallyValidated`	NF validation passed	Yes	No
`Prepared`	Serialized, encrypted, and ready to write	Yes	No
`Persisted`	Commit record and audit record fsynced	Yes	Yes
`Published`	Running pointer atomically swapped	No in normal operation	Yes
`Notified`	Subscribers informed	Best effort per subscriber	Metrics/audit only

No state is allowed to panic as part of ordinary error handling. A panic in the commit worker is a process bug and MUST be treated as StateMachineFault.

5.2 Corrected Phase Ordering

The commit worker MUST serialize commits, but it MUST NOT hold a tokio::sync::Mutex across .await, blocking validation, encryption, serialization, or database I/O. The recommended structure is:

Northbound handlers push CommitRequest into a bounded mpsc queue.
A single commit worker owns sequencing and transaction IDs.
CPU-heavy validation runs through a bounded blocking/CPU pool.
Crypto and serialization run through a bounded crypto pool.
Persistence runs through a single writer backend handle.
Publication is an atomic pointer swap.

This keeps ordering deterministic without turning the async runtime lock into a global bottleneck.

5.3 Commit Request

#![allow(unused)]
fn main() {
pub struct CommitRequest<C: OpcConfig> {
    pub request_id: RequestId,
    pub principal: TrustedPrincipal,
    pub transport: TransportType,
    pub source: RequestSource,
    pub operation: ConfigOperation,
    pub mode: CommitMode,
    pub deadline: std::time::Instant,
    pub idempotency_key: Option<IdempotencyKey>,
    pub base_version: ConfigVersion,
    pub candidate: Option<C>,
    pub changed_paths: Vec<YangPath>,
}

pub enum CommitMode {
    Commit,
    ValidateOnly,
    CommitConfirmed { timeout: std::time::Duration },
    Rollback { target: RollbackTarget },
}
}

idempotency_key SHOULD be supported for northbound clients that retry after UNAVAILABLE.

Candidate-bearing requests MUST carry the running config base_version used to build the candidate. The ConfigBus worker MUST reject the request before validation or publication when that value no longer matches the current running version, so stale full-candidate writers cannot overwrite an intervening commit.

5.4 Commit Result

#![allow(unused)]
fn main() {
pub struct CommitResult {
    pub tx_id: TxId,
    pub base_version: ConfigVersion,
    pub new_version: Option<ConfigVersion>,
    pub status: CommitStatus,
    pub changed_paths: Vec<YangPath>,
    pub apply_plan: Option<ApplyPlan>,
}
}

Failed commits MUST include stable machine-readable error codes. Error strings MUST NOT contain secrets or raw config fragments.

Candidate-bearing commit, commit-confirmed, and validate-only requests SHOULD return an ApplyPlan that classifies the operational impact of the SDK-derived changed paths after validation and before durable append. The default classifier returns hot plans so existing products remain compatible; products MAY install a ConfigImpactClassifier for domain-specific warm, drain-required, restart-required, or forbidden-live behavior. forbidden-live and apply-plan hard errors MUST fail closed before durable append/publication and attach the rejected plan to CommitError.apply_plan.

6. Management Thread Boundary

6.1 Required Execution Domains

The initial "Three-Pool" model is directionally correct but underspecified. The SDK MUST implement the following boundaries:

Domain	Work	Requirement
Async I/O	gNMI, NETCONF, gNSI, health, metrics	Never perform CPU-heavy work or fsync
Commit worker	Sequencing, state machine ownership	Single logical writer, bounded queue
Validation pool	Generated and NF semantic validation	Bounded threads and timeout
Crypto/serialization pool	RFC 7951 serialization, compression, AEAD	Bounded threads and memory
Persistence writer	SQLite or backend write transaction	Single writer per local store
Data-plane workers	Packet/session fast path	No dependency on management pools

Implementations MAY combine validation and crypto pools for small deployments, but the default carrier profile MUST expose independent limits for both.

6.2 Starvation Protection

The SDK MUST provide:

Separate semaphores for validation, crypto, and persistence work.
Configurable max queued commits, default 32.
Configurable max pending bytes across staged candidates, default 64 MiB.
Per-request deadline propagation.
Admission rejection with gRPC UNAVAILABLE and retry metadata when queues are full.
A hard rule that data-plane threads never run management-plane blocking work.

Carrier CNF deployments SHOULD pin data-plane workers and management workers to different CPU sets using Kubernetes CPU Manager or an equivalent runtime mechanism. The SDK MUST work without CPU pinning, but the documented production profile MUST include it.

6.3 Time Budgets

Default phase budgets:

Phase	Default Budget
Admission wait	2 seconds
Syntax validation	5 seconds
Semantic validation	30 seconds
Serialization/encryption	10 seconds
Persistence	10 seconds
Notification fanout	2 seconds per subscriber batch

Budgets MUST be configurable per NF. Expired commits MUST fail before publication. Persistence timeouts after partial backend work MUST be resolved by backend recovery logic before the next commit is accepted.

7. Persistence Abstraction

7.1 Trait

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
pub trait ConfigStore: Send + Sync {
    async fn load_latest(&self) -> Result<Option<StoredConfig>, PersistError>;
    async fn load_rollback(&self, target: RollbackTarget) -> Result<StoredConfig, PersistError>;
    async fn append_commit(&self, record: CommitRecord, audit: Vec<AuditRecord>)
        -> Result<(), PersistError>;
    async fn mark_confirmed(&self, tx_id: TxId) -> Result<(), PersistError>;
    async fn create_rollback_point(&self, tx_id: TxId, label: Option<String>)
        -> Result<(), PersistError>;
    async fn preflight(&self) -> Result<PersistCapabilities, PersistError>;
}
}

append_commit MUST be atomic: either the commit record and its audit records are durable together, or neither is visible during recovery.

7.2 Commit Record

#![allow(unused)]
fn main() {
pub struct CommitRecord {
    pub tx_id: TxId,
    pub parent_tx_id: Option<TxId>,
    pub version: ConfigVersion,
    pub committed_at: Timestamp,
    pub principal: TrustedPrincipal,
    pub source: RequestSource,
    pub schema_digest: SchemaDigest,
    pub plaintext_digest: Sha256Digest,
    pub encrypted_blob: EncryptedBlob,
    pub rollback_point: bool,
    pub confirmed_deadline: Option<Timestamp>,
}
}

The plaintext digest is verified only after successful AEAD decryption. It is not a substitute for AEAD integrity.

8. SQLite Reference Backend

8.1 Positioning

SQLite WAL is a sound reference backend for a single NF replica's management configuration and audit history because commits are low-rate, read access is local, recovery is simple, and the operational footprint is small.

SQLite MUST NOT be treated as a distributed consensus system. It MUST NOT be used for high-rate session state or cross-replica active/active configuration coordination.

8.2 Mandatory Container Storage Preflight

Before accepting writes, the SQLite backend MUST verify and report:

Database path is on a persistent volume when persistence is required.
Filesystem supports POSIX byte-range locking compatible with SQLite.
WAL, SHM, and database files are on the same filesystem.
The volume is not a known-unsafe network filesystem unless explicitly overridden by an operator with an evidence waiver.
fsync is not disabled by mount options or runtime configuration.
The database directory is writable only by the NF service account UID/GID.
Free space is above configured threshold.
Startup can create, checkpoint, close, and reopen a test WAL transaction.

If preflight fails, the NF MUST fail closed unless configured for an explicit ephemeral development mode.

8.3 PRAGMA Profile

The reference backend MUST apply and verify:

PRAGMA journal_mode = WAL;
PRAGMA synchronous = EXTRA;
PRAGMA foreign_keys = ON;
PRAGMA locking_mode = NORMAL;
PRAGMA busy_timeout = 5000;
PRAGMA temp_store = MEMORY;

locking_mode = EXCLUSIVE SHOULD NOT be the default in containers because it can break sidecar backup, online inspection, and some recovery workflows. The backend MAY offer exclusive mode for sealed appliances, but the default is NORMAL with a single SDK writer and no external writers.

synchronous = EXTRA is acceptable as a conservative default, but the backend MUST document that durability still depends on the underlying filesystem and storage class. Production deployments MUST use tested PVC/storage classes, not overlay filesystem layers for durable config.

8.4 Schema

CREATE TABLE schema_version (
    id INTEGER PRIMARY KEY CHECK (id = 1),
    schema_digest BLOB NOT NULL,
    sdk_version TEXT NOT NULL,
    created_at TEXT NOT NULL
);

CREATE TABLE config_history (
    tx_id BLOB PRIMARY KEY,
    parent_tx_id BLOB NULL REFERENCES config_history(tx_id),
    version INTEGER NOT NULL UNIQUE,
    committed_at TEXT NOT NULL,
    principal TEXT NOT NULL,
    source TEXT NOT NULL,
    schema_digest BLOB NOT NULL,
    plaintext_digest BLOB NOT NULL,
    encrypted_blob BLOB NOT NULL,
    rollback_point INTEGER NOT NULL DEFAULT 0,
    confirmed_deadline TEXT NULL,
    confirmed_at TEXT NULL
);

CREATE TABLE audit_trail (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    tx_id BLOB NOT NULL REFERENCES config_history(tx_id) ON DELETE RESTRICT,
    sequence INTEGER NOT NULL,
    yang_path TEXT NOT NULL,
    op_type TEXT NOT NULL CHECK(op_type IN ('CREATE', 'UPDATE', 'REPLACE', 'DELETE')),
    previous_value TEXT NULL,
    new_value TEXT NULL,
    redaction_applied INTEGER NOT NULL DEFAULT 0,
    previous_hash BLOB NOT NULL,
    entry_hmac BLOB NOT NULL,
    UNIQUE(tx_id, sequence)
);

CREATE INDEX audit_trail_tx_id_idx ON audit_trail(tx_id);
CREATE INDEX config_history_rollback_idx ON config_history(version, rollback_point);

8.5 WAL Maintenance

The backend MUST:

Set a bounded WAL autocheckpoint threshold.
Run explicit checkpoints during graceful shutdown and after large commits.
Export metrics for WAL size and checkpoint failures.
Refuse startup when WAL recovery fails.
Avoid deleting WAL or SHM files manually.

9. Encryption at Rest

Configuration encryption is specified here at the envelope level and governed by RFC 003 for key management.

9.1 Algorithm

Default AEAD: AES-256-GCM-SIV.
Alternative for non-AES-accelerated targets: XChaCha20-Poly1305, if allowed by the deployment security profile.
Random nonce generation is still REQUIRED even when using nonce-misuse resistant AEAD.

9.2 Envelope

struct ConfigEnvelopeV1 {
    magic: [u8; 4] = "OPCE";
    version: u16 = 1;
    alg_id: u16;
    key_id_len: u16;
    nonce_len: u16;
    aad_len: u32;
    key_id: [u8; key_id_len];
    nonce: [u8; nonce_len];
    aad: [u8; aad_len];
    ciphertext_and_tag: [u8; remaining];
}

AAD MUST include:

tx_id
parent_tx_id
version
committed_at
principal
schema_digest
store_kind

9.3 Key Derivation

When using a master secret, per-commit keys MUST be derived with HKDF-SHA256:

salt = tx_id || schema_digest
info = "openpacketcore/config/v1" || store_kind || key_id
key = HKDF(master_secret, salt, info, 32)

The backend MUST support key rotation by retaining enough key metadata to read old commits until the operator performs re-encryption or retention expiry.

10. Authorization Boundary

10.1 Auth Context

#![allow(unused)]
fn main() {
pub struct AuthContext {
    pub principal: TrustedPrincipal,
    pub spiffe_id: Option<SpiffeId>,
    pub transport: TransportType,
    pub source_ip: std::net::IpAddr,
    pub tenant: TenantId,
    pub authenticated_at: Timestamp,
}
}

10.2 NACM Requirements

The NACM engine MUST:

Normalize YANG paths before policy evaluation.
Reject ambiguous module prefixes.
Treat missing policy as deny.
Authorize every changed path, not just the top-level request path.
Authorize read, create, update, replace, delete, exec, and subscribe actions separately.
Enforce policy before candidate mutation and again before publication if the policy changed during a long-running commit.

Trie evaluation is acceptable for performance, but wildcard, subtree, module, and default-deny semantics MUST be tested against RFC 8341 behavior.

11. Notifications

After publication, the ConfigBus MUST notify subscribers with:

#![allow(unused)]
fn main() {
pub struct ConfigChange<C: OpcConfig> {
    pub tx_id: TxId,
    pub version: ConfigVersion,
    pub previous: std::sync::Arc<C>,
    pub current: std::sync::Arc<C>,
    pub deltas: Vec<C::Delta>,
    pub changed_paths: Vec<YangPath>,
}
}

Subscriber channels MUST be bounded. Slow subscribers MUST be isolated so they cannot block publication of future commits. Each subscriber must choose one of:

drop_oldest
drop_newest
disconnect_on_lag
force_resync

Critical NF subsystems that cannot tolerate missed notifications MUST expose a resync method and compare local applied version against ConfigBus::version().

12. Recovery

12.1 Startup

Startup MUST:

Run storage preflight.
Recover or checkpoint WAL if required.
Load highest confirmed config version.
Decrypt and authenticate envelope.
Verify plaintext digest.
Verify schema compatibility or run migration.
Run syntax validation.
Run semantic validation in startup mode.
Publish running snapshot.
Start northbound write admission only after running is published.

12.2 Rollback

If latest config fails startup semantic validation, the NF MAY try rollback points in descending version order. It MUST audit the rollback decision on the next successful write-capable startup. If no rollback point validates, the NF MUST fail closed and expose a read-only recovery endpoint only if explicitly enabled.

12.3 Commit-Confirmed

commit-confirmed MUST:

Persist the tentative config with a deadline.
Publish it as running.
Require explicit confirmation before deadline.
Automatically roll back to the parent config if not confirmed.
Emit warning telemetry before rollback.

The rollback timer MUST survive process restart by reading persisted confirmed_deadline.

13. Observability

Required metrics:

opc_config_commits_total{outcome,reason,transport}
opc_config_commit_duration_seconds{phase}
opc_config_commit_queue_depth
opc_config_commit_queue_rejections_total{reason}
opc_config_running_version
opc_config_subscriber_lag{subscriber}
opc_persist_wal_bytes
opc_persist_checkpoint_total{outcome}
opc_persist_fsync_duration_seconds
opc_nacm_decisions_total{action,outcome}

Required structured log fields:

request_id
tx_id
version
principal
tenant
transport
phase
outcome
error_code

Logs MUST NOT contain secret values or raw config blobs.

14. Testing Requirements

14.1 Unit Tests

State transition table.
NACM path normalization and default deny.
Candidate patch behavior.
Encryption envelope parse/decrypt failures.
Audit hash chain validation.
Subscriber lag policies.

14.2 Integration Tests

Concurrent commits serialize deterministically.
Validation timeout does not block health/read endpoints.
Persistence crash before commit is invisible after restart.
Persistence crash after commit is visible after restart.
WAL checkpoint and recovery on restart.
Commit-confirmed rollback after process restart.
Rollback point selection when latest config fails validation.

14.3 Fault Injection

Disk full.
fsync failure.
Corrupt WAL.
Corrupt encrypted blob.
Missing key.
Expired SPIFFE identity.
NACM policy change during long commit.
Slow or disconnected subscriber.

14.4 Performance Tests

Minimum carrier profile gates:

Data-plane config snapshot load p99 under 1 microsecond in-process.
Northbound read path remains available during 30 second semantic validation.
Commit queue rejects rather than exceeding configured memory limit.
10,000 path-level audit records commit without unbounded memory growth.
SQLite backend sustains 10 commits/second for 60 seconds on reference PVC.

15. Module Ownership

Contributors should implement these modules independently with the listed ownership:

Module	Responsibility
`opc-config-bus`	Commit worker, snapshot publication, subscriber fanout
`opc-config-model`	Shared IDs, errors, request/result types
`opc-nacm`	Path normalization and authorization decisions
`opc-persist`	`ConfigStore` trait and SQLite backend
`opc-crypto`	Envelope encryption/decryption and key lookup adapter
`opc-audit`	Audit records, redaction markers, hash chain
`opc-config-testkit`	Fault injection, mock store, mock NACM

Each module MUST expose a narrow public API, avoid cyclic dependencies, and include doc examples for the primary workflow.

16. Acceptance Criteria

This RFC is implemented when:

A commit cannot publish unless authorization, validation, encryption, and durable append all succeed.
Data-plane snapshot access is independent of commit queue and persistence health.
SQLite preflight rejects unsafe durable deployments.
Recovery handles clean restart, crash restart, rollback, and commit-confirmed expiry.
Audit logs are tamper-evident and redacted.
Metrics expose queue, phase latency, persistence, and authorization health.
Fault injection tests cover all failures listed in Section 14.3.

OPC-SDK-RFC-002: YANG-to-Rust Projection and Codegen Engine

Status: Draft for Implementation
Version: 2.0.0
Date: 2026-05-19
Audience: SDK implementers, YANG model authors, NF teams, operator authors

1. Abstract

This RFC defines how OpenPacketCore projects YANG models into Rust data structures, validators, serializers, patch applicators, metadata tables, and operator-facing schemas. The generated code must preserve YANG semantics, support RFC 7951 JSON encoding, avoid stack blowups on large configurations, and provide deterministic APIs for the management substrate in RFC 001.

The key correction from the initial draft is that code generation MUST NOT rely on ad hoc recursive traversal or direct translation of arbitrary XPath strings into Rust closures. The SDK must compile YANG into a typed intermediate representation with bounded validation behavior, stable metadata, and differential tests against a reference YANG engine.

2. Scope

2.1 In Scope

YANG 1.1 module loading and schema resolution.
RFC 7951 JSON serialization and deserialization.
Rust type generation for config and state trees.
Validation for type constraints, must, when, leafref, unique, min-elements, max-elements, mandatory, and defaults.
gNMI/NETCONF patch application metadata.
Secret/redaction metadata for RFC 001 and RFC 003.
Runtime schema metadata consumed by gNMI, NETCONF, NACM, audit, and operator policy helpers.
Conformance tags for RFC 006.

2.2 Out of Scope

Runtime session state schema. See RFC 004.
Protocol wire codecs. See RFC 005.
UI form generation.
Go/Kubernetes CRD generation. Product operators own their API shape and may consume the generated Rust schema/policy metadata through RFC 009 helpers.
Support for proprietary YANG extensions unless explicitly registered in the extension registry defined here.

3. Design Goals

3.1 Security

Generated deserializers must reject unknown, ambiguous, duplicate, or malformed fields unless the relevant protocol explicitly allows them.
Secret leaves must use secret-aware generated types and redaction metadata.
Generated validators must not panic on hostile input.
Generated code must avoid unsafe unless an RFC-specific exception is approved and fuzzed.

3.2 Performance

Validation must be linear or near-linear in the size of the config for common cases.
Large lists must validate through generated indices, not repeated global depth-first searches.
Generated root structs must keep stack footprint bounded.
Patch application must avoid full-tree clone when structural sharing is enabled.

3.3 Maintainability

Code generation must be deterministic for identical inputs.
Generated files must have stable names, stable item order, and stable formatting.
Constraint lowering must go through a typed IR that can be inspected, tested, and rendered.
Generated APIs must be boring and consistent across all NFs.

3.4 Functionality

Support canonical YANG schema features required by 3GPP and IETF models.
Preserve presence, default, namespace, ordering, and key semantics.
Emit enough metadata for NACM, audit, gNMI paths, and conformance mapping.
Support schema migrations between SDK releases.

4. Inputs and Outputs

4.1 Inputs

The code generator consumes:

YANG module files.
A module lockfile containing exact module names, revisions, and checksums.
A generation profile.
Optional extension registry.

4.2 Outputs

For each generation unit, the tool emits:

Rust structs, enums, newtypes, validators, serializers, and patch applicators.
Static schema metadata tables.
Path constants and path parser helpers.
Redaction and NACM metadata.
Property test fixtures.
schema-digest.json for runtime compatibility checks.
conformance-tags.json for RFC 006.

Generated output MUST be reproducible from the lockfile and profile.

5. Schema Resolution Pipeline

5.1 Frontend

The frontend MUST parse YANG 1.1 and preserve:

Module and submodule identity.
Revision.
Namespace and prefix.
Imports and includes.
Extension statements.
Source locations for diagnostics.

The implementation MAY use libyang2 through a safe wrapper or a native Rust parser. In either case, the SDK MUST include differential tests against at least one reference YANG implementation for supported constructs.

5.2 Middle-End

The middle-end MUST produce a flattened schema IR by resolving:

typedef
grouping and uses
augment
deviation
refine
feature and if-feature
identity inheritance
module prefixes and namespaces

The flattened model MUST retain enough source mapping to produce diagnostics that point back to the original YANG module and line.

5.3 Backend

The backend emits Rust and schema metadata. It MUST:

Sort emitted items deterministically.
Use stable generated filenames.
Run generated Rust through rustfmt.
Fail generation if generated code does not compile.
Emit compile-time size checks.

6. Rust Type Mapping

6.1 Scalar Leaves

YANG Type	Rust Representation	RFC 7951 JSON Notes
`int8`, `int16`, `int32`	`i8`, `i16`, `i32`	JSON number
`uint8`, `uint16`, `uint32`	`u8`, `u16`, `u32`	JSON number
`int64`, `uint64`	`i64`, `u64`	JSON string to avoid precision loss
`decimal64`	generated fixed-scale newtype or `rust_decimal::Decimal`	JSON string
`string`	`String` or generated constrained newtype	JSON string
`boolean`	`bool`	JSON boolean
`empty`	generated unit marker	RFC 7951 `[null]`
`enumeration`	generated Rust enum	renamed variants preserve YANG names
`bits`	generated bitflags/newtype	space-separated string
`binary`	`bytes::Bytes` or `Vec<u8>`	base64 string
`identityref`	generated enum or `IdentityRef` newtype	namespace-qualified string when needed
`instance-identifier`	`YangInstanceIdentifier`	namespace-aware path string
`leafref`	generated newtype over target type	encoded like target leaf
`union`	generated ordered enum	parse order follows YANG union member order

Generated constrained newtypes MUST enforce range, length, and pattern constraints during deserialization and validation.

6.2 Containers

YANG containers map to Rust structs. The generator must distinguish:

Presence containers.
Non-presence containers.
Optional generated fields.
Mandatory generated fields.

Large or optional containers SHOULD be boxed. The generator MUST box a field if embedding it would make the parent exceed the configured stack budget.

Default stack budget:

max_size_of_root = 4096 bytes
max_size_of_any_struct = 1024 bytes

Budgets are profile-configurable. Generated code MUST include compile-time assertions for these limits.

6.3 Lists

YANG list projection depends on key and ordering:

YANG List Kind	Rust Representation
keyed, `ordered-by system`	`BTreeMap<Key, Value>`
keyed, `ordered-by user`	`Vec<Value>` plus generated key index
unkeyed config list	`Vec<Value>` with min/max validation
`config false` operational list	`Vec<Value>` or backend-specific iterator

The key type MUST be a generated struct when there are multiple key leaves. Duplicate keys MUST be rejected during deserialization and patch application.

6.4 Leaf-Lists

Leaf-lists map to Vec<T> plus generated validation for:

min-elements
max-elements
uniqueness, when required by YANG semantics
user ordering
default values

Generated code SHOULD build a temporary set for uniqueness checks rather than performing O(n^2) comparisons.

6.5 Choices and Cases

choice maps to a generated enum. The generator MUST preserve:

default case
mandatory choice behavior
when conditions on cases
removal of sibling case data when a different case is selected

Patch application MUST enforce case exclusivity.

7. Presence and Defaults

YANG requires distinguishing absent, defaulted, and explicitly set values. The generator MUST NOT collapse these states into plain Option<T> when protocol semantics require the distinction.

Generated fields SHOULD use a profile-selected representation such as:

#![allow(unused)]
fn main() {
pub enum LeafPresence<T> {
    Absent,
    Defaulted(T),
    Explicit(T),
}
}

For ergonomic NF logic, generated structs MAY expose helper accessors:

#![allow(unused)]
fn main() {
impl UpfInterface {
    pub fn mtu(&self) -> u16;
    pub fn mtu_presence(&self) -> LeafPresence<&u16>;
}
}

RFC 7951 serialization MUST follow the selected output mode:

ExplicitOnly: omit defaults unless explicitly set.
WithDefaults: include effective defaults.
Operational: include state and effective values.

8. RFC 7951 Encoding Requirements

The serializer/deserializer MUST handle:

Namespace-qualified member names where required.
64-bit integers as strings.
decimal64 as strings.
empty as [null].
Base64 for binary.
Identity names with module prefixes when the identity is not in the parent namespace.
Instance identifiers with namespace-aware path segments.
Duplicate JSON object member rejection.
Unknown field handling according to protocol profile.

Round-trip tests MUST cover all scalar mappings.

9. Constraint IR and Validation

9.1 Constraint IR

The generator MUST lower must, when, range, length, pattern, and other constraints into a typed IR:

#![allow(unused)]
fn main() {
pub enum ConstraintExpr {
    Path(PathExpr),
    Literal(Literal),
    Function(FunctionCall),
    Compare { op: CompareOp, left: Box<ConstraintExpr>, right: Box<ConstraintExpr> },
    Boolean { op: BooleanOp, terms: Vec<ConstraintExpr> },
}
}

Direct string-to-Rust closure generation is forbidden because it is difficult to audit, hard to fuzz, and prone to semantic drift.

9.2 Supported XPath Profile

The initial SDK profile MUST support the XPath subset required by OpenPacketCore YANG models and selected IETF/3GPP dependencies. Unsupported expressions MUST fail generation with a clear diagnostic, not become runtime warnings.

The supported function list must be versioned. Each function implementation MUST have:

Unit tests.
Source-location diagnostics.
Differential tests against the reference YANG engine.

9.3 Validation Engine

Generated validation MUST be split:

validate_types
validate_cardinality
validate_choices
validate_when
validate_must
validate_leafrefs
validate_unique
validate_semantics hook for NF-owned logic

Validators MUST return structured errors:

#![allow(unused)]
fn main() {
pub struct ValidationError {
    pub path: YangPath,
    pub code: ValidationCode,
    pub message: String,
    pub source: Option<YangSourceLocation>,
}
}

Messages MUST be safe for northbound clients and MUST NOT expose secrets.

10. Leafref and Indexing

The initial draft required a depth-first search for each leafref. That is not acceptable for large configs.

The generator MUST create validation indices for referenced lists and leaves:

#![allow(unused)]
fn main() {
pub struct ValidationIndices<'a> {
    pub interfaces_by_name: BTreeMap<&'a str, &'a Interface>,
    pub slices_by_s_nssai: BTreeMap<SNssaiKeyRef<'a>, &'a Slice>,
}
}

Validation flow:

Build indices in deterministic order.
Reject duplicate keys.
Validate all leafref constraints using the indices.
Drop indices before publication.

Index building MUST be iterative and bounded by the configured validation memory budget.

11. Memory Safety and Stack Discipline

Generated code MUST be safe Rust by default.

11.1 Stack Budget

The generator MUST calculate size_of::<T>() for generated root and nested types through compile-time tests. Any type exceeding budget must be boxed, interned, or represented through a collection.

11.2 Traversal

Generated validation and serialization MUST avoid unbounded recursive traversal. Implementations SHOULD use explicit stacks:

#![allow(unused)]
fn main() {
let mut work = Vec::with_capacity(initial_capacity);
work.push(NodeRef::Root(root));
while let Some(node) = work.pop() {
    // validate node and push children
}
}

The SDK MUST define a maximum schema depth and maximum instance depth. Exceeding either MUST fail parsing or validation with a structured error.

11.3 Drop Behavior

Generated models MUST NOT create recursive self-referential types. If future extensions introduce recursive structures, the generator must provide iterative drop or arena ownership to avoid stack overflow.

11.4 Large Configs

The generator MUST support configs with:

100,000 list entries in a single keyed list.
1,000,000 scalar leaves across the tree in stress tests.
Deep but valid schemas up to the configured maximum depth.

Stress tests must verify no stack overflow and bounded peak memory.

12. Patch Application

Generated patch applicators MUST support:

gNMI Update
gNMI Replace
gNMI Delete
NETCONF merge
NETCONF replace
NETCONF create
NETCONF delete
NETCONF remove

Patch behavior MUST be generated from schema metadata, not hand-written per NF.

Patch application MUST:

Validate path existence and key predicates.
Preserve YANG default semantics.
Enforce list key immutability.
Enforce choice/case exclusivity.
Track changed paths for NACM and audit.
Avoid mutating running; only candidate may be modified.

13. Secret and Redaction Metadata

The generator MUST mark fields as secret when indicated by:

opc:secret
tailf:display-hint "password"
configured extension registry entries
explicit projection profile overrides

Generated secret fields SHOULD use a secret-aware type:

#![allow(unused)]
fn main() {
pub struct SecretLeaf<T> {
    inner: secrecy::SecretBox<T>,
}
}

Generated Debug, audit, telemetry, and error rendering MUST redact these values. Serialization for persistence may include encrypted secret values only through the RFC 001/RFC 003 envelope.

14. Operator Schema Boundary

The generator MUST expose enough Rust schema metadata for operator policy code to validate compatibility, migrations, admission, and config-apply decisions without hand-maintained side schemas.

Generated schema metadata MUST include:

canonical YANG paths and module identity;
config/state classification;
list-key ordering;
NACM action mapping;
redaction data classes;
schema digest data for compatibility checks.

The SDK does not generate Go structs or Kubernetes CRD fragments from opc-yanggen. Product operators own their Kubernetes API shape and may use the Rust operator-lifecycle, operator-controller, and operator-lifecycle-cli contracts to bridge those APIs into the SDK policy surface. Large NF configs are therefore split, referenced, or summarized by the product operator rather than by the YANG generator.

15. Schema Migration

Generated code MUST include schema digest metadata. On startup, RFC 001 uses the digest to determine whether persisted config can be loaded directly or requires migration.

Migration support MUST provide:

#![allow(unused)]
fn main() {
pub trait ConfigMigration {
    fn from_schema(&self) -> SchemaDigest;
    fn to_schema(&self) -> SchemaDigest;
    fn migrate(&self, input: serde_json::Value) -> Result<serde_json::Value, MigrationError>;
}
}

Migrations MUST be deterministic and tested with golden inputs.

16. Implementation Contracts

To keep the generated system modular and reviewable, every generated module MUST follow this layout:

generated/<module_name>/
  mod.rs
  types.rs
  paths.rs
  serde.rs
  validate.rs
  patch.rs
  metadata.rs
  redaction.rs
  tests/
    roundtrip.rs
    validation.rs
    patch.rs

Rules:

Hand-written code MUST NOT edit generated files.
Generated files MUST contain a header with generator version and schema digest.
Public generated APIs MUST be documented with YANG path and source module.
Each generated validation function MUST be small enough for review and have a stable name derived from the YANG path.
Conformance tags for RFC 006 MUST be emitted near the generated item that implements the requirement.

17. Testing Requirements

17.1 Generator Tests

Deterministic output for identical inputs.
Stable schema digest.
Unsupported YANG feature fails generation.
Differential validation against reference YANG engine.
Source-location diagnostics.

17.2 Generated Code Tests

RFC 7951 round trips for every scalar type.
Presence/default serialization modes.
Leafref validation with large lists.
must and when validation.
Choice/case exclusivity.
Patch operation matrix.
Secret redaction.
Stack size compile-time checks.

17.3 Fuzzing

Fuzz targets MUST include:

RFC 7951 JSON deserialization.
Path parsing.
Patch application.
Constraint evaluator.

Fuzz failures MUST be minimized and committed as regression tests.

17.4 Performance Gates

Minimum gates for a generated carrier profile:

Deserialize 10 MiB RFC 7951 config without stack overflow.
Validate 100,000 keyed list entries with leafrefs in O(n log n) or better.
Patch a single leaf in a large config without full serialization.
Generated root size_of below configured budget.
No unbounded recursion in validation or serialization paths.

18. Extension Registry

The SDK MUST maintain a versioned extension registry:

[[extension]]
name = "opc:secret"
behavior = "secret"

[[extension]]
name = "tailf:display-hint"
value = "password"
behavior = "secret"

Unknown extensions default to ignore-with-warning only if the generation profile allows it. Carrier profiles SHOULD fail generation for unknown extensions that affect config, security, or validation behavior.

19. Acceptance Criteria

This RFC is implemented when:

Generated Rust preserves YANG presence, defaults, ordering, keys, and namespace semantics.
RFC 7951 round trips pass for all supported types.
Large config validation is bounded and does not use unbounded recursive DFS.
Unsupported XPath/YANG constructs fail generation with diagnostics.
Generated patch applicators support gNMI and NETCONF operation semantics.
Secret metadata integrates with audit redaction and persistence.
Operator policy helpers can consume generated schema metadata without a hand-maintained side schema or generated Go/Kubernetes projection.
Output is deterministic and suitable for parallel implementation.

OPC-SDK-RFC-003: Security Substrate

Status: Draft for Implementation
Version: 2.0.0
Date: 2026-05-19
Audience: SDK implementers, security engineers, operator authors, NF teams

1. Abstract

This RFC defines the OpenPacketCore security substrate: workload identity, transport security, authorization, key management, secret handling, audit integrity, and runtime security administration. It integrates SPIFFE/SPIRE, gNSI, NACM, AEAD envelope encryption, and tenant-aware policy into a coherent boundary suitable for carrier-grade cloud-native network functions.

The initial draft correctly selected SPIFFE and gNSI, but it did not define a strong enough multi-tenant carrier boundary, key lifecycle, replay controls, or break-glass governance. This version makes those contracts explicit.

2. Security Objectives

2.1 Security

Authenticate every workload and operator action with cryptographic identity.
Authorize every operation by tenant, role, transport, method, and YANG path.
Encrypt all sensitive persistent configuration and session state.
Keep secret material out of logs, telemetry, panic messages, and ordinary gNMI reads.
Provide tamper-evident audit and durable security event trails.
Fail closed on invalid identity, unknown issuer, expired certificate, failed authorization, key lookup failure, or audit integrity failure.

2.2 Performance

TLS rotation must not drop established data-plane sessions unless policy requires it.
Authorization decisions must be cacheable and bounded.
Crypto operations must use the RFC 001 crypto pool or equivalent offload so they do not starve async or data-plane workers.
Security checks on high-rate paths must avoid heap allocation in the common case.

2.3 Maintainability

Identity parsing, authorization, key lookup, and redaction must be separate modules with narrow APIs.
Policy documents must be versioned, validated, and testable offline.
Security defaults must live in one profile file, not scattered constants.
The same security metadata must drive NACM, audit, and evidence generation.

2.4 Functionality

Support SPIFFE X.509-SVID identity.
Support trust domain federation.
Support gNSI certificate and authorization services.
Support break-glass with strict governance.
Support tenant-aware policy.
Support key rotation and historical decryption.

3. Threat Model

The SDK assumes attackers may:

Control an unprivileged pod in the same Kubernetes cluster.
Control another tenant namespace.
Replay old management-plane requests.
Attempt confused-deputy attacks through the operator.
Read persistent volumes or backend snapshots offline.
Corrupt local database files.
Delay, drop, or reorder network packets.
Trigger malformed gNMI, NETCONF, gNSI, or protocol inputs.
Observe timing, status codes, and logs.
Compromise a single NF replica.

The SDK does not claim to survive:

Total compromise of the root trust domain signing keys.
Compromise of the active KMS/HSM root keys without detection.
Kernel-level compromise of the node running the NF.
Malicious code compiled into the NF binary.

These residual risks MUST be documented in RFC 006 known gaps.

4. Identity Model

4.1 SPIFFE Workload Identity

Every NF replica MUST obtain an X.509-SVID from the local SPIRE Workload API.

Default SPIFFE ID format:

spiffe://<trust-domain>/tenant/<tenant-id>/ns/<namespace>/sa/<service-account>/nf/<nf-kind>/instance/<instance-id>

The original namespace/service-account pattern is insufficient for multi-tenant carrier isolation because namespaces are often operational boundaries, not contractual tenant boundaries. tenant-id MUST be explicit unless the deployment uses one trust domain per tenant.

4.2 Identity Claims

The SDK MUST parse the SVID into:

#![allow(unused)]
fn main() {
pub struct WorkloadIdentity {
    pub trust_domain: TrustDomain,
    pub tenant: TenantId,
    pub namespace: Namespace,
    pub service_account: ServiceAccount,
    pub nf_kind: NetworkFunctionKind,
    pub instance: InstanceId,
    pub spiffe_id: SpiffeId,
    pub expires_at: Timestamp,
}
}

Identity parsing MUST reject:

Unknown path formats.
Missing tenant.
Invalid NF kind.
Expired SVID.
SVIDs with trust domains not present in the active bundle set.

4.3 Workload Attestation

SPIRE registration entries MUST bind identity to Kubernetes selectors such as:

namespace
service account
pod label set
node attestation policy
image digest, when available through the attestor

The SDK MUST document the required SPIRE registration pattern. Relying only on service account name is not sufficient for production carrier profiles.

4.4 Trust Domain Federation

Federation MUST be explicit. The SDK MUST load and validate trust bundles for:

local workload trust domain
management/operator trust domain
optional peer-region trust domains

Federation policy MUST define which remote trust domains may perform which actions. Accepting a federated bundle MUST NOT automatically grant management privileges.

Example:

[[federation]]
trust_domain = "operator.openpacketcore.example"
allowed_tenants = ["tenant-a"]
allowed_roles = ["config-admin", "security-admin"]
allowed_transports = ["gnmi", "gnsi"]

4.5 Rotation

The SDK MUST watch SVID and bundle updates and hot-reload TLS acceptors and clients without process restart.

Rotation requirements:

New connections use the latest identity immediately after reload.
Existing connections are reauthenticated on stream boundaries or at a configurable maximum connection age.
Expired identities are not accepted.
Bundle removal revokes future handshakes.
Rotation failures emit critical telemetry.

5. Transport Security

5.1 gRPC Transports

gNMI, gNSI, and internal gRPC APIs MUST use mTLS with SPIFFE identity verification.

Requirements:

TLS 1.3 required by default.
TLS 1.2 disabled by default and only allowed by explicit compatibility profile.
Peer certificate SAN MUST contain a valid SPIFFE URI.
Common Name MUST NOT be used for authorization.
ALPN and service/method authorization MUST be enforced.
Certificates MUST be validated against active SPIFFE bundles, not system web PKI.

5.2 Cipher Suites

Default modern profile:

TLS_AES_256_GCM_SHA384
TLS_CHACHA20_POLY1305_SHA256

FIPS profile:

MUST use a FIPS 140-3 validated module and only approved algorithms.
MUST document any difference from the modern profile.
MUST disable algorithms not available through the validated boundary.

The SDK MUST expose the selected security profile in metrics and evidence.

5.3 NETCONF over SSH

If NETCONF/SSH is enabled:

SSH host keys MUST be generated or provisioned through the security substrate.
Client identity MUST map to a TrustedPrincipal.
Password authentication MUST be disabled by default.
SSH certificate authorities SHOULD be used when SPIFFE-native SSH identity is unavailable.
SSH authorization MUST flow through the same NACM engine as gNMI.

6. Authorization

6.1 Principal Model

#![allow(unused)]
fn main() {
pub struct TrustedPrincipal {
    pub identity: WorkloadIdentity,
    pub tenant: TenantId,
    pub roles: Vec<Role>,
    pub groups: Vec<Group>,
    pub auth_strength: AuthStrength,
}
}

Roles and groups MUST come from signed policy or trusted identity attributes. They MUST NOT be accepted from unsigned client metadata.

6.2 Policy Layers

Authorization is evaluated in this order:

Transport and peer authentication.
Trust domain allowlist.
Tenant boundary check.
gRPC service/method authorization.
NACM/YANG path authorization.
Operation-specific guardrails, such as break-glass or key export denial.

Any deny at any layer is final unless a governed break-glass flow applies.

6.3 NACM Requirements

NACM MUST authorize:

read
create
update
replace
delete
exec
subscribe
security-admin

The engine MUST evaluate all changed paths after patch expansion. It is not enough to authorize the request's root path.

Authorization decisions SHOULD be cached by:

principal digest
tenant
policy version
normalized path
action

Cache entries MUST be invalidated on policy updates and SVID rotation.

6.4 Multi-Tenant Boundary

Cross-tenant access is denied by default. A principal from tenant A MUST NOT read or mutate tenant B config, session state, keys, or audit records unless a federated policy explicitly grants a scoped operation.

The tenant boundary MUST be enforced in:

identity parsing
authorization
persistence key namespace
session key namespace
audit query filters
telemetry labels, with cardinality controls
operator reconciliation

7. gNSI Services

The SDK MUST provide server-side support for:

Service	Purpose	SDK Component
`gnsi.certz.v1`	Certificate and trust material distribution	`opc-gnsi-server`
`gnsi.pathz.v1`	Path authorization policy	`opc-nacm`
`gnsi.authz.v1`	gRPC service/method authorization	`opc-nacm`

gNSI endpoints are security-critical. Access MUST require security-admin or a more specific role. gNSI mutations MUST be audited and persisted through the shadow-security store from RFC 001.

7.1 Shadow Security Store

Security material pushed through gNSI is stored in shadow-security.

Rules:

Not visible through ordinary gNMI Get.
Exportable only through explicitly authorized security APIs.
Encrypted at rest with a distinct key purpose from normal config.
Included in backup only when backup policy allows secret material.
Redacted in audit and telemetry.

7.2 Policy Staging

Authorization policy updates MUST support validate-only and staged apply. A policy that would lock out all security administrators MUST be rejected unless a break-glass recovery policy exists.

8. Break-Glass

Break-glass is dangerous and MUST be treated as an exceptional workflow, not a convenience override.

Requirements:

Disabled by default in production profiles unless explicitly enabled.
Requires a high-assurance principal.
Requires reason, ticket/reference, requested scope, and duration.
Maximum default duration: 15 minutes.
Requires dual authorization or an externally signed emergency token in carrier profiles.
Cannot bypass cryptographic verification, tenant boundary, or audit logging.
Cannot export raw key material unless a separate key recovery policy allows it.
Emits critical audit events at start, use, and expiry.
Emits high-priority telemetry.

Break-glass must grant the narrowest possible action set and path set.

9. Key Management

9.1 Key Hierarchy

The SDK uses purpose-separated keys:

Purpose	Example Use
`config`	RFC 001 encrypted config blobs
`shadow-security`	gNSI security material
`session`	RFC 004 session store data
`audit`	HMAC hash chains
`backup`	encrypted export bundles

Keys MUST be separated by KMS key ID or HKDF info labels. Reusing one raw key for multiple purposes is forbidden.

9.2 Key Sources

Production profiles MUST obtain root or wrapping keys from one of:

KMS plugin.
HSM plugin.
Kubernetes Secret encrypted by a cluster KMS provider, only for lower assurance profiles.
SPIRE/SVID-authenticated key service.

Environment variables are forbidden for production key material.

9.3 Key Lookup API

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
pub trait KeyProvider: Send + Sync {
    async fn get_active_key(&self, purpose: KeyPurpose, tenant: &TenantId)
        -> Result<KeyHandle, KeyError>;
    async fn get_key_by_id(&self, key_id: &KeyId)
        -> Result<KeyHandle, KeyError>;
    async fn rotate_key(&self, purpose: KeyPurpose, tenant: &TenantId)
        -> Result<KeyId, KeyError>;
}
}

KeyHandle MUST avoid exposing raw bytes unless required by the crypto module. If raw bytes are materialized, they MUST be zeroized after use where the crypto backend permits.

9.4 Rotation

Key rotation MUST support:

New writes using the active key.
Old reads using key ID from the envelope.
Optional background re-encryption.
Retention windows.
Emergency key revocation.

If a key is unavailable, the SDK MUST fail closed for writes and for reads that require the missing key.

10. AEAD Envelope Encryption

10.1 Default Profile

Default persistent encryption uses AES-256-GCM-SIV for misuse resistance. Nonce reuse is still a bug and MUST be monitored.

10.2 FIPS Profile

Some FIPS validated modules may not expose AES-GCM-SIV. A FIPS profile MAY use AES-256-GCM only when:

Nonces are generated by a validated DRBG or deterministic counter scheme.
Nonce uniqueness is guaranteed per key.
The uniqueness state is crash-safe.
Tests prove duplicate nonce detection.

The active AEAD algorithm MUST be recorded in each envelope and in RFC 006 evidence.

10.3 Associated Data

AAD MUST bind ciphertext to:

tenant
purpose
tx/session identifier
schema digest or state type
key ID
version
principal, for config commits

AAD mismatch MUST produce a generic integrity error without exposing which field failed.

10.4 Replay and Rollback

Encryption alone does not prevent replay of an old valid blob. The management store MUST enforce monotonic config versions as specified in RFC 001. Session store backends MUST use generation numbers or lease fencing as specified in RFC 004.

11. Audit Security

11.1 Hash Chain

Audit records MUST include:

entry_hmac = HMAC(audit_key, tenant || sequence || canonical_entry || previous_hash)

The hash chain MUST be tenant-scoped and purpose-separated. Startup MUST verify the local audit chain unless the operator explicitly configures degraded recovery mode.

11.2 External Audit Sink

Carrier profiles SHOULD stream audit events to an external append-only system. Local SQLite audit is necessary for recovery and debugging but is not sufficient against host-level compromise.

11.3 Time

Audit timestamps MUST use UTC. The SDK SHOULD record both wall-clock timestamp and monotonic sequence number. Security decisions MUST NOT rely only on wall clock when monotonic ordering is required.

12. Redaction

The redaction subsystem consumes metadata generated by RFC 002.

Redaction MUST apply to:

Debug
structured logs
audit records
metrics labels
error messages
traces
panic hooks where possible
gNMI read responses after NACM filtering

Redaction MUST preserve enough information for debugging, such as value presence, length class, or stable digest when explicitly allowed by policy.

13. Observability

Required metrics:

opc_security_authn_total{outcome,reason,transport}
opc_security_authz_total{outcome,reason,action}
opc_security_svid_expires_seconds
opc_security_bundle_version
opc_security_rotation_total{kind,outcome}
opc_security_key_lookup_total{purpose,outcome}
opc_security_breakglass_active
opc_security_breakglass_total{outcome}
opc_security_audit_chain_verify_total{outcome}
opc_security_redactions_total{source}

Metrics MUST control label cardinality. Raw SPIFFE IDs SHOULD be exposed through logs, not high-cardinality metrics, unless explicitly enabled.

14. Module Ownership

Module	Responsibility
`opc-identity`	SPIFFE ID parsing, SVID watch, trust bundle watch
`opc-tls`	TLS acceptor/client reload and peer extraction
`opc-authz`	Principal, roles, method policy, decision cache
`opc-nacm`	YANG path authorization and RFC 8341 semantics
`opc-gnsi-server`	gNSI service handlers and staged policy apply
`opc-key`	KeyProvider trait and KMS/HSM adapters
`opc-crypto`	AEAD envelopes and key derivation
`opc-redaction`	Secret metadata and safe rendering
`opc-audit`	HMAC chain, external sink adapter
`opc-security-testkit`	fake SPIRE, fake KMS, policy fixtures

Agents must not mix transport identity parsing with NACM path logic. Each module should have deterministic test fixtures and no hidden global state.

15. Testing Requirements

15.1 Unit Tests

SPIFFE ID parser accepts valid pattern and rejects malformed identities.
Federation allowlist denies unknown trust domains.
Authorization cache invalidates on policy version change.
NACM denies missing rules.
Redaction covers generated secret fields.
AEAD envelope rejects wrong AAD, wrong key, corrupted tag, and wrong tenant.
Break-glass scope and TTL enforcement.

15.2 Integration Tests

SVID rotation without process restart.
Trust bundle rotation revokes removed trust domain.
gNSI policy staging and rollback.
Management commit rejected after NACM policy update removes permission.
Shadow-security store not visible through ordinary gNMI Get.
Key rotation reads old commits and writes new commits.
External audit sink outage does not drop local audit.

15.3 Fault Injection

SPIRE socket unavailable.
Expired SVID.
Corrupt trust bundle.
KMS timeout.
Missing historical key.
Duplicate AEAD nonce detector trigger, when applicable.
Audit HMAC mismatch.
Break-glass token replay.

15.4 Performance Gates

Authorization decision cache p99 under 50 microseconds for hot entries.
TLS reload completes without blocking new accepts longer than 100 milliseconds on reference hardware.
Key lookup cache hit p99 under 25 microseconds.
Redaction of a 10 MiB config audit diff completes within configured commit budget.

16. Acceptance Criteria

This RFC is implemented when:

Every management connection is authenticated with SPIFFE-aware mTLS or an explicitly configured SSH identity profile.
Tenant identity is explicit and enforced across authz, persistence, audit, and telemetry.
gNSI services can stage, validate, apply, audit, and roll back security policy.
Config, shadow-security, session, and audit keys are purpose-separated and rotatable.
AEAD envelopes bind ciphertext to tenant, purpose, version, and schema/state metadata.
Break-glass is scoped, time-limited, audited, and disabled by default in production unless carrier policy enables it.
Security failure modes fail closed and are covered by fault injection tests.

OPC-SDK-RFC-004: High-Performance Session Store

Status: Draft for Implementation
Version: 2.0.0
Date: 2026-05-19
Audience: SDK implementers, NF owners, data-plane engineers, reliability engineers

1. Abstract

This RFC defines opc-session-store, the SDK substrate for high-rate network function state such as PDU sessions, PFCP associations, TEID mappings, QoS flow state, handover coordination metadata, and data-plane derived counters that need controlled persistence.

The initial draft correctly identified the need for partitioning, local-first operation, and distributed leases. It was not strict enough for 5G continuity: last-writer-wins based on synchronized clocks is not safe for authoritative session state. This version requires monotonic fencing tokens, compare-and-set updates, owner epochs, explicit handover state transitions, and a documented consistency model per data class.

2. Scope

2.1 In Scope

Per-session control-plane state needed by AMF, SMF, UPF, and related NFs.
Data-plane lookup state that can be safely snapshotted or reconstructed.
Lease and fencing mechanisms for single-owner session mutation.
Local cache and distributed backend abstraction.
Geo-redundant replication for disaster recovery and warm standby.
Serialization, encryption, integrity, TTL, metrics, and fault injection.

2.2 Out of Scope

Configuration management. See RFC 001.
Packet parsing and protocol codecs. See RFC 005.
Full 3GPP procedure implementation. This RFC provides storage primitives and state-machine support used by NF-specific procedure logic.
Hard real-time packet forwarding in the remote store. Packet fast paths must use local data-plane structures.

3. Design Goals

3.1 Security

Encrypt session state before it leaves process memory unless the backend is explicitly trusted by profile.
Bind encrypted records to tenant, NF kind, session key, generation, and state type through AEAD AAD.
Prevent stale owners from overwriting newer session state.
Prevent cross-tenant key collision or data exposure.
Redact SUPI/GPSI and other subscriber identifiers in logs by default.

3.2 Performance

Keep packet forwarding off the remote store path.
Support 100,000+ session updates/second per NF replica for local in-memory or batched backend profiles.
Keep hot read p99 below 1 ms for local-cluster operations where the selected backend can meet it.
Provide bounded allocation and zero-copy or low-copy decode for common session reads.
Support batching, pipelining, and async replication without sacrificing fencing correctness.

3.3 Maintainability

Separate storage API, lease API, serialization, encryption, and replication.
Require backend capability declarations so NF code does not assume semantics a backend cannot provide.
Use typed session records instead of arbitrary blobs at module boundaries.
Provide a deterministic testkit for split-brain, failover, and handover races.

3.4 Functionality

Support create, get, update, delete, compare-and-set, TTL refresh, lease, renew, release, snapshot, and replication.
Support session handover prepare/activate/abort flows.
Support backend implementations for in-memory, Redis, Aerospike, and optional strongly consistent stores.
Support region-aware replication and recovery.

4. State Classes

The SDK distinguishes state by consistency need:

Class	Examples	Consistency Requirement
`authoritative-session`	PDU session owner, AMF/SMF ownership, handover phase	Single writer with fencing
`dataplane-lookup`	TEID to session mapping, FAR/QER/PDR snapshots	Local atomic snapshot, rebuildable
`replicated-dr`	Warm standby copy of session records	Async, ordered by generation
`telemetry-derived`	Counters, rates, last seen timestamps	Mergeable or lossy
`ephemeral-procedure`	Temporary handover transaction state	TTL, fenced owner

Only telemetry-derived state may use last-writer-wins based on timestamps. Authoritative session state MUST NOT use wall-clock LWW.

5. Session Identity

Session keys MUST be tenant-scoped and type-scoped:

#![allow(unused)]
fn main() {
pub struct SessionKey {
    pub tenant: TenantId,
    pub nf_kind: NetworkFunctionKind,
    pub key_type: SessionKeyType,
    pub stable_id: bytes::Bytes,
}
}

Examples:

SUPI-derived subscriber context key.
PDU session ID plus SUPI hash.
TEID mapping key.
PFCP session SEID key.
Handover transaction key.

Raw SUPI/GPSI MUST NOT be used directly as a backend key in production. The SDK SHOULD derive stable keys with a tenant-specific keyed hash.

6. Backend Capability Model

The initial get/set/delete trait is too weak. Backends MUST declare capabilities:

#![allow(unused)]
fn main() {
pub struct BackendCapabilities {
    pub atomic_compare_and_set: bool,
    pub monotonic_fencing_token: bool,
    pub per_key_ttl: bool,
    pub server_side_lease_expiry: bool,
    pub ordered_replication_log: bool,
    pub batch_write: bool,
    pub watch: bool,
    pub max_value_bytes: usize,
}
}

Carrier profiles MUST reject a backend for authoritative-session state unless it supports atomic compare-and-set and monotonic fencing tokens or an adapter can provide equivalent semantics.

7. Storage API

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
pub trait SessionBackend: Send + Sync {
    async fn capabilities(&self) -> BackendCapabilities;

    async fn get(&self, key: &SessionKey)
        -> Result<Option<StoredSessionRecord>, StoreError>;

    async fn compare_and_set(&self, op: CompareAndSet)
        -> Result<CompareAndSetResult, StoreError>;

    async fn delete_fenced(&self, key: &SessionKey, fence: FenceToken)
        -> Result<(), StoreError>;

    async fn refresh_ttl(&self, key: &SessionKey, fence: FenceToken, ttl: Duration)
        -> Result<(), StoreError>;

    async fn batch(&self, ops: Vec<SessionOp>)
        -> Result<Vec<SessionOpResult>, StoreError>;
}
}

set without fencing is allowed only for state classes that explicitly do not require authoritative ownership.

8. Record Format

#![allow(unused)]
fn main() {
pub struct StoredSessionRecord {
    pub key: SessionKey,
    pub generation: Generation,
    pub owner: OwnerId,
    pub fence: FenceToken,
    pub state_class: StateClass,
    pub state_type: StateType,
    pub expires_at: Option<Timestamp>,
    pub payload: EncryptedSessionPayload,
}
}

generation is a monotonic per-session version. Every authoritative update MUST increment it atomically.

9. Lease and Fencing

9.1 Lease API

#![allow(unused)]
fn main() {
#[async_trait::async_trait]
pub trait SessionLeaseManager: Send + Sync {
    async fn acquire(&self, key: &SessionKey, owner: OwnerId, ttl: Duration)
        -> Result<LeaseGuard, LeaseError>;

    async fn renew(&self, lease: &LeaseGuard, ttl: Duration)
        -> Result<LeaseGuard, LeaseError>;

    async fn release(&self, lease: LeaseGuard)
        -> Result<(), LeaseError>;
}

pub struct LeaseGuard {
    pub key: SessionKey,
    pub owner: OwnerId,
    pub fence: FenceToken,
    pub acquired_at: Timestamp,
    pub expires_at: Timestamp,
}
}

9.2 Fencing Rules

Every successful lease acquisition MUST produce a monotonic fencing token for that session key. Backends MUST reject any write with a token lower than the current recorded token.

This prevents an old owner whose lease expired during a pause or partition from overwriting a newer owner after it resumes.

9.3 Lease Expiry

Lease expiry alone is not correctness. It is only a liveness mechanism. Safety comes from fencing.

Rules:

Lease TTL MUST be longer than worst-case expected procedure pause plus backend failover detection time.
Renewals MUST happen before 50 percent of TTL elapsed by default.
A failed renewal MUST stop authoritative writes immediately.
Owners MUST treat unknown lease state as lost.
Stale writes MUST fail with a distinct StaleFence error.

9.4 Backend Notes

Redis implementations MUST use atomic Lua scripts or equivalent server-side transactions for acquire, renew, and fenced CAS. Redis deployments that can lose acknowledged writes during failover MUST NOT be used for strict authoritative state without an external consensus/fencing source.
Aerospike implementations SHOULD use generation checks and record UDF or transaction mechanisms where available.
In-memory backend is for single-process tests or single-replica development unless paired with a consensus lease manager.
Strongly consistent stores may be used for leases even when bulk state is in a faster backend.

10. 3GPP Session Continuity and Handover

10.1 Storage Guarantees Needed by Handover

5G handover procedures require avoiding duplicate authoritative writers while preserving continuity of PDU session and bearer/QoS state. The store must support:

Idempotent procedure steps.
Prepared-but-not-active state.
Activation with a fencing token.
Abort/rollback of prepared handover.
Recovery after source or target NF restart.
Detection of stale source updates after target activation.

A lease mechanism without fencing is not sufficient.

10.2 Handover State Machine

The SDK provides generic storage states:

#![allow(unused)]
fn main() {
pub enum HandoverPhase {
    Stable,
    Preparing { tx: HandoverTxId, target: OwnerId },
    Prepared { tx: HandoverTxId, target: OwnerId },
    Activating { tx: HandoverTxId, target: OwnerId },
    Active { owner: OwnerId },
    Aborting { tx: HandoverTxId },
}
}

NF-specific AMF/SMF/UPF logic maps 3GPP procedure messages to these states.

10.3 Procedure Rules

The session store MUST support these generic steps:

Source owner holds a valid lease.
Source creates Preparing record with current generation.
Target acquires or is assigned a higher fence for activation.
Target writes Prepared with expected generation.
Activation performs a fenced CAS to Active { owner: target }.
Source updates with old fence are rejected.
Abort performs a fenced CAS back to Stable if activation did not complete.

All steps MUST be idempotent by HandoverTxId.

10.4 Packet Continuity

The session store does not itself guarantee zero packet loss. It provides the state consistency needed by NFs to implement make-before-break, buffering, or tunnel switching. NF-specific procedures MUST state their packet continuity behavior and evidence in RFC 006 reports.

11. Geo-Redundancy

11.1 Corrected Consistency Model

Asynchronous geo-replication is suitable for disaster recovery and warm standby. It is not sufficient for strict active/active mutation of the same authoritative session unless a higher-level single-owner protocol is used.

Authoritative state MUST use one of:

Home-region ownership per session.
Explicit ownership transfer with fencing.
A strongly consistent multi-region backend, if the deployment accepts the latency cost.

Wall-clock last-writer-wins is forbidden for authoritative session state.

11.2 Replication Log

Backends SHOULD expose an ordered replication log:

#![allow(unused)]
fn main() {
pub struct ReplicationEvent {
    pub key: SessionKey,
    pub generation: Generation,
    pub fence: FenceToken,
    pub state_class: StateClass,
    pub payload_digest: Sha256Digest,
    pub encrypted_payload: EncryptedSessionPayload,
}
}

Replicas MUST apply events only if generation and fence are newer according to the state class rules.

11.3 RPO and RTO

Every deployment profile MUST publish:

Recovery point objective for session state.
Recovery time objective for session service.
Maximum tolerated replication lag.
Which state classes are replicated.
Which state classes are rebuildable.

12. Serialization

Rust has no garbage collector, so the goal is allocation, CPU, and cache efficiency rather than "GC pressure" reduction.

12.1 Formats

Allowed formats:

FlatBuffers for read-mostly zero-copy records.
Prost/Protobuf for compatibility, with careful allocation profiling.
Postcard or bincode-like formats only for internal state with stable version policy.

Each state type MUST define:

schema version
compatibility policy
max encoded size
fuzz target
migration path

12.2 Decode Rules

Decoders MUST:

Validate length prefixes and offsets.
Reject trailing garbage unless explicitly allowed.
Avoid borrowing data beyond the lifetime of the source buffer.
Avoid panics on corrupt data.
Support partial decode for lookup keys where useful.

13. Local Cache

The SDK SHOULD provide a two-level model:

Local in-process cache for hot reads.
Distributed backend for ownership, recovery, and replication.

Cache entries MUST include generation and fence. Stale cache entries MUST NOT be used for authoritative writes. Data-plane lookup snapshots SHOULD be updated through atomic swap or RCU-like mechanisms.

Cache invalidation options:

backend watch stream
polling by generation
explicit publish from owner
TTL expiry

NF owners must choose a cache mode per state class.

14. Security

14.1 Encryption

Session payloads MUST be encrypted before storage unless the profile explicitly marks the backend as inside the same cryptographic boundary.

AAD MUST include:

tenant
NF kind
session key digest
state type
generation
fence
backend namespace

14.2 Integrity

AEAD integrity is required. Additional MAC fields MAY be used for backends that need independent integrity checks, but they do not replace AEAD.

14.3 Privacy

Logs and metrics MUST NOT expose raw subscriber identifiers. The SDK SHOULD use stable keyed digests for correlation when needed.

15. Observability

Required metrics:

opc_session_store_ops_total{op,state_class,outcome}
opc_session_store_latency_seconds{op,state_class}
opc_session_store_cas_conflicts_total{state_class}
opc_session_store_stale_fence_total{state_class}
opc_session_lease_acquire_total{outcome}
opc_session_lease_renew_total{outcome}
opc_session_lease_lost_total{reason}
opc_session_replication_lag_seconds{region}
opc_session_cache_hit_ratio{state_class}
opc_session_record_bytes{state_type}

Required logs for state transitions:

session_key_digest
tenant
state_class
generation
fence
owner
handover_tx_id, when applicable
outcome

Raw subscriber identifiers MUST be redacted.

16. Module Ownership

Module	Responsibility
`opc-session-model`	Keys, record headers, generations, state classes
`opc-session-backend`	Backend trait and capability model
`opc-session-lease`	Lease manager and fencing rules
`opc-session-cache`	Local cache and snapshot publication
`opc-session-codec`	Session serialization and migrations
`opc-session-crypto`	Payload envelope integration with RFC 003
`opc-session-replication`	Region log and apply rules
`opc-handover`	Generic handover storage state machine
`opc-session-testkit`	Fake backend, split-brain tests, stale fence tests

Agents implementing backends must not modify NF-specific handover logic. Agents implementing handover logic must use the public lease/CAS APIs and not bypass fencing.

17. Testing Requirements

17.1 Unit Tests

Session key tenant separation.
CAS success and conflict.
Stale fence rejection.
Lease acquire/renew/release.
TTL refresh with valid and stale fences.
Serialization corrupt input rejection.
AEAD AAD mismatch rejection.
Cache generation checks.

17.2 Integration Tests

Two owners racing for the same session.
Owner pause beyond TTL, new owner writes, old owner resumes and is rejected.
Handover prepare/activate/abort idempotency.
Backend restart with leases recovered or invalidated according to profile.
Geo-replication applies newer generation and rejects older generation.
Cache invalidation after remote update.

17.3 Fault Injection

Backend timeout.
Partial batch failure.
Redis/Aerospike failover.
Clock skew.
Network partition between owners and backend.
Replication lag spike.
Corrupt encrypted payload.
Missing session key decryption key.

17.4 Performance Gates

Profiles must state which backend they apply to. Minimum SDK reference gates:

Local cache read p99 under 50 microseconds.
In-memory fenced CAS p99 under 100 microseconds.
Backend adapter exposes measured p50/p99 for get, CAS, lease acquire, and renew.
100,000 updates/second per replica for in-memory or batched local profile.
No packet fast-path benchmark depends on remote backend availability.

18. Acceptance Criteria

This RFC is implemented when:

Authoritative session writes require monotonic fencing and CAS.
Stale owners cannot overwrite newer session state after lease expiry.
Handover state transitions are idempotent and recoverable.
Geo-replication does not use wall-clock LWW for authoritative state.
Backend capabilities are declared and enforced by profile.
Session payloads are encrypted and tenant-bound.
Local cache supports fast reads without compromising write correctness.
Fault injection covers split-brain, failover, replication lag, and stale fences.

OPC-SDK-RFC-005: Zero-Copy Protocol Framework

Status: Draft for Implementation
Version: 2.0.0
Date: 2026-05-19
Audience: SDK implementers, protocol crate authors, fuzzing engineers, NF teams

1. Abstract

This RFC defines the protocol codec framework for OpenPacketCore. It covers zero-copy parsing, encoding, lifetime discipline, allocation budgets, parser security, fuzzing, conformance tags, and implementation layout for 3GPP and IETF protocol crates.

The initial draft correctly required nom, bytes, fuzzing, and exact spec citations. It was incomplete in two areas: the codec trait did not express borrowed lifetimes safely, and the round-trip property was too simplistic for protocols with canonical encodings, unknown fields, padding, or lossy normalization. This version corrects those issues.

2. Scope

2.1 In Scope

Binary protocol parsing and encoding.
Borrowed zero-copy PDU views.
Owned conversion for async and cross-thread use.
Length, bounds, recursion, and integer safety.
Fuzzing, property tests, and corpus management.
Spec traceability for RFC 006.
Protocol crate layout and module boundaries.

2.2 Out of Scope

Management config projection. See RFC 002.
Session persistence. See RFC 004.
Full NF procedure state machines.
Kernel bypass packet I/O frameworks, except for buffer ownership contracts.

3. Design Goals

3.1 Security

No out-of-bounds reads or writes.
No panics on untrusted input.
No unbounded recursion, loops, allocation, or CPU use from hostile packets.
Constant-time comparison for secrets, MACs, authentication tags, and keys.
Strict validation of length fields, IE cardinality, duplicate handling, and unknown critical elements.

3.2 Performance

Parse common fast-path headers without heap allocation.
Avoid copying payloads where a borrowed view is sufficient.
Encode into caller-provided buffers with exact or bounded capacity planning.
Support partial decode when only routing keys are needed.
Provide per-protocol allocation and latency budgets.

3.3 Maintainability

Each protocol crate uses the same module layout.
Every message and field cites the exact spec section/table.
Parser errors are structured and stable.
Unsafe code is forbidden by default.
Generated tables are separated from hand-written parser logic.

3.4 Functionality

Support borrowed and owned message representations.
Support streaming/incomplete input where protocols require reassembly.
Support extension headers and unknown IE preservation when required.
Support canonical encoding and raw-preserving encoding modes.

4. Parsing Model

4.1 Borrowed Views

Protocol decoders SHOULD return borrowed views over the input buffer:

#![allow(unused)]
fn main() {
pub struct GtpHeader<'a> {
    pub flags: u8,
    pub msg_type: u8,
    pub length: u16,
    pub teid: u32,
    pub payload: &'a [u8],
}
}

Borrowed views MUST NOT outlive the input buffer. They MUST NOT store pointers into mutable buffers that can be changed while the view exists.

4.2 Owned Messages

Every borrowed PDU that may cross an async boundary, thread boundary, queue, or long-lived store MUST provide an owned conversion:

#![allow(unused)]
fn main() {
pub trait ToOwnedPdu {
    type Owned;
    fn to_owned_pdu(&self) -> Self::Owned;
}
}

Owned PDUs MAY use bytes::Bytes to retain cheap shared ownership of the original packet.

4.3 No Self-Referential Types

Generated or hand-written protocol structs MUST NOT be self-referential. If a message needs both raw bytes and parsed fields, use either:

borrowed view tied to external input lifetime, or
owned Bytes plus offsets validated at construction.

5. Codec Traits

The SDK defines separate traits for borrowed decode, owned decode, and encode.

#![allow(unused)]
fn main() {
pub type DecodeResult<'a, T> = Result<(&'a [u8], T), DecodeError>;

pub trait BorrowDecode<'a>: Sized {
    fn decode(input: &'a [u8], ctx: DecodeContext) -> DecodeResult<'a, Self>;
}

pub trait OwnedDecode: Sized {
    fn decode_owned(input: bytes::Bytes, ctx: DecodeContext) -> Result<Self, DecodeError>;
}

pub trait Encode {
    fn encode(&self, dst: &mut bytes::BytesMut, ctx: EncodeContext) -> Result<(), EncodeError>;
    fn wire_len(&self, ctx: EncodeContext) -> Result<usize, EncodeError>;
}
}

This avoids pretending that a borrowed PDU can be represented by a lifetime-free Self.

5.1 Decode Context

#![allow(unused)]
fn main() {
pub struct DecodeContext {
    pub protocol_version: ProtocolVersion,
    pub max_depth: usize,
    pub max_ies: usize,
    pub max_message_len: usize,
    pub unknown_ie_policy: UnknownIePolicy,
    pub duplicate_ie_policy: DuplicateIePolicy,
    pub validation_level: ValidationLevel,
}
}

Protocol crates MUST define safe defaults.

5.2 Error Model

#![allow(unused)]
fn main() {
pub struct DecodeError {
    pub code: DecodeErrorCode,
    pub offset: usize,
    pub spec_ref: Option<SpecRef>,
}
}

Errors MUST be safe to expose in logs. They MUST NOT include raw packet payload unless debug packet capture is explicitly enabled.

6. `nom` Usage

nom is the default parser combinator framework for binary TLV, bitfield, and header-oriented protocols.

Rules:

Use nom::number::complete or nom::number::streaming deliberately.
Map nom::Err::Incomplete to a structured incomplete-input error.
Do not discard remaining input unless the message definition allows trailing padding.
Wrap nom errors at module boundaries; do not expose combinator internals in public API.
Prefer small named parser functions over deeply nested combinator expressions.

Protocols based on ASN.1 PER, JSON, HTTP/2, or other specialized encodings MAY use proven dedicated parsers instead of nom, but they must implement the same SDK codec, error, fuzzing, and evidence contracts.

7. Buffer Management

Encoders MUST use bytes::BytesMut or bytes::BufMut.

Encoding rules:

wire_len MUST use checked arithmetic.
encode MUST fail before writing if required capacity exceeds configured maximum.
Encoders SHOULD reserve exact capacity when cheap to compute.
Encoders MUST produce canonical output unless raw-preserving mode is selected.
Partial writes on error SHOULD be avoided. If unavoidable, document the behavior and do not reuse the buffer without caller awareness.

8. Allocation Budgets

Each protocol crate MUST define an allocation profile:

#![allow(unused)]
fn main() {
pub struct AllocationBudget {
    pub decode_heap_allocations_fast_path: usize,
    pub decode_max_temporary_bytes: usize,
    pub encode_max_temporary_bytes: usize,
}
}

Default fast-path target:

Fixed header decode: 0 heap allocations.
Routing-key partial decode: 0 heap allocations.
Full message decode: protocol-specific, bounded.

Variable IE lists SHOULD use:

iterators over borrowed IE views,
smallvec for small bounded lists,
caller-provided scratch buffers, or
validated owned vectors when required.

9. Security Invariants

9.1 Length and Offset Safety

All length calculations MUST use checked arithmetic. Parsers MUST verify:

field length is within remaining input,
nested IE length does not exceed parent length,
padding length is valid,
extension header chains terminate,
total parsed elements do not exceed max_ies,
recursion or nesting does not exceed max_depth.

9.2 Integer Safety

All offset, length, and capacity calculations MUST use:

checked_add
checked_sub
checked_mul
usize::try_from

Integer truncation with as is forbidden in parser and encoder length paths.

9.3 Constant-Time Operations

Constant-time comparison is REQUIRED for:

MACs
authentication tags
keys
nonces when secrecy or oracle behavior matters
authentication tokens

Checksums over public packet data do not require constant-time comparison, but checksum parsing must still be bounds-safe and panic-free.

9.4 Denial of Service Controls

Every decoder MUST enforce:

maximum message length,
maximum IE count,
maximum nesting depth,
maximum extension chain length,
maximum decompressed length if compression exists,
maximum parse time indirectly through bounded loops.

Protocol crates MUST expose these limits through profile configuration.

10. Validation Levels

The decoder supports levels:

#![allow(unused)]
fn main() {
pub enum ValidationLevel {
    HeaderOnly,
    Structural,
    Strict,
    ProcedureAware,
}
}

HeaderOnly: parse enough for routing.
Structural: verify lengths and container structure.
Strict: enforce field cardinality, enum ranges, and critical IE rules.
ProcedureAware: call NF-specific semantic validators.

Data-plane fast paths SHOULD use the minimum level needed for safe routing and leave expensive semantic validation to control-plane paths where appropriate.

11. Unknown and Duplicate Elements

Protocol crates MUST define:

Unknown IE behavior.
Duplicate IE behavior.
Critical/mandatory IE behavior.
Extension preservation behavior.

If a protocol requires preserving unknown elements for forwarding or round-trip, the borrowed view MUST expose raw slices and owned conversion MUST retain them.

12. Round-Trip Properties

The simplistic property encode(decode(input)) == input is not universally valid. The SDK requires three properties:

12.1 Canonical Round Trip

For generated valid model values:

decode(encode(model)) == model

12.2 Raw-Preserving Round Trip

For accepted inputs where unknown/padding preservation is enabled:

encode_raw_preserving(decode_raw_preserving(input)) == input

12.3 Reject Stability

For rejected inputs, the decoder returns a structured error and never panics, hangs, or allocates beyond budget.

13. Fuzzing

Every protocol crate MUST include fuzz targets for:

full decode,
header-only decode,
encode after generated model mutation,
round-trip properties,
length and extension chains,
security fields where applicable.

Fuzz gates SHOULD be time and coverage based, not only iteration-count based. Minimum admission gate:

30 minutes sanitizer-enabled fuzzing per new parser target in CI or nightly.
1,000,000 generated cases for property tests where practical.
All crashes minimized and committed as regression tests.

Required sanitizers where supported:

AddressSanitizer for native dependencies.
UndefinedBehaviorSanitizer for C/C++ parser dependencies.
Miri for unsafe Rust, if any unsafe exception is approved.

14. Spec Traceability

Every public PDU, IE, field enum, and procedure-relevant constant MUST cite:

standards body,
document number,
release or revision where applicable,
section,
table or figure where applicable,
conformance status.

Example:

#![allow(unused)]
fn main() {
/// @3gpp TS 29.281 Release 18, Section 5.1, Table 5.1-1
/// @conformance full
pub struct Gtpv1uHeader<'a> { ... }
}

These tags feed RFC 006 evidence extraction.

15. Protocol Crate Layout

Each protocol crate MUST use:

crates/opc-proto-<name>/
  src/
    lib.rs
    error.rs
    context.rs
    header.rs
    ie.rs
    message.rs
    parser.rs
    encode.rs
    validate.rs
    spec.rs
    generated/
      tables.rs
  tests/
    corpus.rs
    roundtrip.rs
    conformance.rs
  fuzz/
    fuzz_targets/
      decode.rs
      header.rs
      roundtrip.rs

For protocols without IEs, ie.rs may be omitted. Generated tables MUST live under generated/ and be reproducible.

16. Implementation Contracts

Contributors implementing protocol crates MUST follow these rules:

Start from spec.rs constants and conformance tags.
Implement error.rs and context.rs before parser logic.
Implement header parsing before full message parsing.
Add fuzz target with the first parser.
Do not add unsafe.
Do not use unwrap, expect, or indexing on untrusted input.
Keep parser functions small and named after spec structures.
Add one regression test per newly handled malformed input class.

Agents may work independently on:

header parser,
IE parser,
encoder,
validation,
fuzz/test corpus,
generated spec tables.

17. Testing Requirements

17.1 Unit Tests

Minimum and maximum length messages.
Truncated input at every byte position for fixed headers.
Invalid enum values.
Duplicate IE policies.
Unknown IE policies.
Extension header chain termination.
Checked arithmetic overflow cases.

17.2 Integration Tests

Decode real capture fixtures.
Encode/decode canonical known-good messages.
Partial decode for routing keys.
Owned conversion across async boundary.
Protocol-specific strict validation.

17.3 Performance Tests

Each protocol crate MUST benchmark:

header-only decode,
full structural decode,
strict validation,
encode,
owned conversion.

Benchmarks MUST report:

p50/p99 latency,
heap allocations,
bytes copied,
throughput in messages/second.

17.4 Negative Corpus

Every parser MUST maintain a negative corpus:

truncated,
overlong,
nested too deep,
duplicate mandatory fields,
unknown critical fields,
invalid length,
invalid padding,
integer overflow candidate.

18. Acceptance Criteria

This RFC is implemented when:

Borrowed decoders express lifetimes safely and owned conversion is available.
Fast-path header decode is allocation-free for supported protocols.
All length and offset math is checked.
Decoders reject hostile input without panic, hang, or unbounded allocation.
Round-trip tests distinguish canonical and raw-preserving modes.
Fuzz targets and regression corpora exist for every protocol crate.
Spec traceability tags feed RFC 006 evidence.
Protocol modules follow the standard layout for parallel implementation.

OPC-SDK-RFC-006: Conformance and Evidence Pipeline

Status: Draft for Implementation
Version: 2.0.0
Date: 2026-05-19
Audience: release engineers, security engineers, standards reviewers, SDK implementers, NF teams

1. Abstract

This RFC defines the OpenPacketCore evidence pipeline: standards conformance mapping, test evidence, SBOM generation, VEX, provenance, artifact signing, performance baselines, known-gap management, and release gates.

The purpose is not to create marketing compliance claims. The purpose is to produce machine-readable, signed evidence that states exactly what is implemented, tested, partially implemented, not implemented, or intentionally out of scope.

The initial draft correctly required conformance tags, SBOMs, signed bundles, and performance baselines. This version expands those into a full evidence system suitable for high-integrity carrier CNFs and parallel implementation.

2. Scope

2.1 In Scope

Standards requirement inventory.
Code-to-spec and test-to-spec mapping.
Conformance status extraction.
Known-gap registry.
SBOM and VEX generation.
Build provenance and artifact signing.
Performance baseline capture.
Evidence bundle format.
Release and PR gates.

2.2 Out of Scope

Legal certification by standards bodies.
Operator-specific acceptance testing.
Live-network certification.
Runtime audit storage. See RFC 003.

3. Design Goals

3.1 Security

Evidence must be tamper-evident and tied to artifact digests.
Supply-chain metadata must include source, dependencies, build environment, container base images, and vulnerability status.
Claims must be traceable to tests, source, and reviewed gaps.
Signing keys or identities must be auditable.

3.2 Performance

Evidence generation must be incremental for PR workflows.
Full release evidence may be more expensive but must be reproducible.
Performance baselines must record environment details so regressions are meaningful.

3.3 Maintainability

Conformance tags must use a strict schema.
Known gaps must be first-class records, not prose-only notes.
Evidence tools must fail closed when claims are ambiguous.
Output formats must be stable for downstream automation.

3.4 Functionality

Produce human-readable and machine-readable reports.
Support partial, full, not-implemented, not-applicable, and gap statuses.
Attach tests and benchmark results to claims.
Sign artifacts and attestations.
Support release promotion gates.

4. Evidence Model

4.1 Claim Types

The evidence pipeline recognizes:

Claim	Meaning
`implemented`	Code exists for the requirement
`tested`	Automated tests exercise the requirement
`partial`	Some required behavior is missing
`not-implemented`	No implementation exists
`not-applicable`	Requirement does not apply to this SDK/NF/profile
`gap`	Known missing behavior with owner and mitigation
`waived`	Temporary exception approved by policy

No release may claim full conformance for a requirement unless it has both implemented and tested evidence, plus no open blocking gap.

4.2 Requirement IDs

Every tracked requirement receives a stable ID:

REQ-<source>-<document>-<release>-<section>-<ordinal>

Example:

REQ-3GPP-TS29281-R18-5.1-001

Requirement IDs are stored in a versioned inventory file. Comments in code may reference IDs, but comments do not define the inventory.

4.3 Evidence Records

{
  "requirement_id": "REQ-3GPP-TS29281-R18-5.1-001",
  "status": "partial",
  "source_refs": ["crates/opc-proto-gtp/src/header.rs:Gtpv1uHeader"],
  "test_refs": ["crates/opc-proto-gtp/tests/roundtrip.rs:test_gtpu_header"],
  "gap_refs": ["GAP-000123"],
  "artifact_digests": ["sha256:..."],
  "reviewed_by": ["standards-reviewer"],
  "last_updated": "2026-05-19T00:00:00Z"
}

The pipeline MUST validate evidence records against a JSON schema.

5. Conformance Tracking

5.1 Inventory

The repository MUST maintain:

evidence/
  requirements/
    3gpp-ts-29.281-r18.yaml
    ietf-rfc-7951.yaml
  mappings/
    code-map.yaml
    test-map.yaml
  gaps/
    known-gaps.yaml

Requirement inventories SHOULD be generated from structured sources when available. When manual extraction is required, each requirement must include source document, release/revision, section, and reviewer.

5.2 Code Tags

Code tags use strict syntax:

#![allow(unused)]
fn main() {
/// @spec 3GPP TS 29.281 R18 5.1 Table 5.1-1
/// @req REQ-3GPP-TS29281-R18-5.1-001
/// @conformance partial
/// @gap GAP-000123
pub struct Gtpv1uHeader<'a> { ... }
}

Allowed tag keys:

@spec
@req
@conformance
@gap
@security
@performance
@test

Unknown tags MUST fail evidence extraction in release mode.

5.3 Test Tags

Tests SHOULD reference requirement IDs:

#![allow(unused)]
fn main() {
#[test]
#[req("REQ-3GPP-TS29281-R18-5.1-001")]
fn gtpu_header_roundtrip() { ... }
}

The extraction tool MUST support Rust test attributes or a sidecar test mapping file. A requirement with code but no test remains implemented, not full.

5.4 Status Rules

Status calculation:

Inputs	Result
code + passing tests + no blocking gaps	`full`
code + some tests + open nonblocking gaps	`partial`
code + no tests	`implemented-untested`
gap with no code	`not-implemented`
reviewed N/A record	`not-applicable`
approved waiver	`waived`

The machine-readable report MUST include both raw evidence and calculated status.

6. Known Gaps

6.1 Gap Record

Known gaps MUST be structured:

id: GAP-000123
title: GTP-U extension headers not fully decoded
status: open
severity: medium
applies_to:
  - REQ-3GPP-TS29281-R18-5.2-004
owner: opc-proto-gtp
created: 2026-05-19
target_release: 0.3.0
mitigation: Reject unsupported extension headers in strict mode.
security_impact: Low if strict mode is enabled.
performance_impact: None.

6.2 Gap Gates

Release mode MUST fail when:

A partial or not-implemented status has no gap.
A gap has no owner.
A gap has no mitigation or explicit "no mitigation" rationale.
A gap target release is overdue.
A security-critical gap lacks security approval.

The root known-gaps.md MAY be generated from known-gaps.yaml, but the YAML is the source of truth.

7. SBOM and VEX

7.1 SBOM Requirements

Every release MUST include CycloneDX JSON SBOMs for:

Rust workspace dependencies.
Container images.
Helm charts and embedded images.
Generated artifacts where dependencies differ.
Native libraries linked into binaries.

SBOMs MUST include:

direct and transitive dependencies,
package URLs where available,
license data,
hashes,
supplier/source repository where available,
build target,
feature flags,
container base image digests.

7.2 VEX Requirements

VEX records MUST state vulnerability applicability:

affected,
not affected,
fixed,
under investigation.

Each VEX decision MUST include:

CVE or advisory ID,
package and version,
scanner database timestamp,
justification,
reviewer or automated policy source,
expiry for temporary decisions.

Release mode MUST fail on unresolved critical vulnerabilities unless an approved VEX record exists.

8. Provenance and Signing

8.1 Artifact Digests

Every artifact must be addressed by digest:

binaries,
container images,
Helm charts,
SBOMs,
evidence bundles,
performance reports,
conformance reports.

Tags are not sufficient.

8.2 Provenance

Release builds MUST produce SLSA-style provenance, preferably in in-toto/DSSE format, including:

source repository URL,
commit SHA,
dirty tree status,
builder identity,
build workflow reference,
build inputs,
dependency lockfiles,
environment image digest,
output artifact digests.

8.3 Signing

Release artifacts and attestations MUST be signed with Sigstore/Cosign or an approved offline carrier signing profile.

Keyless profile:

OIDC issuer and subject must be policy-allowed.
Transparency log entry must be verifiable.
Certificate identity must match release workflow.

Offline profile:

Public key must be published through an approved channel.
Signing key custody and rotation must be documented.
Transparency log use SHOULD be retained where possible.

8.4 Bundle Signing

Signing only evidence-bundle.tar.gz is not enough. The bundle MUST include a manifest of file digests, and the manifest or DSSE envelope MUST be signed. Individual high-value artifacts SHOULD also carry their own attestations.

9. Performance Evidence

9.1 Benchmark Classes

Performance evidence MUST cover:

RFC 001 config commit phases.
RFC 002 generated validation and patch application.
RFC 004 session store operations.
RFC 005 protocol decode/encode.
Security operations from RFC 003 where relevant.

9.2 Environment Capture

performance-baseline.json MUST include:

CPU model and count,
memory size and speed where available,
kernel version,
container runtime,
Kubernetes version when applicable,
storage class for persistence tests,
network plugin for distributed tests,
compiler version,
cargo profile,
feature flags,
git commit,
date/time,
benchmark tool version.

9.3 Regression Policy

Each benchmark defines:

metric,
baseline,
allowed regression threshold,
required sample count,
noise handling,
owner.

Data-plane PRs MUST fail when they exceed regression thresholds unless a performance waiver is approved.

10. Evidence Bundle

10.1 Files

The release evidence bundle MUST contain:

evidence-bundle/
  manifest.json
  conformance-report.json
  conformance-report.md
  known-gaps.json
  sbom/
    workspace.cdx.json
    containers.cdx.json
  vex/
    vex.json
  provenance/
    build.intoto.jsonl
  signatures/
    cosign.bundle
  performance/
    performance-baseline.json
    raw/
  tests/
    test-summary.json
    junit/
  security/
    vulnerability-report.json
    policy-results.json

10.2 Manifest

manifest.json MUST include:

evidence schema version,
SDK version,
git commit,
artifact digests,
file digests,
signing identity,
generation tool version,
generation timestamp,
known incomplete sections.

10.3 Packet-core evidence packs

A release evidence bundle MAY include one or more packet-core evidence packs for protocol fixtures, attach procedure results, and kernel dataplane/XFRM proof. These packs are intended to make smoke artifacts and test evidence from different network functions comparable, not to create product-specific certification claims.

Each pack is a JSON object conforming to packet-core-evidence-pack.schema.json and contains:

protocol_evidence: protocol fixture evidence records.
attach_evidence: attach and session-establishment procedure results.
kernel_dataplane_evidence: kernel dataplane, XFRM, routing, and firewall state summaries.

Packet-core evidence schemas are versioned independently within RFC 006 and are currently experimental. A pack MUST declare experimental: true until the schema graduates. Every pack MUST pass redaction validation before it is included in a bundle; validation fails closed if any string field contains a raw IMSI, MSISDN, IMEI, NAI, Session-Id, LI identifier, or key material.

Downstream products (for example, ePDG smoke artifacts) MAY map their own evidence into this SDK format. Doing so documents how the product evidence corresponds to SDK schema fields; it does not imply the SDK has certified the product.

11. PR and Release Gates

11.1 PR Gates

Required for every PR:

Build.
Unit tests.
Formatting and lint checks.
Incremental evidence extraction.
New public protocol/config items include spec or explicit non-spec tags.
New gaps are structured and owned.
Security-sensitive changes run targeted tests.

11.2 Release Gates

Required for every release:

Full test suite.
Fuzzing gate for changed protocol crates.
SBOM generation.
VEX evaluation.
Vulnerability scan.
Provenance generation.
Artifact signing.
Conformance report.
Known-gap validation.
Performance baseline.
Evidence bundle signing.

Release MUST fail closed if evidence generation fails.

12. Implementation Evidence Requirements

Generated code is allowed only when evidence remains strict.

Rules:

Every new protocol struct must include spec tags.
Every new generated config item must include YANG path metadata.
Every new security behavior must include a threat/test note.
Every generated test must map to a requirement or state it is purely internal.
Contributors must not mark conformance full; only the evidence calculator may calculate final status.
Ambiguous or unsupported spec behavior must create a gap record.

The evidence pipeline is the guardrail that prevents plausible-looking code from silently becoming unsupported compliance claims.

13. Tooling Architecture

crates/opc-evidence/
  src/
    inventory.rs
    extract.rs
    conformance.rs
    sbom.rs
    vex.rs
    provenance.rs
    performance.rs
    bundle.rs
    policy.rs
    report.rs

Tool responsibilities:

inventory: load and validate requirement inventories.
extract: scan source and test tags.
conformance: calculate status.
sbom: invoke or parse SBOM generators.
vex: correlate vulnerabilities and VEX decisions.
provenance: collect build attestation metadata.
performance: normalize benchmark output.
bundle: create manifest and bundle.
policy: enforce PR/release gates.
report: emit Markdown and JSON.

14. Schemas

The repository MUST version JSON schemas for:

requirement inventory,
evidence record,
conformance report,
gap record,
performance baseline,
bundle manifest,
VEX policy result,
packet-core protocol evidence,
packet-core attach evidence,
packet-core kernel dataplane evidence,
packet-core evidence pack.

Schema changes MUST be backward compatible within a major SDK release or include a migration tool.

15. Testing Requirements

15.1 Unit Tests

Tag parser accepts valid tags and rejects invalid tags.
Requirement inventory schema validation.
Gap gate logic.
Status calculation matrix.
Manifest digest calculation.
VEX decision expiry.

15.2 Integration Tests

End-to-end evidence generation on fixture crate.
Release gate fails on undocumented partial conformance.
Release gate fails on unsigned artifact.
Release gate fails on unresolved critical CVE.
Performance regression gate fails on threshold breach.
Known-gaps Markdown generation from YAML.

15.3 Tamper Tests

Modify artifact after manifest generation.
Remove test evidence for full claim.
Change SBOM after signing.
Use disallowed signing identity.
Replay old VEX with expired decision.

16. Acceptance Criteria

This RFC is implemented when:

Conformance claims are calculated from requirement inventory, code tags, tests, and gaps.
A requirement cannot silently remain partial without a structured known gap.
SBOM and VEX are generated and release-gated.
Provenance ties artifacts to source commit, builder, inputs, and digests.
Evidence bundles include signed manifests and verifiable artifact digests.
Performance baselines include environment details and regression thresholds.
PR and release gates fail closed on missing or inconsistent evidence.
Generated code must supply traceable tags and tests before it can support conformance claims.

OPC-SDK-RFC-007: SBI Service Framework

Status: Draft for Implementation
Version: 1.0.0
Date: 2026-05-19
Audience: SBI NF implementers, security engineers, operator authors, test authors

1. Abstract

This RFC defines the OpenPacketCore Service Based Interface (SBI) framework for 5G control-plane CNFs. It standardizes HTTP/2 transport behavior, 3GPP ProblemDetails, OAuth2/JWT-SVID authentication, NRF discovery, service registration, retry/backoff, overload control, circuit breaking, idempotency, callback delivery, OpenAPI/model generation, observability, and conformance tests.

Without this RFC, every SBI-producing NF would independently implement common TS 29.500/29.501 behavior. That would create incompatible error semantics, token validation, discovery caching, and overload behavior across AMF, SMF, PCF, NRF, UDM, AUSF, NSSF, NEF, NWDAF, BSF, CHF, SCP, and SEPP.

2. Scope

2.1 In Scope

SBI HTTP/2 server and client substrate.
TS 29.500 common headers and ProblemDetails behavior.
TS 29.510 NRF registration, heartbeat, discovery, and access token client helpers.
OAuth2 bearer token validation and client-credentials acquisition.
SPIFFE JWT-SVID client authentication to NRF where configured.
Retry, timeout, backoff, idempotency, and callback delivery.
Per-peer, per-slice, and per-service overload controls.
Circuit breakers and outlier detection.
OpenAPI-driven model generation and compatibility.
Metrics, tracing, audit, and evidence hooks.

2.2 Out of Scope

NF-specific SBI resource semantics. Those live in per-NF crates.
Management-plane gNMI/NETCONF. See RFC 001 and RFC 003.
Protocol codecs below HTTP/2. See RFC 005.
Session persistence. See RFC 004.

3. Design Goals

3.1 Security

Authenticate every SBI peer with mTLS and, where applicable, OAuth2 access tokens.
Bind peer identity, NF type, NF instance ID, PLMN, tenant, slice, and token scopes into authorization decisions.
Prevent topology scraping, token replay, confused-deputy calls, callback spoofing, and cross-slice data exposure.
Avoid logging raw SUPI/GPSI, bearer tokens, assertion JWTs, or subscriber payloads.

3.2 Performance

Use HTTP/2 connection pooling and bounded concurrency per peer.
Avoid per-request DNS/NRF discovery.
Make token verification hot-path cacheable.
Provide low-latency fast paths for common ProblemDetails and header parsing.
Enforce backpressure before request queues grow unbounded.

3.3 Maintainability

Keep TS 29.500 common behavior in opc-sbi, not in every NF.
Generate typed models from version-pinned OpenAPI definitions where possible.
Keep retry and overload policy declarative through YANG.
Provide one shared testkit for SBI peers and NRF behavior.

3.4 Functionality

Support SBI producer and consumer roles.
Support NRF registration, heartbeat, discovery, subscriptions, and token acquisition.
Support service-version negotiation.
Support callbacks with retry and dead-letter behavior.
Support direct NF-to-NF routing and SCP-mediated routing.

4. Standards Baseline

The initial target is 3GPP Release 17 with explicit support for selected Release 18 behavior when per-NF specs require it.

Required references:

TS 29.500: Common API framework, HTTP behavior, headers, ProblemDetails.
TS 29.501: Principles and guidelines for services definition.
TS 29.510: NRF NFManagement, NFDiscovery, AccessToken.
TS 33.501: SBI security and OAuth2 usage.
RFC 6749: OAuth2.
RFC 6750: Bearer token usage.
RFC 7515/7517/7519: JWS, JWK, JWT.
RFC 7662: Token introspection, if enabled by profile.
RFC 9110/RFC 9113: HTTP semantics and HTTP/2.

The exact release and supported service APIs are captured in RFC 006 evidence.

5. Crate Model

The shared crate is opc-sbi.

crates/opc-sbi/
  src/
    lib.rs
    error.rs
    problem.rs
    headers.rs
    identity.rs
    oauth.rs
    nrf/
      mod.rs
      registration.rs
      discovery.rs
      heartbeat.rs
      access_token.rs
      cache.rs
    client/
      mod.rs
      pool.rs
      retry.rs
      circuit_breaker.rs
      overload.rs
    server/
      mod.rs
      auth.rs
      extractors.rs
      middleware.rs
    callback/
      mod.rs
      dispatcher.rs
      dead_letter.rs
    models/
      generated/
    observability.rs
    testkit/

NF crates MUST use opc-sbi for common SBI behavior. They MUST NOT duplicate ProblemDetails encoding, bearer-token parsing, NRF discovery caching, or retry policy.

6. Transport Contract

6.1 HTTP/2

SBI uses HTTP/2 by default. The framework MUST:

Use TLS 1.3 by default.
Verify peer certificate identity through RFC 003.
Support direct NF endpoints and SCP endpoints.
Enforce max header list size, max frame size, max body size, stream concurrency, and idle timeouts.
Reject HTTP/1.1 in production profiles unless a per-NF compatibility profile explicitly permits it.

6.2 Connection Pooling

The client pool key MUST include:

target NF instance or service set,
transport mode: direct or SCP,
trust domain,
tenant,
service name,
API version,
TLS profile,
OAuth2 audience/scope set.

Pools MUST enforce:

maximum connections per peer,
maximum concurrent streams per connection,
idle connection eviction,
connection max age,
backpressure when all streams are saturated.

6.3 Deadlines

Every outbound SBI request MUST carry a deadline from the caller. The framework MUST enforce request timeout locally and SHOULD propagate timeout hints through headers where 3GPP permits.

7. ProblemDetails

7.1 Error Type

opc-sbi owns the canonical ProblemDetails type:

#![allow(unused)]
fn main() {
pub struct ProblemDetails {
    pub status: http::StatusCode,
    pub cause: Option<CauseCode>,
    pub title: Option<String>,
    pub detail: Option<String>,
    pub instance: Option<String>,
    pub invalid_params: Vec<InvalidParam>,
    pub supported_features: Option<String>,
}
}

NF code returns domain errors; the framework maps them to ProblemDetails.

7.2 Mapping Rules

ProblemDetails mapping MUST be:

deterministic,
spec-cited,
test-covered,
safe for logs and clients,
evidence-linked through RFC 006.

No domain handler may return ad hoc JSON error bodies on SBI routes.

8. Common Headers

The framework MUST parse and render configured TS 29.500 headers, including:

3gpp-Sbi-Message-Priority
3gpp-Sbi-Correlation-Info
3gpp-Sbi-Binding
3gpp-Sbi-Routing-Binding
3gpp-Sbi-Target-apiRoot
Retry-After
Location
Authorization

Header parsing MUST reject malformed values with structured errors. Sensitive headers MUST be redacted.

9. Identity and Authorization

9.1 Peer Identity

The server middleware extracts:

#![allow(unused)]
fn main() {
pub struct SbiPeer {
    pub spiffe: Option<SpiffeId>,
    pub nf_instance_id: Option<NfInstanceId>,
    pub nf_type: Option<NfType>,
    pub tenant: TenantId,
    pub plmn: Option<PlmnId>,
    pub snssai: Option<Snssai>,
}
}

Identity MAY come from mTLS SPIFFE, NRF-issued token claims, or a legacy certificate mapping profile. Unsigned metadata headers MUST NOT establish identity.

9.2 OAuth2 Validation

SBI producers that require OAuth2 MUST validate:

issuer,
audience,
expiry and not-before,
signature and key ID,
scope,
NF type and instance binding,
tenant and slice binding where configured,
replay-sensitive claims when configured.

Token validation results MAY be cached until the earlier of token expiry or policy version change.

9.3 OAuth2 Client Credentials

SBI consumers MUST acquire tokens through NRF or configured authorization server. Client authentication methods:

SPIFFE JWT-SVID, preferred.
mTLS-bound client authentication.
Private key JWT.
Kubernetes Secret client secret only in explicit compatibility profile.

Long-lived shared client secrets are forbidden in production carrier profiles unless an RFC 006 waiver exists.

10. NRF Integration

10.1 Registration

opc-sbi MUST provide helpers for NF registration, update, deregistration, and heartbeat.

NF profiles MUST be generated from typed NF metadata and canonical YANG. Raw free-form JSON construction is forbidden outside test fixtures.

10.2 Heartbeats

The heartbeat driver MUST:

derive interval from NRF response where present,
jitter heartbeat timing,
mark the NF degraded on repeated heartbeat failure,
keep serving existing local traffic according to per-NF policy,
deregister gracefully on shutdown when possible.

10.3 Discovery

The discovery client MUST provide:

query construction with typed filters,
response validation,
cache with TTL and stale-if-error policy,
negative caching,
per-service-set load balancing,
SCP preference where configured,
tenant and slice filter enforcement.

Discovery cache entries MUST be invalidated on canonical config changes that affect peers, PLMN, slice, trust anchors, or routing mode.

10.4 Subscriptions

NRF subscription handling MUST support retry, backoff, and dead-letter behavior for failed notifications. Subscription callbacks MUST be authenticated and authorized like any other SBI request.

11. Routing Modes

Supported modes:

Mode	Behavior
`direct`	Consumer dials producer discovered from NRF or static peer config
`scp`	Consumer sends through SCP with routing headers
`sepp`	Inter-PLMN traffic goes through SEPP policy
`static`	Explicit peer list from YANG, for lab or interop

The mode is selected per service, tenant, PLMN, and slice. Inter-PLMN traffic MUST NOT bypass SEPP when policy requires SEPP.

12. Retry, Idempotency, and Callback Delivery

12.1 Retry Policy

Retry policy MUST be declarative:

#![allow(unused)]
fn main() {
pub struct RetryPolicy {
    pub max_attempts: u8,
    pub base_delay: Duration,
    pub max_delay: Duration,
    pub jitter: Jitter,
    pub retry_on_status: Vec<StatusCode>,
    pub retry_on_transport_error: bool,
}
}

The framework MUST NOT retry non-idempotent requests unless the request carries an idempotency key or the operation is explicitly marked idempotent by the service definition.

12.2 Idempotency

For operations that can be retried, the framework SHOULD provide:

idempotency key generation,
inbound idempotency cache,
replay-safe response caching,
expiry and memory bounds.

12.3 Callback Delivery

Callback dispatchers MUST support:

bounded queues,
retry budget,
backoff,
callback authentication,
dead-letter sink,
observability,
cancellation on subscription deletion.

Callback storms MUST be rate-limited per callback target.

13. Overload Control

13.1 Admission

The framework MUST provide admission control before request bodies are fully read when possible.

Admission keys:

peer identity,
NF type,
tenant,
slice,
service,
operation,
priority.

13.2 Response Semantics

Overload responses MUST use:

HTTP 429 for rate limiting,
HTTP 503 for temporary service overload,
Retry-After where retry is appropriate,
ProblemDetails with a stable cause code.

13.3 Priority

Requests with emergency, lawful, registration, paging, or charging criticality MAY receive higher priority only when the per-NF spec and 3GPP behavior justify it. Priority policy MUST be explicit, audited, and tested.

13.4 Circuit Breakers

Outbound circuit breakers MUST track:

consecutive failures,
error-rate window,
latency outliers,
half-open probes,
per-peer and per-service state.

Circuit breaker state MUST be visible in metrics and debug endpoints without exposing secrets or topology beyond authorized users.

14. Generated Models and OpenAPI

opc-sbi SHOULD generate models from version-pinned OpenAPI sources where available. Generated code MUST:

be reproducible,
preserve unknown extension fields only when configured,
avoid ad hoc stringly typed JSON in NF handlers,
include spec tags for RFC 006,
pass serialization round trips.

OpenAPI mismatches with normative 3GPP text MUST create RFC 006 known gaps or generator overrides with citations.

15. Configuration Model

Each SBI NF YANG SHOULD expose:

sbi/listeners
sbi/clients
sbi/nrf
sbi/oauth2
sbi/retry-policy
sbi/overload
sbi/circuit-breakers
sbi/callbacks

These may be embedded under the shared listeners, peers, rate-limits, and policy containers defined by the cloud-native pattern.

16. Observability

Required metrics:

opc_sbi_requests_total{nf,service,operation,outcome}
opc_sbi_request_duration_seconds{service,operation}
opc_sbi_problem_details_total{service,cause,status}
opc_sbi_oauth_validation_total{outcome,reason}
opc_sbi_nrf_discovery_total{outcome}
opc_sbi_nrf_cache_entries{service}
opc_sbi_nrf_heartbeat_total{outcome}
opc_sbi_circuit_state{peer,service,state}
opc_sbi_overload_rejections_total{service,reason}
opc_sbi_callback_delivery_total{target,outcome}

Tracing MUST propagate W3C traceparent and 3GPP correlation headers when present.

17. Module Ownership

Module	Responsibility
`opc-sbi-problem`	ProblemDetails model and mappings
`opc-sbi-headers`	3GPP header parse/render/redaction
`opc-sbi-auth`	OAuth2/JWT-SVID validation and token acquisition
`opc-sbi-nrf`	NRF registration, heartbeat, discovery, cache
`opc-sbi-client`	HTTP/2 pool, deadlines, retries, circuit breakers
`opc-sbi-server`	Axum/tower middleware, extractors, admission
`opc-sbi-callback`	Callback queues, retry, dead-letter
`opc-sbi-codegen`	OpenAPI/model generation
`opc-sbi-testkit`	Mock NRF, mock producer, token fixtures

Agents must not implement NF-specific business logic in opc-sbi.

18. Testing Requirements

18.1 Unit Tests

ProblemDetails mappings.
Header parsing and redaction.
Token validation matrix.
Retry idempotency policy.
Circuit breaker transitions.
NRF cache expiry and invalidation.

18.2 Integration Tests

Mock NRF registration, heartbeat, discovery, and token issuance.
Producer validates mTLS and OAuth2 together.
Consumer refreshes token before expiry.
SCP routing header generation.
Callback retry and dead-letter.
Overload rejection with Retry-After.

18.3 Fault Injection

NRF unavailable.
Expired token.
Bad JWK key ID.
Peer certificate rotation.
DNS failure.
HTTP/2 stream reset.
Slow callback target.
Discovery cache stale while NRF down.

18.4 Performance Gates

Hot token validation cache p99 under 25 microseconds.
ProblemDetails mapping allocation-free for common static errors.
Discovery cache lookup p99 under 10 microseconds.
Client pool does not allocate per request beyond body/model needs.
Overload admission rejects before full body read for oversized bodies.

19. Acceptance Criteria

This RFC is implemented when:

All SBI NFs use shared ProblemDetails, header, auth, retry, and NRF code.
OAuth2 validation and client-credential acquisition are test-covered.
NRF registration, heartbeat, discovery, and cache behavior are shared.
Retry behavior is idempotency-aware.
Overload control returns consistent 429/503/Retry-After semantics.
Circuit breaker state is observable and bounded.
Generated models are reproducible and evidence-tagged.
A shared SBI testkit can exercise producer and consumer behavior for every SBI NF.

OPC-SDK-RFC-008: CNF Runtime Chassis and Resource Governance

Status: Draft for Implementation
Version: 1.0.0
Date: 2026-05-19
Audience: NF implementers, platform engineers, SREs, security reviewers

1. Abstract

This RFC defines the common Rust runtime chassis used by every OpenPacketCore CNF. It standardizes process startup, task supervision, shutdown, health probes, admin endpoints, runtime pools, resource budgets, panic policy, configuration bootstrap, signal handling, telemetry initialization, memory behavior, and operational debug surfaces.

The goal is that AMF, SMF, UPF, NRF, PCF, SEPP, SMSC, and all other CNFs share one predictable runtime skeleton instead of each inventing its own Tokio setup, shutdown behavior, health semantics, and task lifecycle.

2. Scope

2.1 In Scope

Runtime initialization.
Tokio worker and blocking pool configuration.
Task supervision and cancellation.
Startup and readiness phases.
Graceful shutdown and drain.
Health and admin HTTP endpoints.
Runtime resource budgets and backpressure hooks.
Panic and fatal-error policy.
Metrics, logging, and tracing bootstrap.
Memory, allocator, and OOM behavior.
Common CLI/env/bootstrap contract.

2.2 Out of Scope

NF-specific protocol logic.
Kubernetes controller behavior. See RFC 009.
Node/NIC scheduling and SR-IOV contracts. See RFC 011.
Config commit semantics. See RFC 001.

3. Design Goals

3.1 Security

Fail closed when required bootstrap security material is unavailable.
Keep debug endpoints disabled or authorization-gated in production.
Ensure panic output and fatal-error reports are redacted.
Make shutdown safe: no partial config writes, key leaks, or unaudited emergency exits.

3.2 Performance

Avoid runtime-pool contention between management, control, crypto, and data-plane work.
Bound queues, tasks, memory, and blocking work.
Make health checks cheap and non-blocking.
Provide predictable drain behavior under load.

3.3 Maintainability

Provide one reusable opc-runtime crate.
Make lifecycle phases explicit and testable.
Provide standard task naming and metrics.
Keep per-NF custom code in callbacks, not in process scaffolding.

3.4 Functionality

Support control-plane, data-plane, and library-like CNF profiles.
Support local developer mode and production mode.
Support graceful restart, termination, and Kubernetes probe integration.
Support runtime introspection without exposing secrets.

4. Runtime Crate

The shared crate is opc-runtime.

crates/opc-runtime/
  src/
    lib.rs
    bootstrap.rs
    profile.rs
    supervisor.rs
    task.rs
    shutdown.rs
    health.rs
    admin.rs
    resources.rs
    panic.rs
    telemetry.rs
    memory.rs
    signals.rs
    testkit.rs

Every NF binary SHOULD be a thin wrapper around opc_runtime::run.

5. Runtime Profile

#![allow(unused)]
fn main() {
pub struct RuntimeProfile {
    pub mode: RuntimeMode,
    pub nf_kind: NetworkFunctionKind,
    pub instance_id: InstanceId,
    pub async_workers: WorkerCount,
    pub blocking_threads: ThreadLimit,
    pub crypto_threads: ThreadLimit,
    pub management_threads: ThreadLimit,
    pub max_tasks: usize,
    pub max_queued_bytes: usize,
    pub shutdown_grace: Duration,
    pub drain_timeout: Duration,
}
}

Profiles:

dev: permissive, local files, debug endpoints enabled on loopback.
lab: production-like, explicit waivers allowed.
production: fail closed, debug gated, strict resource limits.
conformance: deterministic test profile.
perf: optimized benchmark profile with fixed CPU/resource assumptions.

6. Startup State Machine

Every CNF starts through:

Phase	Purpose
`ProcessInit`	parse CLI/env, install panic hook, initialize logging
`TelemetryInit`	metrics/tracing/logging exporters
`SecurityInit`	identity, trust bundles, key providers
`ConfigBootstrap`	load initial config through RFC 001
`ResourcePreflight`	verify CPU, memory, filesystem, devices
`ServiceBind`	bind listeners but do not report ready
`PeerWarmup`	optional NRF registration, discovery, backend connection
`Ready`	readiness probe returns success
`Draining`	termination accepted, new work limited
`Stopped`	all supervised tasks exited

Startup MUST fail closed in production if any required phase fails.

7. Task Supervision

7.1 Task Model

All long-lived tasks MUST be registered with the supervisor:

#![allow(unused)]
fn main() {
pub struct TaskSpec {
    pub name: TaskName,
    pub kind: TaskKind,
    pub criticality: Criticality,
    pub restart: RestartPolicy,
    pub shutdown: ShutdownPolicy,
}
}

Task kinds:

listener
protocol-worker
session-worker
management-worker
background-sync
metrics-exporter
watcher
timer

7.2 Criticality

Criticality	Behavior on Failure
`fatal`	Transition CNF to fatal shutdown
`degrade`	Mark degraded and optionally restart
`best-effort`	Log/metric and continue

Critical task failures MUST be visible through readiness and alarm state.

7.3 Restart Policy

Restart policy MUST include:

max restarts per window,
backoff,
jitter,
failure classification,
whether restart is allowed after config changes.

Unbounded task restart loops are forbidden.

8. Runtime Pool Isolation

The runtime MUST expose separate execution domains:

async I/O workers,
blocking/CPU pool,
crypto pool,
management pool,
data-plane workers where applicable.

Data-plane CNFs SHOULD integrate with RFC 011 CPU pinning and IRQ affinity. Management-plane work MUST NOT execute on data-plane pinned workers.

9. Resource Governance

9.1 Budgets

Each CNF declares:

#![allow(unused)]
fn main() {
pub struct ResourceBudget {
    pub max_heap_bytes: Option<usize>,
    pub max_tasks: usize,
    pub max_channels: usize,
    pub max_queue_bytes: usize,
    pub max_request_body_bytes: usize,
    pub max_open_files: usize,
    pub max_backend_connections: usize,
}
}

Budgets MUST be profile-configurable and observable.

9.2 Backpressure

The runtime provides shared primitives:

bounded mpsc channels,
byte-accounted queues,
weighted semaphores,
admission guards,
deadline propagation,
cancellation tokens.

Unbounded channels are forbidden in production runtime code unless an RFC 006 waiver exists.

9.3 Memory Behavior

The runtime SHOULD:

expose allocator metrics where available,
support an optional hardened allocator profile,
fail fast on configured memory-budget breach,
avoid memory-heavy debug dumps in production,
support heap profile endpoints only under explicit authorization.

10. Shutdown and Drain

10.1 Signals

The runtime MUST handle:

SIGTERM: graceful drain.
SIGINT: graceful drain in dev, configurable in production.
fatal internal errors: controlled shutdown path when possible.

10.2 Drain Sequence

Drain order:

Stop accepting new external work.
Mark readiness false.
Notify NRF/deregister where applicable.
Stop management writes except emergency recovery.
Drain protocol workers up to timeout.
Flush audit and evidence breadcrumbs.
Checkpoint local state where applicable.
Shut down listeners and background tasks.

Each NF can add steps but MUST preserve safety ordering.

10.3 Kubernetes Integration

terminationGracePeriodSeconds MUST be at least shutdown_grace plus probe latency margin. PreStop hooks MAY call admin drain but MUST NOT be the only drain mechanism.

11. Health and Admin Surface

11.1 Endpoints

Default admin listener:

/livez
/readyz
/startupz
/metrics
/debug/runtime gated
/debug/tasks gated
/debug/config-version gated

Production debug endpoints MUST require authorization or be disabled.

11.2 Health Semantics

/livez means the process event loop is alive. It MUST NOT depend on external peers.

/readyz means the CNF can serve its intended role. It SHOULD include:

config applied,
critical tasks healthy,
required listeners bound,
required security material valid,
required backends reachable according to NF policy.

12. Panic and Fatal Error Policy

12.1 Panics

Production builds MUST install a panic hook that:

redacts secrets,
records task name,
increments fatal metrics,
emits a structured fatal log,
triggers supervisor policy.

Panics in parser or protocol handlers are bugs and MUST be covered by RFC 005 fuzzing regression tests.

12.2 `unwrap` and `expect`

Runtime and NF code MUST avoid unwrap and expect outside tests, build scripts, and explicitly justified invariants. Justifications MUST be grep-able and evidence-linked.

13. Bootstrap Contract

CLI/env values are limited to bootstrap concerns:

config bootstrap source,
management bind address,
admin bind address,
production/dev mode,
identity socket path,
tracing exporter endpoint,
initial log level,
feature gates for explicit waivers.

Dense protocol behavior MUST come from canonical config, not env vars.

14. Telemetry Initialization

The runtime initializes:

structured JSON logging,
OpenTelemetry tracing,
Prometheus metrics,
build info,
runtime profile info,
panic/fatal counters.

Required metrics:

opc_runtime_build_info{nf,version,git_sha}
opc_runtime_tasks{nf,kind,state}
opc_runtime_task_restarts_total{nf,task}
opc_runtime_queue_depth{nf,queue}
opc_runtime_queue_bytes{nf,queue}
opc_runtime_shutdown_total{nf,reason}
opc_runtime_panic_total{nf,task}
opc_runtime_memory_bytes{nf,kind}
opc_runtime_ready{nf}

15. Time and Clocks

The runtime MUST provide a clock abstraction for tests:

#![allow(unused)]
fn main() {
pub trait Clock: Send + Sync {
    fn now(&self) -> Timestamp;
    fn monotonic(&self) -> Instant;
}
}

Security expiry and audit timestamps use wall clock plus monotonic sequencing where required. Timers use monotonic time.

16. Module Ownership

Module	Responsibility
`opc-runtime-bootstrap`	CLI/env/profile loading
`opc-runtime-supervisor`	task registry, restart, failure policy
`opc-runtime-shutdown`	signal handling and drain orchestration
`opc-runtime-health`	health model and probe endpoints
`opc-runtime-admin`	gated debug/admin routes
`opc-runtime-resources`	budgets, queues, semaphores
`opc-runtime-telemetry`	logging, metrics, tracing init
`opc-runtime-testkit`	fake clock, fake tasks, shutdown tests

Agents implementing NF business logic should consume opc-runtime; they should not fork startup/shutdown code.

17. Testing Requirements

17.1 Unit Tests

Startup state transitions.
Task restart/backoff.
Fatal vs degraded task failure.
Bounded queue byte accounting.
Panic hook redaction.
Health state aggregation.
Clock abstraction.

17.2 Integration Tests

SIGTERM drains in order.
Readiness flips false before listeners stop.
NRF deregistration hook is called during drain.
Background task failure degrades readiness.
Debug endpoints are disabled or authorized in production.

17.3 Fault Injection

Hung task on shutdown.
Task restart loop.
Telemetry exporter unavailable.
Missing identity socket.
Memory budget breach.
Queue saturation.
Panic in a worker task.

17.4 Performance Gates

/livez p99 under 1 millisecond in healthy process.
Supervisor task spawn overhead negligible relative to direct spawn in NF startup tests.
Runtime metrics collection does not allocate on every scrape for static metric sets.
Queue admission overhead p99 under 10 microseconds.

18. Acceptance Criteria

This RFC is implemented when:

Every NF binary uses opc-runtime for startup, supervision, health, and shutdown.
Long-lived tasks are supervised and named.
Readiness semantics are consistent across CNFs.
Shutdown drains safely and predictably.
Production debug endpoints are gated or disabled.
Runtime pools and queues are bounded.
Panic and fatal-error handling is redacted and observable.
Runtime behavior is covered by shared testkit and fault injection tests.

OPC-SDK-RFC-009: Operator Lifecycle, Upgrade, Migration, and Rollback

Status: Draft for Implementation
Version: 1.0.0
Date: 2026-05-19
Audience: operator authors, NF owners, release engineers, SREs

1. Abstract

This RFC defines the lifecycle contract between the OpenPacketCore Kubernetes operator, lifecycle CRDs, canonical YANG configuration, NF pods, persistent state, and release artifacts. It specifies reconciliation phases, version skew, CRD conversion, YANG schema migration, state migration, rollout strategies, rollback, drain, status conditions, and release gates.

This RFC turns the thin-CRD/fat-YANG pattern into an upgrade-safe product contract across all CNFs.

2. Scope

2.1 In Scope

Lifecycle CRD reconciliation.
Operator/NF version compatibility.
CRD versioning and conversion webhooks.
Canonical config revision and schema migration.
NF image rollout strategies.
Session-aware drain and handover coordination.
Rollback and downgrade policy.
Status, events, and GitOps health gates.
Multi-cluster rollout topology.

2.2 Out of Scope

Runtime process shutdown internals. See RFC 008.
Session storage primitives. See RFC 004.
Evidence bundle generation. See RFC 006.
Node resource scheduling. See RFC 011.

3. Design Goals

3.1 Security

Prevent unsigned, unverified, or policy-disallowed images from rolling out.
Prevent config downgrades that bypass validation or reintroduce forbidden settings.
Ensure rollback preserves audit and does not silently lose regulated data.
Keep break-glass upgrade paths narrow, explicit, and auditable.

3.2 Performance

Rollouts must avoid unnecessary full-cluster disruption.
Stateful NFs must drain or transfer ownership before termination.
Operator reconciliation must avoid hot loops and unbounded API traffic.
Large config migrations must be staged and observable.

3.3 Maintainability

Every lifecycle phase has stable names, conditions, and event reasons.
Compatibility matrices are machine-readable.
Migration functions are versioned, deterministic, and tested.
Per-NF deviations are explicit.

3.4 Functionality

Support install, update, scale, config change, restart, drain, rollback, restore, and delete.
Support CRD conversion webhooks.
Support canary and partitioned rollouts.
Support GitOps promotion gates.

4. Version Model

4.1 Versions

The operator tracks:

operator version,
CRD API version,
lifecycle contract version,
NF image version and digest,
NF binary SDK version,
YANG schema digest,
canonical config revision,
session state schema version,
evidence bundle digest.

4.2 Compatibility Matrix

Every release MUST publish:

operator: 0.4.0
supports:
  crd_versions: ["v1alpha1", "v1alpha2"]
  lifecycle_contracts: ["v1alpha1"]
  nf_images:
    opc-amf: ">=0.3.0 <0.5.0"
    opc-smf: ">=0.3.0 <0.5.0"
  yang_schema_digests:
    opc-amf:
      - "sha256:..."

The operator MUST reject unsupported combinations unless an explicit waiver is present and policy allows it.

5. Lifecycle State Machine

Every reconcile moves through:

Phase	Purpose
`Admitted`	CR accepted by admission policy
`Resolved`	image, config, secrets, devices, and dependencies resolved
`Provisioning`	workload resources created/updated
`Bootstrapping`	pod reachable and management plane alive
`Configuring`	canonical config applied
`Verifying`	drift, health, and readiness checked
`Ready`	service is available
`Draining`	rollout/delete drain in progress
`Migrating`	schema/state migration in progress
`Degraded`	service impaired but not terminal
`Failed`	reconciliation cannot proceed without operator action
`Terminating`	deletion finalizers active

Phase names are public API and MUST be stable.

6. Conditions and Events

Required conditions:

Admitted
Resolved
Provisioned
Bootstrapped
ConfigResolved
AppConfigApplied
Drift
MigrationReady
MigrationApplied
DrainReady
RollbackAvailable
Ready

Each condition MUST include:

status,
reason,
message,
observed generation,
last transition time.

Event reasons MUST be stable and documented. Events MUST NOT contain secrets or raw config payloads.

7. Admission and Policy

Admission MUST verify:

image digest present,
image signature valid,
evidence bundle available where required,
CRD field validation,
canonical config reference exists,
manual/break-glass authority policy,
required secrets and service accounts,
pod security exceptions,
per-NF node resource references.

Admission should reject failures early rather than allowing a reconcile to fail late in the workload.

8. Canonical Config Lifecycle

8.1 Revision

canonicalConfigRevision is opaque but immutable for a given config artifact. Changing config content MUST change the revision or digest.

8.2 Apply

The operator applies config through RFC 001 management APIs. It MUST:

verify schema digest,
run validate-only before commit where supported,
use idempotency keys for retries,
record applied revision and tx ID,
read back running config for drift detection.

8.3 Drift

Drift states:

InSync
DriftDetected
BreakGlassActive
ResyncRequired
Unknown

Runtime state such as counters and sessions MUST be filtered out of drift comparison.

9. CRD Versioning and Conversion

Public lifecycle CRDs MUST use hub-and-spoke conversion once a second served version exists.

Rules:

one storage version at a time,
conversion webhooks are deterministic,
lossy conversion is forbidden unless the target version has an explicit status condition and known gap,
deprecated fields retain read compatibility for at least one minor release,
removed fields require migration notes and evidence.

Conversion tests MUST include round trips for every CRD version pair.

10. YANG Schema Migration

YANG migration follows RFC 002. Operator responsibilities:

detect persisted schema digest,
select migration chain,
run validate-only against target NF before commit,
back up previous config envelope before migration,
record migration tx ID,
fail closed if migration chain is missing.

Per-NF migrations MUST be deterministic and golden-tested.

11. State Migration

Session and durable state migrations are separate from config migrations.

State migration plans MUST define:

source version,
target version,
online/offline mode,
rollback support,
validation query,
maximum expected duration,
data-loss risk,
RPO/RTO impact.

Authoritative session migrations MUST preserve RFC 004 generation and fencing semantics.

12. Rollout Strategies

Supported strategies:

Strategy	Use
`rolling`	stateless or safely drainable NFs
`partitioned`	stateful sets and ordered migrations
`canary`	high-risk release or config change
`blue-green`	major upgrades or incompatible config/state changes
`manual`	operator-approved special cases

Each NF declares allowed strategies.

13. Drain and Handover

Before terminating or replacing a pod, the operator MUST invoke or observe NF drain where the NF is stateful.

Drain contract:

#![allow(unused)]
fn main() {
pub enum DrainMode {
    RejectNewWork,
    TransferOwnership,
    FlushAndStop,
    ImmediateEmergency,
}
}

Drain MUST:

mark readiness false before removing work,
stop new session ownership,
transfer or release leases where possible,
flush audit and local state,
respect timeout,
expose progress in status.

UPF, AMF, SMF, ePDG, N3IWF, SMSC, and IMS NFs MUST define NF-specific drain behavior.

14. Rollback and Downgrade

14.1 Rollback

Rollback is allowed when:

previous image digest is still policy-allowed,
previous config schema is compatible or migration back exists,
state schema supports downgrade or state can be rebuilt,
evidence permits rollback.

14.2 Downgrade

Downgrade is forbidden by default for stateful NFs unless explicitly supported. If downgrade is unsupported, the operator MUST fail before changing workload resources.

14.3 Failed Rollout

On failed rollout:

Stop further pod replacement.
Preserve logs/events/evidence references.
Mark Degraded or Failed.
Attempt rollback only if policy says automatic rollback is safe.
Require manual approval for destructive recovery.

15. Backup and Restore

Before high-risk migration, the operator MUST ensure backups exist for:

canonical config,
shadow-security material where policy allows,
session state if durable and required,
audit state,
CR status needed for recovery.

Restore MUST be tested per NF and recorded in RFC 006 evidence.

16. Multi-Cluster Lifecycle

In multi-cluster deployments:

management cluster owns desired lifecycle state,
workload clusters own local pod status,
status aggregation is explicit,
cluster identity is part of every condition source,
rollout waves are region-aware,
rollback can be per-cluster or global.

The operator MUST avoid applying incompatible migrations to only part of a fenced session ownership domain.

17. Observability

Required metrics:

opc_operator_reconcile_total{kind,outcome}
opc_operator_reconcile_duration_seconds{kind,phase}
opc_operator_rollout_total{kind,strategy,outcome}
opc_operator_migration_total{kind,type,outcome}
opc_operator_drain_total{kind,outcome}
opc_operator_drift_observations_total{kind,state}
opc_operator_rollback_total{kind,outcome}
opc_operator_version_skew{kind}

Required status fields:

current image digest,
desired image digest,
applied config revision,
applied config hash,
running schema digest,
last successful tx ID,
evidence bundle digest,
migration state.

18. Module Ownership

Module	Responsibility
`operator-lifecycle`	shared phase/condition composition
`operator-compat`	compatibility matrix parser/evaluator
`operator-config-apply`	validate-only, commit, readback
`operator-conversion`	CRD conversion webhook helpers
`operator-migration`	config/state migration orchestration
`operator-rollout`	rolling/canary/blue-green strategies
`operator-drain`	NF drain API clients and progress
`operator-backup`	backup/restore orchestration
`operator-testkit`	fake NF, fake config bus, fake session store

Agents must keep NF-specific reconcile logic behind interfaces and avoid duplicating phase/condition code.

19. Testing Requirements

19.1 Unit Tests

Compatibility matrix evaluation.
Phase transition reducer.
Condition reason stability.
CRD conversion round trips.
Migration chain selection.
Rollback eligibility.

19.2 Integration Tests

Install fresh NF.
Config-only update.
Image-only update.
Image plus config update.
Failed validate-only blocks rollout.
Drift detection and resync.
Canary success and failure.
Rollback with compatible config.

19.3 Fault Injection

Operator restart mid-rollout.
NF pod deleted during migration.
gNMI commit timeout.
Conversion webhook unavailable.
Backup failure.
Session drain timeout.
Partial multi-cluster rollout failure.

19.4 Performance Gates

Reconcile avoids hot loops under persistent failure.
1,000 lifecycle CRs do not exceed configured API QPS.
Drift compare for large config stays within budget.
Status update rate is bounded.

20. Acceptance Criteria

This RFC is implemented when:

Operator/NF/version compatibility is machine-readable and enforced.
Lifecycle phases and conditions are stable across all CNFs.
Config apply uses RFC 001 validate/commit/readback behavior.
CRD conversions are deterministic and tested.
YANG and state migrations are explicit and evidence-linked.
Stateful rollouts drain or transfer ownership before termination.
Rollback eligibility is evaluated before workload mutation.
Multi-cluster rollout status is explicit and safe.

OPC-SDK-RFC-010: Data Governance, Privacy, and Regulated Records

Status: Draft for Implementation
Version: 1.0.0
Date: 2026-05-19
Audience: security engineers, privacy reviewers, NF owners, LI/charging implementers, SREs

1. Abstract

This RFC defines the data governance substrate for OpenPacketCore CNFs. It standardizes classification, handling, redaction, retention, encryption, backup, export, audit, and evidence rules for subscriber identifiers, session records, charging data, lawful-intercept material, analytics, security logs, and management configuration.

The purpose is to ensure that every CNF treats sensitive telecom data consistently and that privacy behavior is implemented as an auditable platform contract, not as scattered per-NF convention.

2. Scope

2.1 In Scope

Data classification taxonomy.
SUPI/GPSI/MSISDN/IP address handling.
Charging, audit, lawful-intercept data classification, analytics, and session state records.
Redaction and pseudonymization.
Retention and deletion.
Backup and restore handling.
Export and external sink policy.
Tenant/slice/PLMN data boundaries.
Evidence and test requirements.

2.2 Out of Scope

Cryptographic key management internals. See RFC 003.
Session store consistency. See RFC 004.
Evidence bundle mechanics. See RFC 006.
Product lawful-intercept mediation, collection workflows, and target-specific LI policy engines. The SDK classifies and protects LI material; it does not implement an LI product subsystem.
Jurisdiction-specific legal interpretation.

3. Design Goals

3.1 Security

Minimize sensitive data exposure by default.
Encrypt regulated data at rest and in transit.
Prevent cross-tenant, cross-slice, and cross-PLMN data leakage.
Make audit and regulated exports tamper-evident.
Ensure backup and debug workflows preserve classification.

3.2 Performance

Redaction and classification must be cheap enough for hot-path logging.
High-volume telemetry must avoid high-cardinality raw identifiers.
Bulk retention jobs must be bounded and schedulable.
Analytics minimization must be profile-driven and measurable.

3.3 Maintainability

One classification vocabulary across all CNFs.
Generated redaction metadata from RFC 002 drives code behavior.
Retention policies are declarative through YANG.
Exceptions are structured known gaps or waivers.

3.4 Functionality

Support operational debugging without leaking raw subscriber data.
Support charging and audit records with correct retention.
Classify lawful-intercept material and keep it separated from ordinary telemetry, analytics, support bundles, and exports.
Support analytics minimization and privacy-preserving export.

4. Data Classification

4.1 Classes

Class	Examples	Default Handling
`public`	build version, static feature flags	log/export allowed
`operational`	readiness, queue depth, non-sensitive counters	log/export allowed with cardinality controls
`network-sensitive`	topology, NF instance IDs, peer FQDNs	restricted logs, auth-gated debug
`subscriber-id`	SUPI, IMSI, GPSI, MSISDN, PEI	redacted or keyed digest
`subscriber-session`	PDU session, TEID, SEID, IP address, QoS state	encrypted, access-controlled
`security-secret`	keys, tokens, credentials, OP/OPc/K	never logged, secret types
`charging-record`	CDR, usage, rating inputs	retained/exported by charging policy
`lawful-intercept`	warrant, target selectors, X2/X3 products	LI plane only
`analytics-sensitive`	NWDAF source events, location, behavior traces	minimized before export
`audit-regulated`	admin actions, break-glass, security events	tamper-evident retention

Each data field in generated models and hand-written domain types MUST be classified.

4.2 Classification Metadata

#![allow(unused)]
fn main() {
pub enum DataClass {
    Public,
    Operational,
    NetworkSensitive,
    SubscriberId,
    SubscriberSession,
    SecuritySecret,
    ChargingRecord,
    LawfulIntercept,
    AnalyticsSensitive,
    AuditRegulated,
}
}

Generated YANG metadata and Rust annotations MUST feed the same classification registry.

5. Identity and Pseudonymization

Raw SUPI/GPSI/MSISDN/PEI MUST NOT appear in:

metric labels,
info/warn/error logs,
ordinary traces,
backend keys,
Kubernetes Events,
unauthenticated debug output.

The default correlation form is a tenant-scoped keyed digest:

digest = HMAC(tenant_privacy_key, data_class || identifier_type || raw_value)

Digest keys MUST be purpose-separated from encryption keys. Rotating digest keys changes correlation IDs; this must be documented in operational runbooks.

6. Redaction

Redaction levels:

Level	Behavior
`drop`	omit the field entirely
`mask`	show fixed placeholder
`class`	show class and presence only
`length-class`	show approximate length bucket
`digest`	show keyed digest
`cleartext`	allowed only by explicit policy

cleartext is forbidden for security-secret and restricted for lawful-intercept.

Redaction MUST apply to:

logs,
traces,
metrics,
audit views,
admin/debug endpoints,
panic hooks,
error messages,
test snapshots committed to git.

7. Retention Policy

Each data class has a retention policy:

#![allow(unused)]
fn main() {
pub struct RetentionPolicy {
    pub class: DataClass,
    pub min_duration: Option<Duration>,
    pub max_duration: Option<Duration>,
    pub deletion_mode: DeletionMode,
    pub legal_hold_supported: bool,
    pub export_allowed: bool,
}
}

Retention MUST be configured through canonical YANG and surfaced in evidence.

Default posture:

operational telemetry: short retention,
audit-regulated: longer tamper-evident retention,
charging-record: charging policy retention,
lawful-intercept: legal/LI policy retention,
security-secret: no export, rotate/delete per key policy.

8. Legal Hold and Deletion

Legal hold prevents deletion of matching regulated records. It MUST:

be authenticated and authorized,
be audited,
include scope and expiry,
be visible to retention jobs,
not expose target selectors outside authorized LI/audit roles.

Deletion jobs MUST be idempotent and evidence-producing. They MUST avoid deleting records under legal hold.

9. Data Boundaries

The platform enforces boundaries by:

tenant,
slice/S-NSSAI,
PLMN,
region,
NF instance,
data class.

Every storage key, audit query, export job, and backup manifest MUST include boundary metadata. Cross-boundary export is denied by default.

10. Backups and Restore

Backups MUST preserve:

classification metadata,
encryption envelope metadata,
tenant and slice boundary,
retention policy,
legal hold flags,
manifest digests.

Restore MUST verify that destination tenant/slice/PLMN policy allows the data. Restoring LI or security-secret material into a different environment is denied unless an explicit recovery policy allows it.

11. Charging Records

Charging records are regulated operational records. CNFs that produce charging data MUST:

classify records as charging-record,
avoid raw identifiers in logs,
use durable, auditable write path,
support duplicate detection/idempotency,
expose export status,
test retention and replay behavior.

Charging exports MUST be signed or transmitted over authenticated channels.

12. Lawful Intercept Data

LI data is a special class with strict separation:

X1 management/control material,
X2 intercept-related information,
X3 content/user-plane products.

LI records MUST NOT share ordinary audit, telemetry, or debug paths unless the path is explicitly LI-authorized. LI selectors and products MUST be encrypted, audited, and retained according to LI policy.

CNFs that are not LI functions MUST NOT adopt LI vocabulary for ordinary analytics or operational telemetry.

13. Analytics and Privacy

Analytics-producing CNFs, especially NWDAF, MUST implement minimization before export.

Minimization methods:

field drop,
coarsening,
keyed hash,
aggregation threshold,
k-anonymity threshold,
differential privacy noise where policy requires it.

The active minimization policy version MUST be recorded with each analytics export.

14. Debug and Support Bundles

Support bundles MUST:

exclude secrets by default,
redact subscriber identifiers,
include manifest and classification summary,
require authorization,
be time-bounded,
be audited,
be signed or checksummed.

Debug packet captures are disabled by default and require explicit policy.

15. Configuration Model

Shared YANG groupings SHOULD include:

data-governance/classification-overrides
data-governance/retention
data-governance/export-policy
data-governance/legal-hold
data-governance/redaction
data-governance/support-bundle

NF-specific YANG can refine but not bypass the baseline.

16. Observability

Required metrics:

opc_data_records_total{class,operation,outcome}
opc_data_redactions_total{class,level}
opc_data_retention_deletions_total{class,outcome}
opc_data_legal_holds{class,state}
opc_data_exports_total{class,outcome}
opc_data_policy_version_info{class,version}
opc_data_privacy_minimization_total{method,outcome}

Metrics MUST NOT use raw subscriber identifiers as labels.

17. Evidence Requirements

RFC 006 evidence MUST include:

classification registry,
retention policy report,
redaction test report,
export policy report,
legal hold test report,
privacy minimization report for analytics NFs,
known gaps for any class not fully handled.

18. Module Ownership

Module	Responsibility
`opc-data-governance`	class registry, retention policy, legal-hold policy, and annotations
`opc-redaction`	redaction renderers and generated metadata adapter
`opc-privacy`	digesting, minimization, support bundle policy
`opc-export`	signed/exported data handling
`opc-evidence`	data-governance evidence reports and release gates
`opc-sdk-integration`	integration tests covering redaction, retention, export, and analytics policy

Agents implementing NF features must classify new fields before exposing logs, metrics, storage, or exports.

19. Testing Requirements

19.1 Unit Tests

Classification coverage.
Redaction levels.
Keyed digest stability.
Retention eligibility.
Legal hold blocks deletion.
Support bundle manifest redaction.

19.2 Integration Tests

NF logs contain no raw SUPI/GPSI/MSISDN.
Metrics reject high-cardinality raw labels.
Backup/restore preserves classification.
Export denied across tenant boundary.
Analytics minimization records policy version.

19.3 Fault Injection

Missing privacy digest key.
Retention job interrupted.
Export sink unavailable.
Backup manifest tampered.
Legal hold expiry during deletion.

19.4 Performance Gates

Hot-path redaction p99 under 5 microseconds for scalar identifiers.
Digest generation p99 under 25 microseconds.
Retention jobs respect configured I/O budget.
Metrics classification checks do not allocate on common paths.

20. Acceptance Criteria

This RFC is implemented when:

Every generated and hand-written sensitive field has a data class.
Raw subscriber identifiers do not appear in logs, metrics, traces, events, backend keys, or support bundles by default.
Retention and legal hold policies are declarative and tested.
Backups, restores, and exports preserve classification metadata.
LI data is separated from ordinary telemetry and analytics.
Analytics exports record minimization policy.
RFC 006 evidence reports classification, redaction, retention, and privacy behavior.

OPC-SDK-RFC-011: Node and Data-Plane Resource Contract

Status: Draft for Implementation
Version: 1.0.0
Date: 2026-05-19
Audience: UPF/data-plane engineers, platform engineers, Kubernetes operators, security reviewers

1. Abstract

This RFC defines the node, kernel, NIC, CNI, CPU, memory, and pod-security contract required by OpenPacketCore data-plane and signaling-heavy CNFs. It standardizes how CNFs request and verify SR-IOV, Multus, AF_XDP, XDP/eBPF, hugepages, NUMA alignment, CPU pinning, IRQ affinity, device plugins, kernel features, and pod security exceptions.

The goal is to make data-plane performance and privilege requirements explicit, admissible, testable, and portable across carrier Kubernetes environments.

2. Scope

2.1 In Scope

Node capability discovery.
Kubernetes scheduling/resource requests.
Multus and SR-IOV attachment contracts.
AF_XDP/XDP/eBPF requirements.
CPU pinning, NUMA, hugepages, and IRQ affinity.
Pod security exceptions and capability minimization.
Data-plane preflight and readiness.
Metrics and conformance tests for platform resources.

2.2 Out of Scope

Packet parser behavior. See RFC 005.
Session state consistency. See RFC 004.
Runtime task supervision. See RFC 008.
Vendor-specific NIC tuning beyond declared capability adapters.

3. Design Goals

3.1 Security

Grant only the minimum Linux capabilities needed by each CNF.
Bind privileged data-plane pods to explicitly labeled nodes.
Prevent untrusted workloads from using OpenPacketCore data-plane device resources.
Make kernel/eBPF program loading auditable.

3.2 Performance

Preserve CPU, cache, NUMA, NIC queue, and IRQ locality.
Avoid noisy-neighbor interference on data-plane cores.
Provide deterministic preflight before declaring readiness.
Expose packet drop and queue pressure metrics.

3.3 Maintainability

One shared contract for platform assumptions.
Per-NF specs declare deviations through structured resource profiles.
Device and kernel feature detection is reusable.
CI can verify chart/resource generation without real NICs.

3.4 Functionality

Support UPF AF_XDP fast path.
Support ePDG/N3IWF IPsec and tunnel workloads.
Support L4 UDP fan-in proxy.
Support SCTP-heavy AMF/SMS/IMS workloads.
Support lab mode without hardware acceleration.

4. Resource Profiles

Each CNF declares a resource profile:

#![allow(unused)]
fn main() {
pub enum DataPlaneProfile {
    ControlPlaneOnly,
    SignalingHeavy,
    KernelNetworking,
    AfXdpFastPath,
    SriovFastPath,
    IpsecGateway,
}
}

Profiles determine required node labels, capabilities, CNI attachments, and preflight checks.

IpsecGateway is a resource and admission profile only in the current SDK. It does not imply that this repository ships IKEv2, ESP, xfrm orchestration, or N3IWF/NWu procedure implementations. Those protocol crates are required for a selected ePDG/N3IWF/untrusted-access product target, but are not a blocker for the current AMF-lite/N2/N1 first-NF profile.

5. Node Capability Discovery

The platform MUST provide a node capability report:

node:
  kernel: "6.8.0"
  bpf:
    cap_bpf: true
    xdp_supported: true
    btf_available: true
  cpu:
    manager_policy: static
    isolated_cores: "2-15"
    numa_nodes: 2
  memory:
    hugepages_2Mi: 4096
    hugepages_1Gi: 8
  nics:
    - name: ens5f0
      driver: ice
      sriov_vfs: 16
      xdp_modes: ["native", "skb"]
      queues: 32

The operator or node agent MUST publish this through labels, annotations, or a custom resource.

6. Scheduling Contract

Data-plane CNFs MUST use:

node selectors for required hardware,
tolerations for dedicated nodes,
pod anti-affinity where replicas need failure-domain separation,
topology spread constraints,
resource requests/limits matching CPU Manager static policy,
hugepage requests where required,
device plugin resource requests for SR-IOV or specialized devices.

The operator MUST reject a lifecycle CR if no eligible node can satisfy the declared profile, unless lab mode allows software fallback.

7. CPU and NUMA

7.1 CPU Pinning

Data-plane workers SHOULD run on exclusive CPUs. Management and async control tasks MUST NOT run on those same pinned data-plane CPUs.

The runtime receives an explicit CPU allocation:

#![allow(unused)]
fn main() {
pub struct CpuLayout {
    pub data_plane_cores: Vec<CpuId>,
    pub control_plane_cores: Vec<CpuId>,
    pub management_cores: Vec<CpuId>,
    pub numa_node: Option<NumaNodeId>,
}
}

7.2 NUMA Locality

NIC queues, AF_XDP UMEM, hugepages, and worker threads SHOULD be NUMA-local. Preflight MUST warn or fail according to profile when locality is broken.

7.3 IRQ Affinity

The platform SHOULD pin NIC IRQs to the correct NUMA-local cores. The CNF MUST report IRQ affinity mismatches when detectable.

8. Memory and Hugepages

CNFs using DPDK-like or AF_XDP memory pools MUST declare:

hugepage size,
hugepage count,
per-queue buffer count,
max packet size,
headroom,
NUMA node.

The pod MUST request hugepages explicitly. Overcommitting data-plane memory is forbidden in production profiles.

9. Network Attachments

9.1 Multus

Each data-plane interface is a named attachment:

multus:
  n3:
    networkAttachmentDefinition: upf-n3
    interfaceName: n3
  n4:
    networkAttachmentDefinition: upf-n4
    interfaceName: n4
  n6:
    networkAttachmentDefinition: upf-n6
    interfaceName: n6

Canonical YANG defines interface roles; lifecycle CR values reference attachment objects only.

9.2 SR-IOV

SR-IOV profiles MUST define:

resource name,
VF trust/spoof-check settings,
VLAN policy,
link state policy,
allowed device drivers,
whether IPAM is static or dynamic.

The operator MUST validate that referenced SR-IOV resources are allowlisted for the NF kind.

10. AF_XDP and XDP/eBPF

AfXdpFastPath is a resource and admission profile only in the current SDK. It does not imply that this repository ships AF_XDP sockets, UMEM management, RX/TX rings, or packet I/O runtime support. Those crates are required for a selected UPF or other accelerated data-plane product target, but are not a blocker for the current AMF-lite/N2/N1 first-NF profile.

10.1 Kernel Requirements

AF_XDP fast-path profiles MUST declare:

minimum kernel version,
required BPF features,
required XDP mode,
required capabilities,
required maps and pin paths,
whether generic XDP fallback is allowed.

10.2 Capabilities

Allowed capabilities for AF_XDP profile:

CAP_BPF
CAP_NET_ADMIN
CAP_NET_RAW

CAP_SYS_ADMIN is forbidden in production profiles. If a kernel requires CAP_SYS_ADMIN, the node is not eligible.

10.3 eBPF Program Governance

eBPF programs MUST be:

built from source in release pipeline,
included in SBOM/evidence,
signed or digest-pinned,
loaded only from approved paths,
audited on load/unload,
pinned under controlled bpffs path.

11. Pod Security Exceptions

Baseline pod security remains:

run as non-root,
read-only root filesystem,
no privilege escalation,
drop all capabilities except explicit allowlist,
seccomp profile enabled,
AppArmor/SELinux profile where supported.

Every exception MUST be declared in:

per-NF spec,
Helm values,
operator admission policy,
RFC 006 evidence.

12. Data-Plane Preflight

Before readiness, data-plane CNFs MUST verify:

required interfaces exist,
link state is up where required,
MTU matches config,
NIC driver and queues match profile,
XDP attach succeeded,
BPF maps created,
hugepages allocated,
CPU layout applied,
session table initialized,
drop counters accessible.

Failures mark readiness false and emit alarms.

13. Lab and Fallback Modes

Lab mode MAY use:

veth instead of SR-IOV,
generic XDP instead of native XDP,
software packet path,
relaxed CPU pinning,
no hugepages.

Lab fallback MUST be visible in status and MUST NOT be silently used in production.

14. Observability

Required metrics:

opc_node_capability_info{node,kernel,profile}
opc_dataplane_interface_up{nf,interface}
opc_dataplane_rx_packets_total{nf,interface}
opc_dataplane_tx_packets_total{nf,interface}
opc_dataplane_drops_total{nf,interface,reason}
opc_dataplane_queue_fill_ratio{nf,interface,queue}
opc_dataplane_xdp_attach_total{nf,outcome}
opc_dataplane_bpf_map_entries{nf,map}
opc_dataplane_numa_mismatch{nf}
opc_dataplane_irq_affinity_mismatch{nf}

15. Configuration Model

Shared YANG groupings SHOULD include:

resources/cpu
resources/numa
resources/hugepages
resources/interfaces
resources/xdp
resources/sriov
resources/preflight

Lifecycle CRDs reference Kubernetes resource names; dense tuning lives in YANG.

16. Module Ownership

Module	Responsibility
`opc-node-capabilities`	node feature report parser/model
`opc-resource-admission`	operator resource validation
`opc-cpu-layout`	CPU/NUMA layout helpers
`opc-net-attach`	Multus/SR-IOV model helpers
`opc-af-xdp-platform`	AF_XDP preflight and map metadata
`opc-bpf-governance`	BPF artifact digest/load audit
`opc-resource-testkit`	fake node capabilities and chart tests

Agents implementing UPF or similar CNFs must consume these modules rather than hard-coding node assumptions.

17. Testing Requirements

17.1 Unit Tests

Node capability parsing.
Resource profile validation.
CPU layout validation.
SR-IOV allowlist policy.
Capability exception rendering.
Lab fallback status.

17.2 Integration Tests

Helm renders correct resource requests.
Operator rejects unsatisfied node profile.
AF_XDP preflight succeeds with fake capabilities.
Production profile rejects CAP_SYS_ADMIN.
Readiness false when required interface is missing.

17.3 Fault Injection

XDP attach failure.
Hugepage allocation failure.
NIC link down.
NUMA mismatch.
IRQ affinity mismatch.
Device plugin resource unavailable.

17.4 Performance Gates

Preflight completes within configured startup budget.
Data-plane metrics scrape does not stall packet workers.
Resource admission for 1,000 CNF CRs stays within operator API budget.

18. Acceptance Criteria

This RFC is implemented when:

Data-plane CNFs declare structured resource profiles.
Operator admission rejects unsatisfied production resource requirements.
CPU, NUMA, hugepage, NIC, and CNI assumptions are explicit.
AF_XDP/eBPF programs are governed by signed/digest-pinned artifacts.
Pod security exceptions are minimal and evidence-linked.
Readiness depends on data-plane preflight.
Lab fallback cannot silently enter production.

OPC-SDK-RFC-012: Common Testbed, Simulator, and Scenario Framework

Status: Draft for Implementation
Version: 1.0.0
Date: 2026-05-19
Audience: test engineers, NF implementers, conformance owners, SREs

1. Abstract

This RFC defines the shared OpenPacketCore testbed and simulator framework. It standardizes reusable peer simulators, virtual time, traffic scenarios, protocol fixtures, conformance packs, chaos hooks, load profiles, and evidence output.

The purpose is to prevent every CNF from building isolated mocks that cannot compose into end-to-end 5G scenarios. The framework lets multiple contributors implement NFs independently while verifying them against the same scenario language and peer behavior.

2. Scope

2.1 In Scope

Peer simulators for UE, gNB, AMF, SMF, UPF, NRF, AUSF, UDM, PCF, NSSF, SCP, SEPP, SMSC, and other core peers.
Protocol fixture management and PCAP replay.
Virtual time and deterministic timers.
Scenario DSL.
Conformance scenario packs.
Load and soak profiles.
Chaos and fault injection hooks.
Evidence output for RFC 006.

2.2 Out of Scope

Production NF logic.
Standards certification by external bodies.
Full radio access network simulation beyond interfaces required for core testing.

3. Design Goals

3.1 Security

Test secrets must be synthetic and clearly marked.
Fixtures containing real subscriber data are forbidden.
Negative tests must cover malformed and hostile peer behavior.
Testbed artifacts must not weaken production code paths.

3.2 Performance

Simulators must support both deterministic unit-scale tests and high-rate load tests.
Virtual time should make timer-heavy procedures fast and deterministic.
Load profiles must be reproducible.

3.3 Maintainability

One scenario DSL across all CNFs.
Reusable protocol fixtures and peer simulators.
Test evidence links back to RFC 006 requirement IDs.
Each simulator has a documented fidelity level.

3.4 Functionality

Support component, integration, end-to-end, conformance, chaos, and performance testing.
Support both in-process and Kubernetes-deployed test modes.
Support golden traces and expected state assertions.

4. Crate and Tooling Layout

crates/opc-testbed/
  src/
    lib.rs
    scenario.rs
    virtual_time.rs
    assertions.rs
    fixtures.rs
    pcap.rs
    load.rs
    evidence.rs
    chaos.rs
    simulators/
      nrf.rs
      amf.rs
      smf.rs
      upf.rs
      epc.rs
      gnb.rs
      ue.rs
      ausf.rs
      udm.rs
      pcf.rs
      nssf.rs
      scp.rs
      sepp.rs

Each NF MAY also provide opc-<nf>-testkit, but NF testkits SHOULD build on opc-testbed.

5. Scenario DSL

Scenarios are declarative:

id: AMF-REG-001
title: UE registration success
requirements:
  - REQ-3GPP-TS23502-R17-4.2.2-001
topology:
  nfs:
    amf: { image: opc-amf:test }
    nrf: { simulator: nrf-basic }
    ausf: { simulator: ausf-5g-aka }
    udm: { simulator: udm-auth-sdm }
steps:
  - send_ngap:
      from: gnb-1
      to: amf
      message: InitialUEMessage.registration_request
  - expect_sbi:
      from: amf
      to: ausf
      operation: Nausf_UEAuthentication.Authenticate
  - expect_ngap:
      from: amf
      to: gnb-1
      message: InitialContextSetupRequest
assertions:
  - amf.ue_context.state == REGISTERED

The DSL MUST be versioned and schema-validated.

6. Simulator Fidelity Levels

Level	Meaning
`stub`	fixed responses only
`stateful-mock`	protocol-aware state machine, simplified
`procedure-faithful`	follows normative procedure enough for conformance
`load-model`	optimized for traffic generation
`adversarial`	emits malformed, delayed, duplicated, or hostile behavior

Every simulator MUST declare its fidelity level per interface.

7. Virtual Time

The testbed MUST provide a virtual clock compatible with RFC 008 runtime clocks.

Use cases:

NAS timers,
PFCP heartbeat,
NRF heartbeat,
retry/backoff,
session lease expiry,
SMS retry/expiry,
retention jobs.

Tests MUST NOT sleep real time for long protocol timers when virtual time can advance deterministically.

8. Protocol Fixtures and PCAP

Fixtures MUST include:

source standard reference,
release/version,
generation tool or capture provenance,
whether synthetic or captured,
sanitization status,
expected decode result,
linked requirement IDs.

Real customer/subscriber captures are forbidden in the public repository.

PCAP replay MUST support:

timestamp-preserving mode,
accelerated mode,
deterministic mode,
packet mutation for fuzz-style tests.

9. Peer Simulators

Minimum simulator set:

UE/NAS procedure driver.
gNB/NGAP over SCTP driver.
NRF SBI simulator.
AUSF/UDM auth and subscription simulators.
SMF/UPF/PFCP simulator pair.
EPC and untrusted-access peer skeletons such as PGW S2b and Diameter metadata peers. These must consume SDK protocol-crate decoded views and must not introduce local product parsers.
PCF policy simulator.
NSSF slice selection simulator.
SCP routing simulator.
SEPP partner simulator.
SMSC/SMSF/SMPP simulators.

Simulators MUST expose deterministic state assertions.

10. Test Modes

Mode	Purpose
`in-process`	fast component integration
`multi-process`	local network behavior
`kind`	Kubernetes operator/chart validation
`hardware-lab`	SR-IOV/AF_XDP/real NIC validation
`chaos`	failure injection
`soak`	long-running reliability

The same scenario SHOULD run in multiple modes where practical.

11. Fault Injection

Faults:

packet loss,
reordering,
duplication,
malformed protocol messages,
delayed responses,
peer restart,
NRF outage,
token expiry,
backend timeout,
clock skew,
node drain,
network partition.

Faults MUST be declarative in scenarios and evidence-linked.

12. Load Profiles

Load profiles define:

arrival distribution,
subscriber population,
slice distribution,
DNN distribution,
session duration,
mobility/handover rate,
message mix,
target throughput,
duration,
pass/fail SLOs.

Profiles MUST be reproducible from seeds.

13. Assertions

Assertions may target:

protocol messages,
SBI calls,
config state,
session store records,
metrics,
logs,
traces,
alarms,
Kubernetes status,
evidence output.

Assertions MUST avoid depending on nondeterministic ordering unless explicitly marked.

14. Evidence Output

Each scenario run emits:

{
  "scenario_id": "AMF-REG-001",
  "requirements": ["REQ-..."],
  "mode": "kind",
  "seed": 1234,
  "artifacts": ["trace.json", "metrics.prom", "events.json"],
  "outcome": "pass"
}

RFC 006 consumes these records for conformance reports.

15. Security and Privacy Rules

The testbed MUST:

generate synthetic subscriber identities,
mark all test keys as non-production,
reject fixture import without sanitization metadata,
prevent real bearer tokens from being stored in artifacts,
redact logs and traces through RFC 010 redaction.

16. Module Ownership

Module	Responsibility
`opc-testbed-scenario`	DSL schema, parser, executor
`opc-testbed-time`	virtual clock and timer control
`opc-testbed-fixtures`	fixture registry and provenance
`opc-testbed-pcap`	PCAP replay and mutation
`opc-testbed-sim-nrf`	NRF simulator
`opc-testbed-sim-ran`	UE/gNB/NAS/NGAP drivers
`opc-testbed-sim-sbi`	generic SBI producer/consumer mock
`opc-testbed-chaos`	failure injection
`opc-testbed-evidence`	RFC 006 result emission

Agents implementing a new NF must add scenarios before declaring conformance.

17. Testing Requirements

17.1 Unit Tests

DSL schema validation.
Virtual time advancement.
Fixture provenance validation.
Deterministic seed behavior.
Assertion engine.

17.2 Integration Tests

Scenario runs against fake NF.
Mock NRF discovery and token flow.
PCAP replay into protocol parser.
Kind-mode lifecycle install and readiness.
Evidence JSON emitted and validated.

17.3 Fault Injection Tests

Peer timeout.
Malformed message.
Duplicate message.
Clock skew.
Node drain in kind.
Backend outage.

17.4 Performance Gates

In-process scenarios start under 100 milliseconds.
Virtual-time timer tests avoid long real sleeps.
Load generator reports achieved TPS and latency.
Scenario artifacts remain within configured size budgets.

18. Acceptance Criteria

This RFC is implemented when:

A versioned scenario DSL exists.
Shared peer simulators cover core 5G procedures.
Virtual time is integrated with runtime/test clocks.
Fixtures carry provenance and sanitization metadata.
Scenarios emit RFC 006 evidence records.
NF testkits build on the shared framework.
Conformance and chaos scenarios are reusable across local and Kubernetes modes.

OPC-SDK-RFC-013: Fault Management and Alarm Substrate

Status: Draft for Implementation
Version: 1.0.0
Date: 2026-05-19
Audience: SREs, NF implementers, operator authors, observability engineers

1. Abstract

This RFC defines the OpenPacketCore fault management and alarm substrate. It standardizes alarm identity, severity, probable cause, affected object, raise/update/clear semantics, deduplication, suppression, correlation, Kubernetes condition mapping, gNMI/NETCONF notification projection, external fault-management sink integration, and evidence requirements.

Metrics, logs, and traces describe behavior. Alarms describe actionable service faults. Carrier CNFs need both.

2. Scope

2.1 In Scope

Alarm model and lifecycle.
Severity and probable-cause taxonomy.
Affected-object naming.
Raise, update, clear, acknowledge, suppress.
Alarm correlation and deduplication.
Mapping to Kubernetes conditions and events.
Mapping to gNMI/NETCONF notifications.
External FM sink integration.
Alarm metrics, audit, and tests.

2.2 Out of Scope

Full OSS/BSS ticketing implementation.
Vendor-specific FM protocols unless implemented as adapters.
Raw log aggregation.
Performance SLO alerting rules outside CNF-generated alarms.

3. Design Goals

3.1 Security

Alarms must not leak secrets or raw subscriber identifiers.
Alarm administration must be authorized.
Suppression and acknowledgement are audited.
LI/security alarms must preserve regulated handling boundaries.

3.2 Performance

Raising an alarm must be cheap and non-blocking.
Alarm storms must be deduplicated and rate-limited.
External sink outages must not block packet or request handling.

3.3 Maintainability

One alarm vocabulary across all CNFs.
Stable alarm IDs and probable causes.
Generated YANG notification projection.
Shared testkit for alarm lifecycle.

3.4 Functionality

Support active and historical alarms.
Support severity changes.
Support clear conditions.
Support suppression windows.
Support external sinks and local query.

4. Alarm Model

#![allow(unused)]
fn main() {
pub struct Alarm {
    pub alarm_id: AlarmId,
    pub alarm_type: AlarmType,
    pub severity: Severity,
    pub probable_cause: ProbableCause,
    pub affected_object: AffectedObject,
    pub tenant: Option<TenantId>,
    pub slice: Option<Snssai>,
    pub region: Option<RegionId>,
    pub text: RedactedText,
    pub details: AlarmDetails,
    pub raised_at: Timestamp,
    pub updated_at: Timestamp,
    pub cleared_at: Option<Timestamp>,
    pub correlation_id: Option<CorrelationId>,
}
}

AlarmId MUST be stable for the same active fault instance.

5. Severity

Severity levels:

Severity	Meaning
`critical`	service outage, data loss, security boundary failure
`major`	serious degradation or redundancy loss
`minor`	limited impairment with workaround
`warning`	approaching fault or policy exception
`indeterminate`	fault detected but impact unknown
`cleared`	fault no longer active

Severity mapping MUST be consistent across CNFs.

6. Probable Cause Taxonomy

The SDK maintains a versioned taxonomy:

config-apply-failed
config-drift-detected
certificate-expiring
certificate-expired
identity-unavailable
authorization-policy-invalid
session-store-unavailable
lease-lost
backend-timeout
nrf-unreachable
sbi-overload
peer-unreachable
packet-drop-threshold
dataplane-preflight-failed
storage-corruption
audit-chain-invalid
key-unavailable
li-delivery-failed
charging-export-failed
privacy-policy-violation

Per-NF causes may be added but MUST be namespaced.

7. Affected Object

Affected objects use structured names:

#![allow(unused)]
fn main() {
pub enum AffectedObject {
    NfInstance { kind: NfKind, instance: InstanceId },
    Interface { nf: InstanceId, name: String },
    Peer { nf: InstanceId, peer_id: String },
    SessionStore { nf: InstanceId, shard: Option<String> },
    Slice { snssai: Snssai },
    Tenant { tenant: TenantId },
    Certificate { key_id: KeyId },
    DataPlaneQueue { nf: InstanceId, interface: String, queue: u16 },
}
}

Raw subscriber identifiers MUST NOT be affected-object names.

8. Alarm Lifecycle

States:

raised
updated
acknowledged
suppressed
cleared
expired

Lifecycle rules:

A repeated raise with same dedup key updates the active alarm.
Clear requires a matching active alarm or creates a no-op metric.
Acknowledgement does not clear.
Suppression does not delete history.
Severity downgrade is an update, not clear plus raise.

9. Deduplication and Correlation

Dedup key:

alarm_type || probable_cause || affected_object || tenant || slice

Correlation groups related alarms, such as:

NRF unavailable causing SBI discovery failures.
certificate expiry causing mTLS failures.
session store outage causing lease lost alarms.

Correlation MUST NOT hide critical alarms; it only helps presentation.

10. Suppression

Suppression may be:

maintenance window,
known outage,
test mode,
dependency alarm correlation.

Suppression requires authorization and audit. Security-critical alarms SHOULD not be suppressible unless carrier policy explicitly allows it.

11. Storage

The alarm store MUST support:

active alarm query,
historical alarm query,
append-only lifecycle events,
bounded retention,
tenant/slice filtering,
tamper-evident audit for admin actions.

Local storage may use RFC 001 persistence for management alarms. High-volume alarm history SHOULD be exported to an external FM system.

12. Projection to Kubernetes

Alarms map to Kubernetes Conditions and Events:

critical/major active alarms can drive Ready=False or Degraded=True according to NF policy,
warning alarms usually do not change readiness,
clear events update conditions when no other active alarm holds the state.

Condition reason strings MUST be stable.

13. Projection to gNMI/NETCONF

The alarm subsystem MUST expose:

active alarms operational tree,
alarm history operational tree,
notifications for raise/update/clear,
authorized acknowledge/suppress operations.

YANG notification generation SHOULD use RFC 002 metadata and RFC 006 evidence tags.

14. External FM Sinks

Sink adapters:

webhook,
Kafka/NATS,
OpenTelemetry events,
SNMP/NETCONF adapter where needed,
carrier OSS adapter.

External sink failure MUST:

raise a sink alarm,
buffer within limits if policy allows,
never block fast paths,
expose drop counters.

15. Alarm Sources

Common sources:

RFC 001 config commit failures,
RFC 003 identity/key/cert failures,
RFC 004 session store and lease failures,
RFC 007 SBI overload/discovery failures,
RFC 008 runtime task failures,
RFC 009 lifecycle migration failures,
RFC 011 data-plane preflight and drop thresholds,
RFC 010 privacy/legal-hold/export failures.

16. Observability

Required metrics:

opc_alarm_active{severity,cause}
opc_alarm_events_total{event,severity,cause}
opc_alarm_suppressed_total{cause}
opc_alarm_sink_delivery_total{sink,outcome}
opc_alarm_sink_queue_depth{sink}
opc_alarm_clear_without_active_total{cause}

Alarm text MUST be redacted through RFC 010.

17. Configuration Model

Shared YANG groupings SHOULD include:

alarms/severity-policy
alarms/suppression
alarms/sinks
alarms/retention
alarms/readiness-impact
alarms/correlation

Per-NF YANG may add alarm thresholds, such as packet drop ratio or peer outage duration.

18. Module Ownership

Module	Responsibility
`opc-alarm-model`	alarm structs, severity, causes
`opc-alarm-store`	active/history store
`opc-alarm-manager`	raise/update/clear/dedup
`opc-alarm-policy`	suppression and readiness impact
`opc-alarm-k8s`	condition/event mapping
`opc-alarm-yang`	gNMI/NETCONF operational projection
`opc-alarm-sink`	external sink adapters
`opc-alarm-testkit`	alarm lifecycle fixtures

Agents adding new alarms must add taxonomy entries, tests, and evidence tags.

19. Testing Requirements

19.1 Unit Tests

Dedup key stability.
Severity transition.
Clear behavior.
Suppression authorization.
Redaction.
Readiness impact policy.

19.2 Integration Tests

Runtime task failure raises alarm.
Alarm maps to Kubernetes condition.
Alarm notification appears on gNMI subscription.
External sink receives raise/update/clear.
Sink outage buffers or drops according to policy.

19.3 Fault Injection

Alarm storm.
Sink outage.
Store unavailable.
Unauthorized suppression attempt.
Duplicate raise from many tasks.

19.4 Performance Gates

Alarm raise common path does not block longer than 100 microseconds.
Alarm storm of 10,000 duplicate events deduplicates without unbounded memory.
External sink outage does not impact protocol request p99.

20. Acceptance Criteria

This RFC is implemented when:

Every CNF uses shared alarm model and manager.
Alarm severity and probable cause taxonomy are stable and versioned.
Raise/update/clear semantics are deterministic.
Kubernetes conditions and events are derived consistently.
gNMI/NETCONF alarm operational state and notifications are available.
Suppression and acknowledgement are authorized and audited.
External sink failures do not block service paths.
Alarm behavior is covered by shared testkit and evidence.

Architecture Decision Records

This directory contains accepted and proposed architecture decisions for the OpenPacketCore SDK hardening and management-plane work.

ADRs are the durable record of architectural intent. The audit completion reports and implementation status matrix record what was validated; these ADRs record why the shape of the SDK is what it is. Proposed ADRs are included here when they gate in-progress work, but they do not authorize implementation until accepted.

Index

ADR	Decision
0001	Config management is secure by default, commit-confirmed, audited, and explicitly authorized.
0002	Config persistence HA uses `ConsensusConfigStore` with Raft-style quorum safety, authenticated transport, durable membership, and snapshot integrity.
0003	Authoritative session state uses quorum ordered-log replication with majority-supported repair, not standalone SQLite HA.
0004	Production identity, TLS, keys, and audit integrity are explicit SDK substrates with fail-closed adapters.
0005	Runtime health, admin/probe routes, metrics, and alarms are shared SDK surfaces with production authorization and redaction.
0006	Storage, security, runtime, HA, and release evidence are validated through fail-closed fault injection.
0007	Operator lifecycle policy logic lives in Rust SDK crates as reusable policy engines.
0008	Kubernetes operator integration is demonstrated by a Go reference harness without becoming a product CNF operator.
0009	Production data-plane claims require explicit node-resource, BPF, pod-security, and fallback validation.
0010	RFC 006 evidence, SBOM/VEX, provenance, bundle verification, performance baselines, and gates are first-class release inputs.
0011	`opc-amf-lite` is the SDK vertical integration proof, not a product NF.
0012	Diagnostics safety and privacy governance boundaries are structured, fail-closed, and compile-gated.
0013	NGAP requires generated ASN.1 APER code; hand-written and FFI codecs are rejected.
0014	rustls/tokio-only dependency policy, no gRPC stack in SDK crates, and a measured (not aspirational) MSRV.
0015	Protocol codecs are proven against spec-authored byte fixtures, never only their own encoder output.
0016	(proposed) `tonic`/`prost` are permitted only for `opc-gnmi-server` as the ADR 0014 §3 exception; core SDK crates stay gRPC-free.
0017	Explicitly allowlisted Linux kernel UAPI sys crates, including `opc-libsctp-sys` and `opc-linux-xfrm-sys`, hold all `unsafe` UAPI FFI; this OS-transport exception to ADR 0014 §8 does not reopen ADR 0013's rejection of foreign C codec FFI.
0018	EPC and untrusted-access additions are limited to SDK-owned reusable mechanisms; product policy, deployment defaults, ePDG orchestration, and carrier-readiness claims remain product-owned.

ADR 0001: Secure Config Management

Status

Accepted

Date

2026-06-08

Context

The SDK exposes shared configuration management primitives that downstream CNFs will use for production configuration changes. Early helper APIs made it too easy to wire allow-all authorization or treat commit-confirmed behavior as a test-only convention.

For carrier deployments, configuration writes must be explicit, authorized, recoverable, and auditable. Pending configuration must either be confirmed before its deadline or roll back to a confirmed point without silently accepting unsafe state.

Decision

Configuration management is secure by default:

Production-facing ConfigBus constructors require an explicit ConfigAuthorizer.
Allow-all construction is limited to clearly named dev/test helpers.
Commit-confirmed state is persisted durably with deadline metadata.
Expired pending commits roll back to a previous confirmed configuration.
Failed rollback or failed confirmation fences the bus into recovery-required state instead of allowing further writes.
Configuration audit records are persisted after redaction and protected by a hash chain/HMAC.

Consequences

Downstream CNFs must provide an authorization adapter rather than relying on SDK defaults. Tests can still use dev-only allow-all constructors, but production call sites are visibly different.

Rollback and recovery behavior is now part of the SDK contract. Operators can recover from failed commits, but they cannot pretend a pending or failed commit is a confirmed production state.

Evidence

crates/opc-config-bus/src/lib.rs
crates/opc-persist/src/backend.rs
crates/opc-persist/tests/persist.rs
docs/implementation-status.md

ADR 0002: Config Store Consensus HA

Status

Accepted

Date

2026-06-08

Context

Single-node SQLite persistence is not acceptable for carrier HA configuration claims. The SDK needed a production HA config persistence path with leader fencing, majority commit behavior, restart recovery, and authenticated transport. It also needed to make clear that standalone SQLite remains a development, lab, conformance, or explicitly accepted edge/single-replica profile.

Decision

High-availability configuration persistence is provided by ConsensusConfigStore.

The consensus backend uses:

Durable cluster membership and node identity checks.
Leader election, current-term no-op gating, and majority write commitment.
Linearizable read verification instead of follower-local reads.
Authenticated mTLS/SPIFFE transport using shared identity/TLS substrates.
Controlled TCP server lifecycle with bounded concurrency, read timeouts, and explicit shutdown.
Snapshot persistence and HMAC verification.
Non-voter catch-up and promotion guards for membership changes.
Metrics and chaos/failover tests for partitions, restart, rejoin, and stale leader behavior.

Consequences

Config HA is a quorum system, not a property of SQLite. Any production claim must use the consensus backend or an equivalent adapter that satisfies the same contract.

The SDK accepts additional operational complexity so correctness is explicit: membership, certificates, node identity, quorum availability, and recovery state all become deployment responsibilities.

Evidence

crates/opc-persist/src/consensus.rs
crates/opc-persist/tests/consensus_tests.rs
crates/opc-persist/tests/tcp_consensus_tests.rs
docs/ha-design.md
docs/consensus-operator-runbook.md

ADR 0003: Session Store Quorum Replication

Status

Accepted

Date

2026-06-08

Context

Authoritative telecom session state cannot rely on single-node storage, wall-clock last-writer-wins, or best-effort replica repair. Session records need monotonic fencing, compare-and-set semantics, TTL handling, watch resume support, and stale replica recovery.

Decision

Authoritative session HA is implemented as quorum ordered-log replication in QuorumSessionStore.

The session store contract includes:

Monotonic fences and CAS for authoritative writes.
Durable ordered replication logs for lease acquire, renew, release, CAS, delete, TTL refresh, and batch operations.
Idempotent replay using log position, generation, fence, and transaction ID.
Majority-supported committed-prefix repair for stale or divergent replicas.
Watch/change-stream resume cursors.
Partial-quorum write rollback to prevent failed writes from resurrecting during later catch-up.
Truthful capability reporting so standalone SQLite does not claim replicated behavior.

Consequences

Standalone SqliteSessionBackend remains useful as a durable local backend, but it is not HA. Production CNFs that need authoritative session HA must use QuorumSessionStore or an equivalent replicated profile.

The SDK favors fail-closed reads over returning divergent session state when a majority cannot agree.

Evidence

crates/opc-session-store/src/quorum.rs
crates/opc-session-store/src/sqlite.rs
crates/opc-session-testkit/
docs/ha-design.md
docs/operator-readiness.md

ADR 0004: Security Identity, Keying, And Audit Integrity

Status

Accepted

Date

2026-06-08

Context

The SDK needs reusable production security substrates rather than bespoke per-CNF wiring. Identity, mTLS transport, key retrieval, audit redaction, and tamper evidence must be consistent across config, session, persistence, alarm, and operator-facing paths.

Decision

Production security uses explicit shared adapters:

opc-identity watches SPIFFE SVIDs and trust bundles.
opc-tls builds reloadable mTLS client/server configurations from identity material.
opc-key provides durable KmsKeyProvider adapters over authenticated KMS transports or local Unix-socket agents.
Memory key providers remain deterministic test/conformance adapters.
Persistence audit records redact sensitive values before storage and before hash-chain/HMAC material is calculated.
Alarm administration uses NACM-backed authorization and durable audit sinks.

Consequences

Production deployments must supply real identity and KMS infrastructure. Unauthenticated TCP KMS and in-memory keys are not production key sources.

Security failures should fail closed and surface sanitized errors rather than leaking paths, SQL details, PEM material, keys, subscriber identifiers, or network addresses.

Evidence

crates/opc-identity/
crates/opc-tls/
crates/opc-key/
crates/opc-persist/src/backend.rs
crates/opc-alarm/src/nacm_adapter.rs
crates/opc-alarm/src/persist_adapter.rs

ADR 0005: Runtime Observability And Admin Probes

Status

Accepted

Date

2026-06-08

Context

Production CNFs need consistent runtime health, readiness, metrics, alarm visibility, and debug/admin routes. These surfaces must be shared and redaction-safe, not reimplemented by each NF.

Decision

Runtime observability is a shared SDK surface:

opc-runtime owns liveness, readiness, startup, debug, and admin route semantics.
Production and lab admin/probe/debug endpoints require bearer token authorization.
/metrics exports Prometheus text through a shared SdkMetrics registry.
Metrics use low-cardinality, redaction-safe labels.
Runtime, ConfigBus, persistence, session store, NACM, and alarms report counters/gauges/histograms through the shared metrics surface.
Runtime failures and drain failures raise SDK-managed alarms.

Consequences

Downstream CNFs should wire the SDK runtime and metrics instead of creating incompatible health/admin conventions.

Debug endpoints are production-controlled operational surfaces. They must never expose raw configs, tokens, SQL, file paths, certificate material, subscriber IDs, or other sensitive data.

Evidence

crates/opc-runtime/src/admin.rs
crates/opc-runtime/src/health.rs
crates/opc-redaction/src/metrics.rs
crates/opc-sdk-integration/tests/observability.rs
docs/operator-readiness.md

ADR 0006: Fail-Closed Fault Injection Validation

Status

Accepted

Date

2026-06-08

Context

Happy-path tests are insufficient for SDK stability claims. Storage, KMS, SPIFFE, consensus, session replication, runtime, and evidence release gates all have failure modes where unsafe behavior can look like success unless tested directly.

Decision

The SDK validates production safety with explicit fault injection and chaos tests:

Persistence can simulate disk full, fsync/write failure, corrupt database, corrupt WAL, failed rollback target load, failed rollback point creation, and audit-chain corruption.
Config and session HA are tested under partitions, crashes, stale leaders, stale fences, rejoin/catch-up, split-brain healing, and partial writes.
SPIFFE and KMS are tested under expiry, rotation, bundle removal, timeout, and unavailability.
Runtime and admin routes are tested for authentication, malformed requests, timeouts, and redaction.
Release gates are tested for missing evidence, malformed JSON, dirty provenance, missing signatures, tampered bundles, and unsafe evidence values.

Consequences

Test-only fault hooks are acceptable when explicitly gated and named as dangerous test hooks. Production APIs should not expose fault injection knobs.

Regression tests must prefer fail-closed assertions: no publish, no partial commit, no stale promotion, no sensitive error leak, and no unsafe readiness claim.

Evidence

crates/opc-sdk-integration/tests/fault_injection.rs
crates/opc-security-testkit/
crates/opc-session-testkit/
crates/opc-evidence/tests/evidence_pipeline.rs

ADR 0007: Operator Lifecycle Rust Policy Core

Status

Accepted

Date

2026-06-08

Context

The SDK is not a product operator, but downstream CNF operators need common policy decisions for compatibility, admission, configuration apply, migration, drain, rollback, and fleet status. Those policy decisions should be reusable from Rust SDK code and Go Kubernetes operators.

Decision

Operator lifecycle policy lives in Rust SDK crates:

operator-lifecycle owns lifecycle phases, admission checks, compatibility matrix policy, config-apply decisions, and rollback constraints.
operator-controller owns deterministic conversion helpers, migration plan execution, drain client orchestration, and multi-cluster status aggregation.
Policy functions use structured inputs/outputs and fail closed on unknown, malformed, stale, or unsupported state.
Error messages are sanitized before crossing operator or webhook boundaries.

Consequences

The SDK can expose consistent policy decisions to multiple operator implementations without forcing all Kubernetes code into Rust.

Rust lifecycle crates do not deploy workloads by themselves. Product CNF operators still own reconciliation of Deployments, StatefulSets, Services, protocol-specific CRDs, and live cluster behavior.

Evidence

crates/operator-lifecycle/
crates/operator-controller/
crates/operator-lifecycle-cli/
docs/operator-readiness.md

ADR 0008: Go Reference Operator Boundary

Status

Accepted

Date

2026-06-08

Context

The original repository direction is polyglot: SDK core behavior is Rust, while Kubernetes operator integration should use Go controller-runtime, which is the first-class Kubernetes operator ecosystem. At the same time, this repository is an SDK, not an AMF/SMF/UPF product operator.

Decision

The repository includes a Go reference operator harness under operators/sdk-reference-operator.

The Go harness demonstrates:

CRD API versions and conversion wiring.
Validating webhook integration.
Controller reconciliation shape and status updates.
Kustomize/RBAC/cert-manager/manager manifests.
A Go-to-Rust JSON CLI bridge to operator-lifecycle-cli.

The harness is explicitly not a production CNF operator and does not encode product-specific reconciliation.

Consequences

Downstream CNF teams get a concrete Go integration pattern without importing product behavior into the SDK repository.

Reference tests use Go unit tests, fake-client controller/webhook tests, rendered Kustomize manifests, and Rust CLI contract tests. Product CNF operators must add envtest, kind, and real-cluster end-to-end tests around their own reconciliation logic.

Manager images must package both the Go manager binary and the Rust operator-lifecycle-cli, or set OPERATOR_LIFECYCLE_CLI_PATH to a valid CLI location.

Evidence

operators/sdk-reference-operator/
crates/operator-lifecycle-cli/
docs/operator-readiness.md
docs/implementation-status.md

ADR 0009: Platform Preflight Resource Contract

Status

Accepted

Date

2026-06-08

Context

Carrier CNFs often depend on CPU isolation, NUMA locality, hugepages, NIC capabilities, SR-IOV, AF_XDP/eBPF, CNI behavior, and pod-security exceptions. These assumptions cannot remain tribal knowledge or comments in deployment manifests.

Decision

Production data-plane readiness is an explicit SDK contract:

opc-node-resources models resource profiles and node capability reports.
CPU manager, topology manager, isolated/reserved CPU sets, NUMA mappings, hugepage pools, NIC capabilities, and data-plane interfaces are validated.
AF_XDP/eBPF artifacts require digest pinning, signer/evidence identity, program type, attach point, and allowed capability checks.
Pod-security exceptions must be minimal and evidence-linked.
Lab/dev fallback paths fail closed in production.
Operator admission and config-apply paths consume the preflight report.

Consequences

Production manifests must provide explicit resource profiles and node capability evidence. If evidence is absent, stale, or incompatible, the SDK policy blocks rollout instead of silently downgrading to lab behavior.

The Go reference operator projects this contract into CRD fields but does not replace product-specific operator resource management.

Evidence

crates/opc-node-resources/src/lib.rs
crates/operator-lifecycle/src/admission.rs
crates/operator-lifecycle/src/config_apply.rs
operators/sdk-reference-operator/api/

ADR 0010: Release Assurance Evidence Pipeline

Status

Accepted

Date

2026-06-08

Context

The SDK needs release evidence that is machine-readable and fail-closed. Manual claims like "tests passed" are insufficient for conformance, supply-chain assurance, and auditability.

Decision

opc-evidence is the RFC 006 release-assurance pipeline.

It provides:

Source extraction for RFC 006 tags such as @spec, @req, @conformance, @gap, @security, @performance, and @test.
Deterministic CycloneDX SBOM generation from local Cargo manifests and lock data.
VEX policy result and record validation.
SLSA/in-toto-style provenance tied to commit, builder, input materials, output digests, and dirty/clean worktree state.
Bundle assembly and verification with canonical manifest signing bytes.
Signer/verifier traits and deterministic in-process test signing.
Performance baseline schema with redaction-safe environment metadata and regression status.
PR/release gate policy that fails closed on missing evidence, missing signatures, tampering, mismatched commits, dirty release provenance, malformed JSON, or unsafe evidence content.

Consequences

Release pipelines must treat evidence artifacts as required inputs, not as optional reports.

Real Sigstore/Cosign keyless signing remains an external signer adapter boundary. The SDK owns the signing/verifier interface and test verifier, not a hard dependency on one hosted signing provider.

Evidence

crates/opc-evidence/src/extract.rs
crates/opc-evidence/src/sbom.rs
crates/opc-evidence/src/vex.rs
crates/opc-evidence/src/provenance.rs
crates/opc-evidence/src/bundle.rs
crates/opc-evidence/src/performance.rs
crates/opc-evidence/src/policy.rs
crates/opc-evidence/tests/evidence_pipeline.rs

ADR 0011: First NF Vertical Proof

Status

Accepted

Date

2026-06-08

Context

The SDK needed proof that its seams compose in a real NF-shaped control-plane slice. Toy examples can validate local APIs, but they do not prove that runtime, config, session, identity, KMS, NACM, alarms, metrics, and HA recovery work together.

Decision

opc-amf-lite is the first NF vertical integration proof.

It demonstrates:

Runtime startup and supervised workers.
Secure ConfigBus integration.
Consensus-backed configuration persistence.
Quorum session storage with read-repair behavior.
KMS-backed encryption paths.
NACM authorization and audit.
Alarm and metrics integration.
HA recovery and failure validation.

opc-amf-lite is not a product AMF. It is a reusable SDK proof slice that downstream CNFs can study when wiring their own production crates.

Consequences

The SDK can claim that its core seams compose into an NF-shaped control-plane vertical. It cannot claim complete AMF/SMF/UPF protocol coverage from this slice.

Future NF crates should follow the integration pattern but own their procedure-specific logic, protocol fidelity, and product tests.

Evidence

crates/opc-amf-lite/
crates/opc-amf-lite/README.md
docs/implementation-status.md
docs/operator-readiness.md

ADR 0012: Diagnostics Safety and Privacy Governance

Status

Accepted

Date

2026-06-08

Context

Diagnostics, support bundles, exports, and evidence files pose a high risk of leaking sensitive subscriber identifiers (SUPI, IMSI, MSISDN), secrets, cryptographic credentials, database internals, and local filesystem paths. The SDK required a structured, fail-closed diagnostics and privacy boundary to satisfy RFC 010.

Decision

Establish a clear, multi-crate boundary for diagnostics safety and privacy governance:

Structured, Redacted Support Bundles:
- Diagnostic data is collected as structured DiagnosticEntry variants.
- Support bundles are redacted prior to serialization using redact_support_bundle.
- The engine cleans sensitive subscriber identifiers, IPs, SPIFFE IDs, JWTs, paths, database errors, and secrets, producing a RedactionSummary.
- Unknown or unsafe attachments fail closed in Production mode.
Declarative Retention & Legal Holds:
- RetentionPolicy schema in opc-data-governance dictates retention duration, data class, and disposal action.
- Policies validate durational boundaries and block deletion/disposal decisions when a legal hold flag is active.
Classification-Preserving Exports:
- ExportedItem in opc-export encapsulates the payload and ExportMetadata.
- Production validation rejects raw sensitive payloads unless they are encrypted.
Analytics Minimization:
- MinimizationPolicy in opc-privacy enforces k-anonymity cohort sizing thresholds, binning, and subscriber ID digest hashing.
- Cohorts below the threshold or direct identifiers are rejected.
Data-Governance Evidence Gating:
- Release gates require DataGovernanceEvidenceReport validation.
- The evaluator parses the report and scans it to ensure no absolute paths, credentials, or raw IPs are present.

Consequences

Diagnostic attachments and support bundles cannot silently leak raw sensitive identifiers or secrets in Production mode.
Downstream CNFs can safely collect support bundles and perform analytics exports without violating privacy regulations.
Data-governance compliance is automatically checked and enforced at release compile/gate time.

Evidence

crates/opc-redaction/src/support_bundle.rs
crates/opc-data-governance/src/retention.rs
crates/opc-export/src/lib.rs
crates/opc-privacy/src/lib.rs
crates/opc-evidence/src/data_governance.rs
crates/opc-sdk-integration/tests/privacy_governance.rs

ADR 0013: NGAP ASN.1 Strategy

Status

Accepted — amended 2026-06 with first implementation experience

Date

2026-06-11

Context

NGAP (NG Application Protocol, 3GPP TS 38.413) is required for gNodeB↔AMF and AMF↔SMF signaling. Unlike GTP-U (fixed binary headers) or PFCP (TLV IEs), NGAP is defined in ASN.1 using APER (Aligned Packed Encoding Rules). Hand-writing an APER codec is error-prone, high-maintenance, and incompatible with the SDK's goal of spec-traceable, fuzz-safe protocol code.

The SDK currently has:

opc-protocol — zero-copy codec framework with BorrowDecode/Encode
opc-proto-gtpu — GTP-U codec following the above framework
opc-proto-pfcp — PFCP codec (planned, TS 29.244)

NGAP is the next mandatory codec after PFCP, but its ASN.1 nature makes it structurally different from the existing binary codecs.

Decision

We will not hand-write NGAP APER parsing or code-generation.

Instead, we will evaluate and adopt a maintained Rust ASN.1 / APER toolchain that can consume the 3GPP ASN.1 modules directly. The evaluation criteria are:

MSRV 1.81 compatibility — must compile on the SDK's declared MSRV.
License compatibility — Apache-2.0 or MIT, no copyleft dependencies.
#![forbid(unsafe_code)] — generated and runtime code must be pure safe Rust.
Fuzzability — the generated codec must integrate with cargo-fuzz and tolerate hostile inputs without panics.
Maintenance risk — actively maintained, responsive to security issues, ideally with existing 3GPP or telecom user base.

Options Evaluated

Option A: `hampi` / `rasn` ecosystem

hampi (GitHub: repnop/hampi) — ASN.1 compiler generating Rust structs with APER/UPER/OER support.
rasn (GitHub: XAMPPRocky/rasn) — runtime ASN.1 codec library with derive macros.

Pros: Pure Rust, no_std capable, active development, Apache-2.0. Cons: hampi's APER support is partial (v0.x); no proven 3GPP NGAP corpus yet; smaller community than protobuf alternatives. Verdict: Leading candidate. Requires a spike to compile 3GPP R18 NGAP ASN.1 modules and validate against known-good PCAPs.

Option B: Generated code from `asn1-codecs` (ERI framework)

The asn1-codecs family (used by some telecom OSS projects) generates Rust from ASN.1 via an intermediate representation.

Pros: Explicitly designed for telecom ASN.1 modules. Cons: Mixed maintenance status; some forks carry unsafe code; licensing unclear on some forks; heavy dependency tree. Verdict: Fallback if Option A fails the spike. Requires legal review of upstream license before adoption.

Option C: FFI to `srsRAN` / `OAI` C NGAP codec

Reuse the established C NGAP implementations from srsRAN or OpenAirInterface.

Pros: Battle-tested against live networks; spec-complete. Cons: FFI requires unsafe blocks, violating the SDK's #![forbid(unsafe_code)] invariant. Cross-compilation for musl/target environments adds complexity. Memory-safety bugs in C code become SDK security issues. Verdict: Rejected. The forbid(unsafe_code) constraint is architectural and non-negotiable for a carrier-grade CNF security substrate.

Option D: Hand-written subset

Implement only NGSetupRequest/Response and InitialUEMessage by hand and omit the rest.

Pros: Zero new dependencies; full control over decode limits and fuzzing. Cons: Maintenance nightmare on every 3GPP release; no spec-traceability to ASN.1 modules; high bug rate. Verdict: Rejected. The SDK explicitly rejected hand-written ASN.1 for NGAP at the architecture level.

Recommendation

Proceed with Option A (hampi/rasn).

Phased plan:

Spike (v0.2.x follow-up): Compile 3GPP R18 NGAP ASN.1 modules with hampi/rasn, generate structs, and validate against a small corpus of known-good NGAP PDUs (extracted from 3GPP test specifications or opc-testbed fixtures).
Subset crate (v0.3.0): Create opc-proto-ngap wrapping only NGSetupRequest/Response and InitialUEMessage to prove the integration pattern with opc-protocol's decode-context limits.
Full message surface (v0.4.0+): Expand to the full NGAP message and IE surface required by the AMF-lite reference implementation.

Consequences

The SDK gains a maintainable, spec-traceable NGAP codec path.
Downstream NF operators must accept a generated-code dependency (acceptable given the alternative of FFI or hand-written bugs).
If hampi/rasn fails the spike, we fall back to Option B with a license review gate.

Implementation experience (2026-06)

The first opc-proto-ngap attempt followed the phased plan and stalled at step 1 on toolchain compatibility, not on the codec approach itself:

rasn (0.22 and 0.25) failed the then-declared MSRV of 1.81. Its derive implementation transitively requires uuid ^1.11, which resolves to a getrandom release whose manifest uses edition2024 — unparseable by Cargo 1.81. No pinning escape existed within rasn's requirements.
Investigating the failure exposed that the workspace's own dependency graph had already drifted past MSRV 1.81 through the same getrandom release (reached via uuid, tempfile, and quickcheck), i.e. the MSRV declaration no longer reflected reality independent of NGAP.
hampi was not pursued: no meaningful release since 2021 and its APER encoder was still marked work-in-progress then — unacceptable abandonment risk for a protocol codec.

Consequences acted on:

The workspace MSRV was raised to 1.88, the actual floor of the resolved dependency graph (set by time; edition2024 support needs ≥ 1.85, the icu stack ≥ 1.86). This repairs the MSRV gate and removes the blocker on Option A. See ADR 0014 for the toolchain/dependency policy.
The Option A spike should be re-run against rasn on the raised MSRV before any consideration of Option B (asn1-codecs, which still carries its license-review gate per the comparison above).

Evidence

Gap register updated: GAP-PROTO-003 now records the partially closed codec boundary.
docs/implementation-status.md linked.

ADR 0014: Dependency and Toolchain Policy

Status

Accepted (amended 2026-06-12: crypto-provider scope and JWT backend, point 9)

Date

2026-06-11

Context

The SDK is the foundation for downstream CNFs with carrier security and audit requirements. Every dependency the workspace takes is inherited by every downstream NF, and several incidents during development showed that implicit policy does not survive contact with routine maintenance:

The declared MSRV silently drifted out of truth: routine lockfile updates pulled a getrandom release whose manifest requires edition2024, unparseable by the Cargo version the workspace claimed to support — and the breakage reached the graph through three independent parents (uuid, tempfile, quickcheck), one of them in the production graph.
An HTTP adapter was nearly built on a second client stack when the workspace already standardized on one.
A license gate failure appeared days after the dependency that caused it, because the gate's evidence had been captured before the dependency landed.

Decision

TLS: rustls only. No openssl/native-tls anywhere in the graph, including transitively via feature defaults (disable default-features where needed). Rationale: a single auditable TLS stack and reproducible cross-compilation, with no coupling to a system OpenSSL/native-tls library (dynamic linking, version skew). This rule targets system/dynamic crypto; vendored crypto built statically from source as part of the graph (e.g. ring, aws-lc-sys) is permitted — see point 9.
Async runtime: tokio only. No second runtime, no runtime-agnostic abstraction layers.
No gRPC stack (tonic/prost) in SDK crates. Internal transports (e.g. session replication) use hand-specified framing over the existing tokio/rustls stack; external 3GPP interfaces are HTTP/2 (hyper) or raw protocol codecs. A future exception requires an ADR, not a Cargo.toml edit. (An ASN.1 codec dependency for NGAP per ADR 0013 is the kind of exception that warrants that process.)
HTTP clients: hyper is the workspace HTTP stack. reqwest (rustls-backed, built on hyper) is tolerated in leaf adapter crates (currently opc-key-vault) but must not spread into core crates.
MSRV is the measured floor of the resolved graph, not an aspiration. Currently 1.88 (set by time). The CI msrv job compiles the whole workspace (--all-targets --all-features) on exactly the declared version; a lockfile update that raises the floor must raise rust-version, this ADR's record, and the contributor docs in the same change. Raising MSRV is acceptable for a pre-1.0 SDK; lying about it is not.
Licenses: Apache-2.0/MIT/BSD-family only, enforced by cargo deny with a curated allow-list; uncommon-but-permissive licenses are admitted as per-crate exceptions in deny.toml, never as global allows.
Every new dependency is justified in the PR description (what it replaces, why the existing stack cannot serve, license, MSRV impact).
unsafe_code = "forbid" is workspace-wide and non-negotiable, which also rules out FFI-based protocol libraries (see ADR 0013).
Cryptographic providers. rustls uses the ring provider for TLS; opc-sbi's jsonwebtoken uses the aws_lc_rs backend for JWT-SVID signature verification. Both are vendored, statically-built crypto (no system OpenSSL), consistent with point 1. aws_lc_rs is chosen over jsonwebtoken's pure-Rust rust_crypto backend because the latter pulls the rsa crate, which carries RUSTSEC-2023-0071 (the "Marvin" timing sidechannel) with no fixed release available upstream. That advisory is unreachable for our verify-only (public-key) usage — the SDK never holds or decrypts with an RSA private key — but aws_lc_rs is constant-time and keeps both security gates (cargo audit, cargo deny) green without a standing advisory exception, which matters for a security SDK whose advisory surface is inherited by every downstream consumer. Future goal: migrate JWT verification to the pure-Rust rust_crypto backend once the rsa crate ships a constant-time release (its in-progress crypto-bigint migration), dropping the aws-lc-sys/cmake build step and fully satisfying the pure-Rust ideal.

Consequences

Some integrations cost more to build (hand-rolled framing instead of tonic; hyper plumbing instead of convenience clients) in exchange for a dependency graph that downstream carriers can audit once and trust.
MSRV moves forward with the ecosystem rather than pinning old dependency lines; downstream consumers should track a recent stable toolchain.
scripts/publish-order.py --check and cargo deny check are the mechanical halves of this policy; this ADR is the rationale they enforce.

ADR 0015: Protocol Codec Conformance Policy

Status

Accepted

Date

2026-06-11

Context

The SDK ships wire codecs for 3GPP protocols (GTP-U, PFCP, NAS-5GS, with NGAP planned). Codec bugs are uniquely dangerous: an encoder and decoder written by the same hand are internally consistent, so round-trip tests pass perfectly while every byte on the wire is wrong for a real peer. This failure mode occurred twice during development — a scrambled PFCP header flag layout and a byte-swapped Outer Header Creation description field — and in both cases the existing test suite was green because the fixtures had been derived from the codec's own output.

Decision

Every protocol codec crate (opc-proto-*) MUST satisfy all of the following before it is merged, and CONFORMANCE.md must claim nothing the tests do not prove:

Spec-authored fixtures. Conformance tests include byte fixtures hand-authored from the 3GPP specification (or captured from an independent implementation), with octet-level comments citing the spec section. Fixtures derived from this codec's own encoder do not count as conformance evidence — they detect regressions, not wire-format errors.
Byte-exact round-trips. decode → encode must reproduce the input bytes exactly for every fixture, including unknown/vendor-extension elements, which must be preserved raw.
Declared canonicalization. Where a typed view legitimately normalizes (zeroing spare bits, dropping forward-compatibility trailing octets that the spec requires receivers to ignore), CONFORMANCE.md must say so explicitly, and a raw byte-preserving layer must remain available for forwarding paths.
Hostile-input safety. No panics on any input: checked arithmetic on all length/offset math, enforced decode limits (message length, element count, recursion depth), and negative tests for truncation, overflow, and depth bombs.
Fuzzing. A fuzz target over the decode surface with a seed corpus of spec-valid messages, registered in the fuzz CI workflow. The fuzz crate must compile in CI even when fuzzing is not executed.
Framework fit. Codecs implement the opc-protocol traits (BorrowDecode/OwnedDecode/Encode) and carry @spec/@req traceability tags so RFC 006 evidence tooling can index them.
CONFORMANCE.md enumerates exactly which messages, elements, and fields are covered, at which 3GPP release, and what belongs outside the codec boundary.

Consequences

Writing a codec costs more up front: authoring fixtures from the spec is slower than round-tripping the encoder. That cost is the point — it is the only test construction that catches self-consistent wire errors.
Reviews of codec changes start from the fixtures: a reviewer verifies bytes against the cited spec section before reading the implementation.
opc-proto-gtpu, opc-proto-pfcp, and opc-proto-nas conform today and serve as the templates; future codecs (NGAP per ADR 0013) inherit the same bar.

ADR 0016: Northbound gRPC Stack Exception (gNMI)

Status

Accepted

Date

2026-06-13

Context

ADR 0014 §3 states: "No gRPC stack (tonic/prost) in SDK crates. … A future exception requires an ADR, not a Cargo.toml edit." That rule keeps the core SDK dependency graph lean and auditable: internal transports use hand-specified framing over tokio/rustls, and external 3GPP interfaces are HTTP/2 (hyper) or raw protocol codecs.

The management-plane work introduces opc-gnmi-server (see docs/design/opc-gnmi-server-spec.md). gNMI (OpenConfig) is a gRPC service: its contract is a protobuf service over HTTP/2. There is no rustls/hyper-only or hand-framed path to a conformant gNMI server — a client (gnmic, gNMIc, OpenConfig collectors) speaks gRPC and nothing else. So opc-gnmi-server cannot exist without a gRPC stack, and per ADR 0014 §3 that requires this ADR.

gNMI is a distinct dependency category from the cases ADR 0014 §3 was written for. It is a northbound management interface embedded by a CNF that chooses to expose gNMI — not an internal SDK transport and not a 3GPP data-plane codec.

Decision

Permit tonic, prost, and prost-types only for the northbound gNMI server crate, opc-gnmi-server. prost-types is included because the vendored OpenConfig gNMI proto uses standard Google protobuf types such as google.protobuf.Any. tonic-build is permitted only as that crate's build-time proto-generation dependency if the Phase-0 spike chooses build-time generation. Any future gRPC-based management crate requires an explicit ADR amendment and an update to the mechanical allow-list; this exception is not a blanket "management crates may use gRPC" policy. Specifically:

Scope boundary. tonic/prost/prost-types MUST NOT appear in any core SDK crate (opc-config-bus, opc-config-model, opc-persist, opc-runtime, opc-identity, opc-tls, opc-nacm, opc-yanggen, the opc-proto-* codecs, opc-sbi, the opc-mgmt-* foundation crates, etc.). They live only in opc-gnmi-server unless this ADR is amended. ADR 0014 §3 remains in force everywhere else. Inside this SDK workspace, no other crate may depend on or re-export opc-gnmi-server; downstream CNFs outside the workspace opt in to gNMI by depending on the server crate directly.
Boundary is enforced mechanically. scripts/check-management-plane-policy.py --check asserts that no crate outside the explicit allowed set directly or transitively depends on tonic/prost/prost-types/tonic-build, or on opc-gnmi-server itself. The CI job runs this gate. The initial allowed set is exactly opc-gnmi-server.
One TLS stack only (ADR 0014 §1 preserved). opc-gnmi-server serves tonic over the rustls::ServerConfig produced by opc-mgmt-transport (ring provider), not tonic's own/native TLS. No openssl/native-tls enters the graph (verify tonic/hyper features with default-features = false, rustls only).
Dependency hygiene (ADR 0014 §6/§7). tonic/prost/prost-types are MIT/Apache — compatible with the license gate. The PR adding them justifies them per §7 and passes cargo deny. The pinned tonic version MUST compile on the workspace MSRV (currently 1.88, ADR 0014 §5); the Phase-0 spike validates this before the version is pinned, and any MSRV bump follows the §5 process.
Proto pin and generation mode. The gNMI proto is vendored at an exact tag under crates/opc-gnmi-server/proto/; the vendored files carry the upstream tag/commit in their header, and the advertised gNMI version string derives from this pin. The Phase-0 spike must choose and document exactly one generation mode:
- build-time generation with tonic-build, which adds an explicit protoc build prerequisite and a CI check that generated output is reproducible; or
- checked-in generated Rust, which avoids protoc in downstream builds but requires a regeneration script and a CI drift check. In either mode, generated service code is treated as part of the opc-gnmi-server boundary and does not become a shared SDK dependency.
This exception does not generalize. It authorizes a gRPC server for a northbound management protocol that is gRPC by definition. It is not license to adopt gRPC for internal transports or to relax ADR 0014 §3 for core crates.

Consequences

A downstream CNF outside this workspace that embeds opc-gnmi-server inherits tonic/prost/prost-types. That is an explicit opt-in to gNMI; CNFs that do not expose gNMI never pull the stack.
The core SDK graph stays gRPC-free and auditable, exactly as ADR 0014 §3 intends; only the optional northbound server adds gRPC.
The mechanical gate from point 2 exists and runs in CI, so this exception's scope cannot silently erode — the same "implicit policy does not survive maintenance" lesson that motivated ADR 0014.
NETCONF (opc-netconf-server) is unaffected: it is XML over SSH/TLS and needs no gRPC stack.

ADR 0017: SCTP Transport Strategy and Unsafe-FFI Sys-Crate Boundary

Status

Accepted

Date

2026-06-13

Context

ADR 0014 §8 states unsafe_code = "forbid" is workspace-wide and "non-negotiable, which also rules out FFI-based protocol libraries (see ADR 0013)." ADR 0013 rejected Option C — FFI to the srsRAN/OAI C NGAP codec — because foreign C code parsing attacker-controlled bytes turns memory-safety bugs into SDK security issues.

opc-sctp is required for CNFs that terminate N2/NGAP or other SCTP interfaces. Unlike NGAP, SCTP is not a codec — it is an OS transport. Linux implements SCTP in the kernel (lksctp); a userspace program reaches it through SCTP sockets: socket(AF_INET, SOCK_STREAM|SOCK_SEQPACKET, IPPROTO_SCTP), SCTP setsockopt options, sendmsg/recvmsg with SCTP control messages, and, where necessary, thin libsctp helper calls such as bind/send/receive variants over the same kernel SCTP UAPI. Rust's std and tokio expose no SCTP socket API, so reaching kernel SCTP requires libc/UAPI FFI, which is unsafe. ADR 0014 §8 was written for protocol codec libraries and did not anticipate an OS-transport syscall surface.

The distinction is decisive:

ADR 0013's rejected FFI links a large foreign C parser (thousands of lines) that consumes attacker-controlled wire bytes. The attack surface is the C code itself.
SCTP FFI is a thin wrapper over kernel socket UAPI and optional libsctp helper functions that themselves configure or call the kernel SCTP stack. The SCTP protocol implementation is the kernel — already trusted, exactly as for TCP/UDP. This is the same category of unsafe that tokio/mio already use internally for socket I/O in the workspace. The "foreign C parsing attacker bytes" risk ADR 0013 guarded against simply is not present.

Options

A. Kernel SCTP behind a narrow opc-libsctp-sys sys crate. Thin libc/SCTP-UAPI FFI in one crate, including libsctp helpers only where the Linux SCTP API requires them; a safe opc-sctp wrapper above it. Linux-only.
B. Userspace SCTP stack (pure Rust). Reimplement the SCTP transport protocol with no FFI. Rejected: a from-scratch transport-protocol implementation is large and security-sensitive (association state machine, retransmission, multihoming, chunk bundling) and is more likely to harbor exploitable bugs than thin syscall FFI over the hardened kernel stack; no maintained pure-Rust SCTP stack exists to adopt.
C. Omit SCTP from the SDK. Ship no SCTP transport. Acceptable only if the first production CNF does not terminate N2/NGAP or any SCTP interface; it blocks N2-terminating CNFs.

Decision

Amend ADR 0014 §8 to permit a narrow, explicitly allowlisted unsafe exception pattern for Linux kernel UAPI sys crates, and adopt Option A when an SCTP-terminating CNF is in scope:

opc-libsctp-sys provides thin FFI over Linux SCTP socket UAPI and minimal libsctp helpers where required. It is the only SCTP workspace crate permitted to contain unsafe; follow-on Linux kernel UAPI exceptions such as opc-linux-xfrm-sys must be separately and explicitly allowlisted by the same mechanical gate. It does not inherit [workspace.lints] (so the workspace-wide unsafe_code = "forbid" stays in force for every other crate); it sets its own local crate policy (unsafe_code = "allow" plus unsafe_op_in_unsafe_fn = "deny", or equivalent crate attributes) that allows unsafe only there, with a // SAFETY: comment required on every allowed unsafe token (unsafe block, unsafe fn, unsafe impl, unsafe trait, or unsafe extern block).
opc-sctp (the public crate) is #![forbid(unsafe_code)] and exposes only safe async abstractions (associations, messages, events) over the sys crate, integrated with tokio::io::unix::AsyncFd (the spec's async model). Its manifest must declare the tokio features it relies on, including net, instead of relying on feature unification from unrelated workspace crates.
Boundary is enforced mechanically. scripts/check-management-plane-policy.py --check token-scans OpenPacketCore workspace crate sources and asserts unsafe appears only in explicitly allowlisted Linux UAPI sys crates (opc-libsctp-sys and later, reviewed kernel-UAPI boundaries such as opc-linux-xfrm-sys); the same gate also rejects each allowed sys crate if it inherits [workspace.lints], rejects it if it lacks the required local unsafe lint policy, and requires each allowed unsafe token in that sys crate to be documented by an adjacent SAFETY: comment. The CI job runs this gate, so the exception cannot silently spread or become undocumented.
ABI safety. Every C struct crossing the boundary has a struct-layout (size/alignment/offset) test; the sys crate builds on Linux in CI and compiles to a clean "unsupported platform" stub elsewhere.
This exception pattern does not reopen ADR 0013. It authorizes FFI only to explicitly reviewed trusted Linux kernel UAPI boundaries such as SCTP socket/XFRM netlink calls and minimal helper calls that wrap those UAPIs. FFI that links a foreign C protocol codec (parsing attacker-controlled bytes — NGAP/NAS/etc.) remains rejected; those stay pure-Rust per ADR 0013/0015.
SCTP is implemented per Option A behind this boundary, never as scattered unsafe and never as a userspace reimplementation without revisiting this ADR.

Consequences

The workspace gains small, auditable OpenPacketCore Linux UAPI sys crates containing unsafe; downstream carrier auditors review those explicitly allowlisted sys crates rather than a diffuse unsafe surface, and unsafe_code = "forbid" remains true everywhere else.
The CI gate from point 3 exists, mirroring the "policy must be mechanically enforced" lesson of ADR 0014.
opc-sctp uses the non-inheritance mechanism and AsyncFd model described by this ADR. Its README and tests record the current capability profile.
NGAP-over-SCTP wiring (PPID 60) is separate integration work and is not authorized to use FFI for the NGAP codec itself.

OPC gNMI Server Design Spec

Status

Implemented foundation, owned by opc-gnmi-server.

Scope

opc-gnmi-server is the optional northbound gNMI server for CNFs that choose to expose OpenConfig management. It is outside the core SDK dependency graph and is the only workspace crate allowed to depend on tonic, prost, prost-types, or tonic-build.

The crate owns:

vendored gNMI protobuf bindings and the tonic service wrapper;
authenticated gNMI-over-TLS listener integration;
Capabilities, Get, Set, and Subscribe handling;
OpenPacketCore commit-confirmed registered extension semantics;
gNMI master-arbitration enforcement;
schema-backed path, value, audit, metrics, and config-bus integration.

Security Contract

Production embeddings must construct GnmiServer with an explicit audit sink through new, new_with_audit, new_with_arbitration, or new_with_audit_and_arbitration. The tracing audit sink is available only through *_dev_only constructors for tests, conformance fixtures, and local development.

GnmiService::new requires an authenticated transport principal on every RPC. The unauthenticated service wrapper is crate-private and compiled only for tests. Runtime listeners must derive principals from the mTLS transport and attach them to requests before dispatch.

Set commits submit complete candidates to opc-config-bus with the running snapshot version they were built from. opc-config-bus enforces that base version for candidate-bearing requests, so a stale gNMI Set cannot overwrite an intervening commit.

Extension Semantics

The OpenPacketCore commit-confirmed extension uses the experimental registered extension ID documented in opc-gnmi-server. It is advertised only when the extension registry enables it and master arbitration is also configured.

Every commit-confirmed Begin, Confirm, or Cancel Set must carry a valid master-arbitration extension. This binds control actions to the gNMI election fence for the tenant and role, preventing a different writer from confirming or cancelling another writer's pending commit unless it wins arbitration first.

Servers with arbitration disabled reject commit-confirmed registration at construction time.

Dependency Boundary

ADR 0016 permits the gRPC stack only in opc-gnmi-server. The CI policy script must continue to enforce that:

no other workspace crate depends on tonic, prost, prost-types, or tonic-build;
no other workspace crate depends on or re-exports opc-gnmi-server;
all gNMI TLS serving uses the shared rustls configuration built by the OPC management transport stack.

Verification

The gNMI foundation is covered by crate tests for:

authenticated Capabilities, Get, Set, and Subscribe behavior;
Set stale-candidate rejection after intervening commits;
commit-confirmed timeout, confirm, cancel, malformed payload, and missing arbitration cases;
master-arbitration election, tenant, and role fencing;
listener mTLS principal derivation and max-session bounds;
extension payload redaction in status, metrics, and audit paths.

OpenPacketCore SDK Documentation