WDP Part 5: Compact IDs Specification

This document specifies the Compact ID generation mechanism for the Waddling Diagnostic Protocol (WDP). A Compact ID constitutes the fifth component of the complete WDP error code structure.

Abstract#

This document specifies the Compact ID generation mechanism for the Waddling Diagnostic Protocol (WDP). A Compact ID constitutes the fifth component of the complete WDP error code structure. This specification defines the transformation of the four-part structured code (SEVERITY.COMPONENT.PRIMARY.SEQUENCE) into a deterministic, hash-based identifier optimized for efficient logging, network transmission, and catalog lookup. The Compact ID is derived exclusively from the current structured code; modifications to the structured code produce a correspondingly different Compact ID.

Specification Navigation: Refer to STRUCTURE.md for an overview of all WDP specification documents.

Status of This Memo

This document specifies a standards track protocol for the WDP community and requests discussion and suggestions for improvements. Distribution of this memo is unlimited.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

1. Introduction#

1.1 Purpose

WDP Compact IDs are the fifth part of the complete WDP error code structure:

Complete WDP Structure:

Severity.Component.Primary.Sequence -> CompactID
    ^         ^        ^        ^          ^
  Part 1    Part 2   Part 3   Part 4    Part 5

This Document Focus:

Severity.Component.Primary.Sequence -> CompactID
                                       ^^^^^^^^^
                                       Part 5: Short hash-based identifier (THIS DOCUMENT)

The first four parts constitute the structured code; the Compact ID is derived from this structured code.

Compact IDs address the requirement for efficient error information transmission across networks while preserving full context and traceability.

Full structured codes (e.g., E.AUTH.TOKEN.001) exhibit the following characteristics:

Human-readable format
Self-documenting structure
Searchable content

However, full structured codes present certain limitations:

Verbosity (17 characters minimum)
Increased bandwidth consumption in high-volume systems

Compact IDs (e.g., V6a0B) provide the following properties:

Reduced size (5 characters, representing approximately 70% reduction)
Deterministic output (identical input produces identical output)
Collision resistance (916,132,832 possible combinations)
Safe for use in URLs, JSON documents, and filenames
Efficient generation and comparison
Dynamic derivation (modifications to source code produce different identifiers)

1.2 Use Cases

Constrained Network Devices:

Sensor transmits: {"h":"V6a0B","t":45.2} (12 bytes)
Gateway performs catalog expansion to obtain full error context

Mobile Applications:

1. Application downloads catalog once (approximately 50KB, cacheable)
2. API returns compact error representations (approximately 40 bytes each)
3. Application performs local expansion (offline-capable)

Log Aggregation:

Microservice logs: [V6a0B] Request timeout
Centralized system performs cross-instance search by Compact ID

Heterogeneous Language Environments:

Rust backend transmits: {"h":"V6a0B"}
TypeScript frontend receives and expands via shared catalog

1.3 Design Goals

The Compact ID mechanism is designed to satisfy the following requirements:

Determinism: Identical error codes MUST produce identical Compact IDs
Dynamic Derivation: Compact IDs MUST be computed from current structured codes
Collision Resistance: The probability of hash conflicts MUST be minimized
Cross-Language Consistency: Hash computation MUST produce identical results across all implementations
Computational Efficiency: Generation time MUST be sub-microsecond
Fixed Length: Output MUST be exactly 5 characters
Character Safety: Output MUST be safe for URLs, JSON, and filenames without escaping

1.4 Conformance Levels

An implementation is considered conformant at Level 1 (Standard) if it correctly implements all requirements marked with MUST, REQUIRED, or SHALL in this specification, in addition to achieving Level 0 (Error Codes) conformance as defined in 1-SEVERITY.md.

2. Overview#

2.1 Process Flow

The compact ID is derived from the four-part structured code:

┌─────────────────────────────────────────┐
│  Structured Code (Parts 1-4)            │
│  "E.AUTH.TOKEN.001"                     │
└──────────────┬──────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────┐
│  Step 1: Input Normalization            │
│  - Validate format                      │
│  - Encode as UTF-8 bytes                │
└──────────────┬──────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────┐
│  Step 2: Hash Generation                │
│  - Apply xxHash3 algorithm              │
│  - Use seed: 0x000031762D706477         │
│  - Output: 64-bit unsigned integer      │
└──────────────┬──────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────┐
│  Step 3: Bit Extraction                 │
│  - Extract bytes 3-7 (40 bits)          │
│  - Output: 40-bit unsigned integer      │
└──────────────┬──────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────┐
│  Step 4: Base62 Encoding                │
│  - Convert 40-bit value to base62       │
│  - Pad to exactly 5 characters          │
└──────────────┬──────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────┐
│  Compact ID (Part 5)                    │
│  "V6a0B"                                │
└─────────────────────────────────────────┘

Complete WDP Error Code:
[E.AUTH.TOKEN.001] -> V6a0B

2.2 Example

Input:  "E.AUTH.TOKEN.001" (structured code, parts 1-4)
        | Normalize (UTF-8 bytes)
        v
Bytes:  [69, 46, 65, 85, 84, 72, 46, 84, 79, 75, 69, 78, 46, 48, 48, 49]
        | Hash (xxHash3, seed=0x000031762D706477)
        v
Hash:   [64-bit value]
        | Extract bytes 3-7 (40 bits)
        v
Value:  [40-bit value]
        | Encode (Base62, pad to 5 chars)
        v
Output: "V6a0B" (Compact ID, part 5)

Complete: [E.AUTH.TOKEN.001] -> V6a0B

Note that modifications to the structured code produce different Compact IDs:

Original: [E.AUTH.TOKEN.001] -> V6a0B
Modified: [E.AUTHENTICATION.TOKEN.001] -> Bz93k

The Compact ID is derived dynamically from the current structured code.

3. Input Normalization#

3.1 Purpose

Input normalization ensures that all implementations produce identical byte sequences from the same structured error code (parts 1-4), regardless of platform, language, or locale. This guarantees that identical structured codes produce identical Compact IDs.

3.2 Input Validation

Prior to normalization, implementations MUST validate that the input conforms to a valid WDP structured code (parts 1-4) as specified in the Error Codes specification (1-ERROR-CODES.md).

Validation Requirements:

Input MUST match the format: SEVERITY.COMPONENT.PRIMARY.SEQUENCE
The severity field MUST be one of: E, W, C, B, S, K, I, T, H
The component and primary fields MUST conform to PascalCase conventions (or any case if normalization is applied)
The sequence field MUST be numeric (3 digits) or named (SCREAMING_SNAKE_CASE)

Invalid inputs MUST be rejected prior to hashing.

3.3 String Preparation

Step 1: Whitespace Handling

Implementations MUST remove leading and trailing whitespace from the input string prior to processing.

Input:  "  E.Auth.Token.001  "
Result: "E.Auth.Token.001"

Step 2: Named Sequence Resolution

If the sequence field contains a named sequence (e.g., MISSING), implementations MUST resolve it to its numeric canonical form prior to hashing.

Input:    E.Auth.Token.MISSING
Resolved: E.Auth.Token.001

Step 3: Uppercase Normalization

The entire structured code MUST be converted to uppercase prior to hashing.

Input:      E.Auth.Token.001
Normalized: E.AUTH.TOKEN.001

3.4 Normalization Example

The following demonstrates the complete normalization process:

Step 0: Input (as received)
  "  E.Auth.Token.MISSING  "

Step 1: Remove whitespace
  "E.Auth.Token.MISSING"

Step 2: Resolve named sequence (if applicable)
  "E.Auth.Token.001"

Step 3: Convert to uppercase
  "E.AUTH.TOKEN.001"

Step 4: UTF-8 encode
  [0x45, 0x2E, 0x41, 0x55, 0x54, 0x48, 0x2E, 0x54, 0x4F, 0x4B, 0x45, 0x4E, 0x2E, 0x30, 0x30, 0x31]

Result: Prepared for hashing

The following inputs produce identical hash values:

E.Auth.Token.001           -> E.AUTH.TOKEN.001 -> identical hash
E.AUTH.TOKEN.001           -> E.AUTH.TOKEN.001 -> identical hash
e.auth.token.001           -> E.AUTH.TOKEN.001 -> identical hash
E.auth.Token.001           -> E.AUTH.TOKEN.001 -> identical hash
E.Auth.Token.MISSING       -> E.AUTH.TOKEN.001 -> identical hash (if MISSING maps to 001)

3.5 UTF-8 Encoding

The validated, resolved, and normalized input string MUST be encoded as UTF-8 bytes.

Character Encoding: UTF-8 as specified in [RFC 3629]
Byte Order: Network byte order (big-endian)

3.6 Edge Cases

Empty String:

Input: ""
Behavior: Implementations MUST reject this input (invalid WDP code)

Non-ASCII Characters:

Input:  "E.テスト.TOKEN.001"
Result: Implementations MUST reject this input (invalid component format)

4. Hash Algorithm#

4.1 Required Algorithm: xxHash3

Implementations MUST use xxHash3 (xxh3_64) as the hashing algorithm for Compact ID generation.
Seed: 0x000031762D706477 (ASCIII "wdp-v1\0\0" little-endian)

4.2 Algorithm Selection Rationale

Criterion	xxHash3 Characteristics
Throughput	Approximately 30 GB/s (2x faster than xxHash64)
Small Input Performance	Optimized for 1-128 byte inputs
Determinism	Identical input produces identical output
Cross-Language Support	Available in all major programming languages
Distribution Quality	Excellent distribution for short inputs
Specification Stability	Frozen specification since 2021
Security Classification	Non-cryptographic (appropriate for identification)

4.3 Rationale for Single Algorithm Mandate

Universal Interoperability:

System A (xxHash3): E.Auth.Token.001 -> V6a0B
System B (xxHash3): E.Auth.Token.001 -> V6a0B
System C (xxHash3): E.Auth.Token.001 -> V6a0B

4.4 Security Considerations

Compact IDs are designed for IDENTIFICATION purposes, not AUTHENTICATION. xxHash3 is a non-cryptographic hash function.

Layered Security Architecture:

Security Requirement	Recommended Solution	Architectural Layer
Transport security	HTTPS/TLS	Transport
Message integrity	JWT/JWS signing	Application
Catalog integrity	Ed25519 signature on catalog file	Distribution
Authentication	OAuth/API keys	Application

4.5 Hash Parameters

Input:  UTF-8 encoded byte array
Seed:   0x000031762D706477 (u64) - ASCII "wdp-v1\0\0" interpreted as little-endian
Output: 64-bit unsigned integer (u64)

4.5.1 Seed Conversion for u64-Based APIs (NORMATIVE)

Seed String:        "wdp-v1"
Seed Bytes (6):     [0x77, 0x64, 0x70, 0x2D, 0x76, 0x31]
Zero-Padded (8):    [0x77, 0x64, 0x70, 0x2D, 0x76, 0x31, 0x00, 0x00]
Interpretation:     Little-Endian
u64 Value (hex):    0x000031762D706477
u64 Value (dec):    13891256219767

4.5.2 Cross-Language Seed Snippets (NORMATIVE)

Rust:

const WDP_SEED: u64 = 0x000031762D706477;
let hash = xxh3_64_with_seed(input_bytes, WDP_SEED);

Python:

WDP_SEED_U64 = 0x000031762D706477
hash_value = xxhash.xxh3_64(input_bytes, seed=WDP_SEED_U64)

TypeScript:

const WDP_SEED = 0x000031762D706477n;
const hash = xxh3.h64(inputBytes, WDP_SEED);

5. Base62 Encoding#

5.1 Purpose

Base62 encoding converts the 64-bit hash value into a compact, human-readable string using exclusively alphanumeric characters.

5.2 Base62 Alphabet

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

Character Set Composition: Digits (0-9), Uppercase (10-35), Lowercase (36-61).

5.3 Bit Extraction

The 64-bit hash output from xxHash3 MUST be truncated to 40 bits prior to Base62 encoding (bytes 3-7).

function extract_40_bits(hash: u64) -> u64:
    value = (hash >> 24) & 0xFF_FFFF_FFFF
    return value

5.4 Encoding Algorithm

function to_base62(value: u64) -> String:
    alphabet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
    base = 62
    // ... encoding logic ...
    // Pad to exactly 5 characters
    return result.join()

Worked Example:

Input:  15379485234 (u64)
Result: "FkRWm"

5.5 Output Format

Length: Exactly 5 characters.
Padding: Left-pad with '0'.

Length	Combinations	Collision Prob (2K codes)
5 chars	916M	0.22%
6 chars	56B	0.004%

6. Collision Handling#

6.1 Collision Probability

With 5-character Base62 encoding, collision probability follows the birthday paradox.

Number of Error Codes	Collision Probability	Recommendation
100	0.0005%	No action required
2,000	0.22%	Suitable for most projects
5,000	1.36%	Implement collision monitoring
10,000	5.5%	Implement collision detection

6.3 Collision Resolution Strategies

Strategy 1: Modify Sequence Number (Recommended)

E.Auth.Token.001 -> V6a0B (collision)
E.Auth.Token.002 -> Ab3Cd (no collision)

Strategy 2: Document in Catalog

7. Implementation Guide#

7.1 Core Function

compute_compact_id(input: String) -> String

7.2 Reference Implementation (Rust)

Rust
pub fn compute_compact_id(input: &str) -> Result<String, Error> {
    if !is_valid_wdp_code(input) { return Err(Error::InvalidCode); }
    let normalized = input.trim().to_uppercase();
    let hash = xxh3_64_with_seed(normalized.as_bytes(), WDP_SEED);
    let extracted = (hash >> 24) & 0xFF_FFFF_FFFF;
    Ok(to_base62(extracted))
}

7.3 Python Implementation

Python
def compute_compact_id(input_str: str) -> str:
    normalized = input_str.strip().upper()
    # ... hash and encode ...

8. Test Vectors#

8.1 Purpose

Test vectors ensure that all implementations produce identical compact IDs for identical input. Implementations MUST pass all test vectors to claim conformance.

8.2 Basic Test Vectors

Input (Structured Code)	Expected Compact ID	Description
E.AUTH.TOKEN.001	V6a0B	Basic error code
E.AUTH.TOKEN.MISSING	V6a0B	Named sequence (resolves to 001)
W.DATABASE.CONNECTION.027	KF52S	Warning with extended components
E.A.B.001	l3i4I	Minimal valid code
T.PROFILER.TIMER.999	fnOQk	Trace with high sequence number

Note: See test-vectors/data/compact-ids.json for authoritative test vectors verified across multiple language implementations.

Modifications to the input code require hash regeneration:

E.AUTH.TOKEN.001 -> V6a0B (original)
E.AUTH.TOKEN.002 -> 35Jkp (modified code produces different hash)

8.3 Edge Case Test Vectors

Input	Expected Compact ID	Description
E.AUTH.TOKEN.001	V6a0B	Standard case
E.AUTH.TOKEN.002	35Jkp	Sequential codes (different hashes)
I.HTTP2SERVER.REQUEST.001	Unzd9	Digits in component

Note: See test-vectors/data/compact-ids.json for additional test cases.

9. Cross-Language Compatibility#

9.1 Requirements

Algorithm: xxHash3 (64-bit)
Seed: 0x000031762D706477
String Encoding: UTF-8
Base62 Alphabet: Standard alphanumeric

10. Specification Summary#

10.1 Core Requirements

Implementations MUST use xxHash3 with the specified seed and produce exactly 5 Base62 characters.

10.2 Key Design Decisions

xxHash3 was chosen for its performance on small inputs and universal availability.

11. Versioning and Future Compatibility#

11.1 Hash Algorithm Immutability

WDP hash parameters are IMMUTABLE and MUST NOT be modified.

11.2 Future Formats

Future versions (v2, v3) SHALL use prefixes (e.g., 2.V6a0B).