DocumentWDP-5
TitleCompact IDs Specification
Version0.1.0-draft
StatusDraft
CategoryStandards Track (Core)
Created2025-11-30
Updated2025-12-25

WDP Part 5: Compact IDs Specification

This document specifies the Compact ID generation mechanism for the Waddling Diagnostic Protocol (WDP). A Compact ID constitutes the fifth component of the complete WDP error code structure.

Abstract#

This document specifies the Compact ID generation mechanism for the Waddling Diagnostic Protocol (WDP). A Compact ID constitutes the fifth component of the complete WDP error code structure. This specification defines the transformation of the four-part structured code (SEVERITY.COMPONENT.PRIMARY.SEQUENCE) into a deterministic, hash-based identifier optimized for efficient logging, network transmission, and catalog lookup. The Compact ID is derived exclusively from the current structured code; modifications to the structured code produce a correspondingly different Compact ID.

Specification Navigation: Refer to STRUCTURE.md for an overview of all WDP specification documents.

Status of This Memo

This document specifies a standards track protocol for the WDP community and requests discussion and suggestions for improvements. Distribution of this memo is unlimited.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

1. Introduction#

1.1 Purpose

WDP Compact IDs are the fifth part of the complete WDP error code structure:

Complete WDP Structure:

Severity.Component.Primary.Sequence -> CompactID
    ^         ^        ^        ^          ^
  Part 1    Part 2   Part 3   Part 4    Part 5

This Document Focus:

Severity.Component.Primary.Sequence -> CompactID
                                       ^^^^^^^^^
                                       Part 5: Short hash-based identifier (THIS DOCUMENT)

The first four parts constitute the structured code; the Compact ID is derived from this structured code.

Compact IDs address the requirement for efficient error information transmission across networks while preserving full context and traceability.

Full structured codes (e.g., E.AUTH.TOKEN.001) exhibit the following characteristics:

  • Human-readable format
  • Self-documenting structure
  • Searchable content

However, full structured codes present certain limitations:

  • Verbosity (17 characters minimum)
  • Increased bandwidth consumption in high-volume systems

Compact IDs (e.g., V6a0B) provide the following properties:

  • Reduced size (5 characters, representing approximately 70% reduction)
  • Deterministic output (identical input produces identical output)
  • Collision resistance (916,132,832 possible combinations)
  • Safe for use in URLs, JSON documents, and filenames
  • Efficient generation and comparison
  • Dynamic derivation (modifications to source code produce different identifiers)

1.2 Use Cases

Constrained Network Devices:

Sensor transmits: {"h":"V6a0B","t":45.2} (12 bytes)
Gateway performs catalog expansion to obtain full error context

Mobile Applications:

1. Application downloads catalog once (approximately 50KB, cacheable)
2. API returns compact error representations (approximately 40 bytes each)
3. Application performs local expansion (offline-capable)

Log Aggregation:

Microservice logs: [V6a0B] Request timeout
Centralized system performs cross-instance search by Compact ID

Heterogeneous Language Environments:

Rust backend transmits: {"h":"V6a0B"}
TypeScript frontend receives and expands via shared catalog

1.3 Design Goals

The Compact ID mechanism is designed to satisfy the following requirements:

  • Determinism: Identical error codes MUST produce identical Compact IDs
  • Dynamic Derivation: Compact IDs MUST be computed from current structured codes
  • Collision Resistance: The probability of hash conflicts MUST be minimized
  • Cross-Language Consistency: Hash computation MUST produce identical results across all implementations
  • Computational Efficiency: Generation time MUST be sub-microsecond
  • Fixed Length: Output MUST be exactly 5 characters
  • Character Safety: Output MUST be safe for URLs, JSON, and filenames without escaping

1.4 Conformance Levels

An implementation is considered conformant at Level 1 (Standard) if it correctly implements all requirements marked with MUST, REQUIRED, or SHALL in this specification, in addition to achieving Level 0 (Error Codes) conformance as defined in 1-SEVERITY.md.

2. Overview#

2.1 Process Flow

The compact ID is derived from the four-part structured code:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Structured Code (Parts 1-4)            โ”‚
โ”‚  "E.AUTH.TOKEN.001"                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚
               โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Step 1: Input Normalization            โ”‚
โ”‚  - Validate format                      โ”‚
โ”‚  - Encode as UTF-8 bytes                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚
               โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Step 2: Hash Generation                โ”‚
โ”‚  - Apply xxHash3 algorithm              โ”‚
โ”‚  - Use seed: 0x000031762D706477         โ”‚
โ”‚  - Output: 64-bit unsigned integer      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚
               โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Step 3: Bit Extraction                 โ”‚
โ”‚  - Extract bytes 3-7 (40 bits)          โ”‚
โ”‚  - Output: 40-bit unsigned integer      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚
               โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Step 4: Base62 Encoding                โ”‚
โ”‚  - Convert 40-bit value to base62       โ”‚
โ”‚  - Pad to exactly 5 characters          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚
               โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Compact ID (Part 5)                    โ”‚
โ”‚  "V6a0B"                                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Complete WDP Error Code:
[E.AUTH.TOKEN.001] -> V6a0B

2.2 Example

Input:  "E.AUTH.TOKEN.001" (structured code, parts 1-4)
        | Normalize (UTF-8 bytes)
        v
Bytes:  [69, 46, 65, 85, 84, 72, 46, 84, 79, 75, 69, 78, 46, 48, 48, 49]
        | Hash (xxHash3, seed=0x000031762D706477)
        v
Hash:   [64-bit value]
        | Extract bytes 3-7 (40 bits)
        v
Value:  [40-bit value]
        | Encode (Base62, pad to 5 chars)
        v
Output: "V6a0B" (Compact ID, part 5)

Complete: [E.AUTH.TOKEN.001] -> V6a0B

Note that modifications to the structured code produce different Compact IDs:

Original: [E.AUTH.TOKEN.001] -> V6a0B
Modified: [E.AUTHENTICATION.TOKEN.001] -> Bz93k

The Compact ID is derived dynamically from the current structured code.

3. Input Normalization#

3.1 Purpose

Input normalization ensures that all implementations produce identical byte sequences from the same structured error code (parts 1-4), regardless of platform, language, or locale. This guarantees that identical structured codes produce identical Compact IDs.

3.2 Input Validation

Prior to normalization, implementations MUST validate that the input conforms to a valid WDP structured code (parts 1-4) as specified in the Error Codes specification (1-ERROR-CODES.md).

Validation Requirements:

  • Input MUST match the format: SEVERITY.COMPONENT.PRIMARY.SEQUENCE
  • The severity field MUST be one of: E, W, C, B, S, K, I, T, H
  • The component and primary fields MUST conform to PascalCase conventions (or any case if normalization is applied)
  • The sequence field MUST be numeric (3 digits) or named (SCREAMING_SNAKE_CASE)

Invalid inputs MUST be rejected prior to hashing.

3.3 String Preparation

Step 1: Whitespace Handling

Implementations MUST remove leading and trailing whitespace from the input string prior to processing.

Input:  "  E.Auth.Token.001  "
Result: "E.Auth.Token.001"

Step 2: Named Sequence Resolution

If the sequence field contains a named sequence (e.g., MISSING), implementations MUST resolve it to its numeric canonical form prior to hashing.

Input:    E.Auth.Token.MISSING
Resolved: E.Auth.Token.001

Step 3: Uppercase Normalization

The entire structured code MUST be converted to uppercase prior to hashing.

Input:      E.Auth.Token.001
Normalized: E.AUTH.TOKEN.001

3.4 Normalization Example

The following demonstrates the complete normalization process:

Step 0: Input (as received)
  "  E.Auth.Token.MISSING  "

Step 1: Remove whitespace
  "E.Auth.Token.MISSING"

Step 2: Resolve named sequence (if applicable)
  "E.Auth.Token.001"

Step 3: Convert to uppercase
  "E.AUTH.TOKEN.001"

Step 4: UTF-8 encode
  [0x45, 0x2E, 0x41, 0x55, 0x54, 0x48, 0x2E, 0x54, 0x4F, 0x4B, 0x45, 0x4E, 0x2E, 0x30, 0x30, 0x31]

Result: Prepared for hashing

The following inputs produce identical hash values:

E.Auth.Token.001           -> E.AUTH.TOKEN.001 -> identical hash
E.AUTH.TOKEN.001           -> E.AUTH.TOKEN.001 -> identical hash
e.auth.token.001           -> E.AUTH.TOKEN.001 -> identical hash
E.auth.Token.001           -> E.AUTH.TOKEN.001 -> identical hash
E.Auth.Token.MISSING       -> E.AUTH.TOKEN.001 -> identical hash (if MISSING maps to 001)

3.5 UTF-8 Encoding

The validated, resolved, and normalized input string MUST be encoded as UTF-8 bytes.

Character Encoding: UTF-8 as specified in [RFC 3629]
Byte Order: Network byte order (big-endian)

3.6 Edge Cases

Empty String:

  • Input: ""
  • Behavior: Implementations MUST reject this input (invalid WDP code)

Non-ASCII Characters:

Input:  "E.ใƒ†ใ‚นใƒˆ.TOKEN.001"
Result: Implementations MUST reject this input (invalid component format)

4. Hash Algorithm#

4.1 Required Algorithm: xxHash3

Implementations MUST use xxHash3 (xxh3_64) as the hashing algorithm for Compact ID generation.
Seed: 0x000031762D706477 (ASCIII "wdp-v1\0\0" little-endian)

4.2 Algorithm Selection Rationale

CriterionxxHash3 Characteristics
ThroughputApproximately 30 GB/s (2x faster than xxHash64)
Small Input PerformanceOptimized for 1-128 byte inputs
DeterminismIdentical input produces identical output
Cross-Language SupportAvailable in all major programming languages
Distribution QualityExcellent distribution for short inputs
Specification StabilityFrozen specification since 2021
Security ClassificationNon-cryptographic (appropriate for identification)

4.3 Rationale for Single Algorithm Mandate

Universal Interoperability:

System A (xxHash3): E.Auth.Token.001 -> V6a0B
System B (xxHash3): E.Auth.Token.001 -> V6a0B
System C (xxHash3): E.Auth.Token.001 -> V6a0B

4.4 Security Considerations

Compact IDs are designed for IDENTIFICATION purposes, not AUTHENTICATION. xxHash3 is a non-cryptographic hash function.

Layered Security Architecture:

Security RequirementRecommended SolutionArchitectural Layer
Transport securityHTTPS/TLSTransport
Message integrityJWT/JWS signingApplication
Catalog integrityEd25519 signature on catalog fileDistribution
AuthenticationOAuth/API keysApplication

4.5 Hash Parameters

Input:  UTF-8 encoded byte array
Seed:   0x000031762D706477 (u64) - ASCII "wdp-v1\0\0" interpreted as little-endian
Output: 64-bit unsigned integer (u64)

4.5.1 Seed Conversion for u64-Based APIs (NORMATIVE)

Seed String:        "wdp-v1"
Seed Bytes (6):     [0x77, 0x64, 0x70, 0x2D, 0x76, 0x31]
Zero-Padded (8):    [0x77, 0x64, 0x70, 0x2D, 0x76, 0x31, 0x00, 0x00]
Interpretation:     Little-Endian
u64 Value (hex):    0x000031762D706477
u64 Value (dec):    13891256219767

4.5.2 Cross-Language Seed Snippets (NORMATIVE)

Rust:

Rust
const WDP_SEED: u64 = 0x000031762D706477;
let hash = xxh3_64_with_seed(input_bytes, WDP_SEED);

Python:

Python
WDP_SEED_U64 = 0x000031762D706477
hash_value = xxhash.xxh3_64(input_bytes, seed=WDP_SEED_U64)

TypeScript:

Typescript
const WDP_SEED = 0x000031762D706477n;
const hash = xxh3.h64(inputBytes, WDP_SEED);

5. Base62 Encoding#

5.1 Purpose

Base62 encoding converts the 64-bit hash value into a compact, human-readable string using exclusively alphanumeric characters.

5.2 Base62 Alphabet

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

Character Set Composition: Digits (0-9), Uppercase (10-35), Lowercase (36-61).

5.3 Bit Extraction

The 64-bit hash output from xxHash3 MUST be truncated to 40 bits prior to Base62 encoding (bytes 3-7).

function extract_40_bits(hash: u64) -> u64:
    value = (hash >> 24) & 0xFF_FFFF_FFFF
    return value

5.4 Encoding Algorithm

function to_base62(value: u64) -> String:
    alphabet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
    base = 62
    // ... encoding logic ...
    // Pad to exactly 5 characters
    return result.join()

Worked Example:

Input:  15379485234 (u64)
Result: "FkRWm"

5.5 Output Format

Length: Exactly 5 characters.
Padding: Left-pad with '0'.

LengthCombinationsCollision Prob (2K codes)
5 chars916M0.22%
6 chars56B0.004%

6. Collision Handling#

6.1 Collision Probability

With 5-character Base62 encoding, collision probability follows the birthday paradox.

Number of Error CodesCollision ProbabilityRecommendation
1000.0005%No action required
2,0000.22%Suitable for most projects
5,0001.36%Implement collision monitoring
10,0005.5%Implement collision detection

6.3 Collision Resolution Strategies

Strategy 1: Modify Sequence Number (Recommended)

E.Auth.Token.001 -> V6a0B (collision)
E.Auth.Token.002 -> Ab3Cd (no collision)

Strategy 2: Document in Catalog

7. Implementation Guide#

7.1 Core Function

compute_compact_id(input: String) -> String

7.2 Reference Implementation (Rust)

Rust
pub fn compute_compact_id(input: &str) -> Result<String, Error> {
    if !is_valid_wdp_code(input) { return Err(Error::InvalidCode); }
    let normalized = input.trim().to_uppercase();
    let hash = xxh3_64_with_seed(normalized.as_bytes(), WDP_SEED);
    let extracted = (hash >> 24) & 0xFF_FFFF_FFFF;
    Ok(to_base62(extracted))
}

7.3 Python Implementation

Python
def compute_compact_id(input_str: str) -> str:
    normalized = input_str.strip().upper()
    # ... hash and encode ...

8. Test Vectors#

8.1 Purpose

Test vectors ensure that all implementations produce identical compact IDs for identical input. Implementations MUST pass all test vectors to claim conformance.

8.2 Basic Test Vectors

Input (Structured Code)Expected Compact IDDescription
E.AUTH.TOKEN.001V6a0BBasic error code
E.AUTH.TOKEN.MISSINGV6a0BNamed sequence (resolves to 001)
W.DATABASE.CONNECTION.027KF52SWarning with extended components
E.A.B.001l3i4IMinimal valid code
T.PROFILER.TIMER.999fnOQkTrace with high sequence number

Note: See test-vectors/data/compact-ids.json for authoritative test vectors verified across multiple language implementations.

Modifications to the input code require hash regeneration:

E.AUTH.TOKEN.001 -> V6a0B (original)
E.AUTH.TOKEN.002 -> 35Jkp (modified code produces different hash)

8.3 Edge Case Test Vectors

InputExpected Compact IDDescription
E.AUTH.TOKEN.001V6a0BStandard case
E.AUTH.TOKEN.00235JkpSequential codes (different hashes)
I.HTTP2SERVER.REQUEST.001Unzd9Digits in component

Note: See test-vectors/data/compact-ids.json for additional test cases.

9. Cross-Language Compatibility#

9.1 Requirements

  • Algorithm: xxHash3 (64-bit)
  • Seed: 0x000031762D706477
  • String Encoding: UTF-8
  • Base62 Alphabet: Standard alphanumeric

10. Specification Summary#

10.1 Core Requirements

Implementations MUST use xxHash3 with the specified seed and produce exactly 5 Base62 characters.

10.2 Key Design Decisions

xxHash3 was chosen for its performance on small inputs and universal availability.

11. Versioning and Future Compatibility#

11.1 Hash Algorithm Immutability

WDP hash parameters are IMMUTABLE and MUST NOT be modified.

11.2 Future Formats

Future versions (v2, v3) SHALL use prefixes (e.g., 2.V6a0B).