WDP Part 5: Compact IDs Specification
This document specifies the Compact ID generation mechanism for the Waddling Diagnostic Protocol (WDP). A Compact ID constitutes the fifth component of the complete WDP error code structure.
Abstract#
This document specifies the Compact ID generation mechanism for the Waddling Diagnostic Protocol (WDP). A Compact ID constitutes the fifth component of the complete WDP error code structure. This specification defines the transformation of the four-part structured code (SEVERITY.COMPONENT.PRIMARY.SEQUENCE) into a deterministic, hash-based identifier optimized for efficient logging, network transmission, and catalog lookup. The Compact ID is derived exclusively from the current structured code; modifications to the structured code produce a correspondingly different Compact ID.
Specification Navigation: Refer to STRUCTURE.md for an overview of all WDP specification documents.
Status of This Memo
This document specifies a standards track protocol for the WDP community and requests discussion and suggestions for improvements. Distribution of this memo is unlimited.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
1. Introduction#
1.1 Purpose
WDP Compact IDs are the fifth part of the complete WDP error code structure:
Complete WDP Structure:
Severity.Component.Primary.Sequence -> CompactID
^ ^ ^ ^ ^
Part 1 Part 2 Part 3 Part 4 Part 5This Document Focus:
Severity.Component.Primary.Sequence -> CompactID
^^^^^^^^^
Part 5: Short hash-based identifier (THIS DOCUMENT)The first four parts constitute the structured code; the Compact ID is derived from this structured code.
Compact IDs address the requirement for efficient error information transmission across networks while preserving full context and traceability.
Full structured codes (e.g., E.AUTH.TOKEN.001) exhibit the following characteristics:
- Human-readable format
- Self-documenting structure
- Searchable content
However, full structured codes present certain limitations:
- Verbosity (17 characters minimum)
- Increased bandwidth consumption in high-volume systems
Compact IDs (e.g., V6a0B) provide the following properties:
- Reduced size (5 characters, representing approximately 70% reduction)
- Deterministic output (identical input produces identical output)
- Collision resistance (916,132,832 possible combinations)
- Safe for use in URLs, JSON documents, and filenames
- Efficient generation and comparison
- Dynamic derivation (modifications to source code produce different identifiers)
1.2 Use Cases
Constrained Network Devices:
Sensor transmits: {"h":"V6a0B","t":45.2} (12 bytes)
Gateway performs catalog expansion to obtain full error contextMobile Applications:
1. Application downloads catalog once (approximately 50KB, cacheable)
2. API returns compact error representations (approximately 40 bytes each)
3. Application performs local expansion (offline-capable)Log Aggregation:
Microservice logs: [V6a0B] Request timeout
Centralized system performs cross-instance search by Compact IDHeterogeneous Language Environments:
Rust backend transmits: {"h":"V6a0B"}
TypeScript frontend receives and expands via shared catalog1.3 Design Goals
The Compact ID mechanism is designed to satisfy the following requirements:
- Determinism: Identical error codes MUST produce identical Compact IDs
- Dynamic Derivation: Compact IDs MUST be computed from current structured codes
- Collision Resistance: The probability of hash conflicts MUST be minimized
- Cross-Language Consistency: Hash computation MUST produce identical results across all implementations
- Computational Efficiency: Generation time MUST be sub-microsecond
- Fixed Length: Output MUST be exactly 5 characters
- Character Safety: Output MUST be safe for URLs, JSON, and filenames without escaping
1.4 Conformance Levels
An implementation is considered conformant at Level 1 (Standard) if it correctly implements all requirements marked with MUST, REQUIRED, or SHALL in this specification, in addition to achieving Level 0 (Error Codes) conformance as defined in 1-SEVERITY.md.
2. Overview#
2.1 Process Flow
The compact ID is derived from the four-part structured code:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Structured Code (Parts 1-4) โ
โ "E.AUTH.TOKEN.001" โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Step 1: Input Normalization โ
โ - Validate format โ
โ - Encode as UTF-8 bytes โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Step 2: Hash Generation โ
โ - Apply xxHash3 algorithm โ
โ - Use seed: 0x000031762D706477 โ
โ - Output: 64-bit unsigned integer โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Step 3: Bit Extraction โ
โ - Extract bytes 3-7 (40 bits) โ
โ - Output: 40-bit unsigned integer โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Step 4: Base62 Encoding โ
โ - Convert 40-bit value to base62 โ
โ - Pad to exactly 5 characters โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Compact ID (Part 5) โ
โ "V6a0B" โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Complete WDP Error Code:
[E.AUTH.TOKEN.001] -> V6a0B2.2 Example
Input: "E.AUTH.TOKEN.001" (structured code, parts 1-4)
| Normalize (UTF-8 bytes)
v
Bytes: [69, 46, 65, 85, 84, 72, 46, 84, 79, 75, 69, 78, 46, 48, 48, 49]
| Hash (xxHash3, seed=0x000031762D706477)
v
Hash: [64-bit value]
| Extract bytes 3-7 (40 bits)
v
Value: [40-bit value]
| Encode (Base62, pad to 5 chars)
v
Output: "V6a0B" (Compact ID, part 5)
Complete: [E.AUTH.TOKEN.001] -> V6a0BNote that modifications to the structured code produce different Compact IDs:
Original: [E.AUTH.TOKEN.001] -> V6a0B
Modified: [E.AUTHENTICATION.TOKEN.001] -> Bz93kThe Compact ID is derived dynamically from the current structured code.
3. Input Normalization#
3.1 Purpose
Input normalization ensures that all implementations produce identical byte sequences from the same structured error code (parts 1-4), regardless of platform, language, or locale. This guarantees that identical structured codes produce identical Compact IDs.
3.2 Input Validation
Prior to normalization, implementations MUST validate that the input conforms to a valid WDP structured code (parts 1-4) as specified in the Error Codes specification (1-ERROR-CODES.md).
Validation Requirements:
- Input MUST match the format:
SEVERITY.COMPONENT.PRIMARY.SEQUENCE - The severity field MUST be one of:
E, W, C, B, S, K, I, T, H - The component and primary fields MUST conform to PascalCase conventions (or any case if normalization is applied)
- The sequence field MUST be numeric (3 digits) or named (SCREAMING_SNAKE_CASE)
Invalid inputs MUST be rejected prior to hashing.
3.3 String Preparation
Step 1: Whitespace Handling
Implementations MUST remove leading and trailing whitespace from the input string prior to processing.
Input: " E.Auth.Token.001 "
Result: "E.Auth.Token.001"Step 2: Named Sequence Resolution
If the sequence field contains a named sequence (e.g., MISSING), implementations MUST resolve it to its numeric canonical form prior to hashing.
Input: E.Auth.Token.MISSING
Resolved: E.Auth.Token.001Step 3: Uppercase Normalization
The entire structured code MUST be converted to uppercase prior to hashing.
Input: E.Auth.Token.001
Normalized: E.AUTH.TOKEN.0013.4 Normalization Example
The following demonstrates the complete normalization process:
Step 0: Input (as received)
" E.Auth.Token.MISSING "
Step 1: Remove whitespace
"E.Auth.Token.MISSING"
Step 2: Resolve named sequence (if applicable)
"E.Auth.Token.001"
Step 3: Convert to uppercase
"E.AUTH.TOKEN.001"
Step 4: UTF-8 encode
[0x45, 0x2E, 0x41, 0x55, 0x54, 0x48, 0x2E, 0x54, 0x4F, 0x4B, 0x45, 0x4E, 0x2E, 0x30, 0x30, 0x31]
Result: Prepared for hashingThe following inputs produce identical hash values:
E.Auth.Token.001 -> E.AUTH.TOKEN.001 -> identical hash
E.AUTH.TOKEN.001 -> E.AUTH.TOKEN.001 -> identical hash
e.auth.token.001 -> E.AUTH.TOKEN.001 -> identical hash
E.auth.Token.001 -> E.AUTH.TOKEN.001 -> identical hash
E.Auth.Token.MISSING -> E.AUTH.TOKEN.001 -> identical hash (if MISSING maps to 001)3.5 UTF-8 Encoding
The validated, resolved, and normalized input string MUST be encoded as UTF-8 bytes.
Character Encoding: UTF-8 as specified in [RFC 3629]
Byte Order: Network byte order (big-endian)
3.6 Edge Cases
Empty String:
- Input:
"" - Behavior: Implementations MUST reject this input (invalid WDP code)
Non-ASCII Characters:
Input: "E.ใในใ.TOKEN.001"
Result: Implementations MUST reject this input (invalid component format)4. Hash Algorithm#
4.1 Required Algorithm: xxHash3
Implementations MUST use xxHash3 (xxh3_64) as the hashing algorithm for Compact ID generation.
Seed: 0x000031762D706477 (ASCIII "wdp-v1\0\0" little-endian)
4.2 Algorithm Selection Rationale
| Criterion | xxHash3 Characteristics |
|---|---|
| Throughput | Approximately 30 GB/s (2x faster than xxHash64) |
| Small Input Performance | Optimized for 1-128 byte inputs |
| Determinism | Identical input produces identical output |
| Cross-Language Support | Available in all major programming languages |
| Distribution Quality | Excellent distribution for short inputs |
| Specification Stability | Frozen specification since 2021 |
| Security Classification | Non-cryptographic (appropriate for identification) |
4.3 Rationale for Single Algorithm Mandate
Universal Interoperability:
System A (xxHash3): E.Auth.Token.001 -> V6a0B
System B (xxHash3): E.Auth.Token.001 -> V6a0B
System C (xxHash3): E.Auth.Token.001 -> V6a0B4.4 Security Considerations
Compact IDs are designed for IDENTIFICATION purposes, not AUTHENTICATION. xxHash3 is a non-cryptographic hash function.
Layered Security Architecture:
| Security Requirement | Recommended Solution | Architectural Layer |
|---|---|---|
| Transport security | HTTPS/TLS | Transport |
| Message integrity | JWT/JWS signing | Application |
| Catalog integrity | Ed25519 signature on catalog file | Distribution |
| Authentication | OAuth/API keys | Application |
4.5 Hash Parameters
Input: UTF-8 encoded byte array
Seed: 0x000031762D706477 (u64) - ASCII "wdp-v1\0\0" interpreted as little-endian
Output: 64-bit unsigned integer (u64)4.5.1 Seed Conversion for u64-Based APIs (NORMATIVE)
Seed String: "wdp-v1"
Seed Bytes (6): [0x77, 0x64, 0x70, 0x2D, 0x76, 0x31]
Zero-Padded (8): [0x77, 0x64, 0x70, 0x2D, 0x76, 0x31, 0x00, 0x00]
Interpretation: Little-Endian
u64 Value (hex): 0x000031762D706477
u64 Value (dec): 138912562197674.5.2 Cross-Language Seed Snippets (NORMATIVE)
Rust:
const WDP_SEED: u64 = 0x000031762D706477;
let hash = xxh3_64_with_seed(input_bytes, WDP_SEED);Python:
WDP_SEED_U64 = 0x000031762D706477
hash_value = xxhash.xxh3_64(input_bytes, seed=WDP_SEED_U64)TypeScript:
const WDP_SEED = 0x000031762D706477n;
const hash = xxh3.h64(inputBytes, WDP_SEED);5. Base62 Encoding#
5.1 Purpose
Base62 encoding converts the 64-bit hash value into a compact, human-readable string using exclusively alphanumeric characters.
5.2 Base62 Alphabet
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzCharacter Set Composition: Digits (0-9), Uppercase (10-35), Lowercase (36-61).
5.3 Bit Extraction
The 64-bit hash output from xxHash3 MUST be truncated to 40 bits prior to Base62 encoding (bytes 3-7).
function extract_40_bits(hash: u64) -> u64:
value = (hash >> 24) & 0xFF_FFFF_FFFF
return value5.4 Encoding Algorithm
function to_base62(value: u64) -> String:
alphabet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
base = 62
// ... encoding logic ...
// Pad to exactly 5 characters
return result.join()Worked Example:
Input: 15379485234 (u64)
Result: "FkRWm"5.5 Output Format
Length: Exactly 5 characters.
Padding: Left-pad with '0'.
| Length | Combinations | Collision Prob (2K codes) |
|---|---|---|
| 5 chars | 916M | 0.22% |
| 6 chars | 56B | 0.004% |
6. Collision Handling#
6.1 Collision Probability
With 5-character Base62 encoding, collision probability follows the birthday paradox.
| Number of Error Codes | Collision Probability | Recommendation |
|---|---|---|
| 100 | 0.0005% | No action required |
| 2,000 | 0.22% | Suitable for most projects |
| 5,000 | 1.36% | Implement collision monitoring |
| 10,000 | 5.5% | Implement collision detection |
6.3 Collision Resolution Strategies
Strategy 1: Modify Sequence Number (Recommended)
E.Auth.Token.001 -> V6a0B (collision)
E.Auth.Token.002 -> Ab3Cd (no collision)Strategy 2: Document in Catalog
7. Implementation Guide#
7.1 Core Function
compute_compact_id(input: String) -> String7.2 Reference Implementation (Rust)
pub fn compute_compact_id(input: &str) -> Result<String, Error> {
if !is_valid_wdp_code(input) { return Err(Error::InvalidCode); }
let normalized = input.trim().to_uppercase();
let hash = xxh3_64_with_seed(normalized.as_bytes(), WDP_SEED);
let extracted = (hash >> 24) & 0xFF_FFFF_FFFF;
Ok(to_base62(extracted))
}7.3 Python Implementation
def compute_compact_id(input_str: str) -> str:
normalized = input_str.strip().upper()
# ... hash and encode ...8. Test Vectors#
8.1 Purpose
Test vectors ensure that all implementations produce identical compact IDs for identical input. Implementations MUST pass all test vectors to claim conformance.
8.2 Basic Test Vectors
| Input (Structured Code) | Expected Compact ID | Description |
|---|---|---|
| E.AUTH.TOKEN.001 | V6a0B | Basic error code |
| E.AUTH.TOKEN.MISSING | V6a0B | Named sequence (resolves to 001) |
| W.DATABASE.CONNECTION.027 | KF52S | Warning with extended components |
| E.A.B.001 | l3i4I | Minimal valid code |
| T.PROFILER.TIMER.999 | fnOQk | Trace with high sequence number |
Note: See test-vectors/data/compact-ids.json for authoritative test vectors verified across multiple language implementations.
Modifications to the input code require hash regeneration:
E.AUTH.TOKEN.001 -> V6a0B (original)
E.AUTH.TOKEN.002 -> 35Jkp (modified code produces different hash)8.3 Edge Case Test Vectors
| Input | Expected Compact ID | Description |
|---|---|---|
| E.AUTH.TOKEN.001 | V6a0B | Standard case |
| E.AUTH.TOKEN.002 | 35Jkp | Sequential codes (different hashes) |
| I.HTTP2SERVER.REQUEST.001 | Unzd9 | Digits in component |
Note: See test-vectors/data/compact-ids.json for additional test cases.
9. Cross-Language Compatibility#
9.1 Requirements
- Algorithm: xxHash3 (64-bit)
- Seed: 0x000031762D706477
- String Encoding: UTF-8
- Base62 Alphabet: Standard alphanumeric
10. Specification Summary#
10.1 Core Requirements
Implementations MUST use xxHash3 with the specified seed and produce exactly 5 Base62 characters.
10.2 Key Design Decisions
xxHash3 was chosen for its performance on small inputs and universal availability.
11. Versioning and Future Compatibility#
11.1 Hash Algorithm Immutability
WDP hash parameters are IMMUTABLE and MUST NOT be modified.
11.2 Future Formats
Future versions (v2, v3) SHALL use prefixes (e.g., 2.V6a0B).