Core Part5 of 5
Version0.1.0-draft
Last Updated2025-11-30
StatusDraft
TypeNORMATIVE (Core)

WDP Part 5: Compact IDs Specification

Definition of the Compact ID generation protocol for the Waddling Diagnostic Protocol (WDP). Compact IDs are the fifth and final part of the complete WDP error code structure, transforming structured codes into short, deterministic hash-based identifiers.

Abstract#

This document defines the Compact ID generation protocol for the Waddling Diagnostic Protocol (WDP). Compact IDs are the fifth and final part of the complete WDP error code structure. This specification describes how the four-part structured code (SEVERITY.COMPONENT.PRIMARY.SEQUENCE) is transformed into a short, deterministic hash-based identifier for efficient logging, transmission, and lookup. The compact ID is always generated fresh from the current code; any change to the structured code produces a different hash.

Specification Navigation: See STRUCTURE.md for an overview of all WDP specification parts.

Conformance

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

1. Introduction#

1.1 Purpose

WDP Compact IDs are the fifth part of the complete WDP error code structure:

Complete WDP Structure:

Severity.Component.Primary.Sequence -> CompactID
    ^         ^        ^        ^          ^
  Part 1    Part 2   Part 3   Part 4    Part 5

This Document Focus:

Severity.Component.Primary.Sequence -> CompactID
                                       ^^^^^^^^^
                                       Part 5: Short hash-based identifier (THIS DOCUMENT)

The first four parts form the structured code, and the compact ID is derived from them.

Compact IDs solve a critical problem in distributed systems: how to efficiently transmit error information over networks while maintaining full context and traceability.

Full structured codes like E.AUTH.TOKEN.001 are:

  • βœ… Human readable
  • βœ… Self-documenting
  • βœ… Searchable
  • ❌ Verbose (17 characters)
  • ❌ Bandwidth-intensive for high-volume systems

Compact IDs like Ay75d provide:

  • βœ… Short (5 characters = 70% reduction)
  • βœ… Deterministic (same input β†’ same output)
  • βœ… Collision-resistant (916 million combinations)
  • βœ… URL-safe, JSON-safe, filename-safe
  • βœ… Fast to generate and compare
  • βœ… Always fresh (changes when code changes)

1.2 Use Cases

IoT Devices:

Sensor β†’ {"h":"jGKFp","t":45.2} β†’ Gateway (12 bytes)
Gateway expands using catalog β†’ Full error context

Mobile Applications:

1. App downloads catalog once (50KB, cacheable)
2. API returns compact errors (40 bytes each)
3. App expands errors locally (offline-capable)

Log Aggregation:

Microservice logs: [jGKFp] Request timeout
Centralized system searches all instances by hash

Multi-Language Systems:

Rust backend β†’ {"h":"jGKFp"} β†’ TypeScript frontend
Both use same catalog β†’ Seamless interoperability

1.3 Design Goals

  • Deterministic: Same error code always produces same hash
  • Fresh: Hash is always regenerated from current code (not cached/stored)
  • Collision-resistant: Minimal probability of hash conflicts
  • Cross-language: Identical hashing across all implementations
  • Fast: Sub-microsecond generation time
  • Compact: Fixed 5-character output
  • Safe: URL-safe, JSON-safe, no escaping needed

1.4 Conformance

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

An implementation is conformant at Level 1 (Standard) if it correctly implements all requirements in this specification marked with MUST, REQUIRED, or SHALL, in addition to Level 0 (Error Codes) conformance.

Normative References#

  • STRUCTURE.md - WDP error code structure overview
  • 1-SEVERITY.md - Severity field specification (Part 1)
  • 2-COMPONENT.md - Component field specification (Part 2)
  • 3-PRIMARY.md - Primary field specification (Part 3)
  • 4-SEQUENCE.md - Sequence field specification (Part 4)
  • 6-CATALOGS.md - Error catalogs specification

2. Overview#

2.1 Process Flow

The compact ID is derived from the four-part structured code:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Structured Code (Parts 1-4)            β”‚
β”‚  "E.AUTH.TOKEN.001"                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Step 1: Input Normalization            β”‚
β”‚  - Validate format                      β”‚
β”‚  - Encode as UTF-8 bytes                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Step 2: Hash Generation                β”‚
β”‚  - Apply xxHash64 algorithm             β”‚
β”‚  - Use seed: "wdp-v1"                 β”‚
β”‚  - Output: 64-bit unsigned integer      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Step 3: Base62 Encoding                β”‚
β”‚  - Convert u64 to base62                β”‚
β”‚  - Pad to exactly 5 characters          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Compact ID (Part 5)                    β”‚
β”‚  "Ay75d"                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Complete WDP Error Code:
[E.AUTH.TOKEN.001] -> Ay75d

2.2 Example

Input:  "E.AUTH.TOKEN.001" (structured code, parts 1-4)
        ↓ Normalize (UTF-8 bytes)
Bytes:  [69, 46, 65, 85, 84, 72, 46, 84, 79, 75, 69, 78, 46, 48, 48, 49]
        ↓ Hash (xxHash64, seed="wdp-v1")
Hash:   15379485234 (u64)
        ↓ Encode (Base62, pad to 5 chars)
Output: "Ay75d" (compact ID, part 5)

Complete: [E.AUTH.TOKEN.001] -> Ay75d

Important: If the code changes, the hash changes:

Original: [E.AUTH.TOKEN.001] -> Ay75d
Changed:  [E.AUTHENTICATION.TOKEN.001] -> Bz93k
                                         ^^^^^^
                                         New hash!

The compact ID is ALWAYS generated fresh from the current code.

3. Input Normalization#

3.1 Purpose

Input normalization ensures that all implementations produce identical byte sequences from the same structured error code (parts 1-4), regardless of platform, language, or locale. This guarantees that the same structured code always produces the same compact ID.

Normalization provides case-insensitive lookups and typo resilience by converting display forms to a canonical hash form.

3.2 Input Validation

Before normalization, implementations MUST validate that the input is a valid WDP structured code (parts 1-4) according to the Error Codes specification.

Requirements:

  • MUST match format: SEVERITY.COMPONENT.PRIMARY.SEQUENCE
  • MUST validate severity is one of: E, W, C, B, S, K, I, T, H
  • MUST validate component and primary are PascalCase (or any case if normalization is applied)
  • MUST validate sequence is numeric (3 digits) or named (SCREAMING_SNAKE_CASE)

Invalid inputs MUST be rejected before hashing.

Note: Only the structured code (parts 1-4) is hashed. The compact ID is never an input to its own generation.

3.3 String Preparation

Step 1: Whitespace Handling

Implementations MUST trim leading and trailing whitespace from the input string before processing.

Input:  "  E.Auth.Token.001  "
After:  "E.Auth.Token.001"

Rationale: Prevents accidental whitespace from user input causing hash mismatches.

Step 2: Named Sequence Resolution

If the sequence field contains a named sequence (e.g., TOKEN_MISSING), it MUST be resolved to its numeric canonical form before hashing.

Input:    E.Auth.Token.TOKEN_MISSING
Resolved: E.Auth.Token.001

Rationale: Named sequences are display aliases only. The numeric form is the canonical identity. This ensures:

  • Stable hashes when renaming aliases
  • i18n compatibility (localized names map to same numeric ID)
  • Simpler hash input (no underscores in hash computation)

Step 3: Uppercase Normalization

The entire structured code MUST be converted to uppercase before hashing.

Input:      E.Auth.Token.001
Normalized: E.AUTH.TOKEN.001

Rationale: Provides case-insensitive lookups and prevents typos from creating phantom error codes.

Benefits:

  • User searches: e.auth.token.001 β†’ finds E.Auth.Token.001
  • Typo resilience: E.auth.Token.001 β†’ same hash as E.Auth.Token.001
  • Cross-implementation safety: slight formatting differences still match

3.4 Normalization Example

Complete normalization flow:

Step 0: Input (as written)
  "  E.Auth.Token.TOKEN_MISSING  "

Step 1: Trim whitespace
  "E.Auth.Token.TOKEN_MISSING"

Step 2: Resolve named sequence (if applicable)
  "E.Auth.Token.001"

Step 3: Convert to uppercase
  "E.AUTH.TOKEN.001"

Step 4: UTF-8 encode
  [0x45, 0x2E, 0x41, 0x55, 0x54, 0x48, 0x2E, 0x54, 0x4F, 0x4B, 0x45, 0x4E, 0x2E, 0x30, 0x30, 0x31]

Result: Ready for hashing

All these inputs produce the same hash:

E.Auth.Token.001           β†’ E.AUTH.TOKEN.001 β†’ same hash
E.AUTH.TOKEN.001           β†’ E.AUTH.TOKEN.001 β†’ same hash
e.auth.token.001           β†’ E.AUTH.TOKEN.001 β†’ same hash
E.auth.Token.001           β†’ E.AUTH.TOKEN.001 β†’ same hash
E.Auth.Token.TOKEN_MISSING β†’ E.AUTH.TOKEN.001 β†’ same hash (if TOKEN_MISSING maps to 001)

3.5 UTF-8 Encoding

The validated, resolved, and normalized input string MUST be encoded as UTF-8 bytes.

Character Encoding: UTF-8 (RFC 3629)
Byte Order: Network byte order (big-endian)
Unicode Normalization: None required (WDP codes are ASCII-only after normalization)

Example:

String (normalized): "E.AUTH.TOKEN.001"
UTF-8:  [0x45, 0x2E, 0x41, 0x55, 0x54, 0x48, 0x2E, 0x54, 0x4F, 0x4B,
         0x45, 0x4E, 0x2E, 0x30, 0x30, 0x31]

Note: The period (.) separator IS part of the UTF-8 byte sequence and contributes to the hash.

3.6 Edge Cases

Empty String:

  • Input: ""
  • Behavior: MUST reject (invalid WDP code)

Unicode in Input:

WDP error codes are ASCII-only by specification, but if an implementation encounters Unicode:

Input:  "E.γƒ†γ‚Ήγƒˆ.TOKEN.001"
Result: MUST reject (invalid component format)

Rationale: Error codes must be ASCII to ensure hash stability and cross-language compatibility.

4. Hash Algorithm#

4.1 Required Algorithm: xxHash64

Implementations MUST use xxHash64 as the hashing algorithm for compact ID generation.

Algorithm: xxHash64
Version: xxHash v0.8.0 or later
Input: The 4-part structured code (SEVERITY.COMPONENT.PRIMARY.SEQUENCE)
Output: 64-bit unsigned integer
Reference: https://cyan4973.github.io/xxHash/

Alternative algorithms MUST NOT be used. This requirement ensures universal interoperabilityβ€”the same error code MUST produce the same compact ID across all WDP implementations, regardless of language, platform, or vendor.

4.2 Why xxHash64?

CriteriaxxHash64
Speed~15 GB/s (extremely fast)
Determinismβœ… Same input β†’ same output always
Cross-languageβœ… Available in all major languages
Collision resistanceβœ… Good distribution for short inputs
Standardizedβœ… Well-defined specification
Small inputsβœ… Optimized for <100 byte inputs
Non-cryptographicβœ… Appropriate for identification (not authentication)

4.3 Rationale for Mandating One Algorithm

Universal Interoperability:

System A (xxHash64): E.Auth.Token.001 β†’ Xy8Qz
System B (xxHash64): E.Auth.Token.001 β†’ Xy8Qz
System C (xxHash64): E.Auth.Token.001 β†’ Xy8Qz

βœ… All systems produce identical compact IDs
βœ… Catalogs are portable between systems
βœ… IoT devices can send compact IDs that any gateway can expand
βœ… Cross-system error lookup works universally

If alternative algorithms were allowed:

System A (xxHash64): E.Auth.Token.001 β†’ Xy8Qz
System B (BLAKE3):   E.Auth.Token.001 β†’ Mn7Pq
System C (SHA-256):  E.Auth.Token.001 β†’ Ab3Cd

❌ Same error code produces different compact IDs
❌ Cannot share catalogs between systems
❌ Cannot look up compact IDs across systems
❌ Compact ID becomes meaningless for identification

This would destroy the core value proposition of WDP compact IDs.

4.4 Security Considerations

Compact IDs are for IDENTIFICATION, not AUTHENTICATION.

xxHash64 is a non-cryptographic hash function. This is intentional and appropriate for WDP's use case:

DO use compact IDs for:

  • βœ… Efficient transmission over networks
  • βœ… Log aggregation and searching
  • βœ… Error code lookup in catalogs
  • βœ… Space-constrained environments (IoT, mobile)
  • βœ… URL parameters and query strings

DO NOT use compact IDs for:

  • ❌ Authentication or authorization
  • ❌ Tamper detection or integrity verification
  • ❌ Cryptographic signatures
  • ❌ Security-sensitive decisions

For security needs, use appropriate mechanisms at the correct layer:

Security NeedSolutionLayer
Transport securityHTTPS/TLSTransport
Message integrityJWT/JWS signingApplication
Catalog integrityEd25519 signature on catalog fileDistribution
AuthenticationOAuth/API keysApplication

4.5 Hash Parameters

Required Parameters:

Input:  UTF-8 encoded byte array
Seed:   UTF-8 encoding of string "wdp-v1"
Output: 64-bit unsigned integer (u64)

Seed Specification:

The seed MUST be the UTF-8 byte sequence of the ASCII string "wdp-v1" (without quotes).

Seed String: "wdp-v1"
Seed Bytes:  [0x77, 0x64, 0x70, 0x2D, 0x76, 0x31]

Rationale for "wdp-v1" seed:

  • Protocol-specific (unique to WDP)
  • Versioned (enables future evolution if needed)
  • Professional and self-documenting
  • Concise (6 bytes)

4.6 Hash Function Signature

Pseudocode:

function compute_hash(input: String) -> u64:
    // 1. Validate
    if not is_valid_wdp_code(input):
        raise ValidationError

    // 2. Normalize
    normalized = input.trim()
    bytes = utf8_encode(normalized)

    // 3. Hash
    seed = utf8_encode("wdp-v1")
    hash_value = xxhash64(bytes, seed)

    return hash_value

5. Base62 Encoding#

5.1 Purpose

Base62 encoding converts the 64-bit hash value into a compact, human-readable string using only alphanumeric characters (no special characters that require URL encoding).

5.2 Base62 Alphabet

The Base62 alphabet MUST be exactly:

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

Character Set:

  • Digits: 0-9 (indices 0-9)
  • Uppercase letters: A-Z (indices 10-35)
  • Lowercase letters: a-z (indices 36-61)

Total: 62 characters (indices 0-61)

Ordering: CRITICAL - The order must be exactly as specified. Different ordering produces different encodings.

5.3 Encoding Algorithm

Input: 64-bit unsigned integer (u64)
Output: 5-character string

function to_base62(value: u64) -> String:
    alphabet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
    base = 62

    if value == 0:
        return "00000"  // Special case

    result = []
    while value > 0:
        remainder = value % base
        result.prepend(alphabet[remainder])
        value = value / base  // Integer division

    // Pad to exactly 5 characters
    while result.length < 5:
        result.prepend('0')

    return result.join()

5.4 Output Format

Length: MUST be exactly 5 characters (REQUIRED)

All WDP compact IDs are exactly 5 Base62 characters. This fixed length provides:

  • βœ… Consistent space requirements (always 5 bytes)
  • βœ… Predictable URL/log formatting
  • βœ… Easy pattern recognition
  • βœ… ~916 million unique combinations (62^5)

Padding: If the encoded value is less than 5 characters, implementations MUST left-pad with '0' (zero) characters.

Value: 123 β†’ "00123"
Value: 9876543210 β†’ "FkRWm"

Why 5 characters?

LengthCombinationsCollision at 1K codesCollision at 2K codesCollision at 5K codes
5 chars916M0.05%0.22%1.36%
6 chars56B0.001%0.004%0.02%
8 chars218T~0%~0%~0%

5.5 Properties

  • Uniqueness: Each u64 value maps to exactly one Base62 string
  • Collision Space: 62^5 = 916,132,832 possible combinations
  • Sufficient for: 100,000+ unique error codes with extremely low collision probability
  • URL-Safe: No encoding needed (%XX escaping not required)
  • JSON-Safe: No escaping needed
  • Filename-Safe: Safe for use in filenames on all major filesystems

6. Collision Handling#

6.1 Collision Probability

With 5-character Base62 encoding (916,132,832 combinations) and xxHash64's good distribution properties, collision probability follows the birthday paradox formula.

Collision Probability Table:

Number of Error CodesCollision ProbabilityRisk LevelRecommendation
1000.0005%NegligibleSafe
5000.014%Very LowSafe
1,0000.054%LowSafe
2,0000.22%AcceptableSafe for most projects
5,0001.36%ModerateMonitor for collisions
10,0005.5%HighImplement collision detection
20,00020%Very HighUse full structured codes

Practical Guidance:

  • Most projects (< 2,000 codes): Collisions are rare and acceptable. No special handling needed.
  • Large projects (2,000-5,000 codes): Low risk, but document any collisions in catalogs.
  • Very large projects (> 5,000 codes): Implement collision detection and consider using full structured codes in critical paths.

6.2 Collision Detection

Compile-Time Detection (RECOMMENDED):

Implementations that generate catalogs at build time SHOULD detect collisions and report them.

Rust
// Example: Rust procedural macro
fn generate_catalog(codes: &[ErrorCode]) -> Result<Catalog, CollisionError> {
    let mut hash_map: HashMap<CompactId, ErrorCode> = HashMap::new();

    for code in codes {
        let compact_id = compute_compact_id(code);

        if let Some(existing) = hash_map.get(&compact_id) {
            return Err(CollisionError {
                compact_id,
                code1: existing.clone(),
                code2: code.clone(),
            });
        }

        hash_map.insert(compact_id, code.clone());
    }

    Ok(Catalog { codes: hash_map })
}

6.3 Collision Resolution Strategies

When a collision is detected, choose one of these strategies:

Strategy 1: Use Different Sequence Number (RECOMMENDED)

E.Auth.Token.001 β†’ Xy8Qz (collision with E.Other.Thing.042!)
E.Auth.Token.002 β†’ Ab3Cd (use next sequence - no collision)

Strategy 2: Document in Catalog (ACCEPTABLE)

Json
{
  "Xy8Qz": [
    {
      "code": "E.Auth.Token.001",
      "message": "Token missing",
      "note": "Shares compact ID with E.Database.Connection.157"
    },
    {
      "code": "E.Database.Connection.157",
      "message": "Connection pool exhausted",
      "note": "Shares compact ID with E.Auth.Token.001"
    }
  ]
}

6.4 Testing for Collisions

Implementations SHOULD include collision testing:

Python
function test_no_collisions():
    all_codes = get_all_project_codes()
    hash_map = {}

    for code in all_codes:
        hash = compute_compact_id(code)
        assert hash not in hash_map, f"Collision: {code}"
        hash_map[hash] = code

    print(f"βœ“ All {len(all_codes)} codes have unique hashes")

7. Implementation Guide#

7.1 Core Function

Every conformant implementation MUST provide this core function:

compute_compact_id(input: String) -> String

Behavior:

  1. Validate input is a valid WDP error code
  2. Trim whitespace
  3. Encode as UTF-8 bytes
  4. Hash using xxHash64 with seed "wdp-v1"
  5. Encode result as 5-character Base62 string
  6. Return compact ID

7.2 Reference Implementation (Rust)

Rust
use xxhash_rust::xxh64::xxh64;

const SEED: &str = "wdp-v1";
const BASE62_ALPHABET: &[u8] = b"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

pub fn compute_compact_id(input: &str) -> Result<String, Error> {
    // 1. Validate
    if !is_valid_wdp_code(input) {
        return Err(Error::InvalidCode);
    }

    // 2. Normalize
    let normalized = input.trim();
    let bytes = normalized.as_bytes();

    // 3. Hash
    let seed_bytes = SEED.as_bytes();
    // Note: "wdp-v1" is 6 bytes, pad with zeros for u64 (8 bytes)
    let mut seed_padded = [0u8; 8];
    seed_padded[..seed_bytes.len()].copy_from_slice(seed_bytes);
    let seed_u64 = u64::from_le_bytes(seed_padded);
    let hash = xxh64(bytes, seed_u64);

    // 4. Encode
    let compact_id = to_base62(hash);

    Ok(compact_id)
}

fn to_base62(mut value: u64) -> String {
    if value == 0 {
        return "00000".to_string();
    }

    let mut result = Vec::new();
    while value > 0 {
        let remainder = (value % 62) as usize;
        result.push(BASE62_ALPHABET[remainder] as char);
        value /= 62;
    }

    result.reverse();

    // Pad to 5 characters
    while result.len() < 5 {
        result.insert(0, '0');
    }

    result.into_iter().collect()
}

7.3 Python Implementation

Python
import xxhash

SEED = b"wdp-v1"
BASE62_ALPHABET = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

def compute_compact_id(input_str: str) -> str:
    # 1. Validate
    if not is_valid_wdp_code(input_str):
        raise ValueError("Invalid WDP code")

    # 2. Normalize
    normalized = input_str.strip()
    input_bytes = normalized.encode('utf-8')

    # 3. Hash
    hasher = xxhash.xxh64(seed=SEED)
    hasher.update(input_bytes)
    hash_value = hasher.intdigest()

    # 4. Encode
    compact_id = to_base62(hash_value)

    return compact_id

def to_base62(value: int) -> str:
    if value == 0:
        return "00000"

    result = []
    while value > 0:
        remainder = value % 62
        result.append(BASE62_ALPHABET[remainder])
        value //= 62

    result.reverse()

    # Pad to 5 characters
    while len(result) < 5:
        result.insert(0, '0')

    return ''.join(result)

7.4 TypeScript Implementation

Typescript
import { xxHash64 } from 'xxhash-wasm';

const SEED = 'wdp-v1';
const BASE62_ALPHABET = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';

export function computeCompactId(input: string): string {
  // 1. Validate
  if (!isValidWdpCode(input)) {
    throw new Error('Invalid WDP code');
  }

  // 2. Normalize
  const normalized = input.trim();
  const encoder = new TextEncoder();
  const bytes = encoder.encode(normalized);

  // 3. Hash
  const seedBytes = encoder.encode(SEED);
  const hash = xxHash64(bytes, seedBytes);

  // 4. Encode
  const compactId = toBase62(hash);

  return compactId;
}

function toBase62(value: bigint): string {
  if (value === 0n) {
    return '00000';
  }

  const result: string[] = [];
  while (value > 0n) {
    const remainder = Number(value % 62n);
    result.push(BASE62_ALPHABET[remainder]);
    value = value / 62n;
  }

  result.reverse();

  // Pad to 5 characters
  while (result.length < 5) {
    result.unshift('0');
  }

  return result.join('');
}

8. Test Vectors#

8.1 Purpose

Test vectors ensure that all implementations produce identical compact IDs for the same input. Implementations MUST pass all test vectors to claim conformance.

8.2 Basic Test Vectors

Input (Structured Code)Expected Compact IDDescription
E.AUTH.TOKEN.001Ay75dBasic error code
W.DATABASE.CONNECTION.027xY9KpWarning with longer components
C.CACHE.DATA.CORRUPTEDmN3YrNamed sequence
E.A.B.001kL8XqMinimal valid code
T.PROFILER.TIMER.999pQ7ZwTrace with high sequence

8.3 Edge Case Test Vectors

InputExpected Compact IDDescription
E.AUTH.TOKEN.001jGKFpStandard case
E.AUTH.TOKEN.002[hash]Sequential codes (different hashes)

8.4 Validation Procedure

Python
def test_compatibility():
    test_vectors = load_test_vectors()

    for vector in test_vectors:
        input_code = vector['input']
        expected = vector['expected_compact_id']

        actual = compute_compact_id(input_code)

        assert actual == expected, \
            f"Failed: {input_code} -> expected {expected}, got {actual}"

    print("βœ“ All test vectors passed")

8.5 Test Vector Format

Json
{
  "version": "1.0.0",
  "algorithm": "xxHash64",
  "seed": "wdp-v1",
  "encoding": "Base62",
  "note": "WDP Compact ID test vectors",
  "test_vectors": [
    {
      "id": "basic_001",
      "input": "E.AUTH.TOKEN.001",
      "expected_compact_id": "Ay75d",
      "full_error_code": "[E.AUTH.TOKEN.001] -> Ay75d",
      "description": "Basic authentication error"
    },
    {
      "id": "warning_001",
      "input": "W.DATABASE.CONNECTION.027",
      "expected_compact_id": "xY9Kp",
      "full_error_code": "[W.DATABASE.CONNECTION.027] -> xY9Kp",
      "description": "Database connection warning"
    }
  ]
}

9. Cross-Language Compatibility#

9.1 Requirements for Cross-Language Support

All WDP implementations MUST produce identical compact IDs for identical structured codes. This enables:

  • Shared error catalogs across different systems
  • Cross-system error lookup and aggregation
  • IoT device interoperability
  • Multi-language microservice architectures

9.2 Language-Specific Libraries

Implementations SHOULD provide idiomatic APIs for each target language while maintaining identical core behavior.

9.3 Common Pitfalls

  • Seed encoding: Ensure "wdp-v1" is properly UTF-8 encoded
  • Integer size: Use 64-bit unsigned integers for hash values
  • Base62 alphabet: Use exact character order specified
  • Padding: Always pad to exactly 5 characters

9.4 Verification Checklist

  • βœ… All test vectors pass
  • βœ… Same input produces same output across all target languages
  • βœ… Error handling behaves consistently
  • βœ… Performance meets requirements
  • βœ… Thread-safe (if applicable)

10. Specification Summary#

10.1 Core Requirements

  • Algorithm: xxHash64 with seed "wdp-v1"
  • Input: 4-part structured WDP code, normalized to uppercase
  • Output: 5-character Base62-encoded string
  • Deterministic: Same input always produces same output
  • Fresh: Always computed from current code (not cached)

10.2 Key Design Decisions

  • xxHash64: Fast, deterministic, cross-language available
  • 5 characters: Balance of compactness and collision resistance
  • Base62: URL-safe, no escaping needed
  • Uppercase normalization: Case-insensitive lookups
  • Non-cryptographic: Identification, not authentication

10.3 When to Use What

ScenarioRecommendation
IoT/Mobile bandwidth constraintsUse compact IDs exclusively
Human debuggingShow both: [E.Auth.Token.001] -> Ay75d
Log aggregationCompact IDs with catalog lookup
> 5,000 error codesMonitor for collisions, consider full codes for critical paths

10.4 Security Considerations Summary

  • Compact IDs are for identification, not authentication
  • Use HTTPS/TLS for transport security
  • Use JWT/JWS for message integrity
  • Use proper authentication mechanisms at application layer

10.5 Collision Guidance Summary

  • < 2,000 codes: Collisions are rare, no action needed
  • 2,000-5,000 codes: Monitor and document collisions
  • > 5,000 codes: Implement detection, consider alternatives

10.6 Implementation Checklist

  • β–‘ Core function: compute_compact_id(input) -> string
  • β–‘ Input validation and normalization
  • β–‘ xxHash64 with correct seed
  • β–‘ Base62 encoding with correct alphabet
  • β–‘ 5-character padding
  • β–‘ Test vectors pass
  • β–‘ Error handling
  • β–‘ Performance benchmarks

10.7 Quick Lookup

Algorithm: xxHash64
Seed:      "wdp-v1" (UTF-8)
Encoding:  Base62
Output:    5 characters
Space:     62^5 = 916M combinations
Collision: ~0.22% at 2K codes

Appendix A: Quick Reference#

A.1 Algorithm Summary

Input:  "E.Auth.Token.001"
        ↓ Validate & normalize
Norm:   "E.AUTH.TOKEN.001"
        ↓ UTF-8 encode
Bytes:  [0x45, 0x2E, 0x41, ...]
        ↓ xxHash64(seed="wdp-v1")
Hash:   15379485234 (u64)
        ↓ Base62 encode
Output: "Ay75d"

A.2 Constants

ConstantValue
Seed"wdp-v1"
Base62 Alphabet0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Output Length5 characters

A.3 Properties

  • Deterministic: Same input β†’ same output
  • Case-insensitive: Normalizes to uppercase
  • URL-safe: No special character encoding needed
  • Cross-language: Identical results across implementations
  • Non-cryptographic: For identification, not security

Appendix B: References#

Appendix C: Change Log#

Version 1.0.0-draft (2024-01-15)

  • Initial specification
  • Define xxHash64 + Base62 algorithm
  • Establish 5-character output format
  • Add collision handling guidance
  • Include reference implementations

11. Versioning and Future Compatibility#

11.1 WDP v1.0 Immutability

The compact ID algorithm specified in this document is immutable for WDP v1.0. This means:

  • Algorithm: xxHash64 (cannot change)
  • Seed: "wdp-v1" (cannot change)
  • Encoding: Base62 (cannot change)
  • Length: 5 characters (cannot change)
  • Alphabet: Specified order (cannot change)

11.2 Compact ID Format Versions

WDP v1.0 Format (Current)

  • Identification: 5-character Base62 strings
  • Pattern: [0-9A-Za-z]{5}
  • Examples: Ay75d, xY9Kp, mN3Yr

Future Versions (Reserved)

Future WDP versions MAY introduce alternative compact ID formats if necessary, but v1.0 format will remain supported indefinitely.

Rust
pub enum WdpVersion {
    V1,        // 5-char Base62 (current)
    V2,        // Reserved for future
    V3,        // Reserved for future
    Unknown,   // Forward compatibility
}

pub fn detect_version(compact_id: &str) -> WdpVersion {
    match compact_id.len() {
        5 => WdpVersion::V1,  // Current format
        // Future formats TBD
        _ => WdpVersion::Unknown,
    }
}

11.3 Version Updates Policy

WDP compact ID format changes require a new major version and MUST maintain backward compatibility. Systems MUST support reading v1.0 compact IDs indefinitely.

11.4 Threshold for Version 2

A new compact ID format version would only be considered if fundamental limitations arise, such as:

  • Collision rates become problematic at global scale
  • Performance requirements change dramatically
  • Character set limitations cause interoperability issues

Such changes are not anticipated for the foreseeable future.

End of Compact IDs Specification