Distributed ID Formats Are Architectural Commitments, Not Just Data Types

35 points by mnahkies 4 days ago

donavanm an hour ago

Generation and structure are important, but IME IDs arent complete without consideration of representation; encoding and opacity.

* User facing IDs must be opaque. If users can infer any structure or ordering from your ID they _will_ use and they _will_ create awkward dependencies on "your" implementation detail. My favorite example is the multi year and many many dev years of effort that went in to extending EC2 instance IDs. They were already assumed/intended to be opaque until clever users inferred the structure! The simplest answer of something like block cipher is so cheap as to be free (and can be accounted for as part of versioning).

* Encoding should be tailored for teh primary UX. Ex teh base32 variants are reasonably efficient and accommodating of text selection & input. Dictionary schemes (ala S/KEY rfc2289 or BIP39) may be more appropriate for voice communication.

* Following ID structure -> opacity -> encoding you should probably account for the block size and encoding efficiency to minimize padding or excess characters

caust1c an hour ago

> it prevents entire classes of bugs where IDs get mixed up across services.

Does this really happen for people? I haven't ever seen this class of bug, and shudder to think of how it happens in code. Sure support tickets are nicer with the prefix, but how would a bug manifest in the code itself?

Also, KSUID has been around since before UUIDv7 and seems to meet all of the author's same requirements and has many client libraries already. Guess people doing research on it still aren't able to find it, or just want to do their own anyway which is cool too.

orefalo 3 hours ago

I wrote an article comparing different GUID implementations and also prepared a clear spreadsheet with side‑by‑side implementation comparisons.

https://medium.com/@orefalo_66733/globally-unique-identifier...

Looking at your implementation, I like the clean split between shard, tenant, and sequence.

However, this results in a 160‑bit format, which does not fit natively in most databases, as they usually use the UUID type. I also find 60 bits of randomness to be low (ULID also uses 60).

Last point, using a GUID is not only for sharding. It is also important for protecting against predictability, which beyond the GUID structure, requires using the right approved crypto‑safe random generator.

CGamesPlay 4 hours ago

The checksum idea is interesting, but why make it a tack-on at the end? Taking 20 random bits to use for a mandatory checksum seems like an interesting trade-off.

theoli 6 hours ago

Epoch shift with 48-bit timestamp that has >12,000 years of range to get another 50 years of range is an amusing choice.

mrkeen 3 hours ago

> The old auto-increment IDs were totally fine—until suddenly they weren’t, because multiple shards couldn’t share the same global counter anymore.

> Their workaround was simple and surprisingly effective: they offset new IDs by a huge constant—roughly a billion. Old IDs stayed below the threshold, new IDs lived above it, and nothing collided. It worked surprisingly well, but it also taught me something.

So what was the fix? The new numbers are bigger? I need a little more detail.

> If your system is running on a single database with moderate traffic, auto-increment is still probably the best answer. Don’t overthink it.

If autoincrement is the simplest way to do things, but breaks if you evolve the system in any conceivable way, maybe autoincrement isn't the simplest way to do things.

Isn't that the point of the article?

frutiger 6 hours ago

> ID formats aren’t just formats. They’re commitments.

Reading direct LLM output is highly cringeworthy.