Tinyurl-style unique code: potential algorithm to prevent collisions

狂风中的少年 提交于 2019-12-18 03:38:23

问题


I have a system that requires a unique 6-digit code to represent an object, and I'm trying to think of a good algorithm for generating them. Here are the pre-reqs:

  • I'm using a base-20 system (no caps, numbers, vowels, or l to prevent confusion and naughty words)
    • The base-20 allows 64 million combinations
  • I'll be inserting potentially 5-10 thousand entries at once, so in theory I'd use bulk inserts, which means using a unique key probably won't be efficient or pretty (especially if there starts being lots of collisions)
  • It's not out of the question to fill up 10% of the combinations so there's a high potential for lots of collisions
  • I want to make sure the codes are non-consecutive

I had an idea that sounded like it would work, but I'm not good enough at math to figure out how to implement it: if I start at 0 and increment by N, then convert to base-20, it seems like there should be some value for N that lets me count each value from 0-63,999,999 before repeating any.

For example, going from 0 through 9 using N=3 (so 10 mod 3): 0, 3, 6, 9, 2, 5, 8, 1, 4, 7.

Is there some magic math method for figuring out values of N for some larger number that is able to count through the whole range without repeating? Ideally, the number I choose would sort of jump around the set such that it wasn't obvious that there was a pattern, but I'm not sure how possible that is.

Alternatively, a hashing algorithm that guaranteed uniqueness for values 0-64 million would work, but I'm way too dumb to know if that's possible.


回答1:


All you need is a number that shares no factors with your key space. Easiest value is to use a prime number. You can google for large primes, or use http://primes.utm.edu/lists/small/10000.txt




回答2:


Any prime number which is not a factor of the length of the sequence should be able to span the sequence without repeating. For 64000000, that means you shouldn't use 2 or 5. Of course, if you don't want them to be generated consecutively, generating them 2 or 5 apart is probably also not very good. I personally like the number 73973!




回答3:


There is another method to get a similar result (jumping over the entire set of the values without repeating, nonconsequtively), without using the primes - by using maximum length sequences, which you can generate using specially constructed shift registers.




回答4:


My math is a bit rusty, but I think you just need to ensure that the GCF of N and 64 million is 1. I'd go with a prime number (that doesn't divide evenly into 64 million) just in case though.




回答5:


@Nick Lewis:

Well, only if the prime number doesn't divide 64 million. So, for the questioner's purposes, numbers like 2 or 5 would probably not be advisable.




回答6:


Don't reinvent the wheel: http://en.wikipedia.org/wiki/Universally_Unique_Identifier



来源:https://stackoverflow.com/questions/1257825/tinyurl-style-unique-code-potential-algorithm-to-prevent-collisions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!