问题
I have IDs from a database, and I want them to be short and easily differentiatable by eye (i.e., two close numbers look different).
Like this:
13892359163211 -> ALO2WE7
13992351216421 -> 52NBEK3
or similar, algorithmically. So kind of like a hash, except it needs to be reversible? An encryption algorithm like AES is almost ideal, except that its outputs are way too long. (and overkill).
I'm using Python (3), although I don't think that should really matter
回答1:
New answer with 'close' numbers looking different
You could use RSA to encrypt (and later decrypt) your numbers. This is definitely overkill - but ... here is the example:
Install https://github.com/sybrenstuvel/python-rsa
(pip install rsa
)
import rsa
import rsa.core
# (pubkey, privkey) = rsa.newkeys(64) # Generate key pair
pubkey = rsa.PublicKey(n=9645943279888986023, e=65537)
privkey = rsa.PrivateKey(n=9645943279888986023, e=65537, d=7507666207464026273, p=9255782423, q=1042153201)
print("1st", rsa.core.encrypt_int(13892359163211, pubkey.e, pubkey.n))
print("2nd", rsa.core.encrypt_int(13992351216421, pubkey.e, pubkey.n))
print("1st", hex(rsa.core.encrypt_int(13892359163211, pubkey.e, pubkey.n))[2:])
print("2nd", hex(rsa.core.encrypt_int(13992351216421, pubkey.e, pubkey.n))[2:])
# If you want to compare a couple of numbers that are similar
for i in range (13892359163211, 13892359163251):
encrypted = rsa.core.encrypt_int(i, pubkey.e, pubkey.n)
# decrypted = rsa.core.decrypt_int(encrypted, privkey.d, privkey.n)
print (i, hex(encrypted)[2:], encrypted)
Please not that you cannot encrypt numbers bigger than pubkey.n
. This is a RSA related limitation. By generating a different keypair with a higher n
you can circumvent this issue. If you would like all generated numbers to have the same length, prefix them with leading zeroes. You could also consider making them uppercase for better readability. To make the displayed strings shorter consider using the base62 encoding mentioned in my old answer below.
output
1st 5427392181794576250
2nd 7543432434424555966
1st 4b51f86f0c99177a
2nd 68afa7d5110929be
input hex(encrypted) encrypted
13892359163211 4b51f86f0c99177a 5427392181794576250
13892359163212 2039f9a3f5cf5d46 2322161565485194566
13892359163213 173997b57918a6c3 1673535542221383363
13892359163214 36644663653bbb4 244958435527080884
13892359163215 c2eeec0c054e633 877901489011746355
...
Old answer related to displaying the numbers a bit shorter, not being aware that they should look substantially different
You want to change the base of your number from 10 to something bigger to use less characters. See https://stackoverflow.com/a/1119769 for an example with base 62 (a-zA-Z0-9
).
Or quick and dirty for base 16, (0-9A-F
, hexadecimal).
hex(13892359163211)[2:] # -> 'ca291220d4b'
回答2:
The problem is easier to state than it is to solve. One solution is to borrow some ideas from format-preserving encryption, but simplifying because security is not a goal. Using the Feistel cipher framework a very short and reversible "mixing" function can be written, followed by a short encoding function, to achieve something that appears to be what you want.
import hashlib
import string
mask = (1 << 22) - 1
alphabet = string.ascii_uppercase + string.digits
def func(x: int):
return int.from_bytes(hashlib.sha256(x.to_bytes(3, 'big')).digest(), 'big') & mask
def mix(id_in: int):
L, R = id_in >> 22, id_in & mask
L ^= func(R)
R ^= func(L)
return (L << 22) | R
def unmix(mixed: int):
L, R = mixed >> 22, mixed & mask
R ^= func(L)
L ^= func(R)
return (L << 22) | R
def base_n_encode(value: int):
digits = []
for i in range(9):
value, rem = divmod(value, len(alphabet))
digits.insert(0, rem)
return ''.join(alphabet[digit] for digit in digits)
def base_n_decode(encoded: str):
digits = [alphabet.index(ch) for ch in encoded]
result = 0
for digit in digits:
result = result * len(alphabet) + digit
return result
def encode(id_in: int):
return base_n_encode(mix(id_in))
def decode(encoded: str):
return unmix(base_n_decode(encoded))
if __name__ == '__main__':
e1 = encode(13892359163211)
e2 = encode(13992351216421)
print('13892359163211 -> ' + e1)
print('13992351216421 -> ' + e2)
print(e1 + ' -> ' + str(decode(e1)))
print(e2 + ' -> ' + str(decode(e2)))
Output is:
13892359163211 -> BC33VXN8A
13992351216421 -> D1UOW6SLL
BC33VXN8A -> 13892359163211
D1UOW6SLL -> 13992351216421
Note the use of sha256. This is slow and most definitely overkill, but it has the advantage of being built-in to python and thus a one-liner. Unless you are converting millions of IDs speed shouldn't be an issue, but if it is you can replace func
with something much, much faster, maybe Murmur3.
The code is written with hard-coded constants to make it a little easier to see what's going on, but it can be generalized to work with arbitrary length (in bits) IDs and arbitrary alphabets.
A more general version of this example is available on github.
回答3:
How about finding crc32
for the input and showing the result in hex?
>>> n = 13892359163211
>>>
>>> import binascii
>>> hex(binascii.crc32(str(n).encode()))[2:]
'240a831a'
回答4:
Convert the numeric ID's to binary form (3) and use an encoder (4, 5).
In [1]: import struct, base64
In [2]: i = 13892359163211
Out[2]: 13892359163211
In [3]: struct.pack('L', i)
Out[3]: b'K\r"\x91\xa2\x0c\x00\x00'
In [4]: base64.b85encode(struct.pack('L', i)).decode('ascii')
Out[4]: 'OAR8Cq6`24'
In [5]: base64.b64encode(struct.pack('L', i)).decode('ascii')[:-1]
Out[5]: 'Sw0ikaIMAAA'
Which encoder to use depends on which characters you want to allow.
回答5:
You can use CrypII idea to convert from integer to base64. This will be the shortest
- 13892359163211 is
4LWL
and - 13992351216421 is
64yl
来源:https://stackoverflow.com/questions/57624017/algorithm-to-turn-numeric-ids-in-to-short-different-alphanumeric-codes