Encoding a numeric string into a shortened alphanumeric string, and back again

与世无争的帅哥 提交于 2019-12-09 17:17:51

问题


Quick question. I'm trying to find or write an encoder in Python to shorten a string of numbers by using upper and lower case letters. The numeric strings look something like this:

20120425161608678259146181504021022591461815040210220120425161608667

The length is always the same.

My initial thought was to write some simple encoder to utilize upper and lower case letters and numbers to shorten this string into something that looks more like this:

a26Dkd38JK

That was completely arbitrary, just trying to be as clear as possible. I'm certain that there is a really slick way to do this, probably already built in. Maybe this is an embarrassing question to even be asking.

Also, I need to be able to take the shortened string and convert it back to the longer numeric value. Should I write something and post the code, or is this a one line built in function of Python that I should already know about?

Thanks!


回答1:


This is a pretty good compression:

import base64

def num_to_alpha(num):
    num = hex(num)[2:].rstrip("L")

    if len(num) % 2:
        num = "0" + num

    return base64.b64encode(num.decode('hex'))

It first turns the integer into a bytestring and then base64 encodes it. Here's the decoder:

def alpha_to_num(alpha):
    num_bytes = base64.b64decode(alpha)
    return int(num_bytes.encode('hex'), 16)

Example:

>>> num_to_alpha(20120425161608678259146181504021022591461815040210220120425161608667)
'vw4LUVm4Ea3fMnoTkHzNOlP6Z7eUAkHNdZjN2w=='
>>> alpha_to_num('vw4LUVm4Ea3fMnoTkHzNOlP6Z7eUAkHNdZjN2w==')
20120425161608678259146181504021022591461815040210220120425161608667



回答2:


There are two functions that are custom (not based on base64), but produce shorter output:

chrs = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = len(chrs)

def int_to_cust(i):
    result = ''
    while i:
        result = chrs[i % l] + result
        i = i // l
    if not result:
        result = chrs[0]
    return result

def cust_to_int(s):
    result = 0
    for char in s:
        result = result * l + chrs.find(char)
    return result

And the results are:

>>> int_to_cust(20120425161608678259146181504021022591461815040210220120425161608667)
'9F9mFGkji7k6QFRACqLwuonnoj9SqPrs3G3fRx'
>>> cust_to_int('9F9mFGkji7k6QFRACqLwuonnoj9SqPrs3G3fRx')
20120425161608678259146181504021022591461815040210220120425161608667L

You can also shorten the generated string, if you add other characters to the chrs variable.




回答3:


>>> s="20120425161608678259146181504021022591461815040210220120425161608667"
>>> import base64, zlib
>>> base64.b64encode(zlib.compress(s))
'eJxly8ENACAMA7GVclGblv0X4434WrKFVW5CtJl1HyosrZKRf3hL5gLVZA2b'
>>> zlib.decompress(base64.b64decode(_))
'20120425161608678259146181504021022591461815040210220120425161608667'

so zlib isn't real smart at compressing strings of digits :(




回答4:


Do it with 'class':

VALID_CHRS = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
BASE = len(VALID_CHRS)
MAP_CHRS = {k: v
            for k, v in zip(VALID_CHRS, range(BASE + 1))}


class TinyNum:
    """Compact number representation in alphanumeric characters."""

    def __init__(self, n):
        result = ''
        while n:
            result = VALID_CHRS[n % BASE] + result
            n //= BASE
        if not result:
            result = VALID_CHRS[0]
        self.num = result

    def to_int(self):
        """Return the number as an int."""
        result = 0
        for char in self.num:
            result = result * BASE + MAP_CHRS[char]
        return result

Sample usage:

>> n = 4590823745
>> tn = TinyNum(a)
>> print(n)
4590823745
>> print(tn.num)
50GCYh
print(tn.to_int())
4590823745

(Based on Tadeck's answer.)



来源:https://stackoverflow.com/questions/10326118/encoding-a-numeric-string-into-a-shortened-alphanumeric-string-and-back-again

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!