I\'m working on making a URL shortener for my site, and my current plan (I\'m open to suggestions) is to use a node ID to generate the shortened URL. So, in theory, node 26
What about BASE58 encoding the URL? Like for example flickr does.
# note the missing lowercase L and the zero etc.
BASE58 = '123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ'
url = ''
while node_id >= 58:
div, mod = divmod(node_id, 58)
url = BASE58[mod] + url
node_id = int(div)
return 'http://short.com/%s' % BASE58[node_id] + url
Turning that back into a number isn't a big deal either.
Use hex(id)[2:]
and int(urlpart, 16)
. There are other options. base32 encoding your id could work as well, but I don't know that there's any library that does base32 encoding built into Python.
Apparently a base32 encoder was introduced in Python 2.4 with the base64 module. You might try using b32encode
and b32decode
. You should give True
for both the casefold
and map01
options to b32decode
in case people write down your shortened URLs.
Actually, I take that back. I still think base32 encoding is a good idea, but that module is not useful for the case of URL shortening. You could look at the implementation in the module and make your own for this specific case. :-)
If multiple characters are bound inside a single integer/long, as was my issue:
s = '0123456789'
nchars = len(s)
# string to int or long. Type depends on nchars
x = sum(ord(s[byte])<<8*(nchars-byte-1) for byte in range(nchars))
# int or long to string
''.join(chr((x>>8*(nchars-byte-1))&0xFF) for byte in range(nchars))
Yields '0123456789'
and x = 227581098929683594426425L
>>> ord("a")
97
>>> chr(97)
'a'
ASCII to int:
ord('a')
gives 97
And back to a string:
str(unichr(97))
chr(97)
gives 'a'