Are there any standalonenish solutions for normalizing international unicode text to safe ids and filenames in Python?
E.g. turn My International Text: åäö
What you want to do is also known as "slugify" a string. Here's a possible solution:
import re
from unicodedata import normalize
_punct_re = re.compile(r'[\t !"#$%&\'()*\-/<=>?@\[\\\]^_`{|},.:]+')
def slugify(text, delim=u'-'):
"""Generates an slightly worse ASCII-only slug."""
result = []
for word in _punct_re.split(text.lower()):
word = normalize('NFKD', word).encode('ascii', 'ignore')
if word:
result.append(word)
return unicode(delim.join(result))
Usage:
>>> slugify(u'My International Text: åäö')
u'my-international-text-aao'
You can also change the delimeter:
>>> slugify(u'My International Text: åäö', delim='_')
u'my_international_text_aao'
Source: Generating Slugs
For Python 3: pastebin.com/ft7Yb3KS (thanks @MrPoxipol).