Normalizing unicode text to filenames, etc. in Python

前端未结

关注

 5  1028

情书的邮戳 2021-02-01 05:42

Are there any standalonenish solutions for normalizing international unicode text to safe ids and filenames in Python?

E.g. turn My International Text: åäö

5条回答

长发绾君心 (楼主)

2021-02-01 06:26

What you want to do is also known as "slugify" a string. Here's a possible solution:

import re
from unicodedata import normalize

_punct_re = re.compile(r'[\t !"#$%&\'()*\-/<=>?@\[\\\]^_`{|},.:]+')

def slugify(text, delim=u'-'):
    """Generates an slightly worse ASCII-only slug."""
    result = []
    for word in _punct_re.split(text.lower()):
        word = normalize('NFKD', word).encode('ascii', 'ignore')
        if word:
            result.append(word)
    return unicode(delim.join(result))

Usage:

>>> slugify(u'My International Text: åäö')
u'my-international-text-aao'

You can also change the delimeter:

>>> slugify(u'My International Text: åäö', delim='_')
u'my_international_text_aao'

Source: Generating Slugs

For Python 3: pastebin.com/ft7Yb3KS (thanks @MrPoxipol).

0 讨论(0)

查看其它5个回答