Normalizing unicode text to filenames, etc. in Python

前端 未结 5 1028
情书的邮戳
情书的邮戳 2021-02-01 05:42

Are there any standalonenish solutions for normalizing international unicode text to safe ids and filenames in Python?

E.g. turn My International Text: åäö

5条回答
  •  长发绾君心
    2021-02-01 06:26

    What you want to do is also known as "slugify" a string. Here's a possible solution:

    import re
    from unicodedata import normalize
    
    _punct_re = re.compile(r'[\t !"#$%&\'()*\-/<=>?@\[\\\]^_`{|},.:]+')
    
    def slugify(text, delim=u'-'):
        """Generates an slightly worse ASCII-only slug."""
        result = []
        for word in _punct_re.split(text.lower()):
            word = normalize('NFKD', word).encode('ascii', 'ignore')
            if word:
                result.append(word)
        return unicode(delim.join(result))
    

    Usage:

    >>> slugify(u'My International Text: åäö')
    u'my-international-text-aao'
    

    You can also change the delimeter:

    >>> slugify(u'My International Text: åäö', delim='_')
    u'my_international_text_aao'
    

    Source: Generating Slugs

    For Python 3: pastebin.com/ft7Yb3KS (thanks @MrPoxipol).

提交回复
热议问题