How to remove accent in Python 3.5 and get a string with unicodedata or other solutions?

前端 未结 4 1813
清酒与你
清酒与你 2021-01-11 16:36

I am trying to get a string to use in google geocoding api.I ve checked a lot of threads but I am still facing problem and I don\'t understand how to solve it.

I nee

4条回答
  •  走了就别回头了
    2021-01-11 16:58

    Generally, there are two approaches: (1) regular expressions and (2) str.translate.

    1) regular expressions

    Decompose string and replace characters from the Unicode block \u0300-\u036f:

    import unicodedata
    import re
    word = unicodedata.normalize("NFD", word)
    word = re.sub("[\u0300-\u036f]", "", word)
    

    It removes accents, circumflex, diaeresis, and so on:

    pingüino > pinguino
    εἴκοσι εἶσι > εικοσι εισι
    

    For some languages, it could be another block, such as [\u0559-\u055f] for Armenian script.

    2) str.translate

    First, create replacement table (case-sensitive) and then apply it.

    repl = str.maketrans(
        "áéúíó",
        "aeuio"
    )
    word.translate(repl)
    

    Multi-char replacements are made as following:

    repl = {
        ord("æ"): "ae",
        ord("œ"): "oe",
    }
    word.translate(repl)
    

提交回复
热议问题