string.translate() with unicode data in python

后端 未结 5 464
情书的邮戳
情书的邮戳 2020-12-03 05:05

I have 3 API\'s that return json data to 3 dictionary variables. I am taking some of the values from the dictionary to process them. I read the specific values that I want t

相关标签:
5条回答
  • 2020-12-03 05:21

    The translate method work differently on Unicode objects than on byte-string objects:

    >>> help(unicode.translate)
    
    S.translate(table) -> unicode
    
    Return a copy of the string S, where all characters have been mapped
    through the given translation table, which must be a mapping of
    Unicode ordinals to Unicode ordinals, Unicode strings or None.
    Unmapped characters are left untouched. Characters mapped to None
    are deleted.
    

    So your example would become:

    remove_punctuation_map = dict((ord(char), None) for char in string.punctuation)
    word_list = [s.translate(remove_punctuation_map) for s in value_list]
    

    Note however that string.punctuation only contains ASCII punctuation. Full Unicode has many more punctuation characters, but it all depends on your use case.

    0 讨论(0)
  • 2020-12-03 05:27

    I noticed that string.translate is deprecated. Since you are removing punctuation, not actually translating characters, you can use the re.sub function.

        >>> import re
    
        >>> s1="this.is a.string, with; (punctuation)."
        >>> s1
        'this.is a.string, with; (punctuation).'
        >>> re.sub("[\.\t\,\:;\(\)\.]", "", s1, 0, 0)
        'thisis astring with punctuation'
        >>>
    
    0 讨论(0)
  • 2020-12-03 05:34

    In this version you can relatively make one's letters to other

    def trans(to_translate):
        tabin = u'привет'
        tabout = u'тевирп'
        tabin = [ord(char) for char in tabin]
        translate_table = dict(zip(tabin, tabout))
        return to_translate.translate(translate_table)
    
    0 讨论(0)
  • 2020-12-03 05:36

    As I stumbled upon the same problem and Simon's answer was the one that helped me to solve my case, I thought of showing an easier example just for clarification:

    from collections import defaultdict
    

    And then for the translation, say you'd like to remove '@' and '\r' characters:

    remove_chars_map = defaultdict()
    remove_chars_map['@'] = None
    remove_chars_map['\r'] = None
    
    new_string = old_string.translate(remove_chars_map)
    

    And an example:

    old_string = "word1@\r word2@\r word3@\r"

    new_string = "word1 word2 word3"

    '@' and '\r' removed

    0 讨论(0)
  • 2020-12-03 05:48

    Python re module allows to use a function as a replacement argument, which should take a Match object and return a suitable replacement. We may use this function to build a custom character translation function:

    import re
    
    def mk_replacer(oldchars, newchars):
        """A function to build a replacement function"""
        mapping = dict(zip(oldchars, newchars))
        def replacer(match):
            """A replacement function to pass to re.sub()"""
            return mapping.get(match.group(0), "")
        return replacer
    

    An example. Match all lower-case letters ([a-z]), translate 'h' and 'i' to 'H' and 'I' respectively, delete other matches:

    >>> re.sub("[a-z]", mk_replacer("hi", "HI"), "hail")
    'HI'
    

    As you can see, it may be used with short (incomplete) replacement sets, and it may be used to delete some characters.

    A Unicode example:

    >>> re.sub("[\W]", mk_replacer(u'\u0435\u0438\u043f\u0440\u0442\u0432', u"EIPRTV"), u'\u043f\u0440\u0438\u0432\u0435\u0442')
    u'PRIVET'
    
    0 讨论(0)
提交回复
热议问题