Check that a string contains only ASCII characters?

后端 未结 4 1805
执笔经年
执笔经年 2020-12-29 09:27

How do I check that a string only contains ASCII characters in Python? Something like Ruby\'s ascii_only?

I want to be able to tell whether string speci

相关标签:
4条回答
  • 2020-12-29 10:18

    If you have unicode strings you can use the "encode" function and then catch the exception:

    try:
        mynewstring = mystring.encode('ascii')
    except UnicodeEncodeError:
        print("there are non-ascii characters in there")
    

    If you have bytes, you can import the chardet module and check the encoding:

    import chardet
    
    # Get the encoding
    enc = chardet.detect(mystring)['encoding']
    
    0 讨论(0)
  • 2020-12-29 10:19

    You can also opt for regex to check for only ascii characters. [\x00-\x7F] can match a single ascii character:

    >>> OnlyAscii = lambda s: re.match('^[\x00-\x7F]+$', s) != None
    >>> OnlyAscii('string')
    True
    >>> OnlyAscii('Tannh‰user')
    False
    
    0 讨论(0)
  • 2020-12-29 10:21

    In Python 3.7 were added methods which do what you want:

    str, bytes, and bytearray gained support for the new isascii() method, which can be used to test if a string or bytes contain only the ASCII characters.


    Otherwise:

    >>> all(ord(char) < 128 for char in 'string')
    >>> True
    
    >>> all(ord(char) < 128 for char in 'строка')
    >>> False
    

    Another version:

    >>> def is_ascii(text):
        if isinstance(text, unicode):
            try:
                text.encode('ascii')
            except UnicodeEncodeError:
                return False
        else:
            try:
                text.decode('ascii')
            except UnicodeDecodeError:
                return False
        return True
    ...
    
    >>> is_ascii('text')
    >>> True
    
    >>> is_ascii(u'text')
    >>> True
    
    >>> is_ascii(u'text-строка')
    >>> False
    
    >>> is_ascii('text-строка')
    >>> False
    
    >>> is_ascii(u'text-строка'.encode('utf-8'))
    >>> False
    
    0 讨论(0)
  • 2020-12-29 10:23

    A workaround to your problem would be to try and encode the string in a particular encoding.

    For example:

    'H€llø'.encode('utf-8')
    

    This will throw the following error:

    Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)
    

    Now you can catch the "UnicodeDecodeError" to determine that the string did not contain just the ASCII characters.

    try:
        'H€llø'.encode('utf-8')
    except UnicodeDecodeError:
        print 'This string contains more than just the ASCII characters.'
    
    0 讨论(0)
提交回复
热议问题