How do I do a case-insensitive string comparison?

前端 未结 9 2186
无人及你
无人及你 2020-11-21 07:46

How can I do case insensitive string comparison in Python?

I would like to encapsulate comparison of a regular strings to a repository string using in a very simple

9条回答
  •  广开言路
    2020-11-21 08:24

    Using Python 2, calling .lower() on each string or Unicode object...

    string1.lower() == string2.lower()
    

    ...will work most of the time, but indeed doesn't work in the situations @tchrist has described.

    Assume we have a file called unicode.txt containing the two strings Σίσυφος and ΣΊΣΥΦΟΣ. With Python 2:

    >>> utf8_bytes = open("unicode.txt", 'r').read()
    >>> print repr(utf8_bytes)
    '\xce\xa3\xce\xaf\xcf\x83\xcf\x85\xcf\x86\xce\xbf\xcf\x82\n\xce\xa3\xce\x8a\xce\xa3\xce\xa5\xce\xa6\xce\x9f\xce\xa3\n'
    >>> u = utf8_bytes.decode('utf8')
    >>> print u
    Σίσυφος
    ΣΊΣΥΦΟΣ
    
    >>> first, second = u.splitlines()
    >>> print first.lower()
    σίσυφος
    >>> print second.lower()
    σίσυφοσ
    >>> first.lower() == second.lower()
    False
    >>> first.upper() == second.upper()
    True
    

    The Σ character has two lowercase forms, ς and σ, and .lower() won't help compare them case-insensitively.

    However, as of Python 3, all three forms will resolve to ς, and calling lower() on both strings will work correctly:

    >>> s = open('unicode.txt', encoding='utf8').read()
    >>> print(s)
    Σίσυφος
    ΣΊΣΥΦΟΣ
    
    >>> first, second = s.splitlines()
    >>> print(first.lower())
    σίσυφος
    >>> print(second.lower())
    σίσυφος
    >>> first.lower() == second.lower()
    True
    >>> first.upper() == second.upper()
    True
    

    So if you care about edge-cases like the three sigmas in Greek, use Python 3.

    (For reference, Python 2.7.3 and Python 3.3.0b1 are shown in the interpreter printouts above.)

提交回复
热议问题