How do I do a case-insensitive string comparison?

前端未结

关注

 9  2194

无人及你

How can I do case insensitive string comparison in Python?

I would like to encapsulate comparison of a regular strings to a repository string using in a very simple

相关标签:

9条回答

花落未央

2020-11-21 08:21
Section 3.13 of the Unicode standard defines algorithms for caseless matching.

X.casefold() == Y.casefold() in Python 3 implements the "default caseless matching" (D144).

Casefolding does not preserve the normalization of strings in all instances and therefore the normalization needs to be done ('å' vs. 'å'). D145 introduces "canonical caseless matching":
```
import unicodedata

def NFD(text):
    return unicodedata.normalize('NFD', text)

def canonical_caseless(text):
    return NFD(NFD(text).casefold())
```
NFD() is called twice for very infrequent edge cases involving U+0345 character.

Example:
```
>>> 'å'.casefold() == 'å'.casefold()
False
>>> canonical_caseless('å') == canonical_caseless('å')
True
```
There are also compatibility caseless matching (D146) for cases such as '㎒' (U+3392) and "identifier caseless matching" to simplify and optimize caseless matching of identifiers.
0 讨论(0)
发布评论:

提交评论
- 加载中...

面向向阳花

2020-11-21 08:22

def insenStringCompare(s1, s2):
    """ Method that takes two strings and returns True or False, based
        on if they are equal, regardless of case."""
    try:
        return s1.lower() == s2.lower()
    except AttributeError:
        print "Please only pass strings into this method."
        print "You passed a %s and %s" % (s1.__class__, s2.__class__)

0 讨论(0)

广开言路

2020-11-21 08:24

Using Python 2, calling .lower() on each string or Unicode object...

string1.lower() == string2.lower()

...will work most of the time, but indeed doesn't work in the situations @tchrist has described.

Assume we have a file called unicode.txt containing the two strings Σίσυφος and ΣΊΣΥΦΟΣ. With Python 2:

>>> utf8_bytes = open("unicode.txt", 'r').read()
>>> print repr(utf8_bytes)
'\xce\xa3\xce\xaf\xcf\x83\xcf\x85\xcf\x86\xce\xbf\xcf\x82\n\xce\xa3\xce\x8a\xce\xa3\xce\xa5\xce\xa6\xce\x9f\xce\xa3\n'
>>> u = utf8_bytes.decode('utf8')
>>> print u
Σίσυφος
ΣΊΣΥΦΟΣ

>>> first, second = u.splitlines()
>>> print first.lower()
σίσυφος
>>> print second.lower()
σίσυφοσ
>>> first.lower() == second.lower()
False
>>> first.upper() == second.upper()
True

The Σ character has two lowercase forms, ς and σ, and .lower() won't help compare them case-insensitively.

However, as of Python 3, all three forms will resolve to ς, and calling lower() on both strings will work correctly:

>>> s = open('unicode.txt', encoding='utf8').read()
>>> print(s)
Σίσυφος
ΣΊΣΥΦΟΣ

>>> first, second = s.splitlines()
>>> print(first.lower())
σίσυφος
>>> print(second.lower())
σίσυφος
>>> first.lower() == second.lower()
True
>>> first.upper() == second.upper()
True

So if you care about edge-cases like the three sigmas in Greek, use Python 3.

(For reference, Python 2.7.3 and Python 3.3.0b1 are shown in the interpreter printouts above.)

0 讨论(0)

终归单人心

2020-11-21 08:24

How about converting to lowercase first? you can use string.lower().

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉梦人生

2020-11-21 08:30
This is another regex which I have learned to love/hate over the last week so usually import as (in this case yes) something that reflects how im feeling! make a normal function.... ask for input, then use ....something = re.compile(r'foo*|spam*', yes.I)...... re.I (yes.I below) is the same as IGNORECASE but you cant make as many mistakes writing it!

You then search your message using regex's but honestly that should be a few pages in its own , but the point is that foo or spam are piped together and case is ignored. Then if either are found then lost_n_found would display one of them. if neither then lost_n_found is equal to None. If its not equal to none return the user_input in lower case using "return lost_n_found.lower()"

This allows you to much more easily match up anything thats going to be case sensitive. Lastly (NCS) stands for "no one cares seriously...!" or not case sensitive....whichever

if anyone has any questions get me on this..
```
    import re as yes

    def bar_or_spam():

        message = raw_input("\nEnter FoO for BaR or SpaM for EgGs (NCS): ") 

        message_in_coconut = yes.compile(r'foo*|spam*',  yes.I)

        lost_n_found = message_in_coconut.search(message).group()

        if lost_n_found != None:
            return lost_n_found.lower()
        else:
            print ("Make tea not love")
            return

    whatz_for_breakfast = bar_or_spam()

    if whatz_for_breakfast == foo:
        print ("BaR")

    elif whatz_for_breakfast == spam:
        print ("EgGs")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
失恋的感觉

2020-11-21 08:33
The usual approach is to uppercase the strings or lower case them for the lookups and comparisons. For example:
```
>>> "hello".upper() == "HELLO".upper()
True
>>> 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页