MongoDB: Is it possible to make a case-insensitive query?

后端 未结 24 1797
谎友^
谎友^ 2020-11-22 04:44

Example:

> db.stuff.save({\"foo\":\"bar\"});

> db.stuff.find({\"foo\":\"bar\"}).count();
1
> db.stuff.find({\"foo\":\"BAR\"}).count();
0

相关标签:
24条回答
  • 2020-11-22 05:11

    Starting with MongoDB 3.4, the recommended way to perform fast case-insensitive searches is to use a Case Insensitive Index.

    I personally emailed one of the founders to please get this working, and he made it happen! It was an issue on JIRA since 2009, and many have requested the feature. Here's how it works:

    A case-insensitive index is made by specifying a collation with a strength of either 1 or 2. You can create a case-insensitive index like this:

    db.cities.createIndex(
      { city: 1 },
      { 
        collation: {
          locale: 'en',
          strength: 2
        }
      }
    );
    

    You can also specify a default collation per collection when you create them:

    db.createCollection('cities', { collation: { locale: 'en', strength: 2 } } );
    

    In either case, in order to use the case-insensitive index, you need to specify the same collation in the find operation that was used when creating the index or the collection:

    db.cities.find(
      { city: 'new york' }
    ).collation(
      { locale: 'en', strength: 2 }
    );
    

    This will return "New York", "new york", "New york" etc.

    Other notes

    • The answers suggesting to use full-text search are wrong in this case (and potentially dangerous). The question was about making a case-insensitive query, e.g. username: 'bill' matching BILL or Bill, not a full-text search query, which would also match stemmed words of bill, such as Bills, billed etc.

    • The answers suggesting to use regular expressions are slow, because even with indexes, the documentation states:

      "Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes."

      $regex answers also run the risk of user input injection.

    0 讨论(0)
  • 2020-11-22 05:11

    I had faced a similar issue and this is what worked for me:

      const flavorExists = await Flavors.findOne({
        'flavor.name': { $regex: flavorName, $options: 'i' },
      });
    
    0 讨论(0)
  • 2020-11-22 05:14
    db.company_profile.find({ "companyName" : { "$regex" : "Nilesh" , "$options" : "i"}});
    
    0 讨论(0)
  • 2020-11-22 05:14

    Mongo (current version 2.0.0) doesn't allow case-insensitive searches against indexed fields - see their documentation. For non-indexed fields, the regexes listed in the other answers should be fine.

    0 讨论(0)
  • 2020-11-22 05:16

    Using a filter works for me in C#.

    string s = "searchTerm";
        var filter = Builders<Model>.Filter.Where(p => p.Title.ToLower().Contains(s.ToLower()));
                    var listSorted = collection.Find(filter).ToList();
                    var list = collection.Find(filter).ToList();
    

    It may even use the index because I believe the methods are called after the return happens but I haven't tested this out yet.

    This also avoids a problem of

    var filter = Builders<Model>.Filter.Eq(p => p.Title.ToLower(), s.ToLower());
    

    that mongodb will think p.Title.ToLower() is a property and won't map properly.

    0 讨论(0)
  • 2020-11-22 05:18

    UPDATE:

    The original answer is now obsolete. Mongodb now supports advanced full text searching, with many features.

    ORIGINAL ANSWER:

    It should be noted that searching with regex's case insensitive /i means that mongodb cannot search by index, so queries against large datasets can take a long time.

    Even with small datasets, it's not very efficient. You take a far bigger cpu hit than your query warrants, which could become an issue if you are trying to achieve scale.

    As an alternative, you can store an uppercase copy and search against that. For instance, I have a User table that has a username which is mixed case, but the id is an uppercase copy of the username. This ensures case-sensitive duplication is impossible (having both "Foo" and "foo" will not be allowed), and I can search by id = username.toUpperCase() to get a case-insensitive search for username.

    If your field is large, such as a message body, duplicating data is probably not a good option. I believe using an extraneous indexer like Apache Lucene is the best option in that case.

    0 讨论(0)
提交回复
热议问题