Case insensitive std::set of strings

后端 未结 4 1146
深忆病人
深忆病人 2021-01-01 12:44

How do you have a case insensitive insertion Or search of a string in std::set?

For example-

std::set s;
s.insert(\"Hello\");
s.in         


        
相关标签:
4条回答
  • 2021-01-01 13:14

    std::set offers the possibility of providing your own comparer (as do most std containers). You can then perform any type of comparison you like. Full example is available here

    0 讨论(0)
  • 2021-01-01 13:26

    This is a generic solution that also works with other string types than std::string (tested with std::wstring, std::string_view, char const*). Basically anything that defines a range of characters should work.

    The key point here is to use boost::as_literal that allows us to treat null-terminated character arrays, character pointers and ranges uniformly in the comparator.

    Generic code ("iset.h"):

    #pragma once
    #include <set>
    #include <algorithm>
    #include <boost/algorithm/string.hpp>
    #include <boost/range/as_literal.hpp>
    
    // Case-insensitive generic string comparator.
    struct range_iless
    {
        template< typename InputRange1, typename InputRange2 >
        bool operator()( InputRange1 const& r1, InputRange2 const& r2 ) const 
        {
            // include the standard begin() and end() aswell as any custom overloads for ADL
            using std::begin; using std::end;  
    
            // Treat null-terminated character arrays, character pointers and ranges uniformly.
            // This just creates cheap iterator ranges (it doesn't copy container arguments)!
            auto ir1 = boost::as_literal( r1 );
            auto ir2 = boost::as_literal( r2 );
    
            // Compare case-insensitively.
            return std::lexicographical_compare( 
                begin( ir1 ), end( ir1 ), 
                begin( ir2 ), end( ir2 ), 
                boost::is_iless{} );
        }
    };
    
    // Case-insensitive set for any Key that consists of a range of characters.
    template< class Key, class Allocator = std::allocator<Key> >
    using iset = std::set< Key, range_iless, Allocator >;
    

    Usage example ("main.cpp"):

    #include "iset.h"  // above header file
    #include <iostream>
    #include <string>
    #include <string_view>
    
    // Output range to stream.
    template< typename InputRange, typename Stream, typename CharT >
    void write_to( Stream& s, InputRange const& r, CharT const* sep )
    {
        for( auto const& elem : r )
            s << elem << sep;
        s << std::endl;
    }
    
    int main()
    {
        iset< std::string  >     s1{  "Hello",  "HELLO",  "world" };
        iset< std::wstring >     s2{ L"Hello", L"HELLO", L"world" };
        iset< char const*  >     s3{  "Hello",  "HELLO",  "world" };
        iset< std::string_view > s4{  "Hello",  "HELLO",  "world" };
    
        write_to( std::cout,  s1,  " " );    
        write_to( std::wcout, s2, L" " );    
        write_to( std::cout,  s3,  " " );    
        write_to( std::cout,  s4,  " " );    
    }
    

    Live Demo at Coliru

    0 讨论(0)
  • 2021-01-01 13:26

    From what I have read this is more portable than stricmp() because stricmp() is not in fact part of the std library, but only implemented by most compiler vendors. As a result below is my solution to just roll your own.

    #include <string>
    #include <cctype>
    #include <iostream>
    #include <set>
    
    struct caseInsensitiveLess
    {
      bool operator()(const std::string& x, const std::string& y)
      {
        unsigned int xs ( x.size() );
        unsigned int ys ( y.size() );
        unsigned int bound ( 0 );
    
        if ( xs < ys ) 
          bound = xs; 
        else 
          bound = ys;
    
        {
          unsigned int i = 0;
          for (auto it1 = x.begin(), it2 = y.begin(); i < bound; ++i, ++it1, ++it2)
          {
            if (tolower(*it1) < tolower(*it2))
              return true;
    
            if (tolower(*it2) < tolower(*it1))
              return false;
          }
        }
        return false; 
      }
    };
    
    int main()
    {
      std::set<std::string, caseInsensitiveLess> ss1;
      std::set<std::string> ss2;
    
      ss1.insert("This is the first string");
      ss1.insert("THIS IS THE FIRST STRING");
      ss1.insert("THIS IS THE SECOND STRING");
      ss1.insert("This IS THE SECOND STRING");
      ss1.insert("This IS THE Third");
    
      ss2.insert("this is the first string");
      ss2.insert("this is the first string");
      ss2.insert("this is the second string");
      ss2.insert("this is the second string");
      ss2.insert("this is the third");
    
      for ( auto& i: ss1 )
       std::cout << i << std::endl;
    
      std::cout << std::endl;
    
      for ( auto& i: ss2 )
       std::cout << i << std::endl;
    
    }
    

    Output with case insensitive set and regular set showing the same ordering:

    This is the first string
    THIS IS THE SECOND STRING
    This IS THE Third
    
    this is the first string
    this is the second string
    this is the third
    
    0 讨论(0)
  • 2021-01-01 13:32

    You need to define a custom comparator:

    struct InsensitiveCompare { 
        bool operator() (const std::string& a, const std::string& b) const {
            return strcasecmp(a.c_str(), b.c_str()) < 0;
        }
    };
    
    std::set<std::string, InsensitiveCompare> s;
    

    You may try stricmp or strcoll if strcasecmp is not available.

    0 讨论(0)
提交回复
热议问题