How to sort file names with numbers and alphabets in order in C?

后端 未结 6 1534
青春惊慌失措
青春惊慌失措 2020-12-09 11:47

I have used the following code to sort files in alphabetical order and it sorts the files as shown in the figure:

for(int i = 0;i < maxcnt;i++) 
{
    fo         


        
相关标签:
6条回答
  • 2020-12-09 12:06

    Taking into account that this has a c++ tag, you could elaborate on @Joseph Quinsey's answer and create a natural_less function to be passed to the standard library.

    using namespace std;
    
    bool natural_less(const string& lhs, const string& rhs)
    {
        return strcasecmp_withNumbers(lhs.c_str(), rhs.c_str()) < 0;
    }
    
    void example(vector<string>& data)
    {
        std::sort(data.begin(), data.end(), natural_less);
    }
    

    I took the time to write some working code as an exercise https://github.com/kennethlaskoski/natural_less

    0 讨论(0)
  • 2020-12-09 12:06

    Modifying this answer:

    bool compareNat(const std::string& a, const std::string& b){
        if (a.empty())
            return true;
        if (b.empty())
            return false;
        if (std::isdigit(a[0]) && !std::isdigit(b[0]))
            return true;
        if (!std::isdigit(a[0]) && std::isdigit(b[0]))
            return false;
        if (!std::isdigit(a[0]) && !std::isdigit(b[0]))
        {
            if (a[0] == b[0])
                return compareNat(a.substr(1), b.substr(1));
            return (toUpper(a) < toUpper(b));
            //toUpper() is a function to convert a std::string to uppercase.
        }
    
        // Both strings begin with digit --> parse both numbers
        std::istringstream issa(a);
        std::istringstream issb(b);
        int ia, ib;
        issa >> ia;
        issb >> ib;
        if (ia != ib)
            return ia < ib;
    
        // Numbers are the same --> remove numbers and recurse
        std::string anew, bnew;
        std::getline(issa, anew);
        std::getline(issb, bnew);
        return (compareNat(anew, bnew));
    }
    

    toUpper() function:

    std::string toUpper(std::string s){
        for(int i=0;i<(int)s.length();i++){s[i]=toupper(s[i]);}
        return s;
        }
    

    Usage:

    #include <iostream> // std::cout
    #include <string>
    #include <algorithm> // std::sort, std::copy
    #include <iterator> // std::ostream_iterator
    #include <sstream> // std::istringstream
    #include <vector>
    #include <cctype> // std::isdigit
    
    int main()
    {
        std::vector<std::string> str;
        str.push_back("20.txt");
        str.push_back("10.txt");
        str.push_back("1.txt");
        str.push_back("z2.txt");
        str.push_back("z10.txt");
        str.push_back("z100.txt");
        str.push_back("1_t.txt");
        str.push_back("abc.txt");
        str.push_back("Abc.txt");
        str.push_back("bcd.txt");
    
        std::sort(str.begin(), str.end(), compareNat);
        std::copy(str.begin(), str.end(),
                  std::ostream_iterator<std::string>(std::cout, "\n"));
    }
    
    0 讨论(0)
  • 2020-12-09 12:14

    Your problem is that you have an interpretation behind parts of the file name.

    In lexicographical order, Slide1 is before Slide10 which is before Slide5.

    You expect Slide5 before Slide10 as you have an interpretation of the substrings 5 and 10 (as integers).

    You will run into more problems, if you had the name of the month in the filename, and would expect them to be ordered by date (i.e. January comes before August). You will need to adjust your sorting to this interpretation (and the "natural" order will depend on your interpretation, there is no generic solution).

    Another approach is to format the filenames in a way that your sorting and the lexicographical order agree. In your case, you would use leading zeroes and a fixed length for the number. So Slide1 becomes Slide01, and then you will see that sorting them lexicographically will yield the result you would like to have.

    However, often you cannot influence the output of an application, and thus cannot enforce your format directly.

    What I do in those cases: write a little script/function that renames the file to a proper format, and then use standard sorting algorithms to sort them. The advantage of this is that you do not need to adapt your sorting, and can use existing software for the sorting. On the downside, there are situations where this is not feasible (as filenames need to be fixed).

    0 讨论(0)
  • 2020-12-09 12:19

    Natural sorting is the way that you must take here . I have a working code for my scenario. You probably can make use of it by altering it according to your needs :

        #ifndef JSW_NATURAL_COMPARE
        #define JSW_NATURAL_COMPARE
        #include <string>
        int natural_compare(const char *a, const char *b);
        int natural_compare(const std::string& a, const std::string& b);
        #endif
        #include <cctype>
        namespace {
          // Note: This is a convenience for the natural_compare 
          // function, it is *not* designed for general use
          class int_span {
            int _ws;
            int _zeros;
            const char *_value;
            const char *_end;
          public:
            int_span(const char *src)
            {
              const char *start = src;
              // Save and skip leading whitespace
              while (std::isspace(*(unsigned char*)src)) ++src;
              _ws = src - start;
              // Save and skip leading zeros
              start = src;
              while (*src == '0') ++src;
              _zeros = src - start;
              // Save the edges of the value
              _value = src;
              while (std::isdigit(*(unsigned char*)src)) ++src;
              _end = src;
            }
            bool is_int() const { return _value != _end; }
            const char *value() const { return _value; }
            int whitespace() const { return _ws; }
            int zeros() const { return _zeros; }
            int digits() const { return _end - _value; }
            int non_value() const { return whitespace() + zeros(); }
          };
          inline int safe_compare(int a, int b)
          {
            return a < b ? -1 : a > b;
          }
        }
        int natural_compare(const char *a, const char *b)
        {
          int cmp = 0;
          while (cmp == 0 && *a != '\0' && *b != '\0') {
            int_span lhs(a), rhs(b);
            if (lhs.is_int() && rhs.is_int()) {
              if (lhs.digits() != rhs.digits()) {
                // For differing widths (excluding leading characters),
                // the value with fewer digits takes priority
                cmp = safe_compare(lhs.digits(), rhs.digits());
              }
              else {
                int digits = lhs.digits();
                a = lhs.value();
                b = rhs.value();
                // For matching widths (excluding leading characters),
                // search from MSD to LSD for the larger value
                while (--digits >= 0 && cmp == 0)
                  cmp = safe_compare(*a++, *b++);
              }
              if (cmp == 0) {
                // If the values are equal, we need a tie   
                // breaker using leading whitespace and zeros
                if (lhs.non_value() != rhs.non_value()) {
                  // For differing widths of combined whitespace and 
                  // leading zeros, the smaller width takes priority
                  cmp = safe_compare(lhs.non_value(), rhs.non_value());
                }
                else {
                  // For matching widths of combined whitespace 
                  // and leading zeros, more whitespace takes priority
                  cmp = safe_compare(rhs.whitespace(), lhs.whitespace());
                }
              }
            }
            else {
              // No special logic unless both spans are integers
              cmp = safe_compare(*a++, *b++);
            }
          }
          // All else being equal so far, the shorter string takes priority
          return cmp == 0 ? safe_compare(*a, *b) : cmp;
        }
        #include <string>
        int natural_compare(const std::string& a, const std::string& b)
        {
          return natural_compare(a.c_str(), b.c_str());
        }
    
    0 讨论(0)
  • 2020-12-09 12:22

    What you want to do is perform "Natural Sort". Here is a blog post about it, explaining implementation in python I believe. Here is a perl module that accomplishes it. There also seems to be a similar question at How to implement a natural sort algorithm in c++?

    0 讨论(0)
  • 2020-12-09 12:27

    For a C answer, the following is a replacement for strcasecmp(). This function recurses to handle strings that contain alternating numeric and non-numeric substrings. You can use it with qsort():

    int strcasecmp_withNumbers(const void *void_a, const void *void_b) {
       const char *a = void_a;
       const char *b = void_b;
    
       if (!a || !b) { // if one doesn't exist, other wins by default
          return a ? 1 : b ? -1 : 0;
       }
       if (isdigit(*a) && isdigit(*b)) { // if both start with numbers
          char *remainderA;
          char *remainderB;
          long valA = strtol(a, &remainderA, 10);
          long valB = strtol(b, &remainderB, 10);
          if (valA != valB)
             return valA - valB;
          // if you wish 7 == 007, comment out the next two lines
          else if (remainderB - b != remainderA - a) // equal with diff lengths
             return (remainderB - b) - (remainderA - a); // set 007 before 7
          else // if numerical parts equal, recurse
             return strcasecmp_withNumbers(remainderA, remainderB);
       }
       if (isdigit(*a) || isdigit(*b)) { // if just one is a number
          return isdigit(*a) ? -1 : 1; // numbers always come first
       }
       while (*a && *b) { // non-numeric characters
          if (isdigit(*a) || isdigit(*b))
             return strcasecmp_withNumbers(a, b); // recurse
          if (tolower(*a) != tolower(*b))
             return tolower(*a) - tolower(*b);
          a++;
          b++;
       }
       return *a ? 1 : *b ? -1 : 0;
    }
    

    Notes:

    • Windows needs stricmp() rather than the Unix equivalent strcasecmp().
    • The above code will (obviously) give incorrect results if the numbers are really big.
    • Leading zeros are ignored here. In my area, this is a feature, not a bug: we usually want UAL0123 to match UAL123. But this may or may not be what you require.
    • See also Sort on a string that may contain a number and How to implement a natural sort algorithm in c++?, although the answers there, or in their links, are certainly long and rambling compared with the above code, by about a factor of at least four.
    0 讨论(0)
提交回复
热议问题