How to sort file names with numbers and alphabets in order in C?

后端未结

关注

 6  1534

I have used the following code to sort files in alphabetical order and it sorts the files as shown in the figure:

for(int i = 0;i < maxcnt;i++) 
{
    fo


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2020-12-09 12:06
              
            
            
                                                                       
Taking into account that this has a c++ tag, you could elaborate on @Joseph Quinsey's answer and create a natural_less function to be passed to the standard library.

using namespace std;

bool natural_less(const string& lhs, const string& rhs)
{
    return strcasecmp_withNumbers(lhs.c_str(), rhs.c_str()) < 0;
}

void example(vector<string>& data)
{
    std::sort(data.begin(), data.end(), natural_less);
}


I took the time to write some working code as an exercise
https://github.com/kennethlaskoski/natural_less
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  终归单人心        
                
              
                            
                2020-12-09 12:06
              
            
            
                                                                       
Modifying this answer:

bool compareNat(const std::string& a, const std::string& b){
    if (a.empty())
        return true;
    if (b.empty())
        return false;
    if (std::isdigit(a[0]) && !std::isdigit(b[0]))
        return true;
    if (!std::isdigit(a[0]) && std::isdigit(b[0]))
        return false;
    if (!std::isdigit(a[0]) && !std::isdigit(b[0]))
    {
        if (a[0] == b[0])
            return compareNat(a.substr(1), b.substr(1));
        return (toUpper(a) < toUpper(b));
        //toUpper() is a function to convert a std::string to uppercase.
    }

    // Both strings begin with digit --> parse both numbers
    std::istringstream issa(a);
    std::istringstream issb(b);
    int ia, ib;
    issa >> ia;
    issb >> ib;
    if (ia != ib)
        return ia < ib;

    // Numbers are the same --> remove numbers and recurse
    std::string anew, bnew;
    std::getline(issa, anew);
    std::getline(issb, bnew);
    return (compareNat(anew, bnew));
}


toUpper() function:

std::string toUpper(std::string s){
    for(int i=0;i<(int)s.length();i++){s[i]=toupper(s[i]);}
    return s;
    }


Usage:

#include <iostream> // std::cout
#include <string>
#include <algorithm> // std::sort, std::copy
#include <iterator> // std::ostream_iterator
#include <sstream> // std::istringstream
#include <vector>
#include <cctype> // std::isdigit

int main()
{
    std::vector<std::string> str;
    str.push_back("20.txt");
    str.push_back("10.txt");
    str.push_back("1.txt");
    str.push_back("z2.txt");
    str.push_back("z10.txt");
    str.push_back("z100.txt");
    str.push_back("1_t.txt");
    str.push_back("abc.txt");
    str.push_back("Abc.txt");
    str.push_back("bcd.txt");

    std::sort(str.begin(), str.end(), compareNat);
    std::copy(str.begin(), str.end(),
              std::ostream_iterator<std::string>(std::cout, "\n"));
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南笙        
                
              
                            
                2020-12-09 12:14
              
            
            
                                                                       
Your problem is that you have an interpretation behind parts of the file name.

In lexicographical order, Slide1 is before Slide10 which is before Slide5. 

You expect Slide5 before Slide10 as you have an interpretation of the substrings 5 and 10 (as integers). 

You will run into more problems, if you had the
name of the month in the filename, and would expect them to be ordered by date (i.e. January comes before August). You will need to adjust your sorting to this interpretation (and the "natural" order will depend on your interpretation, there is no generic solution).

Another approach is to format the filenames in a way that your sorting and the lexicographical order agree. In your case, you would use leading zeroes and a fixed length for the number. So Slide1 becomes Slide01, and then you will see that sorting them lexicographically will yield the result you would like to have.

However, often you cannot influence the output of an application, and thus cannot enforce your format directly. 

What I do in those cases: write a little script/function that renames the file to a proper format, and then use standard sorting algorithms to sort them. The advantage of this is that you do not need to adapt your sorting, and can use existing software for the sorting.
On the downside, there are situations where this is not feasible (as filenames need to be fixed).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  终归单人心        
                
              
                            
                2020-12-09 12:19
              
            
            
                                                                       
Natural sorting is the way that you must take here . I have a working code for my scenario. You probably can make use of it by altering it according to your needs :

    #ifndef JSW_NATURAL_COMPARE
    #define JSW_NATURAL_COMPARE
    #include <string>
    int natural_compare(const char *a, const char *b);
    int natural_compare(const std::string& a, const std::string& b);
    #endif
    #include <cctype>
    namespace {
      // Note: This is a convenience for the natural_compare 
      // function, it is *not* designed for general use
      class int_span {
        int _ws;
        int _zeros;
        const char *_value;
        const char *_end;
      public:
        int_span(const char *src)
        {
          const char *start = src;
          // Save and skip leading whitespace
          while (std::isspace(*(unsigned char*)src)) ++src;
          _ws = src - start;
          // Save and skip leading zeros
          start = src;
          while (*src == '0') ++src;
          _zeros = src - start;
          // Save the edges of the value
          _value = src;
          while (std::isdigit(*(unsigned char*)src)) ++src;
          _end = src;
        }
        bool is_int() const { return _value != _end; }
        const char *value() const { return _value; }
        int whitespace() const { return _ws; }
        int zeros() const { return _zeros; }
        int digits() const { return _end - _value; }
        int non_value() const { return whitespace() + zeros(); }
      };
      inline int safe_compare(int a, int b)
      {
        return a < b ? -1 : a > b;
      }
    }
    int natural_compare(const char *a, const char *b)
    {
      int cmp = 0;
      while (cmp == 0 && *a != '\0' && *b != '\0') {
        int_span lhs(a), rhs(b);
        if (lhs.is_int() && rhs.is_int()) {
          if (lhs.digits() != rhs.digits()) {
            // For differing widths (excluding leading characters),
            // the value with fewer digits takes priority
            cmp = safe_compare(lhs.digits(), rhs.digits());
          }
          else {
            int digits = lhs.digits();
            a = lhs.value();
            b = rhs.value();
            // For matching widths (excluding leading characters),
            // search from MSD to LSD for the larger value
            while (--digits >= 0 && cmp == 0)
              cmp = safe_compare(*a++, *b++);
          }
          if (cmp == 0) {
            // If the values are equal, we need a tie   
            // breaker using leading whitespace and zeros
            if (lhs.non_value() != rhs.non_value()) {
              // For differing widths of combined whitespace and 
              // leading zeros, the smaller width takes priority
              cmp = safe_compare(lhs.non_value(), rhs.non_value());
            }
            else {
              // For matching widths of combined whitespace 
              // and leading zeros, more whitespace takes priority
              cmp = safe_compare(rhs.whitespace(), lhs.whitespace());
            }
          }
        }
        else {
          // No special logic unless both spans are integers
          cmp = safe_compare(*a++, *b++);
        }
      }
      // All else being equal so far, the shorter string takes priority
      return cmp == 0 ? safe_compare(*a, *b) : cmp;
    }
    #include <string>
    int natural_compare(const std::string& a, const std::string& b)
    {
      return natural_compare(a.c_str(), b.c_str());
    }

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  礼貌的吻别        
                
              
                            
                2020-12-09 12:22
              
            
            
                                                                       
What you want to do is perform "Natural Sort". Here is a blog post about it, explaining implementation in python I believe. Here is a perl module that accomplishes it. There also seems to be a similar question at How to implement a natural sort algorithm in c++?
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  走了就别回头了        
                
              
                            
                2020-12-09 12:27
              
            
            
                                                                       
For a C answer, the following is a replacement for strcasecmp().  This function recurses to handle strings that contain alternating numeric and non-numeric substrings. You can use it with qsort():

int strcasecmp_withNumbers(const void *void_a, const void *void_b) {
   const char *a = void_a;
   const char *b = void_b;

   if (!a || !b) { // if one doesn't exist, other wins by default
      return a ? 1 : b ? -1 : 0;
   }
   if (isdigit(*a) && isdigit(*b)) { // if both start with numbers
      char *remainderA;
      char *remainderB;
      long valA = strtol(a, &remainderA, 10);
      long valB = strtol(b, &remainderB, 10);
      if (valA != valB)
         return valA - valB;
      // if you wish 7 == 007, comment out the next two lines
      else if (remainderB - b != remainderA - a) // equal with diff lengths
         return (remainderB - b) - (remainderA - a); // set 007 before 7
      else // if numerical parts equal, recurse
         return strcasecmp_withNumbers(remainderA, remainderB);
   }
   if (isdigit(*a) || isdigit(*b)) { // if just one is a number
      return isdigit(*a) ? -1 : 1; // numbers always come first
   }
   while (*a && *b) { // non-numeric characters
      if (isdigit(*a) || isdigit(*b))
         return strcasecmp_withNumbers(a, b); // recurse
      if (tolower(*a) != tolower(*b))
         return tolower(*a) - tolower(*b);
      a++;
      b++;
   }
   return *a ? 1 : *b ? -1 : 0;
}


Notes:


Windows needs stricmp() rather than the Unix equivalent strcasecmp(). 
The above code will (obviously) give incorrect results if the numbers are really big.
Leading zeros are ignored here. In my area, this is a feature, not a bug: we usually want UAL0123 to match UAL123. But this may or may not be what you require.
See also Sort on a string that may contain a number and How to implement a natural sort algorithm in c++?, although the answers there, or in their links, are certainly long and rambling compared with the above code, by about a factor of at least four. 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复