I have used the following code to sort files in alphabetical order and it sorts the files as shown in the figure:
for(int i = 0;i < maxcnt;i++)
{
fo
Taking into account that this has a c++
tag, you could elaborate on @Joseph Quinsey's answer and create a natural_less
function to be passed to the standard library.
using namespace std;
bool natural_less(const string& lhs, const string& rhs)
{
return strcasecmp_withNumbers(lhs.c_str(), rhs.c_str()) < 0;
}
void example(vector<string>& data)
{
std::sort(data.begin(), data.end(), natural_less);
}
I took the time to write some working code as an exercise https://github.com/kennethlaskoski/natural_less
Modifying this answer:
bool compareNat(const std::string& a, const std::string& b){
if (a.empty())
return true;
if (b.empty())
return false;
if (std::isdigit(a[0]) && !std::isdigit(b[0]))
return true;
if (!std::isdigit(a[0]) && std::isdigit(b[0]))
return false;
if (!std::isdigit(a[0]) && !std::isdigit(b[0]))
{
if (a[0] == b[0])
return compareNat(a.substr(1), b.substr(1));
return (toUpper(a) < toUpper(b));
//toUpper() is a function to convert a std::string to uppercase.
}
// Both strings begin with digit --> parse both numbers
std::istringstream issa(a);
std::istringstream issb(b);
int ia, ib;
issa >> ia;
issb >> ib;
if (ia != ib)
return ia < ib;
// Numbers are the same --> remove numbers and recurse
std::string anew, bnew;
std::getline(issa, anew);
std::getline(issb, bnew);
return (compareNat(anew, bnew));
}
toUpper()
function:
std::string toUpper(std::string s){
for(int i=0;i<(int)s.length();i++){s[i]=toupper(s[i]);}
return s;
}
Usage:
#include <iostream> // std::cout
#include <string>
#include <algorithm> // std::sort, std::copy
#include <iterator> // std::ostream_iterator
#include <sstream> // std::istringstream
#include <vector>
#include <cctype> // std::isdigit
int main()
{
std::vector<std::string> str;
str.push_back("20.txt");
str.push_back("10.txt");
str.push_back("1.txt");
str.push_back("z2.txt");
str.push_back("z10.txt");
str.push_back("z100.txt");
str.push_back("1_t.txt");
str.push_back("abc.txt");
str.push_back("Abc.txt");
str.push_back("bcd.txt");
std::sort(str.begin(), str.end(), compareNat);
std::copy(str.begin(), str.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Your problem is that you have an interpretation behind parts of the file name.
In lexicographical order, Slide1
is before Slide10
which is before Slide5
.
You expect Slide5
before Slide10
as you have an interpretation of the substrings 5
and 10
(as integers).
You will run into more problems, if you had the name of the month in the filename, and would expect them to be ordered by date (i.e. January comes before August). You will need to adjust your sorting to this interpretation (and the "natural" order will depend on your interpretation, there is no generic solution).
Another approach is to format the filenames in a way that your sorting and the lexicographical order agree. In your case, you would use leading zeroes and a fixed length for the number. So Slide1
becomes Slide01
, and then you will see that sorting them lexicographically will yield the result you would like to have.
However, often you cannot influence the output of an application, and thus cannot enforce your format directly.
What I do in those cases: write a little script/function that renames the file to a proper format, and then use standard sorting algorithms to sort them. The advantage of this is that you do not need to adapt your sorting, and can use existing software for the sorting. On the downside, there are situations where this is not feasible (as filenames need to be fixed).
Natural sorting is the way that you must take here . I have a working code for my scenario. You probably can make use of it by altering it according to your needs :
#ifndef JSW_NATURAL_COMPARE
#define JSW_NATURAL_COMPARE
#include <string>
int natural_compare(const char *a, const char *b);
int natural_compare(const std::string& a, const std::string& b);
#endif
#include <cctype>
namespace {
// Note: This is a convenience for the natural_compare
// function, it is *not* designed for general use
class int_span {
int _ws;
int _zeros;
const char *_value;
const char *_end;
public:
int_span(const char *src)
{
const char *start = src;
// Save and skip leading whitespace
while (std::isspace(*(unsigned char*)src)) ++src;
_ws = src - start;
// Save and skip leading zeros
start = src;
while (*src == '0') ++src;
_zeros = src - start;
// Save the edges of the value
_value = src;
while (std::isdigit(*(unsigned char*)src)) ++src;
_end = src;
}
bool is_int() const { return _value != _end; }
const char *value() const { return _value; }
int whitespace() const { return _ws; }
int zeros() const { return _zeros; }
int digits() const { return _end - _value; }
int non_value() const { return whitespace() + zeros(); }
};
inline int safe_compare(int a, int b)
{
return a < b ? -1 : a > b;
}
}
int natural_compare(const char *a, const char *b)
{
int cmp = 0;
while (cmp == 0 && *a != '\0' && *b != '\0') {
int_span lhs(a), rhs(b);
if (lhs.is_int() && rhs.is_int()) {
if (lhs.digits() != rhs.digits()) {
// For differing widths (excluding leading characters),
// the value with fewer digits takes priority
cmp = safe_compare(lhs.digits(), rhs.digits());
}
else {
int digits = lhs.digits();
a = lhs.value();
b = rhs.value();
// For matching widths (excluding leading characters),
// search from MSD to LSD for the larger value
while (--digits >= 0 && cmp == 0)
cmp = safe_compare(*a++, *b++);
}
if (cmp == 0) {
// If the values are equal, we need a tie
// breaker using leading whitespace and zeros
if (lhs.non_value() != rhs.non_value()) {
// For differing widths of combined whitespace and
// leading zeros, the smaller width takes priority
cmp = safe_compare(lhs.non_value(), rhs.non_value());
}
else {
// For matching widths of combined whitespace
// and leading zeros, more whitespace takes priority
cmp = safe_compare(rhs.whitespace(), lhs.whitespace());
}
}
}
else {
// No special logic unless both spans are integers
cmp = safe_compare(*a++, *b++);
}
}
// All else being equal so far, the shorter string takes priority
return cmp == 0 ? safe_compare(*a, *b) : cmp;
}
#include <string>
int natural_compare(const std::string& a, const std::string& b)
{
return natural_compare(a.c_str(), b.c_str());
}
What you want to do is perform "Natural Sort". Here is a blog post about it, explaining implementation in python I believe. Here is a perl module that accomplishes it. There also seems to be a similar question at How to implement a natural sort algorithm in c++?
For a C
answer, the following is a replacement for strcasecmp()
. This function recurses to handle strings that contain alternating numeric and non-numeric substrings. You can use it with qsort()
:
int strcasecmp_withNumbers(const void *void_a, const void *void_b) {
const char *a = void_a;
const char *b = void_b;
if (!a || !b) { // if one doesn't exist, other wins by default
return a ? 1 : b ? -1 : 0;
}
if (isdigit(*a) && isdigit(*b)) { // if both start with numbers
char *remainderA;
char *remainderB;
long valA = strtol(a, &remainderA, 10);
long valB = strtol(b, &remainderB, 10);
if (valA != valB)
return valA - valB;
// if you wish 7 == 007, comment out the next two lines
else if (remainderB - b != remainderA - a) // equal with diff lengths
return (remainderB - b) - (remainderA - a); // set 007 before 7
else // if numerical parts equal, recurse
return strcasecmp_withNumbers(remainderA, remainderB);
}
if (isdigit(*a) || isdigit(*b)) { // if just one is a number
return isdigit(*a) ? -1 : 1; // numbers always come first
}
while (*a && *b) { // non-numeric characters
if (isdigit(*a) || isdigit(*b))
return strcasecmp_withNumbers(a, b); // recurse
if (tolower(*a) != tolower(*b))
return tolower(*a) - tolower(*b);
a++;
b++;
}
return *a ? 1 : *b ? -1 : 0;
}
Notes:
stricmp()
rather than the Unix equivalent strcasecmp()
.