Radix Sort on an Array of Strings?

后端 未结 2 1932
闹比i
闹比i 2021-01-03 14:13

I\'ve been researching around, and while I\'ve figured out the general idea of using Radix Sort to alphabetize an array of strings, I know I\'m going the wrong direction.

2条回答
  •  囚心锁ツ
    2021-01-03 14:43

    The slides you've found are great! But where did those queues come from in your code?

    Anyway, here you are (live example):

    template 
    size_t bin(const E& elem, size_t digit)
    {
        return elem.size() > digit ? size_t(elem[digit]) + 1 : 0;
    }
    
    template 
    void radix_sort(P& pos, const C& data, size_t digit)
    {
        using A = std::array;
        A count = {};
        P prev(pos);
    
        for (auto i : prev)
            ++count[bin(data[i], digit)];
    
        A done = {}, offset = {{0}};
        std::partial_sum(count.begin(), count.end() - 1, offset.begin() + 1);
    
        for (auto i : prev)
        {
            size_t b = bin(data[i], digit);
            pos[offset[b] + done[b]++] = i;
        }
    }
    
    struct shorter
    {
        template 
        bool operator()(const A& a, const A& b) { return a.size() < b.size(); }
    };
    
    template 
    std::vector radix_sort(const C& data)
    {
        std::vector pos(data.size());
        std::iota(pos.begin(), pos.end(), 0);
    
        size_t width = std::max_element(data.begin(), data.end(), shorter())->size();
    
        for (long digit = long(width) - 1; digit >= 0; --digit)
            radix_sort(pos, data, size_t(digit));
    
        return pos;
    }
    

    which you can use like that

    int main()
    {
        std::vector data = generate();
        std::vector pos = radix_sort<128>(data);
        for (auto i : pos)
            std::cout << data[i] << std::endl;
    }
    

    where generate() is included in the live example and generates the strings given in your question.

    I am not trying to explain how this works here, I assume you can figure out since you are working on the problem. But a few comments are in order.

    • We are neither sorting the input sequence in-place, nor returning a sorted copy; we are just returning a sequence of positions of input elements in the sorted sequence.

    • We are processing strings from right to left.

    • The complexity is O(lw) where l is the input length (number of input strings) and w is the maximum input width (max. length of all input strings). So this algorithm makes sense if the string width does not vary too much.

    • The first template parameter R of radix_sort() is the number of possible values for each digit (letter) in the input. E.g. R = 128 means that possible values are 0..127. This should be fine for your input. I haven't tried to do anything clever with respect to ASCII codes, but you can customize function bin() for that.

    • In the output of bin(), value 0 is reserved to mean "we are past the end of this string". Such strings are placed before others that are still continuing.

    • I have tried to give self-explanatory names to variables and functions, and use standard library calls for common tasks where possible.

    • The code is generic, e.g. it can sort any random access container containing random access containers, not just vectors of strings.

    • I am using C++11 features here and there for convenience, but nothing is really necessary: one could easily do the same just with C++03.

提交回复
热议问题