Fastest way to sort a list of number and their index

前端 未结 8 1148
萌比男神i
萌比男神i 2021-02-14 21:15

I have a question that could seem very basic, but it is in a context where \"every CPU tick counts\" (this is a part of a larger algorithm that will be used on supercomputers).<

相关标签:
8条回答
  • 2021-02-14 21:32
    struct SomeValue
    {
        unsigned long long val;
        size_t index;
        bool operator<(const SomeValue& rhs)const
        { 
           return val < rhs.val;
        }
    }
    
     #include <algorithm>
     std::vector<SomeValue> somevec;
     //fill it...
     std::sort(somevec.begin(),somevec.end());
    
    0 讨论(0)
  • 2021-02-14 21:33

    It might be worth separating numbers and indexes and then just sorting indexes, like this:

    #include <vector>
    #include <algorithm>
    #include <iostream>
    
    void PrintElements(const std::vector<unsigned long long>& numbers, const std::vector<size_t>& indexes) {
    
        std::cout << "\tNumbers:";
        for (auto i = indexes.begin(); i != indexes.end(); ++i)
            std::cout << '\t' << numbers[*i];
        std::cout << std::endl;
    
        std::cout << "\tIndexes:";
        for (auto i = indexes.begin(); i != indexes.end(); ++i)
            std::cout << '\t' << *i;
        std::cout << std::endl;
    
    }
    
    int main() {
    
        std::vector<unsigned long long> numbers;
        std::vector<size_t> indexes;
    
        numbers.reserve(4); // An overkill for this few elements, but important for billions.
        numbers.push_back(32);
        numbers.push_back(91);
        numbers.push_back(11);
        numbers.push_back(72);
    
        indexes.reserve(numbers.capacity());
        indexes.push_back(0);
        indexes.push_back(1);
        indexes.push_back(2);
        indexes.push_back(3);
    
        std::cout << "BEFORE:" << std::endl;
        PrintElements(numbers, indexes);
    
        std::sort(
            indexes.begin(),
            indexes.end(),
            [&numbers](size_t i1, size_t i2) {
                return numbers[i1] < numbers[i2];
            }
        );
    
        std::cout << "AFTER:" << std::endl;
        PrintElements(numbers, indexes);
    
        return EXIT_SUCCESS;
    
    }
    

    This prints:

    BEFORE:
            Numbers:        32      91      11      72
            Indexes:        0       1       2       3
    AFTER:
            Numbers:        11      32      72      91
            Indexes:        2       0       3       1
    

    The idea is that the elements being sorted are small and thus fast to move around during the sort. On modern CPUs however, the effects of indirect access to numbers on caching could spoil these gains, so I recommend benchmarking on realistic amounts of data before making a final decision to use it.

    0 讨论(0)
  • 2021-02-14 21:36

    The obvious starting point would be a structure with operator< defined for it:

    struct data { 
        unsigned long long int number;
        size_t index;
    };
    
    struct by_number { 
        bool operator()(data const &left, data const &right) { 
            return left.number < right.number;
        }
    };
    

    ...and an std::vector to hold the data:

     std::vector<data> items;
    

    and std::sort to do the sorting:

     std::sort(items.begin(), items.end(), by_number());
    

    The simple fact is, that the normal containers (and such) are sufficiently efficient that using them doesn't make your code substantially less efficient. You might be able to do better by writing some part in a different way, but you might about as easily do worse. Start from solid and readable, and test -- don't (attempt to) optimize prematurely.

    Edit: of course in C++11, you can use a lambda expression instead:

    std::sort(items.begin(), items.end(), 
              [](data const &a, data const &b) { return a.number < b.number; });
    

    This is generally a little more convenient to write. Readability depends--for something simple like this, I'd say sort ... by_number is pretty readable, but that depends (heavily) on the name you give to the comparison operator. The lambda makes the actual sorting criteria easier to find, so you don't need to choose a name carefully for the code to be readable.

    0 讨论(0)
  • 2021-02-14 21:39

    std::pair and std::sort fit your requirements ideally: if you put the value into the pair.first and the index in pair.second, you can simply call a sort on a vector of pairs, like this:

    // This is your original data. It does not need to be in a vector
    vector<long> orig;
    orig.push_back(10);
    orig.push_back(3);
    orig.push_back(6);
    orig.push_back(11);
    orig.push_back(2);
    orig.push_back(19);
    orig.push_back(7);
    // This is a vector of {value,index} pairs
    vector<pair<long,size_t> > vp;
    vp.reserve(orig.size());
    for (size_t i = 0 ; i != orig.size() ; i++) {
        vp.push_back(make_pair(orig[i], i));
    }
    // Sorting will put lower values ahead of larger ones,
    // resolving ties using the original index
    sort(vp.begin(), vp.end());
    for (size_t i = 0 ; i != vp.size() ; i++) {
        cout << vp[i].first << " " << vp[i].second << endl;
    }
    
    0 讨论(0)
  • 2021-02-14 21:39

    Use std::vector and std::sort. That should provided the fastest sort method. To Find the original index create a struct.

    struct A {
        int num;
        int index;
    }
    

    Then make your own compare Predicate for sort that compares the num in the struct.

    struct Predicate {
        bool operator()(const A first, const A second) {
            return first.num < second.num;
        }
    }
    

    std::sort(vec.begin(), vec.end(), Predicate())

    0 讨论(0)
  • 2021-02-14 21:42

    You might find this to be an interesting read. I would start with STL's sort and only then try and improve on it if I could. I'm not sure if you have access to a C++11 compiler (like gcc4.7) on this super computer, but I would suggest that std::sort with std::futures and std::threads would get you quite a bit of the way there with regard to parallelizing the problem in a maintainable way.

    Here is another question that compares std::sort with qsort.

    Finally, there is this article in Dr. Dobb's that compares the performance of parallel algorithms.

    0 讨论(0)
提交回复
热议问题