sort array of integers lexicographically C++

前端 未结 12 1493
野的像风
野的像风 2021-02-02 13:53

I want to sort a large array of integers (say 1 millon elements) lexicographically.

Example:

input [] = { 100, 21 , 22 , 99 , 1  , 927 }
sorted[] = { 1           


        
12条回答
  •  不思量自难忘°
    2021-02-02 14:34

    Here's a community wiki to compare the solutions. I took nim's code and made it easily extensible. Feel free to add your solutions and outputs.

    Sample runs an old slow computer (3 GB RAM, Core2Duo U9400) with g++4.9 @ -O3 -march=native:

    number of elements: 1e+03
    size of integer type: 4
    
    reference solution: Lightness Races in Orbit
    
    solution "dyp":
        duration: 0 ms and 301 microseconds
        comparison to reference solution: exact match
    solution "Nim":
        duration: 2 ms and 160 microseconds
        comparison to reference solution: exact match
    solution "nyarlathotep":
        duration: 8 ms and 126 microseconds
        comparison to reference solution: exact match
    solution "notbad":
        duration: 1 ms and 102 microseconds
        comparison to reference solution: exact match
    solution "Eric Postpischil":
        duration: 2 ms and 550 microseconds
        comparison to reference solution: exact match
    solution "Lightness Races in Orbit":
        duration: 17 ms and 469 microseconds
        comparison to reference solution: exact match
    solution "pts":
        duration: 1 ms and 92 microseconds
        comparison to reference solution: exact match
    
    ==========================================================
    
    number of elements: 1e+04
    size of integer type: 4
    
    reference solution: Lightness Races in Orbit
    
    solution "nyarlathotep":
        duration: 109 ms and 712 microseconds
        comparison to reference solution: exact match
    solution "Lightness Races in Orbit":
        duration: 272 ms and 819 microseconds
        comparison to reference solution: exact match
    solution "dyp":
        duration: 1 ms and 748 microseconds
        comparison to reference solution: exact match
    solution "notbad":
        duration: 16 ms and 115 microseconds
        comparison to reference solution: exact match
    solution "pts":
        duration: 15 ms and 10 microseconds
        comparison to reference solution: exact match
    solution "Eric Postpischil":
        duration: 33 ms and 301 microseconds
        comparison to reference solution: exact match
    solution "Nim":
        duration: 17 ms and 83 microseconds
        comparison to reference solution: exact match
    
    ==========================================================
    
    number of elements: 1e+05
    size of integer type: 4
    
    reference solution: Lightness Races in Orbit
    
    solution "Nim":
        duration: 217 ms and 4 microseconds
        comparison to reference solution: exact match
    solution "pts":
        duration: 199 ms and 505 microseconds
        comparison to reference solution: exact match
    solution "dyp":
        duration: 20 ms and 330 microseconds
        comparison to reference solution: exact match
    solution "Eric Postpischil":
        duration: 415 ms and 477 microseconds
        comparison to reference solution: exact match
    solution "Lightness Races in Orbit":
        duration: 3955 ms and 58 microseconds
        comparison to reference solution: exact match
    solution "notbad":
        duration: 215 ms and 259 microseconds
        comparison to reference solution: exact match
    solution "nyarlathotep":
        duration: 1341 ms and 46 microseconds
        comparison to reference solution: mismatch found
    
    ==========================================================
    
    number of elements: 1e+06
    size of integer type: 4
    
    reference solution: Lightness Races in Orbit
    
    solution "Lightness Races in Orbit":
        duration: 52861 ms and 314 microseconds
        comparison to reference solution: exact match
    solution "Eric Postpischil":
        duration: 4757 ms and 608 microseconds
        comparison to reference solution: exact match
    solution "nyarlathotep":
        duration: 15654 ms and 195 microseconds
        comparison to reference solution: mismatch found
    solution "dyp":
        duration: 233 ms and 779 microseconds
        comparison to reference solution: exact match
    solution "pts":
        duration: 2181 ms and 634 microseconds
        comparison to reference solution: exact match
    solution "Nim":
        duration: 2539 ms and 9 microseconds
        comparison to reference solution: exact match
    solution "notbad":
        duration: 2675 ms and 362 microseconds
        comparison to reference solution: exact match
    
    ==========================================================
    
    number of elements: 1e+07
    size of integer type: 4
    
    reference solution: Lightness Races in Orbit
    
    solution "notbad":
        duration: 33425 ms and 423 microseconds
        comparison to reference solution: exact match
    solution "pts":
        duration: 26000 ms and 398 microseconds
        comparison to reference solution: exact match
    solution "Eric Postpischil":
        duration: 56206 ms and 359 microseconds
        comparison to reference solution: exact match
    solution "Lightness Races in Orbit":
        duration: 658540 ms and 342 microseconds
        comparison to reference solution: exact match
    solution "nyarlathotep":
        duration: 187064 ms and 518 microseconds
        comparison to reference solution: mismatch found
    solution "Nim":
        duration: 30519 ms and 227 microseconds
        comparison to reference solution: exact match
    solution "dyp":
        duration: 2624 ms and 644 microseconds
        comparison to reference solution: exact match
    

    The algorithms have to be structs with function-call operator templates that support the interface:

    template operator()(RaIt begin, RaIt end);
    

    A copy of the input data is provided as a parameter, the algorithm is expected to provide the result in the same range (e.g. in-place sort).

    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    
    using duration_t = decltype(  std::chrono::high_resolution_clock::now()
                                - std::chrono::high_resolution_clock::now());
    
    template
    struct result_t
    {
        std::vector numbers;
        duration_t duration;
        char const* name;
    };
    
    template
    result_t::value_type>
    apply_algorithm(RaIt p_beg, RaIt p_end, F f, char const* name)
    {
        using value_type = typename std::iterator_traits::value_type;
    
        std::vector inplace(p_beg, p_end);
    
        auto start = std::chrono::high_resolution_clock::now();
    
        f(begin(inplace), end(inplace));
    
        auto end = std::chrono::high_resolution_clock::now();
        auto duration = end - start;
    
        return {std::move(inplace), duration, name};
    }
    
    // non-optimized version
    int count_digits(int p) // returns `0` for `p == 0`
    {
            int res = 0;
            for(; p != 0; ++res)
            {
                p /= 10;
            }
            return res;
    }
    
    // non-optimized version
    int my_pow10(unsigned exp)
    {
            int res = 1;
            for(; exp != 0; --exp)
            {
                res *= 10;
            }
            return res;
    }
    
    
    // !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    // paste algorithms here
    // !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    
    
    int main(int argc, char** argv)
    {
        using integer_t = int;
        constexpr integer_t dist_min = 0;
        constexpr integer_t dist_max = std::numeric_limits::max()/10;
        constexpr std::size_t default_number_of_elements = 1E6;
    
        const std::size_t number_of_elements = argc>1 ? std::atoll(argv[1]) :
                                               default_number_of_elements;
        std::cout << "number of elements: ";
        std::cout << std::scientific << std::setprecision(0);
        std::cout << (double)number_of_elements << "\n";
        std::cout << /*std::defaultfloat <<*/ std::setprecision(6);
        std::cout.unsetf(std::ios_base::floatfield);
    
        std::cout << "size of integer type: " << sizeof(integer_t) << "\n\n";
    
        std::vector input;
        {
            input.reserve(number_of_elements);
    
            std::random_device rd;
            std::mt19937 gen( rd() );
            std::uniform_int_distribution<> dist(dist_min, dist_max);
    
            for(std::size_t i = 0; i < number_of_elements; ++i)
                input.push_back( dist(gen) );
        }
    
        auto b = begin(input);
        auto e = end(input);
    
        using res_t = result_t;
        std::vector< std::function > algorithms;
    
        #define MAKE_BINDER(B, E, ALGO, NAME) \
            std::bind( &apply_algorithm, \
                       B,E,ALGO,NAME )
        constexpr auto lightness_name = "Lightness Races in Orbit";
        algorithms.push_back( MAKE_BINDER(b, e, lightness(), lightness_name) );
        algorithms.push_back( MAKE_BINDER(b, e, dyp(), "dyp") );
        algorithms.push_back( MAKE_BINDER(b, e, nim(), "Nim") );
        algorithms.push_back( MAKE_BINDER(b, e, pts(), "pts") );
        algorithms.push_back( MAKE_BINDER(b, e, epost(), "Eric Postpischil") );
        algorithms.push_back( MAKE_BINDER(b, e, nyar(), "nyarlathotep") );
        algorithms.push_back( MAKE_BINDER(b, e, notbad(), "notbad") );
    
        {
            std::srand( std::random_device()() );
            std::random_shuffle(begin(algorithms), end(algorithms));
        }
    
        std::vector< result_t > res;
        for(auto& algo : algorithms)
            res.push_back( algo() );
    
        auto reference_solution
            = *std::find_if(begin(res), end(res),
                            [](result_t const& p)
                            { return 0 == std::strcmp(lightness_name, p.name); });
        std::cout << "reference solution: "<(e.duration);
            std::cout << "\tduration: "<

    Current algorithms; note I replaced the digit counters and pow-of-10 with the global function, so we all benefit if someone optimizes.

    struct lightness
    {
        template void operator()(RaIt b, RaIt e)
        {
            using T = typename std::iterator_traits::value_type;
    
            /**
             * Sorts the array lexicographically.
             *
             * The trick is that we have to compare digits left-to-right
             * (considering typical Latin decimal notation) and that each of
             * two numbers to compare may have a different number of digits.
             *
             * This is very efficient in storage space, but inefficient in
             * execution time; an approach that pre-visits each element and
             * stores a translated representation will at least double your
             * storage requirements (possibly a problem with large inputs)
             * but require only a single translation of each element.
             */
            std::sort(
                b,
                e,
                [](T lhs, T rhs) -> bool {
                    // Returns true if lhs < rhs
                    // Returns false otherwise
                    const auto BASE      = 10;
                    const bool LHS_FIRST = true;
                    const bool RHS_FIRST = false;
                    const bool EQUAL     = false;
    
    
                    // There's no point in doing anything at all
                    // if both inputs are the same; strict-weak
                    // ordering requires that we return `false`
                    // in this case.
                    if (lhs == rhs) {
                        return EQUAL;
                    }
    
                    // Compensate for sign
                    if (lhs < 0 && rhs < 0) {
                        // When both are negative, sign on its own yields
                        // no clear ordering between the two arguments.
                        //
                        // Remove the sign and continue as for positive
                        // numbers.
                        lhs *= -1;
                        rhs *= -1;
                    }
                    else if (lhs < 0) {
                        // When the LHS is negative but the RHS is not,
                        // consider the LHS "first" always as we wish to
                        // prioritise the leading '-'.
                        return LHS_FIRST;
                    }
                    else if (rhs < 0) {
                        // When the RHS is negative but the LHS is not,
                        // consider the RHS "first" always as we wish to
                        // prioritise the leading '-'.
                        return RHS_FIRST;
                    }
    
                    // Counting the number of digits in both the LHS and RHS
                    // arguments is *almost* trivial.
                    const auto lhs_digits = (
                        lhs == 0
                        ? 1
                        : std::ceil(std::log(lhs+1)/std::log(BASE))
                    );
    
                    const auto rhs_digits = (
                        rhs == 0
                        ? 1
                        : std::ceil(std::log(rhs+1)/std::log(BASE))
                    );
    
                    // Now we loop through the positions, left-to-right,
                    // calculating the digit at these positions for each
                    // input, and comparing them numerically. The
                    // lexicographic nature of the sorting comes from the
                    // fact that we are doing this per-digit comparison
                    // rather than considering the input value as a whole.
                    const auto max_pos = std::max(lhs_digits, rhs_digits);
                    for (auto pos = 0; pos < max_pos; pos++) {
                        if (lhs_digits - pos == 0) {
                            // Ran out of digits on the LHS;
                            // prioritise the shorter input
                            return LHS_FIRST;
                        }
                        else if (rhs_digits - pos == 0) {
                            // Ran out of digits on the RHS;
                            // prioritise the shorter input
                            return RHS_FIRST;
                        }
                        else {
                            const auto lhs_x = (lhs / static_cast(std::pow(BASE, lhs_digits - 1 - pos))) % BASE;
                            const auto rhs_x = (rhs / static_cast(std::pow(BASE, rhs_digits - 1 - pos))) % BASE;
    
                            if (lhs_x < rhs_x)
                                return LHS_FIRST;
                            else if (rhs_x < lhs_x)
                                return RHS_FIRST;
                        }
                    }
    
                    // If we reached the end and everything still
                    // matches up, then something probably went wrong
                    // as I'd have expected to catch this in the tests
                    // for equality.
                    assert("Unknown case encountered");
    
                    // dyp: suppress warning and throw
                    throw "up";
                }
            );
        }
    };
    
    namespace ndyp
    {
        // helper to provide integers with the same number of digits
        template
        std::pair lexicographic_pair_helper(T const p, U const maxDigits)
        {
            auto const digits = count_digits(p);
            // append zeros so that `l` has `maxDigits` digits
            auto const l = static_cast( p  * my_pow10(maxDigits-digits) );
            return {l, p};
        }
    
        template
        using pair_vec
            = std::vector::value_type,
                                    typename std::iterator_traits::value_type>>;
    
        template
        pair_vec lexicographic_sort(RaIt p_beg, RaIt p_end)
        {
            if(p_beg == p_end) return pair_vec{};
    
            auto max = *std::max_element(p_beg, p_end);
            auto maxDigits = count_digits(max);
    
            pair_vec result;
            result.reserve( std::distance(p_beg, p_end) );
    
            for(auto i = p_beg; i != p_end; ++i)
                result.push_back( lexicographic_pair_helper(*i, maxDigits) );
    
            using value_type = typename pair_vec::value_type;
    
            std::sort(begin(result), end(result),
                      [](value_type const& l, value_type const& r)
                      {
                          if(l.first < r.first) return true;
                          if(l.first > r.first) return false;
                          return l.second < r.second; }
                     );
    
            return result;
        }
    }
    
    struct dyp
    {
        template void operator()(RaIt b, RaIt e)
        {
            auto pairvec = ndyp::lexicographic_sort(b, e);
            std::transform(begin(pairvec), end(pairvec), b,
                           [](typename decltype(pairvec)::value_type const& e) { return e.second; });
        }
    };
    
    
    namespace nnim
    {
        bool comp(int l, int r)
        {
          int lv[10] = {}; // probably possible to get this from numeric_limits
          int rv[10] = {};
    
          int lc = 10; // ditto
          int rc = 10;
          while (l || r)
          {
            if (l)
            {
              auto t = l / 10;
              lv[--lc] = l - (t * 10);
              l = t;
            }
            if (r)
            {
              auto t = r / 10;
              rv[--rc] = r - (t * 10);
              r = t;
            }
          }
          while (lc < 10 && rc < 10)
          {
            if (lv[lc] == rv[rc])
            {
              lc++;
              rc++;
            }
            else
              return lv[lc] < rv[rc];
          }
          return lc > rc;
        }
    }
    
    struct nim
    {
        template void operator()(RaIt b, RaIt e)
        {
            std::sort(b, e, nnim::comp);
        }
    };
    
    struct pts
    {
            template static bool lex_less(T a, T b) {
              unsigned la = 1, lb = 1;
              for (T t = a; t > 9; t /= 10) ++la;
              for (T t = b; t > 9; t /= 10) ++lb;
              const bool ll = la < lb;
              while (la > lb) { b *= 10; ++lb; }
              while (lb > la) { a *= 10; ++la; }
              return a == b ? ll : a < b;
            }
    
            template void operator()(RaIt b, RaIt e)
        {
            std::sort(b, e, lex_less::value_type>);
        }
    };
    
    struct epost
    {
            static bool compare(int x, int y)
            {
                    static const double limit = .5 * (log(INT_MAX) - log(INT_MAX-1));
    
                    double lx = log10(x);
                    double ly = log10(y);
                    double fx = lx - floor(lx);  // Get the mantissa of lx.
                    double fy = ly - floor(ly);  // Get the mantissa of ly.
                    return fabs(fx - fy) < limit ? lx < ly : fx < fy;
            }
    
            template void operator()(RaIt b, RaIt e)
        {
            std::sort(b, e, compare);
        }
    };
    
    struct nyar
    {
            static bool lexiSmaller(int i1, int i2)
            {
                int digits1 = count_digits(i1);
                int digits2 = count_digits(i2);
    
                double val1 = i1/pow(10.0, digits1-1);
                double val2 = i2/pow(10.0, digits2-1);
    
                while (digits1 > 0 && digits2 > 0 && (int)val1 == (int)val2)
                {
                    digits1--;
                    digits2--;
                    val1 = (val1 - (int)val1)*10;
                    val2 = (val2 - (int)val2)*10;
                }
                if (digits1 > 0 && digits2 > 0)
                {
                    return (int)val1 < (int)val2;
                }
                return (digits2 > 0);
            }
    
            template void operator()(RaIt b, RaIt e)
        {
            std::sort(b, e, lexiSmaller);
        }
    };
    
    struct notbad
    {
            static int up_10pow(int n) {
              int ans = 1;
              while (ans < n) ans *= 10;
              return ans;
            }
    
            static bool compare(int v1, int v2) {
               int ceil1 = up_10pow(v1), ceil2 = up_10pow(v2);
               while ( ceil1 != 0 && ceil2 != 0) {
                  if (v1 / ceil1  < v2 / ceil2) return true;
                  else if (v1 / ceil1 > v2 / ceil2) return false;
                  ceil1 /= 10;
                  ceil2 /= 10;
               }
               if (v1 < v2) return true;
               return false;
            }
    
            template void operator()(RaIt b, RaIt e)
        {
            std::sort(b, e, compare);
        }
    };
    

提交回复
热议问题