select 30 random rows where sum amount = x

前端 未结 7 1357
感情败类
感情败类 2020-12-10 02:39

I have a table

items
id int unsigned auto_increment primary key,
name varchar(255)
price DECIMAL(6,2)

I want to get at least 30 random ite

相关标签:
7条回答
  • 2020-12-10 03:35

    I'm suprised that nobody suggested, for the record, the brute force solution:

    SELECT 
        i1.id, 
        i2.id, 
        ..., 
        i30.id, 
        i1.price + i2.price + ... + i30.price
    FROM items i1 
    INNER JOIN items i2 ON i2.id NOT IN (i1.id)
    ...
    INNER JOIN items i30 ON i30.id NOT IN (i1.id, i2.id, ..., i29.id)
    ORDER BY ABS(x - (i1.price + i2.price + ... + i30.price))
    

    Such a request may be generated by a program to avoid mistakes. That's almost a joke, because the time is O(n^30) (the generic https://en.wikipedia.org/wiki/Subset_sum_problem is NP complete, but if you fix the size of the subset, it is not. ), but it is possible and may have a meaning for precomputations. When the set of prices does not change, use a precomputed set of prices and find random items that have thoses prices.

    There is a dynamic programming solution (see Wikipedia), but it may take too long for your needs. There is also a polynomial time approximate algorithm, but the naive implementation would be O(n) in queries (I didn't search another implementation).

    I propose another possibility, without the assumptions from Jannes Botis The principle is a greedy "hill climbing", with some retreats because the greedy method won't fit to every situation.

    First of all, a summary: take the total of the 30 cheapest items, then progress as fast as possible to x (be greedy) by replacing cheap items by expensive ones; if you outreach x, then make a maximum step back and resume the climb unless you are done or tired.

    And now, the details (should use PHP + MySQL, and not only MySQL):

    Let N = 30

    Step 1: initialization

    Sort items by ascending price and select the first N ones

    • It the total price is x, you are done.
    • If the total price is more than x, give up: you can't produce a total equal to x.
    • Else continue with the N cheapest items.

    With a B-tree index on prices, it should be fast

    Step 2: climb

    Thus, x - total > 0, and we want the difference to be the closest to 0.

    Select every pair of items (with a join) where:

    1. the first item i1 is in the N selected items
    2. the second item i2 is not in the N selected items,
    3. the price of i1 is more than the price of i2: p1 - p2 > 0.
    4. (x - total) - (p1 - p2) >= 0

    Order the result by ascending (x - total) - (p1 - p2).

    • If there is no matching row, there are two cases (so maybe use two queries if you allow N to grow):

      1. no items so that p1 - p2 > 0: increase N and add the item with the lowest price. If N == n, you can't reach x, else go to step 2.
      2. no items so that (x - total) - (p1 - p2) >= 0: you will outreach the limit x. Go to step 3.
    • Else take the first row (the closest to the peak) and replace i1 by i2 in the items: the new total is total - p1 + p2, and now x - total >= 0 and you are closer to 0.

      • If it is zero, then we are done.
      • Else loop to step 2.

    *The join will take some O(n): N items i1 * [(n-N) items i2 minus the one with p2 > p1]*

    Step 3: retreat

    There are many way to retreat. Here's one.

    • If you have just retreated, give up: you're stuck.
    • If you have already retreated more than n times or if you are close enough to 0, you may give up. This avoids endless loops.
    • Else: Remove the item with the maximum price of the list, and replace it by the item that is not in the list with the minimum price (max and min to ensure you go down enough). Then update total and go back to step 2.

    With a B-tree index on prices, it should be fast

    I hope this is clear. You can tune it to decide when you have done enough and use a precomputed set of 30 items with a total price of x. I believe the time complexity is O(n) in average case. I did some tests (python + sqlite) with 200 items, random prices between 0 and 1000 and no retreat. On 1000 tests, 22 failures to reach 5000 (0.44%), 708 successes in 3 tries, 139 successes in 4 tries, 126 successes in 3 tries, 4 successes in 5 tries and 1 success in 1 try (a "try" is the try of a set of items different from the 30 cheapest items: k tries means times the query of step 2). This will depend on the number of items, the prices, ...

    You can also make variations, e.g. start with a random set of items and try to narrow x, oscillate around x instead of retreating, ...

    0 讨论(0)
提交回复
热议问题