How to add “weights” to a MySQL table and select random values according to these?

后端 未结 6 657
粉色の甜心
粉色の甜心 2021-01-03 10:22

I want to create a table, with each row containing some sort of weight. Then I want to select random values with the probability equal to (weight of that row)/(weight of all

相关标签:
6条回答
  • 2021-01-03 11:02

    The easiest (and maybe best/safest?) way to do this is to add those rows to the table as many times as you want the weight to be - say I want "Tree" to be found 2x more often then "Dog" - I insert it 2 times into the table and I insert "Dog" once and just select elements at random one by one.

    If the rows are complex/big then it would be best to create a separate table (weighted_Elements or something) in which you'll just have foreign keys to the real rows inserted as many times as the weights dictate.

    0 讨论(0)
  • 2021-01-03 11:03

    I found this nice little algorithm in Quod Libet. You could probably translate it to some procedural SQL.

    function WeightedShuffle(list of items with weights):
      max_score ← the sum of every item’s weight
      choice ← random number in the range [0, max_score)
      current ← 0
      for each item (i, weight) in items:  
        current ← current + weight  
        if current ≥ choice or i is the last item:  
          return item i
    
    0 讨论(0)
  • 2021-01-03 11:11

    The best possible scenario (if i understand your question properly) is to setup your table as you normally would and then add two columns both INT's.

    • Column 1: Weight - This column would hold your weight value going from -X to +X, X being the highest value you want to have as a weight (IE: X=100, -100 to 100). This value is populated to give the row an actual weight and increase or decrease the probability of it coming up.

    • Column 2: *Count** - This column would hold the count of how many times this row has come up, this column is needed only if you want to use fair weighting. Fair weighting prevents one row from always showing up. (IE: if you have one row weighted at 100 and another at 2 the row with 100 will always show up, this column will allow weight 2 to be more 'valueable' as you get more weight 100 results). This column should be incremented by 1 each time a row result is pulled but you can make the logic more advanced later so it adds the weight etc.

    • Logic: - Its really simple now, your query simply has to request all rows as you normally would then make an extra select that (you can change the logic here to whatever you want) takes the weights and subtracts the count and order by that column.

    The end result should be a table where you will get your weights appearing more often until a certain point where the system will evenly distribute itself out (leave out column 2) and you will have a system that will always return the same weighted order unless you offset the base of the query (IE: LIMIT [RANDOM NUMBER], [NUMBER OF ROWS TO RETURN])

    0 讨论(0)
  • 2021-01-03 11:11

    The problem is called Reservoir Sampling (https://en.wikipedia.org/wiki/Reservoir_sampling)

    The A-Res algorithm is easy to implement in SQL:

    SELECT *
    FROM table
    ORDER BY pow(rand(), 1 / weight) DESC
    LIMIT 10;
    
    0 讨论(0)
  • 2021-01-03 11:16

    I'm not an expert in probability theory, but assuming you have a column called WEIGHT, how about

    select FIELD_1, ... FIELD_N, (rand() * WEIGHT) as SCORE
      from YOURTABLE
     order by SCORE
     limit 0, 10
    

    This would give you 10 records, but you can change the limit clause, of course.

    0 讨论(0)
  • 2021-01-03 11:24

    I came looking for the answer to the same question - I decided to come up with this:

    id      weight
    1       5
    2       1
    
    SELECT * FROM table ORDER BY RAND()/weight
    

    it's not exact - but it is using random so i might not expect exact. I ran it 70 times to get number 2 10 times. I would have expect 1/6th but i got 1/7th. I'd say that's pretty close. I'd have to run a script to do it a few thousand times to get a really good idea if it's working.

    0 讨论(0)
提交回复
热议问题