Redis `SCAN`: how to maintain a balance between newcomming keys that might match and ensure eventual result in a reasonable time?

后端 未结 1 943
情歌与酒
情歌与酒 2020-12-04 02:42

I am not that familiar with Redis. At the moment I am designing some realtime service and I\'d like to rely on it. I expect ~10000-50000 keys per minute to be <

相关标签:
1条回答
  • 2020-12-04 03:18

    First some context, solution at the end:

    From SCAN command > Guarantee of termination

    The SCAN algorithm is guaranteed to terminate only if the size of the iterated collection remains bounded to a given maximum size, otherwise iterating a collection that always grows may result into SCAN to never terminate a full iteration.

    This is easy to see intuitively: if the collection grows there is more and more work to do in order to visit all the possible elements, and the ability to terminate the iteration depends on the number of calls to SCAN and its COUNT option value compared with the rate at which the collection grows.

    But in The COUNT option it says:

    Important: there is no need to use the same COUNT value for every iteration. The caller is free to change the count from one iteration to the other as required, as long as the cursor passed in the next call is the one obtained in the previous call to the command.

    Important to keep in mind, from Scan guarantees:

    • A given element may be returned multiple times. It is up to the application to handle the case of duplicated elements, for example only using the returned elements in order to perform operations that are safe when re-applied multiple times.
    • Elements that were not constantly present in the collection during a full iteration, may be returned or not: it is undefined.

    The key to a solution is in the cursor itself. See Making sense of Redis’ SCAN cursor. It is possible to deduce the percent of progress of your scan because the cursor is really the bits-reversed of an index to the table size.

    Using DBSIZE or INFO keyspace command you can get how many keys you have at any time:

    > DBSIZE
    (integer) 200032
    > info keyspace
    # Keyspace
    db0:keys=200032,expires=0,avg_ttl=0
    

    Another source of information is the undocumented DEBUG htstats index, just to get a feeling:

    > DEBUG htstats 0
    [Dictionary HT]
    Hash table 0 stats (main hash table):
     table size: 262144
     number of elements: 200032
     different slots: 139805
     max chain length: 8
     avg chain length (counted): 1.43
     avg chain length (computed): 1.43
     Chain length distribution:
       0: 122339 (46.67%)
       1: 93163 (35.54%)
       2: 35502 (13.54%)
       3: 9071 (3.46%)
       4: 1754 (0.67%)
       5: 264 (0.10%)
       6: 43 (0.02%)
       7: 6 (0.00%)
       8: 2 (0.00%)
    [Expires HT]
    No stats available for empty dictionaries
    

    The table size is the power of 2 following your number of keys: Keys: 200032 => Table size: 262144

    The solution:

    We will calculate a desired COUNT argument for every scan.

    Say you will be calling SCAN with a frequency (F in Hz) of 10 Hz (every 100 ms) and you want it done in 5 seconds (T in s). So you want this finished in N = F*T calls, N = 50 in this example.

    Before your first scan, you know your current progress is 0, so your remaining percent is RP = 1 (100%).

    Before every SCAN call (or every given number of calls that you want to adjust your COUNT if you want to save the Round Trip Time (RTT) of a DBSIZE call), you call DBSIZE to get the number of keys K.

    You will use COUNT = K*RP/N

    For the first call, this is COUNT = 200032*1/50 = 4000.

    For any other call, you need to calculate RP = 1 - ReversedCursor/NextPowerOfTwo(K).

    For example, let say you have done 20 calls already, so now N = 30 (remaining number of calls). You called DBSIZE and got K = 281569. This means NextPowerOfTwo(K) = 524288, this is 2^19.

    Your next cursor is 14509 in decimal = 000011100010101101 in binary. As the table size is 2^19, we represent it with 18 bits.

    You reverse the bits and get 101101010001110000 in binary = 185456 in decimal. This means we have covered 185456 out of 524288. And:

    RP = 1 - ReversedCursor/NextPowerOfTwo(K) = 1 - 185456 / 524288 = 0.65 or 65%
    

    So you have to adjust:

    COUNT = K*RP/N = 281569 * 0.65 / 30 = 6100
    

    So in your next SCAN call you use 6100. Makes sense it increased because:

    • The amount of keys has increased from 200032 to 281569.
    • Although we have only 60% of our initial estimate of calls remaining, progress is behind as 65% of the keyspace is pending to be scanned.

    All this was assuming you are getting all keys. If you're pattern-matching, you need to use the past to estimate the remaining amount of keys to be found. We add as a factor PM (percent of matches) to the COUNT calculation.

    COUNT = PM * K*RP/N
    
    PM = keysFound / ( K * ReversedCursor/NextPowerOfTwo(K))
    

    If after 20 calls, you have found only keysFound = 2000 keys, then:

    PM = 2000 / ( 281569 * 185456 / 524288) = 0.02
    

    This means only 2% of the keys are matching our pattern so far, so

    COUNT = PM * K*RP/N = 0.02 * 6100 = 122
    

    This algorithm can probably be improved, but you get the idea.

    Make sure to run some benchmarks on the COUNT number you'll use to start with, to measure how many milliseconds is your SCAN taking, as you may need to moderate your expectations about how many calls you need (N) to do this in a reasonable time without blocking the server, and adjust your F and T accordingly.

    0 讨论(0)
提交回复
热议问题