Optimizing queries for the next and previous element

前端 未结 11 912
耶瑟儿~
耶瑟儿~ 2021-01-30 11:33

I am looking for the best way to retrieve the next and previous records of a record without running a full query. I have a fully implemented solution in place, and would like to

相关标签:
11条回答
  • 2021-01-30 12:03

    I've had nightmares with this one as well. Your current approach seems to be the best solution even for lists of 10k items. Caching the IDs of the list view in the http session and then using that for displaying the (personalized to current user) previous/next. This works well especially when there are too many ways to filter and sort the initial list of items instead of just 3.
    Also, by storing the whole IDs list you get to display a "you are at X out of Y" usability enhancing text.
    JIRA's previous/next

    By the way, this is what JIRA does as well.

    To directly answer your questions:

    • Yes it's good practice because it scales without any added code complexity when your filter/sorting and item types crow more complex. I'm using it in a production system with 250k articles with "infinite" filter/sort variations. Trimming the cacheable IDs to 1000 is also a possibility since the user will most probably never click on prev or next more than 500 times (He'll most probably go back and refine the search or paginate).
    • I don't know of a better way. But if the sorts where limited and this was a public site (with no http session) then I'd most probably denormalize.
    • Dunno.
    • Yes, sorting cache sounds good. In my project I call it "previous/next on search results" or "navigation on search results".
    • Dunno.
    0 讨论(0)
  • 2021-01-30 12:04

    I have an idea somewhat similar to Jessica's. However, instead of storing links to the next and previous sort items, you store the sort order for each sort type. To find the previous or next record, just get the row with SortX=currentSort++ or SortX=currentSort--.

    Example:

    Type     Class Price Sort1  Sort2 Sort3
    Lettuce  2     0.89  0      4     0
    Tomatoes 1     1.50  1      0     4
    Apples   1     1.10  2      2     2
    Apples   2     0.95  3      3     1
    Pears    1     1.25  4      1     3
    

    This solution would yield very short query times, and would take up less disk space than Jessica's idea. However, as I'm sure you realize, the cost of updating one row of data is notably higher, since you have to recalculate and store all sort orders. But still, depending on your situation, if data updates are rare and especially if they always happen in bulk, then this solution might be the best.

    i.e.

    once_per_day
      add/delete/update all records
      recalculate sort orders
    

    Hope this is useful.

    0 讨论(0)
  • 2021-01-30 12:06

    Apologies if I have misunderstood, but I think you want to retain the ordered list between user accesses to the server. If so, your answer may well lie in your caching strategy and technologies rather than in database query/ schema optimization.

    My approach would be to serialize() the array once its first retrieved, and then cache that in to a separate storage area; whether that's memcached/ APC/ hard-drive/ mongoDb/ etc. and retain its cache location details for each user individually through their session data. The actual storage backend would naturally be dependent upon the size of the array, which you don't go into much detail about, but memcached scales great over multiple servers and mongo even further at a slightly greater latency cost.

    You also don't indicate how many sort permutations there are in the real-world; e.g. do you need to cache separate lists per user, or can you globally cache per sort permutation and then filter out what you don't need via PHP?. In the example you give, I'd simply cache both permutations and store which of the two I needed to unserialize() in the session data.

    When the user returns to the site, check the Time To Live value of the cached data and re-use it if still valid. I'd also have a trigger running on INSERT/ UPDATE/ DELETE for the special offers that simply sets a timestamp field in a separate table. This would immediately indicate whether the cache was stale and the query needed to be re-run for a very low query cost. The great thing about only using the trigger to set a single field is that there's no need to worry about pruning old/ redundant values out of that table.

    Whether this is suitable would depend upon the size of the data being returned, how frequently it was modified, and what caching technologies are available on your server.

    0 讨论(0)
  • 2021-01-30 12:07

    Basic assumptions:

    • Specials are weekly
    • We can expect the site to change infrequently... probably daily?
    • We can control updates to the database with ether an API or respond via triggers

    If the site changes on a daily basis, I suggest that all the pages are statically generated overnight. One query for each sort-order iterates through and makes all the related pages. Even if there are dynamic elements, odds are that you can address them by including the static page elements. This would provide optimal page service and no database load. In fact, you could possibly generate separate pages and prev / next elements that are included into the pages. This may be crazier with 200 ways to sort, but with 3 I'm a big fan of it.

    ?sort=price
    include(/sorts/$sort/tomatoes_class_1)
    /*tomatoes_class_1 is probably a numeric id; sanitize your sort key... use numerics?*/
    

    If for some reason this isn't feasible, I'd resort to memorization. Memcache is popular for this sort of thing (pun!). When something is pushed to the database, you can issue a trigger to update your cache with the correct values. Do this in the same way you would if as if your updated item existed in 3 linked lists -- relink as appropriate (this.next.prev = this.prev, etc). From that, as long as your cache doesn't overfill, you'll be pulling simple values from memory in a primary key fashion.

    This method will take some extra coding on the select and update / insert methods, but it should be fairly minimal. In the end, you'll be looking up [id of tomatoes class 1].price.next. If that key is in your cache, golden. If not, insert into cache and display.

    • Do you think this is a good practice to find out the neighboring records for varying query orders? Yes. It is wise to perform look-aheads on expected upcoming requests.
    • Do you know better practices in terms of performance and simplicity? Do you know something that makes this completely obsolete? Hopefully the above
    • In programming theory, is there a name for this problem? Optimization?
    • Is the name "Sorting cache" is appropriate and understandable for this technique? I'm not sure of a specific appropriate name. It is caching, it is a cache of sorts, but I'm not sure that telling me you have a "sorting cache" would convey instant understanding.
    • Are there any recognized, common patterns to solve this problem? What are they called? Caching?

    Sorry my tailing answers are kind of useless, but I think my narrative solutions should be quite useful.

    0 讨论(0)
  • 2021-01-30 12:07

    You could save the row numbers of the ordered lists into views, and you could reach the previous and next items in the list under (current_rownum-1) and (current_rownum+1) row numbers.

    0 讨论(0)
提交回复
热议问题