Optimizing queries for the next and previous element

前端 未结 11 909
耶瑟儿~
耶瑟儿~ 2021-01-30 11:33

I am looking for the best way to retrieve the next and previous records of a record without running a full query. I have a fully implemented solution in place, and would like to

相关标签:
11条回答
  • 2021-01-30 11:41

    The problem / datastructur is named bi-directional graph or you could say you've got several linked lists.

    If you think of it as a linked list, you could just add fields to the items table for every sorting and prev / next key. But the DB Person will kill you for that, it's like GOTO.

    If you think of it as a (bi-)directional graph, you go with Jessica's answer. The main problem there is that order updates are expensive operations.

     Item Next Prev
       A   B     -
       B   C     A
       C   D     B
       ...
    

    If you change one items position to the new order A, C, B, D, you will have to update 4 rows.

    0 讨论(0)
  • 2021-01-30 11:42

    So you have two tasks:

    1. build sorted list of items (SELECTs with different ORDER BY)
    2. show details about each item (SELECT details from database with possible caching).

    What is the problem?

    PS: if ordered list may be too big you just need PAGER functionality implemented. There could be different implementations, e.g. you may wish to add "LIMIT 5" into query and provide "Show next 5" button. When this button is pressed, condition like "WHERE price < 0.89 LIMIT 5" is added.

    0 讨论(0)
  • 2021-01-30 11:44

    In general, I denormalize the data from the indexes. They may be stored in the same rows, but I almost always retrieve my result IDs, then make a separate trip for the data. This makes caching the data very simple. It's not so important in PHP where the latency is low and the bandwidth high, but such a strategy is very useful when you have a high latency, low bandwidth application, such as an AJAX website where much of the site is rendered in JavaScript.

    I always cache the lists of results, and the results themselves separately. If anything affects the results of a list query, the cache of the list results is refreshed. If anything affects the results themselves, those particular results are refreshed. This allows me to update either one without having to regenerate everything, resulting in effective caching.

    Since my lists of results rarely change, I generate all the lists at the same time. This may make the initial response slightly slower, but it simplifies cache refreshing (all the lists get stored in a single cache entry).

    Because I have the entire list cached, it's trivial to find neighbouring items without revisiting the database. With luck, the data for those items will also be cached. This is especially handy when sorting data in JavaScript. If I already have a copy cached on the client, I can resort instantly.

    To answer your questions specifically:

    • Yes, it's a fantastic idea to find out the neighbours ahead of time, or whatever information the client is likely to access next, especially if the cost is low now and the cost to recalculate is high. Then it's simply a trade off of extra pre-calculation and storage versus speed.
    • In terms of performance and simplicity, avoid tying things together that are logically different things. Indexes and data are different, are likely to be changed at different times (e.g. adding a new datum will affect the indexes, but not the existing data), and thus should be accessed separately. This may be slightly less efficient from a single-threaded standpoint, but every time you tie something together, you lose caching effectiveness and asychronosity (the key to scaling is asychronosity).
    • The term for getting data ahead of time is pre-fetching. Pre-fetching can happen at the time of access or in the background, but before the pre-fetched data is actually needed. Likewise with pre-calculation. It's a trade-off of cost now, storage cost, and cost to get when needed.
    • "Sorting cache" is an apt name.
    • I don't know.

    Also, when you cache things, cache them at the most generic level possible. Some stuff might be user specific (such as results for a search query), where others might be user agnostic, such as browsing a catalog. Both can benefit from caching. The catalog query might be frequent and save a little each time, and the search query may be expensive and save a lot a few times.

    0 讨论(0)
  • 2021-01-30 11:47

    I'm not sure whether I understood right, so if not, just tell me ;)

    Let's say, that the givens are the query for the sorted list and the current offset in that list, i.e. we have a $query and an $n.

    A very obvious solution to minimize the queries, would be to fetch all the data at once:

    list($prev, $current, $next) = DB::q($query . ' LIMIT ?i, 3', $n - 1)->fetchAll(PDO::FETCH_NUM);
    

    That statement fetches the previous, the current and the next elements from the database in the current sorting order and puts the associated information into the corresponding variables.

    But as this solution is too simple, I assume I misunderstood something.

    0 讨论(0)
  • 2021-01-30 11:53

    There are as many ways to do this as to skin the proverbial cat. So here are a couple of mine.

    If your original query is expensive, which you say it is, then create another table possibly a memory table populating it with the results of your expensive and seldom run main query.

    This second table could then be queried on every view and the sorting is as simple as setting the appropriate sort order.

    As is required repopulate the second table with results from the first table, thus keeping the data fresh, but minimising the use of the expensive query.

    Alternately, If you want to avoid even connecting to the db then you could store all the data in a php array and store it using memcached. this would be very fast and provided your lists weren't too huge would be resource efficient. and can be easily sorted.

    DC

    0 讨论(0)
  • 2021-01-30 11:58

    Here is an idea. You could offload the expensive operations to an update when the grocer inserts/updates new offers rather than when the end user selects the data to view. This may seem like a non-dynamic way to handle the sort data, but it may increase speed. And, as we know, there is always a trade off between performance and other coding factors.

    Create a table to hold next and previous for each offer and each sort option. (Alternatively, you could store this in the offer table if you will always have three sort options -- query speed is a good reason to denormalize your database)

    So you would have these columns:

    • Sort Type (Unsorted, Price, Class and Price Desc)
    • Offer ID
    • Prev ID
    • Next ID

    When the detail information for the offer detail page is queried from the database, the NextID and PrevID would be part of the results. So you would only need one query for each detail page.

    Each time an offer is inserted, updated or deleted, you would need to run a process which validates the integrity/accuracy of the sorttype table.

    0 讨论(0)
提交回复
热议问题