API pagination best practices

前端 未结 11 1199
执念已碎
执念已碎 2020-11-28 17:12

I\'d love some some help handling a strange edge case with a paginated API I\'m building.

Like many APIs, this one paginates large results. If you query /foos, you\'

相关标签:
11条回答
  • 2020-11-28 17:45

    I think currently your api's actually responding the way it should. The first 100 records on the page in the overall order of objects you are maintaining. Your explanation tells that you are using some kind of ordering ids to define the order of your objects for pagination.

    Now, in case you want that page 2 should always start from 101 and end at 200, then you must make the number of entries on the page as variable, since they are subject to deletion.

    You should do something like the below pseudocode:

    page_max = 100
    def get_page_results(page_no) :
    
        start = (page_no - 1) * page_max + 1
        end = page_no * page_max
    
        return fetch_results_by_id_between(start, end)
    
    0 讨论(0)
  • 2020-11-28 17:48

    There may be two approaches depending on your server side logic.

    Approach 1: When server is not smart enough to handle object states.

    You could send all cached record unique id’s to server, for example ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10"] and a boolean parameter to know whether you are requesting new records(pull to refresh) or old records(load more).

    Your sever should responsible to return new records(load more records or new records via pull to refresh) as well as id’s of deleted records from ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10"].

    Example:- If you are requesting load more then your request should look something like this:-

    {
            "isRefresh" : false,
            "cached" : ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10"]
    }
    

    Now suppose you are requesting old records(load more) and suppose "id2" record is updated by someone and "id5" and "id8" records is deleted from server then your server response should look something like this:-

    {
            "records" : [
    {"id" :"id2","more_key":"updated_value"},
    {"id" :"id11","more_key":"more_value"},
    {"id" :"id12","more_key":"more_value"},
    {"id" :"id13","more_key":"more_value"},
    {"id" :"id14","more_key":"more_value"},
    {"id" :"id15","more_key":"more_value"},
    {"id" :"id16","more_key":"more_value"},
    {"id" :"id17","more_key":"more_value"},
    {"id" :"id18","more_key":"more_value"},
    {"id" :"id19","more_key":"more_value"},
    {"id" :"id20","more_key":"more_value"}],
            "deleted" : ["id5","id8"]
    }
    

    But in this case if you’ve a lot of local cached records suppose 500, then your request string will be too long like this:-

    {
            "isRefresh" : false,
            "cached" : ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10",………,"id500"]//Too long request
    }
    

    Approach 2: When server is smart enough to handle object states according to date.

    You could send the id of first record and the last record and previous request epoch time. In this way your request is always small even if you’ve a big amount of cached records

    Example:- If you are requesting load more then your request should look something like this:-

    {
            "isRefresh" : false,
            "firstId" : "id1",
            "lastId" : "id10",
            "last_request_time" : 1421748005
    }
    

    Your server is responsible to return the id’s of deleted records which is deleted after the last_request_time as well as return the updated record after last_request_time between "id1" and "id10" .

    {
            "records" : [
    {"id" :"id2","more_key":"updated_value"},
    {"id" :"id11","more_key":"more_value"},
    {"id" :"id12","more_key":"more_value"},
    {"id" :"id13","more_key":"more_value"},
    {"id" :"id14","more_key":"more_value"},
    {"id" :"id15","more_key":"more_value"},
    {"id" :"id16","more_key":"more_value"},
    {"id" :"id17","more_key":"more_value"},
    {"id" :"id18","more_key":"more_value"},
    {"id" :"id19","more_key":"more_value"},
    {"id" :"id20","more_key":"more_value"}],
            "deleted" : ["id5","id8"]
    }
    

    Pull To Refresh:-

    enter image description here

    Load More

    enter image description here

    0 讨论(0)
  • 2020-11-28 17:56

    I've thought long and hard about this and finally ended up with the solution I'll describe below. It's a pretty big step up in complexity but if you do make this step, you'll end up with what you are really after, which is deterministic results for future requests.

    Your example of an item being deleted is only the tip of the iceberg. What if you are filtering by color=blue but someone changes item colors in between requests? Fetching all items in a paged manner reliably is impossible... unless... we implement revision history.

    I've implemented it and it's actually less difficult than I expected. Here's what I did:

    • I created a single table changelogs with an auto-increment ID column
    • My entities have an id field, but this is not the primary key
    • The entities have a changeId field which is both the primary key as well as a foreign key to changelogs.
    • Whenever a user creates, updates or deletes a record, the system inserts a new record in changelogs, grabs the id and assigns it to a new version of the entity, which it then inserts in the DB
    • My queries select the maximum changeId (grouped by id) and self-join that to get the most recent versions of all records.
    • Filters are applied to the most recent records
    • A state field keeps track of whether an item is deleted
    • The max changeId is returned to the client and added as a query parameter in subsequent requests
    • Because only new changes are created, every single changeId represents a unique snapshot of the underlying data at the moment the change was created.
    • This means that you can cache the results of requests that have the parameter changeId in them forever. The results will never expire because they will never change.
    • This also opens up exciting feature such as rollback / revert, synching client cache etc. Any features that benefit from change history.
    0 讨论(0)
  • 2020-11-28 17:58

    Pagination is generally a "user" operation and to prevent overload both on computers and the human brain you generally give a subset. However, rather than thinking that we don't get the whole list it may be better to ask does it matter?

    If an accurate live scrolling view is needed, REST APIs which are request/response in nature are not well suited for this purpose. For this you should consider WebSockets or HTML5 Server-Sent Events to let your front end know when dealing with changes.

    Now if there's a need to get a snapshot of the data, I would just provide an API call that provides all the data in one request with no pagination. Mind you, you would need something that would do streaming of the output without temporarily loading it in memory if you have a large data set.

    For my case I implicitly designate some API calls to allow getting the whole information (primarily reference table data). You can also secure these APIs so it won't harm your system.

    0 讨论(0)
  • 2020-11-28 17:58

    Another option for Pagination in RESTFul APIs, is to use the Link header introduced here. For example Github use it as follow:

    Link: <https://api.github.com/user/repos?page=3&per_page=100>; rel="next",
      <https://api.github.com/user/repos?page=50&per_page=100>; rel="last"
    

    The possible values for rel are: first, last, next, previous. But by using Link header, it may be not possible to specify total_count (total number of elements).

    0 讨论(0)
提交回复
热议问题