How to fetch more than 1000?

后端 未结 16 1922
囚心锁ツ
囚心锁ツ 2020-11-28 04:13

How can I fetch more than 1000 record from data store and put all in one single list to pass to django?

相关标签:
16条回答
  • 2020-11-28 04:21
    entities = []
    for entity in Entity.all():
        entities.append(entity)
    

    Simple as that. Note that there is an RPC made for every entity which is much slower than fetching in chunks. So if you're concerned about performance, do the following:

    If you have less than 1M items:

    entities = Entity.all().fetch(999999)
    

    Otherwise, use a cursor.

    It should also be noted that:

    Entity.all().fetch(Entity.all().count())
    

    returns 1000 max and should not be used.

    0 讨论(0)
  • You can't.

    Part of the FAQ states that there is no way you can access beyond row 1000 of a query, increasing the "OFFSET" will just result in a shorter result set,

    ie: OFFSET 999 --> 1 result comes back.

    From Wikipedia:

    App Engine limits the maximum rows returned from an entity get to 1000 rows per Datastore call. Most web database applications use paging and caching, and hence do not require this much data at once, so this is a non-issue in most scenarios.[citation needed] If an application needs more than 1,000 records per operation, it can use its own client-side software or an Ajax page to perform an operation on an unlimited number of rows.

    From http://code.google.com/appengine/docs/whatisgoogleappengine.html

    Another example of a service limit is the number of results returned by a query. A query can return at most 1,000 results. Queries that would return more results only return the maximum. In this case, a request that performs such a query isn't likely to return a request before the timeout, but the limit is in place to conserve resources on the datastore.

    From http://code.google.com/appengine/docs/datastore/gqlreference.html

    Note: A LIMIT clause has a maximum of 1000. If a limit larger than the maximum is specified, the maximum is used. This same maximum applies to the fetch() method of the GqlQuery class.

    Note: Like the offset parameter for the fetch() method, an OFFSET in a GQL query string does not reduce the number of entities fetched from the datastore. It only affects which results are returned by the fetch() method. A query with an offset has performance characteristics that correspond linearly with the offset size.

    From http://code.google.com/appengine/docs/datastore/queryclass.html

    The limit and offset arguments control how many results are fetched from the datastore, and how many are returned by the fetch() method:

    • The datastore fetches offset + limit results to the application. The first offset results are not skipped by the datastore itself.

    • The fetch() method skips the first offset results, then returns the rest (limit results).

    • The query has performance characteristics that correspond linearly with the offset amount plus the limit.

    What this means is

    If you have a singular query, there is no way to request anything outside the range 0-1000.

    Increasing offset will just raise the 0, so

    LIMIT 1000  OFFSET 0    
    

    Will return 1000 rows,

    and

    LIMIT 1000 OFFSET 1000 
    

    Will return 0 rows, thus, making it impossible to, with a single query syntax, fetch 2000 results either manually or using the API.

    The only plausible exception

    Is to create a numeric index on the table, ie:

     SELECT * FROM Foo  WHERE ID > 0 AND ID < 1000 
    
     SELECT * FROM Foo WHERE ID >= 1000 AND ID < 2000
    

    If your data or query can't have this 'ID' hardcoded identifier, then you are out of luck

    0 讨论(0)
  • 2020-11-28 04:23

    Every time this comes up as a limitation, I always wonder "why do you need more than 1,000 results?" Did you know that Google themselves doesn't serve up more than 1,000 results? Try this search: http://www.google.ca/search?hl=en&client=firefox-a&rls=org.mozilla:en-US:official&hs=qhu&q=1000+results&start=1000&sa=N I didn't know that until recently, because I'd never taken the time to click into the 100th page of search results on a query.

    If you're actually returning more than 1,000 results back to the user, then I think there's a bigger problem at hand than the fact that the data store won't let you do it.

    One possible (legitimate) reason to need that many results is if you were doing a large operation on the data and presenting a summary (for example, what is the average of all this data). The solution to this problem (which is talked about in the Google I/O talk) is to calculate the summary data on-the-fly, as it comes in, and save it.

    0 讨论(0)
  • 2020-11-28 04:24

    This is close to the solution provided by Gabriel, but doesn't fetch the results it just counts them:

    count = 0
    q = YourEntityClass.all().filter('myval = ', 2)
    countBatch = q.count()
    while countBatch > 0:
        count += countBatch
        countBatch = q.with_cursor(q.cursor()).count()
    
    logging.info('Count=%d' % count)
    

    Works perfectly for my queries, and fast too (1.1 seconds to count 67,000 entities)

    Note that the query must not be an inequality filter or a set or the cursor will not work and you'll get this exception:

    AssertionError: No cursor available for a MultiQuery (queries using "IN" or "!=" operators)

    0 讨论(0)
  • 2020-11-28 04:25

    The 1000 record limit is a hard limit in Google AppEngine.

    This presentation http://sites.google.com/site/io/building-scalable-web-applications-with-google-app-engine explains how to efficiently page through data using AppEngine.

    (Basically by using a numeric id as key and specifying a WHERE clause on the id.)

    0 讨论(0)
  • 2020-11-28 04:29

    Just for the record - fetch limit of 1000 entries is now gone:

    http://googleappengine.blogspot.com/2010/02/app-engine-sdk-131-including-major.html

    Quotation:

    No more 1000 result limit - That's right: with addition of Cursors and the culmination of many smaller Datastore stability and performance improvements over the last few months, we're now confident enough to remove the maximum result limit altogether. Whether you're doing a fetch, iterating, or using a Cursor, there's no limits on the number of results.

    0 讨论(0)
提交回复
热议问题