efficient searching using appengine datastore ancestor paths

爷,独闯天下 提交于 2019-12-13 15:00:42

问题


We have a mulch-tenancy and I need to search and fetch in a huge appengine datastore based on an indexed attribute range and client id. Does usage of Ancestor Paths make it efficient ? Alternatively Same can be done using an additional filter

e.g. to get the top 100 salaries via objectify

Key<Clients> clientIdKey = Key.create(Clients.class, 500)
ofy().load().type(Salaries.class).ancestor(clientIdKey).order("-salary").limit(100).list()

Alternatively just

ofy().load().type(Salaries.class).filter("clientId = ", 500 ).order("-salary").limit(100).list()

My assumption is that in the first case all entities belonging to any other Client will be ignored but in later case it will be full scan which will more expensive. Is this assumption right ?

Also does the index "salary" stored globally or it is partitioned according to the ancestor so that the index update happens only within same ancestor ? This will reduce time taken update the index and will be a good solution when we are never going to query across different clients.


回答1:


The first thing I need to point out is that the datastore does not do table scans. With a couple exceptions (most notably zig-zag merges), GAE queries only follow indexes - so usually these kinds of questions boil down to "which index is more efficient to maintain?"

Let's start by talking about the second case (note that I've singularized Salary, which I assume is your intention):

ofy().load().type(Salary.class).filter("clientId = ", 500 ).order("-salary").limit(100).list()

This requires a multi-property index on Salary { clientId, salary } DESC. GAE will navigate the index to the start of Salary/clientId/500 and then read off each index record one at a time. It will do this on the index table in an arbitrary datacenter - and since these index tables are replicated asynchronously, you get an eventually consistent result.

In order for an entity to participate in a multi-property index, each of the individual single properties must be indexed themselves. If Salary had no other indexed properties, writing a single Salary would cost:

  • 1 write for the entity operation
  • 2 writes for the clientId index (asc and desc)
  • 2 writes for the salary index (asc and desc)
  • 1 write for the multi-property index { clientId, salary } DESC

Now let's look at the first case:

ofy().load().type(Salary.class).ancestor(clientIdKey).order("-salary").limit(100).list()

This requires a different multi-property index in your datastore-indexes.xml. This time you need an index on Salary { ancestor, salary } DESC. In addition, the default behavior of GAE is to read from a quorum of datacenters to make this a strongly consistent operation. This should be somewhat slower (although no more expensive) than the other method, however, you can explicitly specify eventual consistency to get the same "any datacenter" behavior: ofy().consistency(Consistency.EVENTUAL).load()... The nice thing here is that you have the option of strong consistency.

Another bonus of the ancestor approach is that you don't need to maintain a single-property index on clientId. Here's what happens when you write this Salary (assuming no other indexed fields):

  • 1 write for the entity operation
  • 2 writes for the salary index (asc and desc)
  • 1 write for the multi-property index { ancestor, salary } DESC

This can make your system considerably cheaper. The biggest cost of multi-property indexes is often the cost of all the (otherwise irrelevant) bidirectional single-property indexes you must maintain simply as a flag to GAE.


Regarding your last question, it might help to explain what GAE index tables look like. There are three BigTable tables for indexes, shared across all applications. The first two are single-property index tables (one for ascending, one for descending). Their contents look something very roughly like this:

{appId}/{entityKind}/{propertyName}/{propertyValue}/{entityKey}

By doing range scans (one of the primitive operations of BigTable), you can determine which entities match your query. This is also why keys-only queries are fast/cheap; you can return the key immediately without doing the subsequent lookup.

The multi-property index table looks (again, this is not exact) like this:

{appId}/{entityKind}/{prop1name}/{prop1value}/{prop2name}/{prop2value}/.../{entityKey}\

It might be easier to visualize with some values for a multi-property index on Salary { clientId, salary } DESC:

yourapp/Salary/clientId/500/salary/99000/aghzfnZvb3N0MHILCxIFRXZlbnQYAQw
yourapp/Salary/clientId/500/salary/98000/aghttydiisgAJJ3JGS0ij44JFAjasdw

Again, you can see how by performing range scans, GAE can find the entities which match your queries.

I hope this helps clear things up.



来源:https://stackoverflow.com/questions/26338283/efficient-searching-using-appengine-datastore-ancestor-paths

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!