问题
Consider the classic example of blog data modelling, where we have a Blog
entity with many properties, and we want to list the latest blogs in a page.
It makes sense to denormalize the BlogPost
entity into a BlogPostSummary
entity which will be shown in the list view, avoiding fetching and deserializing many unwanted properties.
class BlogPost(db.Model):
title = db.StringProperty()
content = db.TextProperty()
created = db.DateProperty()
...
class BlogPostSummary(db.Model):
title = db.StringProperty()
content_excerpt = db.TextProperty()
The question is: which entity should hold the indexed properties? There are 3 options:
1. Index properties in both
- Pros:
- Easy query on both entities.
- Cons:
- Maintaining denormalized indexes is expensive.
2. Index properties in main entity only
- Pros:
- Indexing properties in the main entity is more safe, as the denormalized entity is treated as redundancy.
- Cons:
- Querying the list view will need a double roundtrip to datastore: One to key-only query for
BlogPost
entities, followed by a batch get forBlogPostSummary
.
- Querying the list view will need a double roundtrip to datastore: One to key-only query for
3. Index in denormalized entity only
- Pros:
- The list view can be easily built by a single query.
- Cons:
- The main entity cannot be queried by those properties anymore.
- The indexes occupy more space when the denormalized entity is a child of the main entity.
Which option would work better? Are there other options?
Would the double round trip to datastore in option 2 be a problem?
回答1:
This is an abstract question that does not have a "correct" answer. The choice of a data model depends on specific requirements of a project, including:
- usage patterns (how often you need access to different data)
- update patterns (e.g. separating frequently updated properties from stable properties in order to reduce write costs)
- average performance and extreme-case performance requirements (e.g an average blog may have 10 posts, a very popular blog may have 10,000 posts)
- ability to use memcache to reduce datastore trips and improve performance
- data complexity (i.e. how many different kids of entities depend on this particular entity kind)
- transaction requirements
- security and access roles considerations (e.g. not exposing private data by mistake)
By the way, there is another way to model data in the Datastore - using child entities. For example, blog posts may be child entities of a blog entity. This way you can retrieve all blog posts with a single query by providing a parent key - without storing post IDs or keys in the blog entity or blog ID/key in the post entities.
回答2:
If you want to avoid fetching and deserializing unwanted properties, then put all your entities into a single model and use Projection queries:
class BlogPost(db.Model):
title = db.StringProperty(indexed=True)
content = db.TextProperty(indexed=False)
content_excerpt = db.TextProperty(indexed=False)
created = db.DateProperty(indexed=True)
...
posts = BlogPost.query().fetch(projection=["title", "created"])
(Note that the projection fields must be indexed.)
来源:https://stackoverflow.com/questions/30928110/app-engine-datastore-denormalization-index-properties-in-the-main-entity-or-the