How to think in data stores instead of databases?

前端 未结 8 2231
滥情空心
滥情空心 2020-11-28 17:32

As an example, Google App Engine uses Google Datastore, not a standard database, to store data. Does anybody have any tips for using Google Datastore instead of databases?

相关标签:
8条回答
  • 2020-11-28 17:48

    If you're used to thinking about ORM-mapped entities then that's basically how an entity-based datastore like Google's App Engine works. For something like joins, you can look at reference properties. You don't really need to be concerned about whether it uses BigTable for the backend or something else since the backend is abstracted by the GQL and Datastore API interfaces.

    0 讨论(0)
  • 2020-11-28 17:50

    I always chuckle when people come out with - it's not relational. I've written cellectr in django and here's a snippet of my model below. As you'll see, I have leagues that are managed or coached by users. I can from a league get all the managers, or from a given user I can return the league she coaches or managers.

    Just because there's no specific foreign key support doesn't mean you can't have a database model with relationships.

    My two pence.


    class League(BaseModel):
        name = db.StringProperty()    
        managers = db.ListProperty(db.Key) #all the users who can view/edit this league
        coaches = db.ListProperty(db.Key) #all the users who are able to view this league
    
        def get_managers(self):
            # This returns the models themselves, not just the keys that are stored in teams
            return UserPrefs.get(self.managers)
    
        def get_coaches(self):
            # This returns the models themselves, not just the keys that are stored in teams
            return UserPrefs.get(self.coaches)      
    
        def __str__(self):
            return self.name
    
        # Need to delete all the associated games, teams and players
        def delete(self):
            for player in self.leagues_players:
                player.delete()
            for game in self.leagues_games:
                game.delete()
            for team in self.leagues_teams:
                team.delete()            
            super(League, self).delete()
    
    class UserPrefs(db.Model):
        user = db.UserProperty()
        league_ref = db.ReferenceProperty(reference_class=League,
                                collection_name='users') #league the users are managing
    
        def __str__(self):
            return self.user.nickname
    
        # many-to-many relationship, a user can coach many leagues, a league can be
        # coached by many users
        @property
        def managing(self):
            return League.gql('WHERE managers = :1', self.key())
    
        @property
        def coaching(self):
            return League.gql('WHERE coaches = :1', self.key())
    
        # remove all references to me when I'm deleted
        def delete(self):
            for manager in self.managing:
                manager.managers.remove(self.key())
                manager.put()
            for coach in self.managing:
                coach.coaches.remove(self.key())
                coaches.put()            
            super(UserPrefs, self).delete()    
    
    0 讨论(0)
  • 2020-11-28 17:53

    Being rooted in the database world, a data store to me would be a giant table (hence the name "bigtable"). BigTable is a bad example though because it does a lot of other things that a typical database might not do, and yet it is still a database. Chances are unless you know you need to build something like Google's "bigtable", you will probably be fine with a standard database. They need that because they are handling insane amounts of data and systems together, and no commercially available system can really do the job the exact way they can demonstrate that they need the job to be done.

    (bigtable reference: http://en.wikipedia.org/wiki/BigTable)

    0 讨论(0)
  • 2020-11-28 17:56

    There's two main things to get used to about the App Engine datastore when compared to 'traditional' relational databases:

    • The datastore makes no distinction between inserts and updates. When you call put() on an entity, that entity gets stored to the datastore with its unique key, and anything that has that key gets overwritten. Basically, each entity kind in the datastore acts like an enormous map or sorted list.
    • Querying, as you alluded to, is much more limited. No joins, for a start.

    The key thing to realise - and the reason behind both these differences - is that Bigtable basically acts like an enormous ordered dictionary. Thus, a put operation just sets the value for a given key - regardless of any previous value for that key, and fetch operations are limited to fetching single keys or contiguous ranges of keys. More sophisticated queries are made possible with indexes, which are basically just tables of their own, allowing you to implement more complex queries as scans on contiguous ranges.

    Once you've absorbed that, you have the basic knowledge needed to understand the capabilities and limitations of the datastore. Restrictions that may have seemed arbitrary probably make more sense.

    The key thing here is that although these are restrictions over what you can do in a relational database, these same restrictions are what make it practical to scale up to the sort of magnitude that Bigtable is designed to handle. You simply can't execute the sort of query that looks good on paper but is atrociously slow in an SQL database.

    In terms of how to change how you represent data, the most important thing is precalculation. Instead of doing joins at query time, precalculate data and store it in the datastore wherever possible. If you want to pick a random record, generate a random number and store it with each record. There's a whole cookbook of this sort of tips and tricks here.

    0 讨论(0)
  • 2020-11-28 17:57

    The way I look at datastore is, kind identifies table, per se, and entity is individual row within table. If google were to take out kind than its just one big table with no structure and you can dump whatever you want in an entity. In other words if entities are not tied to a kind you pretty much can have any structure to an entity and store in one location (kind of a big file with no structure to it, each line has structure of its own).

    Now back to original comment, google datastore and bigtable are two different things so do not confuse google datastore to datastore data storage sense. Bigtable is more expensive than bigquery (Primary reason we didn't go with it). Bigquery does have proper joins and RDBMS like sql language and its cheaper, why not use bigquery. That being said, bigquery does have some limitations, depending on size of your data you might or might not encounter them.

    Also, in terms of thinking in terms of datastore, i think proper statement would have been "thinking in terms of NoSQL databases". There are too many of them available out there these days but when it comes to google products except google cloud SQL (which is mySQL) everything else is NoSQL.

    0 讨论(0)
  • 2020-11-28 18:02

    Take a look at the Objectify documentation. The first comment at the bottom of the page says:

    "Nice, although you wrote this to describe Objectify, it is also one of the most concise explanation of appengine datastore itself I've ever read. Thank you."

    https://github.com/objectify/objectify/wiki/Concepts

    0 讨论(0)
提交回复
热议问题