TL;DR I need to find a real solution to download my data from product datastore and load it to the local development environment.
The detailed problem:
I nee
Even if you get this to work, the localhost datastore still behaves differently than the actual datastore.
If you want to truly simulate your production environment, then i would recommend setting up a clone of your app engine project as a remote sandbox. You could deploy your app to a new gae project id appcfg.py update . -A sandbox-id
, and use datastore admin to create a backup of production in google cloud storage and then use datastore admin in your sandbox to restore this backup in your sandbox.
I do prime my localhost datastore with some production data, but this is not a complete clone. Just the core required objects and a few test users.
To do this I wrote a google dataflow job that exports select models and saves them in google cloud storage in jsonl format. Then on my local host I have an endpoint called /init/
which launches a taskqueue job to download these exports and import them.
To do this i reuse my JSON REST handler code which is able to convert any model to json and vice versa.
In theory you could do this for your entire datastore.
EDIT - This is what my to-json/from-json code looks like:
All of my ndb.Model
s subclass my BaseModel
which has generic conversion code:
get_dto_typemap = {
ndb.DateTimeProperty: dt_to_timestamp,
ndb.KeyProperty: key_to_dto,
ndb.StringProperty: str_to_dto,
ndb.EnumProperty: str,
}
set_from_dto_typemap = {
ndb.DateTimeProperty: timestamp_to_dt,
ndb.KeyProperty: dto_to_key,
ndb.FloatProperty: float_from_dto,
ndb.StringProperty: strip,
ndb.BlobProperty: str,
ndb.IntegerProperty: int,
}
class BaseModel(ndb.Model):
def to_dto(self):
dto = {'key': key_to_dto(self.key)}
for name, obj in self._properties.iteritems():
key = obj._name
value = getattr(self, obj._name)
if obj.__class__ in get_dto_typemap:
if obj._repeated:
value = [get_dto_typemap[obj.__class__](v) for v in value]
else:
value = get_dto_typemap[obj.__class__](value)
dto[key] = value
return dto
def set_from_dto(self, dto):
for name, obj in self._properties.iteritems():
if isinstance(obj, ndb.ComputedProperty):
continue
key = obj._name
if key in dto:
value = dto[key]
if not obj._repeated and obj.__class__ in set_from_dto_typemap:
try:
value = set_from_dto_typemap[obj.__class__](value)
except Exception as e:
raise Exception("Error setting "+self.__class__.__name__+"."+str(key)+" to '"+str(value) + "': " + e.message)
try:
setattr(self, obj._name, value)
except Exception as e:
print dir(obj)
raise Exception("Error setting "+self.__class__.__name__+"."+str(key)+" to '"+str(value)+"': "+e.message)
class User(BaseModel):
# user fields, etc
My request handlers then use set_from_dto
& to_dto
like this (BaseHandler
also provides some convenience methods for converting json payloads to python dicts and what not):
class RestHandler(BaseHandler):
MODEL = None
def put(self, resource_id=None):
if resource_id:
obj = ndb.Key(self.MODEL, urlsafe=resource_id).get()
if obj:
obj.set_from_dto(self.json_body)
obj.put()
return obj.to_dto()
else:
self.abort(422, "Unknown id")
else:
self.abort(405)
def post(self, resource_id=None):
if resource_id:
self.abort(405)
else:
obj = self.MODEL()
obj.set_from_dto(self.json_body)
obj.put()
return obj.to_dto()
def get(self, resource_id=None):
if resource_id:
obj = ndb.Key(self.MODEL, urlsafe=resource_id).get()
if obj:
return obj.to_dto()
else:
self.abort(422, "Unknown id")
else:
cursor_key = self.request.GET.pop('$cursor', None)
limit = max(min(200, self.request.GET.pop('$limit', 200)), 10)
qs = self.MODEL.query()
# ... other code that handles query params
results, next_cursor, more = qs.fetch_page(limit, start_cursor=cursor)
return {
'$cursor': next_cursor.urlsafe() if more else None,
'results': [result.to_dto() for result in results],
}
class UserHandler(RestHandler):
MODEL = User