Find duplicates in app engine datastore

一曲冷凌霜 提交于 2021-02-07 06:44:06

问题


I've some duplicated elements in my datastore (not the whole row, but most of the fields on it) in App Engine.

What's the best way to find them?

I've both integer and string fields that are duplicated (in case comparing one is faster than the other).

Thanks!


回答1:


An stupid but quick approach would be to take the fields you care about, concatenate them as a long string and store them as the key of an DB_Unique entity that references the original entity. Each time you do DB_Unique.get_or_insert() you should verify the reference is to the correct original entity, otherwise, you have a duplicate. This should probably be done in a map reduce.

Something like:

class DB_Unique(db.Model):
  r = db.ReferenceProperty()

class DB_Obj(db.Model):
  a = db.IntegerProperty()
  b = db.StringProperty()
  c = db.StringProperty()

# executed for each DB_Obj...
def mapreduce(entity):
  key = '%s_%s_%s' % (entity.a,entity.b,entity.c)
  res = DB_Unique.get_or_insert(key, r=entity)
  if DB_Unique.r.get_value_for_datastore(res) != entity.key():
    # we have a possible collision, verify and delete?
    # out two entities are res and entity

There are a couple of edge cases that might creep up, such as if you have two entities with b and c equal to ('a_b', '') and ('a','b_') respectively, so the concatenation is 'a_b_' for both. so use a character you know is not in your strings instead of '_', or have DB_Unique.r be a list of references and compare all of them.




回答2:


If this is a one time or rarely occurring occasion, you might want to try dumping the whole database into local machine - see uploading and downloading data - load the data into a sqlite3 database and find the duplicate keys with it.

Trying to do this programmatically on the GAE side might turn out quite tedious. With tasks totally doable but not something too easy.



来源:https://stackoverflow.com/questions/4798858/find-duplicates-in-app-engine-datastore

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!