How to get filtered data from Bigtable using Python?

不羁岁月 提交于 2019-12-06 13:21:27
David

There are several issues with your code that will help you get to what you want:

  1. Bigtable uses lexicographic sort over arbitrary bytes, so the sort order is 1, 10, 2, 3 and so on. This is why 10 is included in your result set. You could fix this by left padding your numbers so they are stored as 000000001, 000000002. (You can reduce the inefficiency of this by storing in hex or even binary).

  2. Because you only print row.cells[columnFamilyid1]["arc_record_id".encode('utf-8')] you are only outputting arc_record_id.

  3. Because the column you want to filter is the row key, it is both easier and more efficient to directly tell read_rows the range to read: read_rows(start_key="RecordArchive1".encode('utf-8'), end_key="RecordArchive3".encode('utf-8'))

All in all, try code like:

KEY_PREFIX = "RecordArchive".encode('utf-8')
ARC_RECORD_ID_COL = "arc_record_id".encode('utf-8')
RECORD_ID_COL = "record_id".encode('utf-8')
BATCH_ID_COL = "batch_id".encode('utf-8')

# Functions used to store/retrieve integer values. Supports IDs up to 2**31
def pack_int(i):
    return struct.pack('>l', i)
def unpack_int(b):
    return struct.unpack('>l', b)[0]
# row key of a record of given arc_record_id
def rowkey(id):
    return KEY_PREFIX + pack_int(id)

results = table.read_rows(start_key=rowkey(1), end_key=rowkey(2), end_inclusive=True)
print("arc_record_id,record_id,batch_id")
for row in results:
    print("{},{},{}".format(
              unpack_int(row.cell[columnFamilyid1][ARC_RECORD_ID_COL][0].value),
              unpack_int(row.cell[columnFamilyid1][RECORD_ID_COL][0].value),
              unpack_int(row.cell[columnFamilyid1][BATCH_ID_COL][0].value)))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!