问题
I am using Bigtable emulator and have successfully added a table in it and now I need to get filtered data.
The table is as follows:
arc_record_id | record_id | batch_id
1 |624 |86
2 |625 |86
3 |626 |86
and so on...till arc_record_id 10.
I have tried this given below Python code:
visit_dt_filter = ValueRangeFilter(start_value = "1".encode('utf-8'),
end_value = "2".encode('utf-8'))
col1_filter = ColumnQualifierRegexFilter(b'arc_record_id')
chain1 = RowFilterChain(filters=[col1_filter, visit_dt_filter])
partial_rows = testTable.read_rows(filter_=chain1)
for row in partial_rows:
cell = row.cells[columnFamilyid1]["arc_record_id".encode('utf-8')][0]
print(cell.value.decode('utf-8'))
The rowkey is
prim_key=row_value[0] //which is arc_record_id
row_key="RecordArchive{}".format(prim_key).encode('utf-8')
I get the output as
1
10
2
3
I expect the output to be
arc_record_id | record_id | batch_id
1 |624 |86
2 |625 |86
回答1:
There are several issues with your code that will help you get to what you want:
Bigtable uses lexicographic sort over arbitrary bytes, so the sort order is 1, 10, 2, 3 and so on. This is why 10 is included in your result set. You could fix this by left padding your numbers so they are stored as 000000001, 000000002. (You can reduce the inefficiency of this by storing in hex or even binary).
Because you only print
row.cells[columnFamilyid1]["arc_record_id".encode('utf-8')]
you are only outputting arc_record_id.Because the column you want to filter is the row key, it is both easier and more efficient to directly tell read_rows the range to read:
read_rows(start_key="RecordArchive1".encode('utf-8'), end_key="RecordArchive3".encode('utf-8'))
All in all, try code like:
KEY_PREFIX = "RecordArchive".encode('utf-8')
ARC_RECORD_ID_COL = "arc_record_id".encode('utf-8')
RECORD_ID_COL = "record_id".encode('utf-8')
BATCH_ID_COL = "batch_id".encode('utf-8')
# Functions used to store/retrieve integer values. Supports IDs up to 2**31
def pack_int(i):
return struct.pack('>l', i)
def unpack_int(b):
return struct.unpack('>l', b)[0]
# row key of a record of given arc_record_id
def rowkey(id):
return KEY_PREFIX + pack_int(id)
results = table.read_rows(start_key=rowkey(1), end_key=rowkey(2), end_inclusive=True)
print("arc_record_id,record_id,batch_id")
for row in results:
print("{},{},{}".format(
unpack_int(row.cell[columnFamilyid1][ARC_RECORD_ID_COL][0].value),
unpack_int(row.cell[columnFamilyid1][RECORD_ID_COL][0].value),
unpack_int(row.cell[columnFamilyid1][BATCH_ID_COL][0].value)))
来源:https://stackoverflow.com/questions/55792924/how-to-get-filtered-data-from-bigtable-using-python