问题
In eventhub, I have both "sender" and "receiver" scripts for communication between those two.
The issue that I am facing is that it seems that I am receiving a dataset that I sent yesterday plus the one that I just sent together. I am trying to control the data amount by either time period or the number of events.
The basic code for sender.py is following:
CONSUMER_GROUP = "$default"
OFFSET = Offset("-1")
PARTITION = "0"
total = 0
last_sn = -1
last_offset = "-1"
client = EventHubClient(ADDRESS, debug=False, username=USER, password=KEY)
try:
receiver = client.add_receiver(
CONSUMER_GROUP, PARTITION, prefetch=0, offset=OFFSET)
client.run()
start_time = time.time()
batch = receiver.receive(timeout=100)
for event_data in batch[-10:]:
print("Received: {}".format(event_data.body_as_str(encoding='UTF-8')))
total += 1
end_time = time.time()
client.stop()
run_time = end_time - start_time
print("Received {} messages in {} seconds".format(total, run_time))
except KeyboardInterrupt:
pass
finally:
client.stop()
回答1:
I just found a solution which uses the offset to control the read process of event data.
What we need to do first is that get the offset of the event data.
the code like below:
logger = logging.getLogger("azure")
ADDRESS = "amqps://xxx.servicebus.windows.net/xxx"
USER = "RootManageSharedAccessKey"
KEY = "xxx"
CONSUMER_GROUP = "$default"
#first, set offset to -1 to read all the event data
OFFSET = Offset("-1")
PARTITION = "0"
total = 0
last_sn = -1
last_offset = "-1"
client = EventHubClient(ADDRESS, debug=False, username=USER, password=KEY)
try:
receiver = client.add_receiver(
CONSUMER_GROUP, PARTITION, prefetch=5000, offset=OFFSET)
client.run()
start_time = time.time()
print("**begin receive**")
for event_data in receiver.receive(timeout=100):
last_offset = event_data.offset.value
last_sn = event_data.sequence_number
#here, we print out the offset of each event data
print("Received: {}, last_offset: {}, last_sn: {}".format(event_data.body_as_str(encoding='UTF-8'),last_offset,last_sn))
total += 1
end_time = time.time()
client.stop()
run_time = end_time - start_time
print("Received {} messages in {} seconds".format(total, run_time))
except KeyboardInterrupt:
pass
finally:
client.stop()
after executing, you can see all the offset of each data, screenshot like below:
then, you know the offset of each event data. And if you want to get the data from number 40 to number 53. The offset for number 40 is 237080, so in your code, change the offset to a value just less than 237080, set it to 237079 in this line of code OFFSET = Offset("237079")
.
The code like below:
logger = logging.getLogger("azure")
ADDRESS = "amqps://xxx.servicebus.windows.net/xx"
USER = "RootManageSharedAccessKey"
KEY = "xxx"
CONSUMER_GROUP = "$default"
#set the offset
OFFSET = Offset("237079")
PARTITION = "0"
total = 0
last_sn = -1
last_offset = "-1"
client = EventHubClient(ADDRESS, debug=False, username=USER, password=KEY)
try:
receiver = client.add_receiver(
CONSUMER_GROUP, PARTITION, prefetch=5000, offset=OFFSET)
client.run()
start_time = time.time()
print("**begin receive**")
for event_data in receiver.receive(timeout=100):
last_offset = event_data.offset.value
last_sn = event_data.sequence_number
print("Received: {}, last_offset: {}, last_sn: {}".format(event_data.body_as_str(encoding='UTF-8'),last_offset,last_sn))
total += 1
end_time = time.time()
client.stop()
run_time = end_time - start_time
print("Received {} messages in {} seconds".format(total, run_time))
except KeyboardInterrupt:
pass
finally:
client.stop()
after execute the code, only the event data from the specified offset are returned. Screenshot as below:
来源:https://stackoverflow.com/questions/58497804/how-to-receive-the-recent-data-only-in-event-hub