问题
I am using AWS-Kinesis-Firehose to injest data to S3, and consume it afterwards with Athena.
I am trying to analyze events from different games, to avoid Athena explore much data I would like to partition the s3 data using an identifier for each game, so far I did not find a solution, as Firehose receives data from different games.
Does anyone knows how to do it?
Thank you, Javi.
回答1:
You could possibly use Amazon Kinesis Analytics to split incoming Firehose streams into separate output streams based upon some logic, such as Game ID.
It can accept a KinesisFirehoseInput and send data to a KinesisFirehoseOutput.
However, the limits documentation seems to suggest that there can only be 3 output destinations per application, so this would not be sufficient.
回答2:
You could send your traffic to the main FireHose stream - then use a lambda function to split the data to multiple FireHose streams - one for each game that will save the data in a separate folder/bucket
来源:https://stackoverflow.com/questions/45432265/partitioning-aws-kinesis-firehose-data-to-s3-by-payload