amazon-kinesis-firehose

Does Amazon Kinesis Firehose support Data Transformations programatically?

巧了我就是萌 提交于 2020-06-03 09:51:21
问题 I have a use case in which I have to verify that the payloads sent to Kinesis firehose are indeed being sent. In order to do that I came up with the chain Firehose -> Firehose Data transformation(using lambda) -> DDB -> Check for payload in DDB (the payload is the hashkey in the DDB). I have to define this entire chain in one shot programatically. The data transformation is the same as http://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html. I am doing all this since I cannot

CLI to put data into AWS Firehose

血红的双手。 提交于 2020-05-14 19:09:30
问题 AWS Firehose was released today. I'm playing around with it and trying to figure out how to put data into the stream using AWS CLI. I have a simple JSON payload and the corresponding Redshift table with columns that map to the JSON attributes. I've tried various combinations but I can't seem to pass in the JSON payload via the cli. What I've tried: aws firehose put-record --delivery-stream-name test-delivery-stream --record '{ "attribute": 1 }' aws firehose put-record --delivery-stream-name

CLI to put data into AWS Firehose

我与影子孤独终老i 提交于 2020-05-14 19:07:02
问题 AWS Firehose was released today. I'm playing around with it and trying to figure out how to put data into the stream using AWS CLI. I have a simple JSON payload and the corresponding Redshift table with columns that map to the JSON attributes. I've tried various combinations but I can't seem to pass in the JSON payload via the cli. What I've tried: aws firehose put-record --delivery-stream-name test-delivery-stream --record '{ "attribute": 1 }' aws firehose put-record --delivery-stream-name

CLI to put data into AWS Firehose

喜你入骨 提交于 2020-05-14 19:06:23
问题 AWS Firehose was released today. I'm playing around with it and trying to figure out how to put data into the stream using AWS CLI. I have a simple JSON payload and the corresponding Redshift table with columns that map to the JSON attributes. I've tried various combinations but I can't seem to pass in the JSON payload via the cli. What I've tried: aws firehose put-record --delivery-stream-name test-delivery-stream --record '{ "attribute": 1 }' aws firehose put-record --delivery-stream-name

Repartitioning parquet-mr generated parquets with pyarrow/parquet-cpp increases file size by x30?

北城以北 提交于 2020-04-30 16:38:22
问题 Using AWS Firehose I am converting incoming records to parquet. In one example, I have 150k identical records enter firehose, and a single 30kb parquet gets written to s3. Because of how firehose partitions data, we have a secondary process (lambda triggered by s3 put event) read in the parquet and repartitions it based on the date within the event itself. After this repartitioning process, the 30kb file size jumps to 900kb. Inspecting both parquet files- The meta doesn't change The data

Repartitioning parquet-mr generated parquets with pyarrow/parquet-cpp increases file size by x30?

旧时模样 提交于 2020-04-30 16:37:15
问题 Using AWS Firehose I am converting incoming records to parquet. In one example, I have 150k identical records enter firehose, and a single 30kb parquet gets written to s3. Because of how firehose partitions data, we have a secondary process (lambda triggered by s3 put event) read in the parquet and repartitions it based on the date within the event itself. After this repartitioning process, the 30kb file size jumps to 900kb. Inspecting both parquet files- The meta doesn't change The data

Repartitioning parquet-mr generated parquets with pyarrow/parquet-cpp increases file size by x30?

不打扰是莪最后的温柔 提交于 2020-04-30 16:36:37
问题 Using AWS Firehose I am converting incoming records to parquet. In one example, I have 150k identical records enter firehose, and a single 30kb parquet gets written to s3. Because of how firehose partitions data, we have a secondary process (lambda triggered by s3 put event) read in the parquet and repartitions it based on the date within the event itself. After this repartitioning process, the 30kb file size jumps to 900kb. Inspecting both parquet files- The meta doesn't change The data

Write to a specific folder in S3 bucket using AWS Kinesis Firehose

安稳与你 提交于 2020-01-13 09:05:52
问题 I would like to be able to send data sent to kinesis firehose based on the content inside the data. For example if I sent this JSON data: { "name": "John", "id": 345 } I would like to filter the data based on id and send it to a subfolder of my s3 bucket like: S3://myS3Bucket/345_2018_03_05. Is this at all possible with Kinesis Firehose or AWS Lambda? The only way I can think of right now is to resort to creating a kinesis stream for every single one of my possible IDs and point them to the

Reading the data written to s3 by Amazon Kinesis Firehose stream

拜拜、爱过 提交于 2020-01-03 06:57:39
问题 I am writing record to Kinesis Firehose stream that is eventually written to a S3 file by Amazon Kinesis Firehose. My record object looks like ItemPurchase { String personId, String itemId } The data is written to S3 looks like: {"personId":"p-111","itemId":"i-111"}{"personId":"p-222","itemId":"i-222"}{"personId":"p-333","itemId":"i-333"} NO COMMA SEPERATION. NO STARTING BRACKET as in a Json Array [ NO ENDING BRACKET as in a Json Array ] I want to read this data get a list of ItemPurchase

AWS DynamoDB Stream into Redshift

末鹿安然 提交于 2019-12-29 08:09:11
问题 We would like to move data from DynamoDB NoSQL into Redshift Database continously as a stream. I am having hard time understand all the new terms/technologies in AWS. There is 1) DynamoDB Streams 2) AWS Lambda 3) AWS Kinesis Firehose Can someone provide a brief summary of each. What are DynamoDB streams? How does this differ from AmazonKinesis? After reading all the resources, this is my hypothesis understanding, please verify below. (a) I assume DynamoDB Streams, create the streaming data of