amazon-kinesis-firehose

Does Amazon Kinesis Firehose support Data Transformations programatically?

阅读更多关于 Does Amazon Kinesis Firehose support Data Transformations programatically?

问题 I have a use case in which I have to verify that the payloads sent to Kinesis firehose are indeed being sent. In order to do that I came up with the chain Firehose -> Firehose Data transformation(using lambda) -> DDB -> Check for payload in DDB (the payload is the hashkey in the DDB). I have to define this entire chain in one shot programatically. The data transformation is the same as http://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html. I am doing all this since I cannot

CLI to put data into AWS Firehose

阅读更多关于 CLI to put data into AWS Firehose

问题 AWS Firehose was released today. I'm playing around with it and trying to figure out how to put data into the stream using AWS CLI. I have a simple JSON payload and the corresponding Redshift table with columns that map to the JSON attributes. I've tried various combinations but I can't seem to pass in the JSON payload via the cli. What I've tried: aws firehose put-record --delivery-stream-name test-delivery-stream --record '{ "attribute": 1 }' aws firehose put-record --delivery-stream-name

CLI to put data into AWS Firehose

阅读更多关于 CLI to put data into AWS Firehose

CLI to put data into AWS Firehose

阅读更多关于 CLI to put data into AWS Firehose

Repartitioning parquet-mr generated parquets with pyarrow/parquet-cpp increases file size by x30?

阅读更多关于 Repartitioning parquet-mr generated parquets with pyarrow/parquet-cpp increases file size by x30?

问题 Using AWS Firehose I am converting incoming records to parquet. In one example, I have 150k identical records enter firehose, and a single 30kb parquet gets written to s3. Because of how firehose partitions data, we have a secondary process (lambda triggered by s3 put event) read in the parquet and repartitions it based on the date within the event itself. After this repartitioning process, the 30kb file size jumps to 900kb. Inspecting both parquet files- The meta doesn't change The data

Repartitioning parquet-mr generated parquets with pyarrow/parquet-cpp increases file size by x30?

阅读更多关于 Repartitioning parquet-mr generated parquets with pyarrow/parquet-cpp increases file size by x30?

Repartitioning parquet-mr generated parquets with pyarrow/parquet-cpp increases file size by x30?

阅读更多关于 Repartitioning parquet-mr generated parquets with pyarrow/parquet-cpp increases file size by x30?

Write to a specific folder in S3 bucket using AWS Kinesis Firehose

阅读更多关于 Write to a specific folder in S3 bucket using AWS Kinesis Firehose

问题 I would like to be able to send data sent to kinesis firehose based on the content inside the data. For example if I sent this JSON data: { "name": "John", "id": 345 } I would like to filter the data based on id and send it to a subfolder of my s3 bucket like: S3://myS3Bucket/345_2018_03_05. Is this at all possible with Kinesis Firehose or AWS Lambda? The only way I can think of right now is to resort to creating a kinesis stream for every single one of my possible IDs and point them to the

Reading the data written to s3 by Amazon Kinesis Firehose stream

阅读更多关于 Reading the data written to s3 by Amazon Kinesis Firehose stream

问题 I am writing record to Kinesis Firehose stream that is eventually written to a S3 file by Amazon Kinesis Firehose. My record object looks like ItemPurchase { String personId, String itemId } The data is written to S3 looks like: {"personId":"p-111","itemId":"i-111"}{"personId":"p-222","itemId":"i-222"}{"personId":"p-333","itemId":"i-333"} NO COMMA SEPERATION. NO STARTING BRACKET as in a Json Array [ NO ENDING BRACKET as in a Json Array ] I want to read this data get a list of ItemPurchase

AWS DynamoDB Stream into Redshift

阅读更多关于 AWS DynamoDB Stream into Redshift

问题 We would like to move data from DynamoDB NoSQL into Redshift Database continously as a stream. I am having hard time understand all the new terms/technologies in AWS. There is 1) DynamoDB Streams 2) AWS Lambda 3) AWS Kinesis Firehose Can someone provide a brief summary of each. What are DynamoDB streams? How does this differ from AmazonKinesis? After reading all the resources, this is my hypothesis understanding, please verify below. (a) I assume DynamoDB Streams, create the streaming data of