amazon-kinesis-firehose

Sync data from Amazon Aurora to Redshift

寵の児 提交于 2019-12-23 17:31:57
问题 I am trying to setup a sync between AWS Aurora and Redshift. What is the best way to achieve this sync? Possible ways to sync can be: - Query table to find changes in a table(since I am only doing inserts, updates don't matter), export these changes to a flat file in S3 bucket and use Redshift copy command to insert into Redshift. Use python publisher and Boto3 to publish changes into a Kinesis stream and then consume this stream in Firehose from where I can copy directly into Redshift. Use

AWS Firehose cross region/account policy

放肆的年华 提交于 2019-12-23 04:00:38
问题 I am trying to create Firehose streams that can receive data from different regions in Account A, through AWS Lambda, and output into a redshift table in Account B. To do this I created an IAM role on Account A: { "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "firehose.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } I gave it the following permissions: { "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Action": [

Running AWS Firehose in lambda.js gives an undefined error

試著忘記壹切 提交于 2019-12-22 09:25:06
问题 var AWS = require('aws-sdk'); var firehose = new AWS.Firehose(); Running above code in lambda with proper roles configured, AWS returns errorMessage": "undefined is not a function Anyone have an idea how can I get firehouse to load in aws-sdk? 回答1: I opened ticket, with amazon and they verified firehouse isn't working with lambda, yet is working ec2. They have escalated the issue to a service-team in order to support FireHose. 回答2: If you update aws-sdk module to version 2.3.19 you should not

Can I customize partitioning in Kinesis Firehose before delivering to S3?

微笑、不失礼 提交于 2019-12-22 06:52:24
问题 I have a Firehose stream that is intended to ingest millions of events from different sources and of different event-types. The stream should deliver all data to one S3 bucket as a store of raw\unaltered data. I was thinking of partitioning this data in S3 based on metadata embedded within the event message like event-souce, event-type and event-date. However, Firehose follows its default partitioning based on record arrival time. Is it possible to customize this partitioning behavior to fit

Write parquet from AWS Kinesis firehose to AWS S3

≯℡__Kan透↙ 提交于 2019-12-17 22:35:39
问题 I would like to ingest data into s3 from kinesis firehose formatted as parquet. So far I have just find a solution that implies creating an EMR, but I am looking for something cheaper and faster like store the received json as parquet directly from firehose or use a Lambda function. Thank you very much, Javi. 回答1: Good news, this feature was released today! Amazon Kinesis Data Firehose can convert the format of your input data from JSON to Apache Parquet or Apache ORC before storing the data

Kinesis agent not parsing the file

我的梦境 提交于 2019-12-13 15:30:28
问题 I have the following in the agent.json { "cloudwatch.emitMetrics": true, "kinesis.endpoint": "", "firehose.endpoint": "", "flows": [ { "filePattern": "/home/ec2-user/ETLdata/contracts/Delta.csv", "kinesisStream": "ETL-rawdata-stream", "partitionKeyOption": "RANDOM", "dataProcessingOptions": [ { "optionName": "CSVTOJSON", "customFieldNames": [ "field1", "field2"], "delimiter": "," } ] } ] } When I add the specified file to the folder, literally nothing happens. I only see the below in the logs

How can I set the destination filename for AWS Firehose on S3?

寵の児 提交于 2019-12-11 01:44:00
问题 I'm processing an XML file added to S3 and writing the results to a firehose, and storing the results on the same S3 bucket, but the destination filename has to be in a specific format. I've examing the documentation and I can't see any way to set the format of the filename. The closest I can find is in the firehose FAQ Q: What is the naming pattern of the Amazon S3 objects delivered by Amazon Kinesis Data Firehose? The Amazon S3 object name follows the pattern DeliveryStreamName

AWS Firehose cross region/account policy

有些话、适合烂在心里 提交于 2019-12-09 04:14:29
I am trying to create Firehose streams that can receive data from different regions in Account A, through AWS Lambda, and output into a redshift table in Account B. To do this I created an IAM role on Account A: { "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "firehose.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } I gave it the following permissions: { "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Action": [ "s3:AbortMultipartUpload", "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket", "s3

Kinesis Firehose putting JSON objects in S3 without seperator comma

安稳与你 提交于 2019-12-07 08:12:58
问题 Before sending the data I am using JSON.stringify to the data and it looks like this {"data": [{"key1": value1, "key2": value2}, {"key1": value1, "key2": value2}]} But once it passes through AWS API Gateway and Kinesis Firehose puts it to S3 it looks like this { "key1": value1, "key2": value2 }{ "key1": value1, "key2": value2 } The seperator comma between the JSON objects are gone but I need it to process data properly. Template in the API Gateway: #set($root = $input.path('$')) {

How do I use the requestShutdown and shutdown to do graceful shutdown in the case of KCL Java library for AWS Kinesis

孤街浪徒 提交于 2019-12-06 08:59:46
问题 I am trying to use the new feature of KCL library in Java for AWS Kinesis to do a graceful shutdown by registering with shutdown hook to stop all the record processors and then the worker gracefully. The new library provides a new interface which record processors needs to be implemented. But how does it get invoked? Tried invoking first the worker.requestShutdown() then worker.shutdown() and it works. But is it any intended way to use it. What is the use then to use both, and its benefit?