amazon-kinesis

AWS Firehose newline Character

一笑奈何 提交于 2020-07-08 11:46:53
问题 I've read a lot of similar questions around adding newline characters to firehose, but they're all around adding the newline character to the source. The problem is that I don't have access to the source, and a third party is piping data to our Kinesis instance and I cannot add the '\n' to the source. I've tried doing a firehose data transformation using the following code: 'use strict'; console.log('Loading function'); exports.handler = (event, context, callback) => { /* Process the list of

What is partition key in AWS Kinesis all about?

时光毁灭记忆、已成空白 提交于 2020-07-04 07:23:28
问题 I was reading about AWS Kinesis . In the following program, I write data into the stream named TestStream . I ran this piece of code 10 times, inserting 10 records into the stream. var params = { Data: 'More Sample data into the test stream ...', PartitionKey: 'TestKey_1', StreamName: 'TestStream' }; kinesis.putRecord(params, function(err, data) { if (err) console.log(err, err.stack); // an error occurred else console.log(data); // successful response }); All the records were inserted

Cross-account lambda trigger by kinesis

萝らか妹 提交于 2020-06-27 08:06:51
问题 I'm trying to trigger a lambda in account 'B' by a Kinesis stream in account 'A'. This is similar to what's described here, except the example uses S3 instead of Kinesis. To do this, I'm trying to set up the right permissions, but running into difficulties. First I add this permission: aws lambda add-permission \ --function-name "$function_name" \ --statement-id 'Id-123' \ --action "lambda:InvokeFunction" \ --principal $source_account \ --source-arn "$stream_arn" \ --source-account $source

Spark Streaming Kinesis partition key and sequence number log in java

房东的猫 提交于 2020-06-09 04:54:07
问题 We are using spark 2.4.3 in java. We would like to log the partition key and the sequence number of every event. The overloaded create stream function of the kinesis utils always throws a compilation error. Function<Record,Record> printSeq = s -> s; KinesisUtils.createStream( jssc, appName, streamName, endPointUrl, regionName, InitialPositionInStream.TRIM_HORIZON, kinesisCheckpointInterval, StorageLevel.MEMORY_AND_DISK_SER(), printSeq, Record.class); The exception is as follows: no suitable

spark streaming checkpoint recovery is very very slow

此生再无相见时 提交于 2020-05-10 07:23:07
问题 Goal: Read from Kinesis and store data in to S3 in Parquet format via spark streaming. Situation: Application runs fine initially, running batches of 1hour and the processing time is less than 30 minutes on average. For some reason lets say the application crashes, and we try to restart from checkpoint. The processing now takes forever and does not move forward. We tried to test out the same thing at batch interval of 1 minute, the processing runs fine and takes 1.2 minutes for batch to

Kinesis Streams and Flink

浪尽此生 提交于 2020-03-05 00:25:26
问题 I have a question regarding sharding data in a Kinesis stream. I would like to use a random partition key when sending user data to my kinesis stream so that the data in the shards is evenly distributed. For the sake of making this question simpler, I would then like to aggregate the user data by keying off of a userId in my Flink application. My question is this: if the shards are randomly partitioned so that data for one userId is spread across multiple Kinesis shards, can Flink handle

Read a Bytes image from Amazon Kinesis output in python

≡放荡痞女 提交于 2020-02-20 07:40:05
问题 I used imageio.get_reader(BytesIO(a), 'ffmpeg') to load a bytes image and save it as normal image. But the below error throws when I read the image using imageio.get_reader(BytesIO(a), 'ffmpeg') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/tango/anaconda3/lib/python3.6/site-packages/imageio/core/functions.py", line 186, in get_reader return format.get_reader(request) File "/home/tango/anaconda3/lib/python3.6/site-packages/imageio/core/format.py", line 164

Flink Kinesis Consumer not storing last successfully processed sequence nos

*爱你&永不变心* 提交于 2020-02-03 16:45:11
问题 We are using Flink Kinesis Consumer to consume data from Kinesis stream into our Flink application. KCL library uses a DynamoDB table to store last successfully processed Kinesis stream sequence nos. so that the next time application starts, it resumes from where it left off. But, it seems that Flink Kinesis Consumer does not maintain any such sequence nos. in any persistent store. As a result, we need to rely upon ShardIteratortype (trim_horizen, latest, etc) to decide where to resume Flink