flink-streaming

Apache flink on Kubernetes - Resume job if jobmanager crashes

蹲街弑〆低调 提交于 2019-12-30 02:33:05
问题 I want to run a flink job on kubernetes, using a (persistent) state backend it seems like crashing taskmanagers are no issue as they can ask the jobmanager which checkpoint they need to recover from, if I understand correctly. A crashing jobmanager seems to be a bit more difficult. On this flip-6 page I read zookeeper is needed to be able to know what checkpoint the jobmanager needs to use to recover and for leader election. Seeing as kubernetes will restart the jobmanager whenever it crashes

Flink 1.2 does not start in HA Cluster mode

守給你的承諾、 提交于 2019-12-25 04:59:05
问题 I've installed Flink 1.2 in HA cluster mode 2 JobManagers 1 TaskManager locally and it kept refusing to actually start in this mode showing "Starting cluster." message instead of "Starting HA cluster with 2 masters and 1 peers in ZooKeeper quorum." Apparently in the bin/config.sh it reads the configuration like: # High availability if [ -z "${HIGH_AVAILABILITY}" ]; then HIGH_AVAILABILITY=$(readFromConfig ${KEY_HIGH_AVAILABILITY} "" "${YAML_CONF}") if [ -z "${HIGH_AVAILABILITY}" ]; then # Try

An exponentially decaying moving average over a hopping window in Flink SQL: Casting time

大兔子大兔子 提交于 2019-12-24 19:40:38
问题 Now we have SQL with fancy windowing in Flink, I'm trying to have the decaying moving average referred by "what will be possible in future Flink releases for both the Table API and SQL." from their SQL roadmap/preview 2017-03 post: table .window(Slide over 1.hour every 1.second as 'w) .groupBy('productId, 'w) .select( 'w.end, 'productId, ('unitPrice * ('rowtime - 'w.start).exp() / 1.hour).sum / (('rowtime - 'w.start).exp() / 1.hour).sum) Here is my attempt (inspired as well by the calcite

Sorting union of streams to identify user sessions in Apache Flink

人盡茶涼 提交于 2019-12-24 09:35:10
问题 I have two streams of events L = (l1, l3, l8, ...) - is sparser and represents user logins to a IP E = (e2, e4, e5, e9, ...) - is a stream of logs the particular IP the lower index represents a timestamp... If we joined the two streams together and sorted them by time we would get: l1 , e2 , l3 , e4, e5 , l8 , e9 , ... Would it be possible to implement custom Window / Trigger functions to group the event to sessions (time between logins of different users): l1 - l3 : e2 l3 - l8 : e4, e5 l8 -

How to read and write to HBase in flink streaming job

强颜欢笑 提交于 2019-12-24 09:04:36
问题 If we have to read and write to HBASE in a streaming application how could we do that. We open a connection via open method for write, how could we open a connection for read. object test { if (args.length != 11) { //print args System.exit(1) } val Array() = args println("Parameters Passed " + ...); val env = StreamExecutionEnvironment.getExecutionEnvironment val properties = new Properties() properties.setProperty("bootstrap.servers", metadataBrokerList) properties.setProperty("zookeeper

Enrich fast stream keyed by (X,Y) with a slowly change stream keyed by (X) in Flink

故事扮演 提交于 2019-12-24 05:15:09
问题 I need to enrich my fast changing streamA keyed by (userId, startTripTimestamp) with slowly changing streamB keyed by (userId). I use Flink 1.8 with DataStream API. I consider 2 approaches: Broadcast streamB and join stream by userId and most recent timestamp. Would it be equivalent of DynamicTable from the TableAPI? I can see some downsides of this solution: streamB needs to fit into RAM of each worker node, it increase utilization of RAM as whole streamB needs to be stored in RAM of each

Reuse of a Stream is a copy of stream or not

江枫思渺然 提交于 2019-12-24 02:52:42
问题 For example, there is a keyed stream: val keyedStream: KeyedStream[event, Key] = env .addSource(...) .keyBy(...) // several transformations on the same stream keyedStream.map(....) keyedStream.window(....) keyedStream.split(....) keyedStream...(....) I think this is the reuse of same stream in Flink, what I found is that when I reused it, the content of stream is not affected by the other transformation, so I think it is a copy of a same stream. But I don't know if it is right or not. If yes,

Adding patterns dynamically in Apache Flink without restarting job

人走茶凉 提交于 2019-12-24 01:17:22
问题 My use case is that I want to apply different CEP patterns to the same datastream. the CEP patterns come dynamically & i want them to be added to flink without having to restart the job. While all conditions can be handled via custom classes that implement IterativeCondition, my main problem is that the temporal condition accepts only TimeWindow; which cannot be handled. Is there some way that the value passed to .within() be set based on the input elements? Something similar was asked here:

WindowFunction cannot be applied using WindowStream.apply() function

不打扰是莪最后的温柔 提交于 2019-12-24 00:54:45
问题 I'm relatively new to using Apache Flink and Scala, and am just getting to grips with some of the basic functionality. I've hit a wall trying to implement a custom WindowFunction . The problem is that when I try to implement a custom WindowFunction the IDE gives an error on the ".apply()" function Cannot resolve symbol apply Unspecified value parameters: foldFunction: (NotInferedR, Data.Fingerprint) => NotInferedR, windowFunction: (Tuple, TimeWindow, Iterable[NotInferedR], Collector

How to debug serializable exception in Flink?

拈花ヽ惹草 提交于 2019-12-23 20:06:19
问题 I've encountered several serializable exceptions , and I did some searching on Flink's internet and doc; there are some famous solutions like transient, extends Serializable etc. Each time the origin of exception is very clear, but in my case, i am unable to find where exactly it is not serialized. Q: How should i debug this kind of Exception? A.scala: class executor ( val sink: SinkFunction[List[String]] { def exe(): Unit = { xxx.....addSink(sinks) } } B.scala: class Main extends App { def