flink-cep

flink program behaves differently in parallelism

こ雲淡風輕ζ 提交于 2021-02-19 08:55:07
问题 I am using Flink 1.4.1 and I am using CEP. I have to calculate lifetime order amount by the same user in each order. So when I sending orders Order A -> amount: 500, Order B -> amount: 200, Order C -> amount: 300 and calculating key by the user using states. Sometime in Order B, it's showing 700 and sometimes 200. Means sometimes it's adding order A in B, sometimes not. I am running code in 6 parallelisms. Is this parallelism issue or distributed state issue? When I run the whole program with

flink count distinct issue

怎甘沉沦 提交于 2021-02-11 14:26:39
问题 Now we use tumbling window to count distinct. The issue we have is if we extend our tumbling window from day to month, We can't have the number as of now distinct count. That means if we set the tumbling window as 1 month, the number we get is from every 1st of each month. How can I get the current distinct count for now(Now is Mar 9.)? package flink.trigger; import org.apache.flink.api.common.state.ReducingState; import org.apache.flink.api.common.state.ReducingStateDescriptor; import org

Apache Flink CEP how to detect if event did not occur within x seconds?

扶醉桌前 提交于 2021-02-07 09:39:40
问题 For example A should be followed by B within 10 seconds. I know how to track if this DID occur (.next, .within), but I want to send an alert if B never happened within the window. public static void main(String[] args) throws Exception { final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // checkpointing is required for exactly-once or at-least-once guarantees // env.enableCheckpointing(1000); final RMQConnectionConfig connectionConfig = new

Using Python user defined function in a Java Flink Job

纵然是瞬间 提交于 2021-02-05 07:19:24
问题 Is there anyway to use a python user defined function within a Java Flink Job or anyway to communicate for example the result of a transformation done by flink with java with a python user defined function to apply some machine learning things: I know that from pyFlink you can do something like this: table_env.register_java_function("hash_code", "my.java.function.HashCode") But I need to do something like that but add the python function from java, or how can I pass the result of a java

Causes of CPU load on the machine increase when flink run more more than 12 hours

对着背影说爱祢 提交于 2021-01-29 20:41:43
问题 I have a flink job, with parallelism set to 6, few simple transformations and the issue is that when Flink is been running for more than 12 hours for example the Load on the machine start to increase, then I thought that was because of the traffic into flink during some hours of the day, but the issue is that when the traffic goes down, the load on the machine continue a bit higher, lower than before but still higher. Use cases: DataStream<Event> from_source = rabbitConsumer .flatMap(new

How to check DataStream in flink is empty or having data

╄→гoц情女王★ 提交于 2021-01-29 10:33:01
问题 I am new to Apache flink i have a datastream which implements a process function if certain conditions is met then the datastream is valid and if its not meeting the conditions i am writing it to sideoutput. I am able to print the datastream is it possible to check the datastream is empty or null.I tried using datastream.equals(null) method but its not working.Please suggest how to know whether a datastream is empty or not 回答1: By "empty", I assume you mean that no data is flowing. What are

About StateTtlConfig

假装没事ソ 提交于 2021-01-29 10:02:24
问题 I'm configuring my StateTtlConfig for MapState and my interest is the objects into the state has for example 3 hours of life and then they should disappear from state and passed to the GC to be cleaned up and release some memory and the checkpoints should release some weight too I think. I had this configuration before and it seems like it was not working because the checkpoints where always growing up: private final StateTtlConfig ttlConfig = StateTtlConfig.newBuilder(org.apache.flink.api

Detect absence of a certain event

萝らか妹 提交于 2021-01-28 17:50:06
问题 In the documentation of FlinkCEP, I found that I can enforce that a particular event doesn't occur between two other events using notFollowedBy or notNext . However, I was wondering If I could detect the absence of a certain event after a time X. For example, if an event A is not followed by another event A within 10 seconds, fire an alert or do something. Could be possible to define a FlinkCEP pattern to capture that situation? Thanks in advance, Humberto 回答1: Although Flink CEP does not