问题
I am looking for the simplest possible example of an hello-world experience with Apache flink.
Assume I have just installed flink on a clean box, what is the bare minimum I would need to do to 'make it do something'. I realize this is quite vague, here are some examples.
Three python examples from the terminal:
python -c "print('hello world')"
python hello_world.py
python python -c "print(1+1)"
Of course a streaming application is a bit more complicated, but here is something similar that I did for spark streaming earlier:
https://spark.apache.org/docs/latest/streaming-programming-guide.html#a-quick-example
As you see these examples have some nice properties:
- They are minimal
- There are minimal dependencies on other tools/resources
- The logic can be trivially adjusted (e.g different number or different separator)
So my question:
What is the simplest hello world example for Flink
What I found so far are examples with 50 lines of code that you need to compile.
If this cannot be avoided due to point 3, then something that satisfies points 1 and 2 and uses (only) jars that are shipped by default, or easily available from a reputable source, would also be fine.
回答1:
In most of Big data and related framework we give Word Count program as Hello World example. Below is the code for word count in Flink:
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<String> text = env.fromCollection(Arrays.asList("This is line one. This is my line number 2. Third line is here".split(". ")));
DataSet<Tuple2<String, Integer>> wordCounts = text
.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
@Override
public void flatMap(String line, Collector<Tuple2<String, Integer>> out) throws Exception {
for (String word : line.split(" ")) {
out.collect(new Tuple2<>(word, 1));
}
}
})
.groupBy(0)
.sum(1);
wordCounts.print();
Reading from a file
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
//The path of the file, as a URI
//(e.g., "file:///some/local/file" or "hdfs://host:port/file/path").
DataSet<String> text = env.readTextFile("/path/to/file");
DataSet<Tuple2<String, Integer>> wordCounts = text
.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
@Override
public void flatMap(String line, Collector<Tuple2<String, Integer>> out) throws Exception {
for (String word : line.split(" ")) {
out.collect(new Tuple2<String, Integer>(word, 1));
}
}
})
.groupBy(0)
.sum(1);
wordCounts.print();
Do not handle exception thrown on wordCounts.print() using try catch but instead add throw to method signature.
Add the following dependency to the pom.xml.
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.8.0</version>
</dependency>
Read about flatMap, groupBy, sum and other flink operations here : https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/
Flink Streaming documentation and examples: https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/datastream_api.html
回答2:
Ok, how about this:
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.fromElements(1, 2, 3, 4, 5)
.map(i -> 2 * i)
.print();
env.execute();
}
回答3:
Minimal steps with standard resources
I am not sure if this will be the ultimate answer, but have found that flink typically ships with examples, that allow for some easy interaction with minimal effort.
Here is a possible hello world example with standard resources that come with flink 1.9.1, based on the default wordcount:
Make sure your flink cluster is started, and that you have three terminals open in the flink directory.
In terminal 1 open a connection to the right port
nc -l 9000
- In the same terminal on the next line type some text and hit enter
Hello World
- In terminal 2 initiate the standard wordcount logic
./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000
- In terminal 3 check the result of the count
tail -f log/flink-*-taskexecutor-*.out
You should now see:
Hello : 1
World : 1
That's it, from here you can type more into terminal 1 and when you check the logs again you will see the updated wordcount.
If you already did this once before and want to start fresh, you could clear the logs (assuming a sandbox environment) with rm log/flink-*-taskexecutor-*.out
来源:https://stackoverflow.com/questions/59347209/simple-hello-world-example-for-flink