问题

I am looking for the simplest possible example of an hello-world experience with Apache flink.

Assume I have just installed flink on a clean box, what is the bare minimum I would need to do to 'make it do something'. I realize this is quite vague, here are some examples.

Three python examples from the terminal:

python -c "print('hello world')"
python hello_world.py
python python -c "print(1+1)"

Of course a streaming application is a bit more complicated, but here is something similar that I did for spark streaming earlier:

https://spark.apache.org/docs/latest/streaming-programming-guide.html#a-quick-example

As you see these examples have some nice properties:

They are minimal
There are minimal dependencies on other tools/resources
The logic can be trivially adjusted (e.g different number or different separator)

So my question:

What is the simplest hello world example for Flink

What I found so far are examples with 50 lines of code that you need to compile.

If this cannot be avoided due to point 3, then something that satisfies points 1 and 2 and uses (only) jars that are shipped by default, or easily available from a reputable source, would also be fine.

回答1:

In most of Big data and related framework we give Word Count program as Hello World example. Below is the code for word count in Flink:

final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    DataSet<String> text = env.fromCollection(Arrays.asList("This is line one. This is my line number 2. Third line is here".split(". ")));

    DataSet<Tuple2<String, Integer>> wordCounts = text
        .flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
          @Override
          public void flatMap(String line, Collector<Tuple2<String, Integer>> out) throws Exception {
            for (String word : line.split(" ")) {
              out.collect(new Tuple2<>(word, 1));
            }
          }
        })
        .groupBy(0)
        .sum(1);

wordCounts.print();

Reading from a file

final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(1);

    //The path of the file, as a URI
    //(e.g., "file:///some/local/file" or "hdfs://host:port/file/path").
    DataSet<String> text = env.readTextFile("/path/to/file");

    DataSet<Tuple2<String, Integer>> wordCounts = text
        .flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
          @Override
          public void flatMap(String line, Collector<Tuple2<String, Integer>> out) throws Exception {
            for (String word : line.split(" ")) {
              out.collect(new Tuple2<String, Integer>(word, 1));
            }
          }
        })
        .groupBy(0)
        .sum(1);

    wordCounts.print();

Do not handle exception thrown on wordCounts.print() using try catch but instead add throw to method signature.

Add the following dependency to the pom.xml.

<dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-java</artifactId>
      <version>1.8.0</version>
</dependency>

Read about flatMap, groupBy, sum and other flink operations here : https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/

Flink Streaming documentation and examples: https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/datastream_api.html

回答2:

Ok, how about this:

public static void main(String[] args) throws Exception {
  final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

  env.fromElements(1, 2, 3, 4, 5)
    .map(i -> 2 * i)
    .print();

  env.execute();
}

回答3:

Minimal steps with standard resources

I am not sure if this will be the ultimate answer, but have found that flink typically ships with examples, that allow for some easy interaction with minimal effort.

Here is a possible hello world example with standard resources that come with flink 1.9.1, based on the default wordcount:

Make sure your flink cluster is started, and that you have three terminals open in the flink directory.
In terminal 1 open a connection to the right port

nc -l 9000

In the same terminal on the next line type some text and hit enter

Hello World

In terminal 2 initiate the standard wordcount logic

./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000

In terminal 3 check the result of the count

tail -f log/flink-*-taskexecutor-*.out

You should now see:

Hello : 1
World : 1

That's it, from here you can type more into terminal 1 and when you check the logs again you will see the updated wordcount.

If you already did this once before and want to start fresh, you could clear the logs (assuming a sandbox environment) with rm log/flink-*-taskexecutor-*.out

来源：https://stackoverflow.com/questions/59347209/simple-hello-world-example-for-flink

标签

apache-flink

Simple hello world example for Flink

问题

What is the simplest hello world example for Flink

回答1:

回答2:

回答3:

Minimal steps with standard resources