Apache Flink basics

apache flink basics

Apache Flink is a platform for distributed stream data processing. It is an efficient tool for manipulation with large amounts of data.

Flink was invented in Berlin Techical University (“Flink” means “quick” in German). It is a primary competitor to Apache Spark. The point is that Apache Flink works as a pure streaming engine, when Spark Streaming framework is just a fast micro-batch processing. Otherwise, Flink allows batch mode as a special case of processing.

Of course, Apache Flink has an API for Java and Scala. Inside the core it provides DataSet and DataStream APIs for working with batch and streams. There’s also a distributed runtime for dataflows.

Apache Flink libraries

Apache Flink has a stack of high-level APIs for Machine Learning (Flink ML), Graphs (Gelly) and Relational Data (Table API) processing. At this moment these libs are in beta-status.

apache flink libraries

Flink could be integrated with a bunch of well-known Big Data tools. You must know that it runs on YARN and works on top of Hadoop Distributed File System (HDFS). It can perform data streaming from Kafka, HBase and many other datasources.

Apache Flink Basics – sample code

Now let’s see the code for the simplest task to represent Flink basics with the DataSet API. We will count the number of “Flink” occurrences in a sample text.

final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

DataSet<String> lines = env.fromElements("This is a test data for Apache Flink",
        "In the end Flink will print the number",
        "of Flink occurrences in this text.",
        "Good luck!");
DataSet<Integer> flinkCounts = lines
        .flatMap(new FlatMapFunction<String, Integer>() {
            @Override
            public void flatMap(String line, Collector<Integer> collector) throws Exception {
                  if (line.contains("Flink")) collector.collect(1);
            }
        }).reduce(new ReduceFunction<Integer>() {
            @Override
            public Integer reduce(Integer first, Integer second) throws Exception {
                  return first + second;
            }
        });

flinkCounts.print();

We have a working example of Word Count task with Flink, you can see it here.

There’s also an article with Flink vs Spark comparison.

Leave a Reply

Be the First to Comment!