Category: BIG DATA

BIG DATA

FIRST SPARK JAVA EXAMPLE

As we say before, Apache Spark has an API to provide computation within’ different programming languages. Today we will create your first Spark Java example program. PREPARATION At first, you need to have Maven installed. This is a build tool which we will use for dependency management. Also, you have to download and unzip Spark package. You can […]

thaiphan 
BIG DATA

INSTALL APACHE SPARK

To install Apache Spark on your local machine you just simply need to download a package from official site. For this example chose version 1.5.1 and package type for Hadoop 2.6. Then click on the download link (see the screenshot). We will not use HDFS, so the installation process will be very easy and should take about about a […]

thaiphan 
BIG DATA

APACHE FLINK BASICS

Apache Flink is a platform for distributed stream data processing. It is an efficient tool for manipulation with large amounts of data. Flink was invented in Berlin Techical University (“Flink” means “quick” in German). It is a primary competitor to Apache Spark. The point is that Apache Flink works as a pure streaming engine, when […]

thaiphan 
BIG DATA

FIXING FLINK’S JAVA.LANG.RUNTIMEEXCEPTION: NO NEW DATA SINKS HAVE BEEN DEFINED SINCE THE LAST EXECUTION.

There’s a typical error for Apache Flink rookies: Exception in thread “main” java.lang.RuntimeException: No new data sinks have been defined since the last execution. The last execution refers to the latest call to ‘execute()’, ‘count()’, ‘collect()’, or ‘print()’. at org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:910) at org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:893) at org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:50) at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:789) … It is caused by the one line in […]

thaiphan 
BIG DATA

SPARK SHELL

Let’s take a closer look on Apache Spark shell. Its main mission is to represent Spark API and its opportunities. People do not usually switch to Spark-shell in scope of Big Data tasks. But it is still a powerful tool for data processing. To install Spark use our guide. As we mentioned in previous lesson, you can start […]

thaiphan 
BIG DATA

WHAT IS APACHE SPARK?

Apache Spark project is breaking limits of growth. It quickly became a trending buzz-word in Big Data field. So, what is Apache Spark? First of all, this is a very fast and efficient framework for big data processing. ABOUT APACHE SPARK Since 2014, when it got license of Apache 2.0, Spark became evolving. Now it consists of […]

thaiphan