First Spark Java Example

first spark java example

As we say before, Apache Spark has an API to provide computation within’ different programming languages. Today we will create your first Spark Java example program.

Preparation

At first, you need to have Maven installed. This is a build tool which we will use for dependency management. Also, you have to download and unzip Spark package. You can use your favorite Java IDE.

Create a new maven project and call it “spark01”. In this example we will work with the same README.md file as in previous lesson. You need to copy it from spark directory into the project directory. Then create a new class “FirstSparkJavaApp”.

Your project structure should look like this:

spark01
---src
------main
---------java
------------FirstSparkJavaApp.java
---pom.xml
---README.md

Maven file listing (pom.xml):

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>spark01</groupId>
    <artifactId>spark01</artifactId>
    <version>1.0</version>

    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.10</artifactId>
            <version>1.5.1</version>
        </dependency>
    </dependencies>
    
    <build>
        <pluginManagement>
            <plugins>
                <plugin>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>2.3.2</version>
                    <configuration>
                        <source>1.6</source>
                        <target>1.6</target>
                        <compilerArgument></compilerArgument>
                    </configuration>
                </plugin>
            </plugins>
        </pluginManagement>
    </build>

</project>

In the main method of FirstSparkJavaApp class we will create some computation. Let’s count the number of “Spark” occurrences in the readme-file and print it out:

import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;

public class FirstSparkJavaApp {
    public static void main(String[] args) {
        String readmeFile = "README.md";
        SparkConf configuration = new SparkConf()
                .setAppName("First Spark Java App")
                .setMaster("local");
        JavaSparkContext sc = new JavaSparkContext(configuration);
        JavaRDD<String> readmeRDD = sc.textFile(readmeFile).cache();

        long sparkCount = readmeRDD.filter(new Function<String, Boolean>() {
            public Boolean call(String s) { return s.contains("Spark"); }
        }).count();

        System.out.println("\nNumber of 'Spark' occurrences: " + sparkCount + "\n");
    }
}

Note: when we created the SparkConf instance we set the master to “local”. It is used for running spark on the local machine, not in cluster. In next articles we will describe the difference between local and cluster modes.

For computation we use the filter method which accepts a function that return a boolean value. If the value is true, the entry will be included in a resulting RDD.

Now run the main method and you will see the results in console. There will be a line containing “Number of ‘Spark’ occurrences: 18” among the spark logs.

spark java example log

There is another way of running spark application. Switch to command line and go into your project directory and run:

> mvn package

It will create a .jar file of your project in the target folder.

Then you can run our Spark example with this command:

<REPLACE_WITH_SPARK_HOME>/bin/spark-submit --class "FirstSparkJavaApp" --master local target/spark01-1.0.jar

You should change <REPLACE_WITH_SPARK_HOME> to the location of spark installation directory in your system. It will do the same thing, but in the console. And you will see the similar output.

That’s all! This was the first Java Spark example in action. This is where the beginners tutorial ends and advanced guide starts.

Leave a Reply

Be the First to Comment!