This is the Apache Spark tutorial in Java. Here we start from the easiest parts of learning Spark and dive into more complicated topics.
Apache Spark is a powerful tool for processing of large amounts of data. It operates with RDDs – Resilient Distributed Datasets. RDD is the abstraction over distributed collection. It can be interacted in two ways: transformations and action. We will discuss it all in this tutorial.
1. Apache Spark basics
In this tutorial we go through Spark essentials. In the end you will be familiar with Spark API.
Here we list typical Spark errors and exeptions