Apache YARN basics


yarn logoApache YARN
was born as a module of Hadoop 2.0. It is responsible for resource management and task planning.

 

Previously, all this activity was handled by MapReduce’s JobTracker. But in YARN this responsibilities are separated between ResourceManager and ApplicationMaster. YARN can work with MapReduce applications and with all distributed applications that implements corresponding API (Apache Spark, Open MPI, Apache HAMA, Apache Giraph, etc.). One of the major benefits of Apache YARN is possibility of running several isolated applications in scope of one cluster.

ResourceManager (RM) distributes available resources to application based on the resource requirements of the application. This component is single for a cluster. It consists of two sub-components:

  • Scheduler that allocates resources;
  • ApplicationsManager that is responsible for ApplicationMaster instances.

NodeManager (NM) is a component that runs on every slave machine and launches app containers. It monitors resource usage and reports it to ResourceManager.

ApplicationMaster is a class implemented by the distributed application developer that communicates with RM and NMs. It negotiates resources from ResourceManger and tracks statuses and progress of containers. It also executes tasks on NodeManagers and monitors the execution.

yarn-basics-scheme

Apache YARN was created by engineers of Yahoo as “the next generation of MapReduce” and became available in 2012. It scales much better than MapReduce 1.0 because of separation of responsibilities to different components. It is planned to run on 10000+ nodes, when standard MapReduce limited to 4000.

Leave a Reply

Be the First to Comment!