What is a container in YARN? Is it same as the child JVM in which the tasks on the nodemanager run or is it different?
The Container is the resource allocation, which is the successful result of the ResourceManager granting a specific ResourceRequest. A Container grants rights to an application to use a specific amount of resources (memory, cpu etc.) on a specific host.
In Hadoop 2.x, Container is a place where a unit of work occurs. For instance each MapReduce task(not the entire job) runs in one container.
An application/job will run on one or more containers.
Set of system resources are allocated for each container, currently CPU core and RAM are supported. Each node in a Hadoop cluster can run several containers.
In Hadoop 1.x a slot is allocated by the JobTracker to run each MapReduce task. Then the TaskTracker spawns a separate JVM for each task(unless JVM reuse is not enabled).
Container :
The logical lease on resources and the actual process spawned on the node is used interchangeably. It is same process in which tasks(or AM) runs. To start container we provide container object and CLC (ContainerLaunchContext) in which we set list of commands to run tasks (or AM).
nmClient.startContainer(container, clcObj)
ContainerLaunchContext code snippet :
<code>
.
.
.
/**
* Add the list of <em>commands</em> for launching the container. All
* pre-existing List entries are cleared before adding the new List
* @param commands the list of <em>commands</em> for launching the container
*/
@Public
@Stable
public abstract void setCommands(List<String> commands);
</code>