Cloudera Enterprise 6.0.x | Other versions

Work Preserving Recovery for YARN Components

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

With work preserving recovery enabled, if a ResourceManager or NodeManager restarts, no in-flight work is lost. You can configure work preserving recovery separately for a ResourceManager or NodeManager. You can enable work preserving recovery whether or not you use ResourceManager High Availability.
  Note: YARN does not support high availability for the JobHistory Server (JHS). If the JHS goes down, Cloudera Manager will restart it automatically.
  Note:

After moving the JobHistory Server to a new host, the URLs listed for the JobHistory Server on the ResourceManager web UI still point to the old JobHistory Server. This affects existing jobs only. New jobs started after the move are not affected. For any existing jobs that have the incorrect JobHistory Server URL, there is no option other than to allow the jobs to roll off the history over time. For new jobs, make sure that all clients have the updated mapred-site.xml that references the correct JobHistory Server.

Configuring Work Preserving Recovery Using Cloudera Manager

Enabling Work Preserving Recovery on ResourceManager with Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

If you use Cloudera Manager and you enable YARN (MRv2) ResourceManager High Availability, work preserving recovery is enabled by default for the ResourceManager.

Disabling Work Preserving Recovery on ResourceManager Using Cloudera Manager

To disable Work Preserving Recovery for the ResourceManager:

  1. Go to the YARN service.
  2. Click the Configuration tab.
  3. Search for Enable ResourceManager Recovery.
  4. In the Enable ResourceManager Recovery field, clear the ResourceManager Default Group checkbox.
  5. Click Save Changes.

Enabling Work Preserving Recovery on NodeManager with Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

The default value for the recovery directory is /var/lib/hadoop-yarn/yarn-nm-recovery.

Work preserving recovery is enabled by default in Cloudera Manager managed clusters.

These are the steps to enable work preserving recovery for a given NodeManager, if needed:
  1. Edit the advanced configuration snippet for yarn-site.xml on that NodeManager, and set the value of yarn.nodemanager.recovery.enabled to true.
  2. Configure the directory on the local filesystem where state information is stored when work preserving recovery is enabled.
    1. Go to the YARN service.
    2. Click the Configuration tab.
    3. Search for NodeManager Recovery Directory.
    4. Enter the directory path in the NodeManager Recovery Directory field (for example, /var/lib/hadoop-yarn/yarn-nm-recovery).
    5. Click Save Changes.

Example Configuration for Work Preserving Recovery

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

The following example configuration can be used with a Cloudera Manager advanced configuration snippet. Adjust the configuration to suit your environment.
<property>
  <name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
  <value>true</value>
  <description>Whether to enable work preserving recovery for the Resource Manager</description>
</property>
<property>
  <name>yarn.nodemanager.recovery.enabled</name>
  <value>true</value>
  <description>Whether to enable work preserving recovery for the Node Manager</description>
</property>
<property>
  <name>yarn.nodemanager.recovery.dir</name>
  <value>/home/cloudera/recovery</value>
  <description>The location for stored state on the Node Manager, if work preserving recovery 
    is enabled.</description>
</property>
<property>
  <name>yarn.nodemanager.address</name>
  <value>0.0.0.0:45454</value>
</property>
Page generated July 25, 2018.