Cloudera Enterprise 6.0.x | Other versions

Monitoring the Performance of HDFS Replications

Note: This page contains references to CDH 5 components or features that have been removed from CDH 6. These references are only applicable if you are managing a CDH 5 cluster with Cloudera Manager 6. For more information, see Deprecated Items.

You can monitor the progress of an HDFS replication schedule using performance data that you download as a CSV file from the Cloudera Manager Admin console. This file contains information about the files being replicated, the average throughput, and other details that can help diagnose performance issues during HDFS replications. You can view this performance data for running HDFS replication jobs and for completed jobs.

To view the performance data for a running HDFS replication schedule:

Go to Backup > Replication Schedules.
Locate the schedule.
Click Performance Report and select one of the following options:
- HDFS Performance Summary – Download a summary report of the performance of the running replication job. An HDFS Performance Summary Report includes the last performance sample for each mapper that is working on the replication job.
- HDFS Performance Full – Download a full report of the performance of the running replication job. An HDFS Performance Full report includes all samples taken for all mappers during the full execution of the replication job.
To view the data, import the file into a spreadsheet program such as Microsoft Excel.

To view the performance data for a completed HDFS replication schedule:

Go to Backup > Replication Schedules.
Locate the schedule and click Actions > Show History.
The Replication History page for the replication schedule displays.
Click to expand the display for this schedule.
Click Download CSV link and select one of the following options:
- Listing – a list of files and directories copied during the replication job.
- Status - full status report of files where the status of the replication is one of the following:
  - ERROR – An error occurred and the file was not copied.
  - DELETED – A deleted file.
  - SKIPPED – A file where the replication was skipped because it was up-to-date.
- Error Status Only – full status report, filtered to show files with errors only.
- Deleted Status Only – full status report, filtered to show deleted files only.
- Skipped Status Only– full status report, filtered to show skipped files only.
- Performance – summary performance report.
- Full Performance – full performance report.
See HDFS Performance Report Columns for a description of the data in the performance reports.
To view the data, import the file into a spreadsheet program such as Microsoft Excel.

The performance data is collected every two minutes. Therefore, no data is available during the initial execution of a replication job because not enough samples are available to estimate throughput and other reported data.

The data returned by the CSV files downloaded from the Cloudera Manager Admin console has the following columns:

Table 1. HDFS Performance Report Columns
Performance Data Columns	Description
Timestamp	Time when the performance data was collected
Host	Name of the host where the YARN or MapReduce job was running.
SrcFile	Name of the source file being copied by the MapReduce job.
TgtFile	Name of the file to which the source file was being copied on the target.
BytesCopiedPerFile	Number of bytes copied for the file currently being copied.
TimeElapsedPerFile	Total time elapsed for this copy operation of the file currently being copied.
CurrThroughput	Current throughput in bytes per second.
AvgFileThroughput	Average throughput in bytes per second since the start of the file currently being copied.
TotalSleepTime	Number of seconds the transfer was stalled due to throughput throttling. This is expected to be zero unless the throughput was throttled using the Maximum Bandwidth parameter for the replication schedule. (You configure his parameter on the Advanced tab when creating or editing a replication schedule.)
AvgMapperThroughput	Average throughput for current mapper. This can include samples of throughput taken for various files copied by this mapper.
BytesCopiedPerMapper	Total bytes copied by this MapReduce job. This can include multiple files.
TimeElapsedPerMapper	Total time elapsed since this MapReduce job started copying files.

A sample CSV file, as presented in Excel, is shown here:

Note the following limitations and known issues:

If you click the CSV download too soon after the replication job starts, Cloudera Manager returns an empty file or a CSV file that has columns headers only and a message to try later when performance data has actually been collected.
If you employ a proxy user with the form user@domain, performance data is not available through the links.
If the replication job only replicates small files that can be transferred in less than a few minutes, no performance statistics are collected.
For replication schedules that specify the Dynamic Replication Strategy, statistics regarding the last file transferred by a MapReduce job hide previous transfers performed by that MapReduce job.
Only the last trace per MapReduce job is reported in the CSV file.

Page generated July 25, 2018.