Cloudera Enterprise 6.0.x | Other versions

LIVE_SUMMARY Query Option (CDH 5.5 or higher only)

For queries submitted through the impala-shell command, displays the same output as the SUMMARY command, with the measurements updated in real time as the query progresses. When the query finishes, the final SUMMARY output remains visible in the impala-shell console output.

Type: Boolean; recognized values are 1 and 0, or true and false; any other value interpreted as false

Default: false (shown as 0 in output of SET statement)

Command-line equivalent:

You can enable this query option within impala-shell by starting the shell with the --live_summary command-line option. You can still turn this setting off and on again within the shell through the SET command.

Usage notes:

The live summary output can be useful for evaluating long-running queries, to evaluate which phase of execution takes up the most time, or if some hosts take much longer than others for certain operations, dragging overall performance down. By making the information available in real time, this feature lets you decide what action to take even before you cancel a query that is taking much longer than normal.

For example, you might see the HDFS scan phase taking a long time, and therefore revisit performance-related aspects of your schema design such as constructing a partitioned table, switching to the Parquet file format, running the COMPUTE STATS statement for the table, and so on. Or you might see a wide variation between the average and maximum times for all hosts to perform some phase of the query, and therefore investigate if one particular host needed more memory or was experiencing a network problem.

The output from this query option is printed to standard error. The output is only displayed in interactive mode, that is, not when the -q or -f options are used.

For a simple and concise way of tracking the progress of an interactive query, see LIVE_PROGRESS Query Option (CDH 5.5 or higher only).

Restrictions:

The LIVE_PROGRESS and LIVE_SUMMARY query options currently do not produce any output during COMPUTE STATS operations.

Because the LIVE_PROGRESS and LIVE_SUMMARY query options are available only within the impala-shell interpreter:
  • You cannot change these query options through the SQL SET statement using the JDBC or ODBC interfaces. The SET command in impala-shell recognizes these names as shell-only options.

  • Be careful when using impala-shell on a pre-CDH 5.5 system to connect to Impala running on a CDH 5.5 or higher system. The older impala-shell does not recognize these query option names. Upgrade impala-shell on the systems where you intend to use these query options.

  • Likewise, the impala-shell command relies on some information only available in CDH 5.5 / Impala 2.3 and higher to prepare live progress reports and query summaries. The LIVE_PROGRESS and LIVE_SUMMARY query options have no effect when impala-shell connects to a cluster running an older version of Impala.

Added in: CDH 5.5.0 / Impala 2.3.0

Examples:

The following example shows a series of LIVE_SUMMARY reports that are displayed during the course of a query, showing how the numbers increase to show the progress of different phases of the distributed query. When you do the same in impala-shell, only a single report is displayed at any one time, with each update overwriting the previous numbers.

[localhost:21000] > set live_summary=true;
LIVE_SUMMARY set to true
[localhost:21000] > select count(*) from customer t1 cross join customer t2;
+---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem | Est. Peak Mem | Detail                |
+---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | FINALIZE              |
| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | UNPARTITIONED         |
| 03:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | 10.00 MB      |                       |
| 02:NESTED LOOP JOIN | 0      | 0ns      | 0ns      | 0       | 22.50B     | 0 B      | 0 B           | CROSS JOIN, BROADCAST |
| |--04:EXCHANGE      | 0      | 0ns      | 0ns      | 0       | 150.00K    | 0 B      | 0 B           | BROADCAST             |
| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 24.09 MB | 64.00 MB      | tpch.customer t2      |
| 00:SCAN HDFS        | 0      | 0ns      | 0ns      | 0       | 150.00K    | 0 B      | 64.00 MB      | tpch.customer t1      |
+---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+

+---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem | Est. Peak Mem | Detail                |
+---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | FINALIZE              |
| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | UNPARTITIONED         |
| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 20.00 KB | 10.00 MB      |                       |
| 02:NESTED LOOP JOIN | 1      | 17.62s   | 17.62s   | 81.14M  | 22.50B     | 3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 0 B      | 0 B           | BROADCAST             |
| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 24.09 MB | 64.00 MB      | tpch.customer t2      |
| 00:SCAN HDFS        | 1      | 247.53ms | 247.53ms | 1.02K   | 150.00K    | 24.39 MB | 64.00 MB      | tpch.customer t1      |
+---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+

+---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem | Est. Peak Mem | Detail                |
+---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | FINALIZE              |
| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | UNPARTITIONED         |
| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 20.00 KB | 10.00 MB      |                       |
| 02:NESTED LOOP JOIN | 1      | 61.85s   | 61.85s   | 283.43M | 22.50B     | 3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 0 B      | 0 B           | BROADCAST             |
| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 24.09 MB | 64.00 MB      | tpch.customer t2      |
| 00:SCAN HDFS        | 1      | 247.59ms | 247.59ms | 2.05K   | 150.00K    | 24.39 MB | 64.00 MB      | tpch.customer t1      |
+---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+

To see how the LIVE_PROGRESS and LIVE_SUMMARY query options work in real time, see this animated demo.

Page generated July 25, 2018.