Cloudera Enterprise 6.0.x | Other versions

Modifying Impala Startup Options

The configuration options for the Impala-related daemons let you choose which hosts and ports to use for the services that run on a single host, specify directories for logging, control resource usage and security, and specify other aspects of the Impala software.

Continue reading:

Configuring Impala Startup Options through Cloudera Manager

If you manage your cluster through Cloudera Manager, configure the settings for all the Impala-related daemons by navigating to this page: Clusters > Impala > Configuration > View and Edit. See the Cloudera Manager documentation for instructions about how to configure Impala through Cloudera Manager.

If the Cloudera Manager interface does not yet have a form field for a newly added option, or if you need to use special options for debugging and troubleshooting, the Advanced option page for each daemon includes one or more fields where you can enter option names directly. In Cloudera Manager 4, these fields are labelled Safety Valve; in Cloudera Manager 5, they are called Advanced Configuration Snippet. There is also a free-form field for query options, on the top-level Impala Daemon options page.

Impala Startup Options

Some common settings to change include:

  • Memory limits. You can limit the amount of memory available to Impala.

    You can specify the memory limit using absolute notation such as 500m or 2G, or as a percentage of physical memory such as 60%.

      Note: Queries that exceed the specified memory limit are aborted. Percentage limits are based on the physical memory of the machine and do not consider cgroups.
  • Core dump enablement. Enable the Enable Core Dump setting for the Impala service.

      Note:
    • The location of core dump files may vary according to your operating system configuration.

    • Other security settings may prevent Impala from writing core dumps even when this option is enabled.

    • The default location for core dumps is on a temporary filesystem, which can lead to out-of-space issues if the core dumps are large, frequent, or not removed promptly. To specify an alternative location for the core dumps, filter the Impala configuration settings to find the core_dump_dir option. This option lets you specify a different directory for core dumps for each of the Impala-related daemons.

  • Authorization using the open source Sentry plugin. See Enabling Sentry for Impala in Cloudera Manager for details.

  • Auditing for successful or blocked Impala queries, another aspect of security. Specify the -audit_event_log_dir=directory_path option and optionally the -max_audit_event_log_file_size=number_of_queries and -abort_on_failed_audit_event options as part of the IMPALA_SERVER_ARGS settings, for each Impala node, to enable and customize auditing. See Auditing Impala Operations for details.

  • Password protection for the Impala web UI, which listens on port 25000 by default. This feature involves adding some or all of the --webserver_password_file, --webserver_authentication_domain, and --webserver_certificate_file options to the IMPALA_SERVER_ARGS and IMPALA_STATE_STORE_ARGS settings. See Security Guidelines for Impala for details.

  • Another setting you might add to IMPALA_SERVER_ARGS is a comma-separated list of query options and values:
    -default_query_options='option=value,option=value,...'
    
    These options control the behavior of queries performed by this impalad instance. The option values you specify here override the default values for Impala query options, as shown by the SET statement in impala-shell.
  • During troubleshooting, Cloudera Support might direct you to change other values, particularly for IMPALA_SERVER_ARGS, to work around issues or gather debugging information.

  Note:

These startup options for the impalad daemon are different from the command-line options for the impala-shell command. For the impala-shell options, see impala-shell Configuration Options.

Checking the Values of Impala Configuration Options

You can check the current runtime value of all these settings through the Impala web interface, available by default at http://impala_hostname:25000/varz for the impalad daemon, http://impala_hostname:25010/varz for the statestored daemon, or http://impala_hostname:25020/varz for the catalogd daemon. In the Cloudera Manager interface, you can see the link to the appropriate service_name Web UI page when you look at the status page for a specific daemon on a specific host.

Startup Options for catalogd Daemon

The catalogd daemon implements the Impala catalog service, which broadcasts metadata changes to all the Impala nodes when Impala creates a table, inserts data, or performs other kinds of DDL and DML operations.

Use --load_catalog_in_background option to control when the metadata of a table is loaded.
  • If set to false, the metadata of a table is loaded when it is referenced for the first time. This means that the first run of a particular query can be slower than subsequent runs. Starting in Impala 2.2, the default for load_catalog_in_background is false.
  • If set to true, the catalog service attempts to load metadata for a table even if no query needed that metadata. So metadata will possibly be already loaded when the first query that would need it is run. However, for the following reasons, we recommend not to set the option to true.
    • Background load can interfere with query-specific metadata loading. This can happen on startup or after invalidating metadata, with a duration depending on the amount of metadata, and can lead to a seemingly random long running queries that are difficult to diagnose.
    • Impala may load metadata for tables that are possibly never used, potentially increasing catalog size and consequently memory usage for both catalog service and Impala Daemon.
Page generated July 25, 2018.