Apache Impala - Interactive SQL
The Apache Impala project provides high-performance, low-latency SQL queries on data stored in popular Apache Hadoop file formats. The fast response for queries enables interactive exploration and fine-tuning of analytic queries, rather than long batch jobs traditionally associated with SQL-on-Hadoop technologies. (You will often see the term "interactive" applied to these kinds of fast queries with human-scale response times.)
Impala integrates with the Apache Hive metastore database, to share databases and tables between both components. The high level of integration with Hive, and compatibility with the HiveQL syntax, lets you use either Impala or Hive to create tables, issue queries, load data, and so on.
- Impala integrates with the existing CDH ecosystem, meaning data can be stored, shared, and accessed using the various solutions included with CDH. This also avoids data silos and minimizes expensive data movement.
- Impala provides access to data stored in CDH without requiring the Java skills required for MapReduce jobs. Impala can access data directly from the HDFS file system. Impala also provides a SQL front-end to access data in the HBase database system, or in the Amazon Simple Storage System (S3).
- Impala returns results typically within seconds or a few minutes, rather than the many minutes or hours that are often required for Hive queries to complete.
- Impala is pioneering the use of the Parquet file format, a columnar storage layout that is optimized for large-scale queries typical in data warehouse scenarios.
Continue reading:
- Impala Concepts and Architecture
- Planning for Impala Deployment
- Impala Tutorials
- Impala Administration
- Impala SQL Language Reference
- Using the Impala Shell (impala-shell Command)
- Tuning Impala for Performance
- Scalability Considerations for Impala
- Partitioning for Impala Tables
- How Impala Works with Hadoop File Formats
- Using Impala to Query Kudu Tables
- Using Impala to Query HBase Tables
- Using Impala with the Amazon S3 Filesystem
- Using Impala with the Azure Data Lake Store (ADLS)
- Using Impala with Isilon Storage
- Using Impala Logging
- Troubleshooting Impala
- Ports Used by Impala
- Impala Reserved Words
- Impala Frequently Asked Questions
- Using Apache Parquet Data Files with CDH
- Installing Impala
- Using CDH with Isilon Storage
- Dynamic Resource Pools
- Dynamic Resource Pools
- Using the Lineage View
- Managing Hive and Impala Lineage Properties
- How to Enable Sensitive Data Redaction
- Hive SQL Syntax for Use with Sentry
- Impala Concepts and Architecture
- Components of the Impala Server
- Developing Impala Applications
- How Impala Fits Into the Hadoop Ecosystem
- Planning for Impala Deployment
- Impala Requirements
- Guidelines for Designing Impala Schemas
- Installing Impala with Cloudera Manager
- Managing Impala
- Post-Installation Configuration for Impala
- Configuring Impala to Work with ODBC
- Configuring Impala to Work with JDBC
- Impala Upgrade Considerations
- Modifying Impala Startup Options
- Impala Tutorials
- Impala Administration
- Admission Control and Query Queuing
- Resource Management for Impala
- How to Configure Resource Management for Impala
- Setting Timeout Periods for Daemons, Queries, and Sessions
- Using Impala through a Proxy for High Availability
- Managing Disk Space for Impala Data
- Auditing Impala Operations
- Viewing Lineage Information for Impala Data
- Impala Security Overview
- Security Guidelines for Impala
- Securing Impala Data and Log Files
- Installation Considerations for Impala Security
- Securing the Hive Metastore Database
- Securing the Impala Web User Interface
- Configuring TLS/SSL for Impala
- Enabling Sentry Authorization for Impala
- Impala Authentication
- Enabling Kerberos Authentication for Impala
- Enabling LDAP Authentication for Impala
- Using Multiple Authentication Methods with Impala
- Configuring Impala Delegation for Hue and BI Tools
- Impala SQL Language Reference
- Comments
- Data Types
- ARRAY Complex Type (CDH 5.5 or higher only)
- BIGINT Data Type
- BOOLEAN Data Type
- CHAR Data Type (CDH 5.2 or higher only)
- DECIMAL Data Type (CDH 6.0 / Impala 3.0 or higher only)
- DOUBLE Data Type
- FLOAT Data Type
- INT Data Type
- MAP Complex Type (CDH 5.5 or higher only)
- REAL Data Type
- SMALLINT Data Type
- STRING Data Type
- STRUCT Complex Type (CDH 5.5 or higher only)
- TIMESTAMP Data Type
- TINYINT Data Type
- VARCHAR Data Type (CDH 5.2 or higher only)
- Complex Types (CDH 5.5 or higher only)
- Literals
- SQL Operators
- Impala Schema Objects and Object Names
- Overview of Impala Aliases
- Overview of Impala Databases
- Overview of Impala Functions
- Overview of Impala Identifiers
- Overview of Impala Tables
- Overview of Impala Views
- Impala SQL Statements
- DDL Statements
- DML Statements
- ALTER TABLE Statement
- ALTER VIEW Statement
- COMPUTE STATS Statement
- CREATE DATABASE Statement
- CREATE FUNCTION Statement
- CREATE ROLE Statement (CDH 5.2 or higher only)
- CREATE TABLE Statement
- CREATE VIEW Statement
- DELETE Statement (CDH 5.10 or higher only)
- DESCRIBE Statement
- DROP DATABASE Statement
- DROP FUNCTION Statement
- DROP ROLE Statement (CDH 5.2 or higher only)
- DROP STATS Statement
- DROP TABLE Statement
- DROP VIEW Statement
- EXPLAIN Statement
- GRANT Statement (CDH 5.2 or higher only)
- INSERT Statement
- INVALIDATE METADATA Statement
- LOAD DATA Statement
- REFRESH Statement
- REVOKE Statement (CDH 5.2 or higher only)
- SELECT Statement
- Joins in Impala SELECT Statements
- ORDER BY Clause
- GROUP BY Clause
- TABLESAMPLE Clause
- HAVING Clause
- LIMIT Clause
- OFFSET Clause
- UNION Clause
- Subqueries in Impala SELECT Statements
- WITH Clause
- DISTINCT Operator
- Optimizer Hints in Impala
- SET Statement
- Query Options for the SET Statement
- ABORT_ON_ERROR Query Option
- ALLOW_UNSUPPORTED_FORMATS Query Option
- APPX_COUNT_DISTINCT Query Option (CDH 5.2 or higher only)
- BATCH_SIZE Query Option
- BUFFER_POOL_LIMIT Query Option
- COMPRESSION_CODEC Query Option (CDH 5.2 or higher only)
- COMPUTE_STATS_MIN_SAMPLE_SIZE Query Option
- DEBUG_ACTION Query Option
- DECIMAL_V2 Query Option
- DEFAULT_JOIN_DISTRIBUTION_MODE Query Option
- DEFAULT_SPILLABLE_BUFFER_SIZE Query Option
- DISABLE_CODEGEN Query Option
- DISABLE_CODEGEN_ROWS_THRESHOLD Query Option (CDH 5.13 / Impala 2.10 or higher only)
- DISABLE_OUTERMOST_TOPN Query Option
- DISABLE_ROW_RUNTIME_FILTERING Query Option (CDH 5.7 or higher only)
- DISABLE_STREAMING_PREAGGREGATIONS Query Option (CDH 5.7 or higher only)
- DISABLE_UNSAFE_SPILLS Query Option (CDH 5.2 or higher only)
- EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (CDH 5.3 or higher only)
- EXEC_TIME_LIMIT_S Query Option (CDH 5.15 / Impala 2.12 or higher only)
- EXPLAIN_LEVEL Query Option
- HBASE_CACHE_BLOCKS Query Option
- HBASE_CACHING Query Option
- LIVE_PROGRESS Query Option (CDH 5.5 or higher only)
- LIVE_SUMMARY Query Option (CDH 5.5 or higher only)
- MAX_ERRORS Query Option
- MAX_ROW_SIZE Query Option
- MAX_SCAN_RANGE_LENGTH Query Option
- MAX_NUM_RUNTIME_FILTERS Query Option (CDH 5.7 or higher only)
- MEM_LIMIT Query Option
- MIN_SPILLABLE_BUFFER_SIZE Query Option
- MT_DOP Query Option
- NUM_NODES Query Option
- NUM_SCANNER_THREADS Query Option
- OPTIMIZE_PARTITION_KEY_SCANS Query Option (CDH 5.7 or higher only)
- PARQUET_COMPRESSION_CODEC Query Option
- PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (CDH 5.8 or higher only)
- PARQUET_ARRAY_RESOLUTION Query Option (CDH 5.12 or higher only)
- PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (CDH 5.8 or higher only)
- PARQUET_FILE_SIZE Query Option
- PREFETCH_MODE Query Option (CDH 5.8 or higher only)
- QUERY_TIMEOUT_S Query Option (CDH 5.2 or higher only)
- REQUEST_POOL Query Option
- SCHEDULE_RANDOM_REPLICA Query Option (CDH 5.7 or higher only)
- REPLICA_PREFERENCE Query Option (CDH 5.9 or higher only)
- RUNTIME_BLOOM_FILTER_SIZE Query Option (CDH 5.7 or higher only)
- RUNTIME_FILTER_MAX_SIZE Query Option (CDH 5.8 or higher only)
- RUNTIME_FILTER_MIN_SIZE Query Option (CDH 5.8 or higher only)
- RUNTIME_FILTER_MODE Query Option (CDH 5.7 or higher only)
- RUNTIME_FILTER_WAIT_TIME_MS Query Option (CDH 5.7 or higher only)
- S3_SKIP_INSERT_STAGING Query Option (CDH 5.8 or higher only)
- SCRATCH_LIMIT Query Option
- SHUFFLE_DISTINCT_EXPRS Query Option
- SUPPORT_START_OVER Query Option
- SYNC_DDL Query Option
- SHOW Statement
- TRUNCATE TABLE Statement (CDH 5.5 or higher only)
- UPDATE Statement (CDH 5.10 or higher only)
- UPSERT Statement (CDH 5.10 or higher only)
- USE Statement
- Impala Built-In Functions
- Impala Mathematical Functions
- Impala Bit Functions
- Impala Type Conversion Functions
- Impala Date and Time Functions
- Impala Conditional Functions
- Impala String Functions
- Impala Miscellaneous Functions
- Impala Aggregate Functions
- APPX_MEDIAN Function
- AVG Function
- COUNT Function
- GROUP_CONCAT Function
- MAX Function
- MIN Function
- NDV Function
- STDDEV, STDDEV_SAMP, STDDEV_POP Functions
- SUM Function
- VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions
- Impala Analytic Functions
- Impala User-Defined Functions (UDFs)
- SQL Differences Between Impala and Hive
- Porting SQL from Other Database Systems to Impala
- Using the Impala Shell (impala-shell Command)
- impala-shell Configuration Options
- Connecting to impalad through impala-shell
- Running Commands and SQL Statements in impala-shell
- impala-shell Command Reference
- Tuning Impala for Performance
- Impala Performance Guidelines and Best Practices
- Performance Considerations for Join Queries
- Table and Column Statistics
- Benchmarking Impala Queries
- Controlling Impala Resource Usage
- Runtime Filtering for Impala Queries (CDH 5.7 or higher only)
- Using HDFS Caching with Impala (CDH 5.3 or higher only)
- Testing Impala Performance
- Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles
- Detecting and Correcting HDFS Block Skew Conditions
- Scalability Considerations for Impala
- Partitioning for Impala Tables
- How Impala Works with Hadoop File Formats
- Using Text Data Files with Impala Tables
- Using the Parquet File Format with Impala Tables
- Using the Avro File Format with Impala Tables
- Using the RCFile File Format with Impala Tables
- Using the SequenceFile File Format with Impala Tables
- Using Impala to Query Kudu Tables
- Using Impala to Query HBase Tables
- Using Impala with the Amazon S3 Filesystem
- Using Impala with Isilon Storage
- Using Impala Logging
- Troubleshooting Impala
- Impala Web User Interface for Debugging
- Breakpad Minidumps for Impala (CDH 5.8 or higher only)
- Ports Used by Impala
- Impala Reserved Words
- Impala Frequently Asked Questions
- Fixed Issues in Apache Impala
- Apache Impala Incompatible Changes
- Known Issues and Limitations in CDH 6.0.0
- New Features in CDH 6.0.0
- Apache Kudu Administration
- Apache Kudu Background Maintenance Tasks
- Apache Kudu Crash Reporting with Breakpad
- Apache Kudu Configuration
- Developing Applications With Apache Kudu
- Using Apache Impala with Kudu
- Installing Kudu
- Installing and Upgrading Apache Kudu
- Cloudera Manager Metrics for Kudu
- More Resources for Apache Kudu
- Apache Kudu Schema Design
- Kudu Security Overview
- Apache Kudu Transaction Semantics
- Troubleshooting Apache Kudu
- Upgrading Kudu
Related information throughout the CDH 5 and CDH 6 libraries:
In CDH 5 and CDH 6, the Impala documentation for Release Notes, Installation, Upgrading, and Security has been integrated alongside the corresponding information for other Hadoop components.
You can download the full content of this guide in PDF format. Choose the PDF corresponding to the same CDH version as this online library, or the most current version of CDH, Impala PDF for Latest Version of CDH
<< Potential Misconfiguration Detected | ©2016 Cloudera, Inc. All rights reserved | Impala Concepts and Architecture >> |
Terms and Conditions Privacy Policy |