Best Practices for Using Apache Hive in CDH
Hive data warehouse software enables reading, writing, and managing large datasets in distributed storage. Using the Hive query language (HiveQL), which is very similar to SQL, queries are converted into a series of jobs that execute on a Hadoop cluster through MapReduce or Apache Spark.
Users can run batch processing workloads with Hive while also analyzing the same data for interactive SQL or machine-learning workloads using tools like Apache Impala or Apache Spark—all within a single platform.
As part of CDH, Hive also benefits from:
- Unified resource management provided by YARN
- Simplified deployment and administration provided by Cloudera Manager
- Shared security and governance to meet compliance requirements provided by Apache Sentry and Cloudera Navigator
Continue reading:
- Apache Hive Changes in CDH 6.0
- Overview of Apache Hive Installation and Upgrade in CDH
- Configuring Apache Hive in CDH
- Using & Managing Apache Hive in CDH
- Tuning Apache Hive in CDH
- Overview of Apache Hive Data Replication in CDH
- Overview of Apache Hive Security in CDH
- Troubleshooting Apache Hive in CDH
Page generated July 25, 2018.
<< Troubleshooting HBase | ©2016 Cloudera, Inc. All rights reserved | Apache Hive Changes in CDH 6.0 >> |
Terms and Conditions Privacy Policy |