Cloudera Enterprise 6.0.x | Other versions

Using Azure Data Lake Store with HBase

CDH supports using Azure Data Lake Store (ADLS) as a storage layer for HBase.

There are two scenarios in which ADLS can be used with HBase:

  • ADLS-only: In this scenario, both HFiles, which contain user data, and write-ahead logs (WALs) are written to ADLS. This configuration is not recommended for use cases that demand high performance.
  • ADLS + HDFS: In this scenario, HFiles are written to ADLS, but WALs are written to HDFS. This configuration provides higher throughput and lower latency for writes than does the ADLS-only configuration.

Configuring HBase to Use ADLS as a Storage Layer

  1. Set up credentials to enable communication between HBase and ADLS. See Configuring ADLS Connectivity and use one of the configuration methods listed there that HBase supports.
  2. In the Cloudera Manager Admin Console, select the HBase service, click the Configuration tab, and locate the Hbase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml.
  3. Depending on which scenario you plan to use, add the following values for the Name and Value fields:

    • ADLS-only:

      • Name: hbase.rootdir

        Value: adl://<adls_account_name>.azuredatalakestore.net/<hbase_directory>

    • ADLS + HDFS:

      • Name: hbase.rootdir

        Value: adl://<adls_account_name>.azuredatalakestore.net/<hbase_directory>

      • Name: hbase.wal.dir

        Value: hdfs://<name_node>:8020/<hbase_wal_directory>

  4. Still on the Configuration page for the HBase service, locate the HBase Service Advanced Configuration Snippet (Safety Valve) for core-site.xml and add the following Name and Value pairs for both configuration scenarios (ADLS-only and ADLS + HDFS):

    • Name: fs.defaultFS

      Value: adl://<adls_account_name>.azuredatalakestore.net/

    • Name: adl.debug.override.localuserasfileowner

      Value: true

      Note: All files and folders in ADLS are owned by the same account owner. When HDFS checks for a file owner, the Azure Active Directory (AD) owner is used and the Access Control List (ACL) check fails to match with the HBase user who is making the request. The above configuration works around this issue by instructing the HDFS client to assume the current user owns all files when requesting data stored in ADLS.
Page generated July 25, 2018.