HBase Read Replicas
Without read replicas, only one RegionServer services a read request from a client, regardless of whether RegionServers are colocated with other DataNodes that have local access to the same block. This ensures consistency of the data being read. However, a RegionServer can become a bottleneck due to an underperforming RegionServer, network problems, or other reasons that could cause slow reads.
With read replicas enabled, the HMaster distributes read-only copies of regions (replicas) to different RegionServers in the cluster. One RegionServer services the default or primary replica, which is the only replica which can service write requests. If the RegionServer servicing the primary replica is down, writes will fail.
Other RegionServers serve the secondary replicas, follow the primary RegionServer and only see committed updates. The secondary replicas are read-only, and are unable to service write requests. The secondary replicas can be kept up to date by reading the primary replica's HFiles at a set interval or by replication. If they use the first approach, the secondary replicas may not reflect the most recent updates to the data when updates are made and the RegionServer has not yet flushed the memstore to HDFS. If the client receives the read response from a secondary replica, this is indicated by marking the read as "stale". Clients can detect whether or not the read result is stale and react accordingly.
Replicas are placed on different RegionServers, and on different racks when possible. This provides a measure of high availability (HA), as far as reads are concerned. If a RegionServer becomes unavailable, the regions it was serving can still be accessed by clients even before the region is taken over by a different RegionServer, using one of the secondary replicas. The reads may be stale until the entire WAL is processed by the new RegionServer for a given region.
For any given read request, a client can request a faster result even if it comes from a secondary replica, or if consistency is more important than speed, it can ensure that its request is serviced by the primary RegionServer. This allows you to decide the relative importance of consistency and availability, in terms of the CAP Theorem, in the context of your application, or individual aspects of your application, using Timeline Consistency semantics.
Timeline Consistency
Timeline Consistency is a consistency model which allows for a more flexible standard of consistency than the default HBase model of strong consistency. A client can indicate the level of consistency it requires for a given read (Get or Scan) operation. The default consistency level is STRONG, meaning that the read request is only sent to the RegionServer servicing the region. This is the same behavior as when read replicas are not used. The other possibility, TIMELINE, sends the request to all RegionServers with replicas, including the primary. The client accepts the first response, which includes whether it came from the primary or a secondary RegionServer. If it came from a secondary, the client can choose to verify the read later or not to treat it as definitive.
Keeping Replicas Current
- Using a Timer
- In this mode, replicas are refreshed at a time interval controlled by the configuration option hbase.regionserver.storefile.refresh.period.
- Using Replication
- In this mode, replicas are kept current between a source and sink cluster using HBase replication. This can potentially allow for faster synchronization than using a timer. Each time a
flush occurs on the source cluster, a notification is pushed to the sink clusters for the table. To use replication to keep replicas current, you must first set the column family attribute
REGION_MEMSTORE_REPLICATION to false, then set the HBase configuration property hbase.region.replica.replication.enabled to true.
Important: Read-replica updates using replication are not supported for the hbase:meta table. Columns of hbase:meta must always have their REGION_MEMSTORE_REPLICATION attribute set to false.
Enabling Read Replica Support
Before you enable read-replica support, make sure to account for their increased heap memory requirements. Although no additional copies of HFile data are created, read-only replicas regions have the same memory footprint as normal regions and need to be considered when calculating the amount of increased heap memory required. For example, if your table requires 8 GB of heap memory, when you enable three replicas, you need about 24 GB of heap memory.
Property Name | Default Value | Description |
---|---|---|
hbase.region.replica.replication.enabled | false |
The mechanism for refreshing the secondary replicas. If set to false, secondary replicas are not guaranteed to be consistent at the row level. Secondary replicas are refreshed at intervals controlled by a timer (hbase.regionserver.storefile.refresh.period), and so are guaranteed to be at most that interval of milliseconds behind the primary RegionServer. Secondary replicas read from the HFile in HDFS, and have no access to writes that have not been flushed to the HFile by the primary RegionServer. If true, replicas are kept up to date using replication. and the column family has the attribute REGION_MEMSTORE_REPLICATION set to false, Using replication for read replication of hbase:meta is not supported, and REGION_MEMSTORE_REPLICATION must always be set to false on the column family. |
hbase.regionserver.storefile.refresh.period | 0 (disabled) |
The period, in milliseconds, for refreshing the store files for the secondary replicas. The default value of 0 indicates that the feature is disabled. Secondary replicas update their store files from the primary RegionServer at this interval. If refreshes occur too often, this can create a burden for the NameNode. If refreshes occur too infrequently, secondary replicas will be less consistent with the primary RegionServer. |
hbase.master.loadbalancer.class |
org.apache.hadoop.hbase.master. balancer.StochasticLoadBalancer (the class name is split for formatting purposes) |
The Java class used for balancing the load of all HBase clients. The default implementation is the StochasticLoadBalancer, which is the only load balancer that supports reading data from secondary RegionServers. |
hbase.ipc.client.allowsInterrupt | true | Whether or not to enable interruption of RPC threads at the client. The default value of true enables primary RegionServers to access data from other regions' secondary replicas. |
hbase.client.primaryCallTimeout.get | 10 ms | The timeout period, in milliseconds, an HBase client's will wait for a response before the read is submitted to a secondary replica if the read request allows timeline consistency. The default value is 10. Lower values increase the number of remote procedure calls while lowering latency. |
hbase.client.primaryCallTimeout.multiget | 10 ms | The timeout period, in milliseconds, before an HBase client's multi-get request, such as HTable.get(List<GET>)), is submitted to a secondary replica if the multi-get request allows timeline consistency. Lower values increase the number of remote procedure calls while lowering latency. |
Configure Read Replicas Using Cloudera Manager
- Before you can use replication to keep replicas current, you must set the column attribute REGION_MEMSTORE_REPLICATION to false for the HBase table, using HBase Shell or the client API. See Activating Read Replicas On a Table.
- Select .
- Click the Configuration tab.
- Select .
- Select .
- Locate the HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml property or search for it by typing its name in the Search box.
- Using the chart above, create a configuration and paste it into the text field. The following example configuration demonstrates the syntax:
<property> <name>hbase.regionserver.storefile.refresh.period</name> <value>0</value> </property> <property> <name>hbase.ipc.client.allowsInterrupt</name> <value>true</value> <description>Whether to enable interruption of RPC threads at the client. The default value of true is required to enable Primary RegionServers to access other RegionServers in secondary mode. </description> </property> <property> <name>hbase.client.primaryCallTimeout.get</name> <value>10</value> </property> <property> <name>hbase.client.primaryCallTimeout.multiget</name> <value>10</value> </property>
- Click Save Changes to commit the changes.
- Restart the HBase service.
Configuring Rack Awareness for Read Replicas
Rack awareness for read replicas is modeled after the mechanism used for rack awareness in Hadoop. Its purpose is to ensure that some replicas are on a different rack than the RegionServer servicing the table. The default implementation, which you can override by setting hbase.util.ip.to.rack.determiner, to custom implementation, is ScriptBasedMapping, which uses a topology map and a topology script to enforce distribution of the replicas across racks. To use the default topology map and script for CDH, setting hbase.util.ip.to.rack.determiner, to ScriptBasedMapping is sufficient. Add the following property to HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml if you use Cloudera Manager, or to hbase-site.xml otherwise.
<property> <name>hbase.util.ip.to.rack.determiner</name> <value>ScriptBasedMapping</value> </property>
Creating a Topology Map
The topology map assigns hosts to racks. It is read by the topology script. A rack is a logical grouping, and does not necessarily correspond to physical hardware or location. Racks can be nested. If a host is not in the topology map, it is assumed to be a member of the default rack. The following map uses a nested structure, with two data centers which each have two racks. All services on a host that are rack-aware will be affected by the rack settings for the host.
<topology> <node name="host1.example.com" rack="/dc1/r1"/> <node name="host2.example.com" rack="/dc1/r1"/> <node name="host3.example.com" rack="/dc1/r2"/> <node name="host4.example.com" rack="/dc1/r2"/> <node name="host5.example.com" rack="/dc2/r1"/> <node name="host6.example.com" rack="/dc2/r1"/> <node name="host7.example.com" rack="/dc2/r2"/> <node name="host8.example.com" rack="/dc2/r2"/> </topology>
Creating a Topology Script
The topology script determines rack topology using the topology map. By default, CDH uses /etc/hadoop/conf.cloudera.YARN-1/topology.py To use a different script, set net.topology.script.file.name to the absolute path of the topology script.
Activating Read Replicas On a Table
After enabling read replica support on your RegionServers, configure the tables for which you want read replicas to be created. Keep in mind that each replica increases the amount of storage used by HBase in HDFS.
At Table Creation
hbase> create 'myTable', 'myCF', {REGION_REPLICATION => '3'}
By Altering an Existing Table
hbase> disable 'myTable' hbase> alter 'myTable', 'myCF', {REGION_REPLICATION => '3'} hbase> enable 'myTable'
Requesting a Timeline-Consistent Read
To request a timeline-consistent read in your application, use the get.setConsistency(Consistency.TIMELINE) method before performing the Get or Scan operation.
To check whether the result is stale (comes from a secondary replica), use the isStale() method of the result object. Use the following examples for reference.
Get Request
Get get = new Get(key); get.setConsistency(Consistency.TIMELINE); Result result = table.get(get);
Scan Request
Scan scan = new Scan(); scan.setConsistency(CONSISTENCY.TIMELINE); ResultScanner scanner = table.getScanner(scan); Result result = scanner.next();
Scan Request to a Specific Replica
This example overrides the normal behavior of sending the read request to all known replicas, and only sends it to the replica specified by ID.
Scan scan = new Scan(); scan.setConsistency(CONSISTENCY.TIMELINE); scan.setReplicaId(2); ResultScanner scanner = table.getScanner(scan); Result result = scanner.next();
Detecting a Stale Result
Result result = table.get(get); if (result.isStale()) { ... }
Getting and Scanning Using HBase Shell
You can also request timeline consistency using HBase Shell, allowing the result to come from a secondary replica.
hbase> get 'myTable', 'myRow', {CONSISTENCY => "TIMELINE"} hbase> scan 'myTable', {CONSISTENCY => 'TIMELINE'}
<< HBase High Availability | ©2016 Cloudera, Inc. All rights reserved | Oozie High Availability >> |
Terms and Conditions Privacy Policy |