Cloudera Enterprise 6.0.x | Other versions

Kafka Setup

Hardware Requirements

Kafka can function on a fairly small amount of resources, especially with some configuration tuning. Out of the box configurations can run on little as 1 core and 1 GB memory with storage scaled based on data retention requirements. These are the defaults for both broker and Mirror Maker in Cloudera Manager version 6.x.

Brokers

Kafka brokers tend to have similar hardware profiles to HDFS data nodes. How you build them depends on what’s important to your Kafka use cases. The high level breakdown is:

  • Message Retention = Disk Size
  • Client Throughput (producer or consumer) = Network Capacity
  • Producer Throughput = Disk I/O
  • Consumer Throughput = Memory

CPU is rarely a bottleneck since Kafka is I/O heavy, but a moderately sized CPU with enough threads is still important to handle concurrent connections and background tasks.

A common choice for a Kafka node is as follows:

  • RAM: 64 GB (4 GB heap and 60 GB free for the page cache)

    • 1 GB heap minimum

  • CPU: 12 to 24 cores (Prioritize cores over speed)

  • Disks:

    • 1 OS HDD

    • 1 ZooKeeper data/log directory HDD (some choose SSD)

    • 10 Kafka data HDDs (in RAID10)

  • Network: 1 GbE or 10 GbE

    • Avoid clusters that span multiple data centers

ZooKeeper

It’s common to run ZooKeeper on 3 broker nodes that are dedicated for Kafka. It is still optimal to have zookeeper on separate (possibly small) machines for total isolation, but in more recent versions of Kafka this is less compelling.

Kafka Performance Considerations

The simplest recommendation for running Kafka with maximum performance is to have dedicated hosts for the Kafka brokers and a dedicated Zookeeper cluster for the Kafka cluster. If that isn't an option, consider these additional guidelines for resource sharing with the Kafka cluster:

Do not run in VMs
It is common practice in modern data centers to run processes in virtual machines. This generally allows for better sharing of resources. Kafka is sufficiently sensitive to I/O throughput that VMs interfere with the regular operation of brokers. For this reason, it is highly recommended to not use VMs for Kafka; if you are running Kafka in a virtual environment you'll need to rely on your VM vendor for help optimizing Kafka performance.
Do not run other processes with Brokers or Zookeeper
Due to I/O contention with other processes, it is generally recommended to avoid running other such processes on the same hosts as Kafka brokers.
Keep the Kafka-Zookeeper Connection Stable
Kafka relies heavily on having a stable Zookeeper connection. Putting an unreliable network between Kafka and Zookeeper will appear as if Zookeeper is offline to Kafka. Examples of unreliable networks include:
  • Do not put Kafka/ZooKeeper nodes on separated networks
  • Do not put Kafka/ZooKeeper nodes on the same network with other high network loads

Operating System Requirements

SUSE Linux Enterprise Server (SLES)

Unlike CentOS, SLES limits virtual memory by default. Changing this default requires adding the following entries to the /etc/security/limits.conf file:

* hard as unlimited
* soft as unlimited
Page generated July 25, 2018.