Cloudera Enterprise 6.0.x | Other versions

Install and Configure PostgreSQL for Cloudera Software

  Note: The following instructions are for a dedicated PostgreSQL database for use in production environments, and are unrelated to the embedded PostgreSQL database provided by Cloudera for non-production installations.

To use a PostgreSQL database, follow these procedures. For information on compatible versions of the PostgreSQL database, see Database Requirements.

Installing PostgreSQL Server

  Note:
  • If you already have a PostgreSQL database set up, you can skip to the section Configuring and Starting the PostgreSQL Server to verify that your PostgreSQL configurations meet the requirements for Cloudera Manager.
  • Make sure that the data directory, which by default is /var/lib/postgresql/data/, is on a partition that has sufficient free space.
  • Cloudera Manager supports the use of a custom schema name for the Cloudera Manager Server database, but not the CDH component databases (such as Hive, Hue, Sentry, and so on). For more information, see https://www.postgresql.org/docs/current/static/ddl-schemas.html.

Install the PostgreSQL packages as follows:

RHEL:

sudo yum install postgresql-server

SLES:

sudo zypper install --no-recommends postgresql96-server
  Note: This command installs PostgreSQL 9.6. If you want to install a different version, you can use zypper search postgresql to search for an available supported version. See Database Requirements.

Ubuntu:

sudo apt-get install postgresql

Installing psycopg2 Python Package

Hue relies on the psycopg2 Python package for connecting to a PostgreSQL database. Install the psycopg2 package as follows:

RHEL compatible:

  1. Install the python-pip package:
    sudo yum install python-pip
  2. Install psycopg2 using pip:
    sudo pip install psycopg2
  3. Link the psycopg2 directory to the Hue Python environment:
    sudo ln -s /usr/lib64/python2.7/site-packages/psycopg2 /opt/cloudera/parcels/CDH/lib/hue/build/env/lib/python2.7/site-packages/psycopg2

SLES:

  1. Install the python-psycopg2 package:
    sudo zypper install python-psycopg2
  2. Link the psycopg2 directory to the Hue Python environment:
    sudo ln -s /usr/lib64/python2.7/site-packages/psycopg2 /opt/cloudera/parcels/CDH/lib/hue/build/env/lib/python2.7/site-packages/psycopg2

Ubuntu:

  1. Install the python-pip package:
    sudo apt-get install python-pip
  2. Install psycopg2 using pip:
    sudo pip install psycopg2
  3. Link the psycopg2 directory to the Hue Python environment:
    sudo ln -s /usr/lib64/python2.7/site-packages/psycopg2 /opt/cloudera/parcels/CDH/lib/hue/build/env/lib/python2.7/site-packages/psycopg2

Configuring and Starting the PostgreSQL Server

  Note: If you are making changes to an existing database, make sure to stop any services that use the database before continuing.

By default, PostgreSQL only accepts connections on the loopback interface. You must reconfigure PostgreSQL to accept connections from the fully qualified domain names (FQDN) of the hosts hosting the services for which you are configuring databases. If you do not make these changes, the services cannot connect to and use the database on which they depend.

  1. Make sure that LC_ALL is set to en_US.UTF-8 and initialize the database as follows:
    • RHEL 7:
      echo 'LC_ALL="en_US.UTF-8"' >> /etc/locale.conf
      sudo su -l postgres -c "postgresql-setup initdb"
    • RHEL 6:
      echo 'LC_ALL="en_US.UTF-8"' >> /etc/default/locale
      sudo service postgresql initdb
    • SLES 12:
      sudo su -l postgres -c "initdb --pgdata=/var/lib/pgsql/data --encoding=UTF-8"
    • Ubuntu:
      sudo service postgresql start
  2. Enable MD5 authentication. Edit pg_hba.conf, which is usually found in /var/lib/pgsql/data or /etc/postgresql/<version>/main. Add the following line:
    host all all 127.0.0.1/32 md5
    If the default pg_hba.conf file contains the following line:
    host all all 127.0.0.1/32 ident
    then the host line specifying md5 authentication shown above must be inserted before this ident line. Failure to do so may cause an authentication error when running the scm_prepare_database.sh script. You can modify the contents of the md5 line shown above to support different configurations. For example, if you want to access PostgreSQL from a different host, replace 127.0.0.1 with your IP address and update postgresql.conf, which is typically found in the same place as pg_hba.conf, to include:
    listen_addresses = '*'
  3. Configure settings to ensure your system performs as expected. Update these settings in the /var/lib/pgsql/data/postgresql.conf or /var/lib/postgresql/data/postgresql.conf file. Settings vary based on cluster size and resources as follows:
    • Small to mid-sized clusters - Consider the following settings as starting points. If resources are limited, consider reducing the buffer sizes and checkpoint segments further. Ongoing tuning may be required based on each host's resource utilization. For example, if the Cloudera Manager Server is running on the same host as other roles, the following values may be acceptable:
      • shared_buffers - 256MB
      • wal_buffers - 8MB
      • checkpoint_segments - 16
      • checkpoint_completion_target - 0.9
    • Large clusters - Can contain up to 1000 hosts. Consider the following settings as starting points.
      • max_connection - For large clusters, each database is typically hosted on a different host. In general, allow each database on a host 100 maximum connections and then add 50 extra connections. You may have to increase the system resources available to PostgreSQL, as described at Connection Settings.
      • shared_buffers - 1024 MB. This requires that the operating system can allocate sufficient shared memory. See PostgreSQL information on Managing Kernel Resources for more information on setting kernel resources.
      • wal_buffers - 16 MB. This value is derived from the shared_buffers value. Setting wal_buffers to be approximately 3% of shared_buffers up to a maximum of approximately 16 MB is sufficient in most cases.
      • checkpoint_segments - 128. The PostgreSQL Tuning Guide recommends values between 32 and 256 for write-intensive systems, such as this one.
      • checkpoint_completion_target - 0.9.
  4. Configure the PostgreSQL server to start at boot.
    OS Command
    RHEL 7 compatible
    sudo systemctl enable postgresql
    RHEL 6 compatible
    sudo chkconfig postgresql on
    SLES
    sudo chkconfig --add postgresql
    Ubuntu
    sudo chkconfig postgresql on
      Note: chkconfig may not be available on recent Ubuntu releases. You may need to use Upstart to configure PostgreSQL to start automatically when the system boots. For more information, see the Ubuntu documentation or the Upstart Cookbook.
  5. Restart the PostgreSQL database:
    • RHEL 7 Compatible:
      sudo systemctl restart postgresql
    • All Others:
      sudo service postgresql restart

Creating Databases for Cloudera Software

Create databases and service accounts for components that require databases:
  • Cloudera Manager Server
  • Cloudera Management Service roles:
    • Activity Monitor (if using the MapReduce service in a CDH 5 cluster)
    • Reports Manager
  • Each Hive metastore
  • Sentry Server
  • Cloudera Navigator Audit Server
  • Cloudera Navigator Metadata Server

The databases must be configured to support the PostgreSQL UTF8 character set encoding.

Record the values you enter for database names, usernames, and passwords. The Cloudera Manager installation wizard requires this information to correctly connect to these databases.

  1. Connect to PostgreSQL:
    sudo -u postgres psql
  2. Create databases for each service you are using from the below table:
    CREATE ROLE <user> LOGIN PASSWORD '<password>';
    CREATE DATABASE <database> OWNER <user> ENCODING 'UTF8';
    You can use any value you want for <database>, <user>, and <password>. The following examples are the default names provided in the Cloudera Manager configuration settings, but you are not required to use them:
    Table 1. Databases for Cloudera Software
    Service Database User
    Cloudera Manager Server scm scm
    Activity Monitor amon amon
    Reports Manager rman rman
    Hue hue hue
    Hive Metastore Server metastore hive
    Sentry Server sentry sentry
    Cloudera Navigator Audit Server nav nav
    Cloudera Navigator Metadata Server navms navms
    Oozie oozie oozie

    Record the databases, usernames, and passwords chosen because you will need them later.

  3. For PostgreSQL 8.4 and higher, set standard_conforming_strings=off for the Hive Metastore and Oozie databases:
    ALTER DATABASE <database> SET standard_conforming_strings=off;

Setting Up the Cloudera Manager Database

After completing the above instructions to install and configure PostgreSQL databases for Cloudera software, continue to Step 5: Set up the Cloudera Manager Database to configure a database for Cloudera Manager.

Page generated July 25, 2018.