Deploying a 2-Node Apache Hadoop Cluster using Apache Ambari
There are already many blogs in SCN forum which talks in detail about Big data, Hadoop, HIVE etc. Some of them are very helpful such as
Even though Apache Hadoop is Open-Sourced , there are basically two flavors available, one from Hortonworks and other from Cloudera . Both of them provide already deployed virtual machines which you can download and run in you laptop with vmware or oracle virtualbox software. Here is in blog i am showing you how to actually deploy an apache hadoop solution with its components like Hive, Hbase, Zookeeper, Yarn/Map-reduce and some other components using Apache Ambari. Apache ambari is a tool to automate the build of Hadoop cluster in multinode machines. Like in case of HANA, Hadoop does not require expensive certified hardware though it is required that you have production level hardware so that there is minimum changes of outage due to hardware failures specially in production environment.There are three steps in using Ambari to deploy Hadoop clusters. One build and deploy server with the repositories, second deploy ambari and third deploy the Hadoop using the ambari & repositories to a multi-node cluster. We also need to look at the compatibility of ambari with hadoop. The following table specifies which Ambari version supports which Hadoop versions
Here we look at preparing the server repositories for ambari & Hadoop via preparing the host server, downloading repo files, deploying a httpd web server and finally deploying the repo files in the httpd server to be used for deploying ambari and then using ambari to deploy a Hadoop cluster in a 2 nods environment
Check the document “Hortonworks Data Platform : Automated Install with Ambari” for more prerequisites
- There is no single hardware requirement set for installing Hadoop. Hadoop can be installed on the following operating systems
- Red Hat Enterprise Linux (RHEL) v6.x
- Red Hat Enterprise Linux (RHEL) v5.x (deprecated)
- CentOS v6.x
- CentOS v5.x (deprecated)
- Oracle Linux v6.x
- Oracle Linux v5.x (deprecated)
- SUSE Linux Enterprise Server (SLES) v11, SP1 and SP3
- Ubuntu Precise v12.04
2. Ensure that the nodes has full hostname. Run command “hostname -f” to check
3. Ensure that you have the following LINUX softwares
- yum and rpm (RHEL/CentOS/Oracle Linux)
- zypper and php_curl (SLES)
- apt (Ubuntu)
- scp, curl, unzip, tar, and wget
- OpenSSL (v1.01, build 16 or later)
- python v2.6
4. Ensure that you have installed Java. Run command “yum install java-1.7.0-openjdk”
5. Database & Memory requirements – Ambari requires a relational database to store information about the cluster configuration and topology. If you install HDP Stack with Hive or Oozie, they also require a relational database.
Ambari : By default, will install an instance of PostgreSQL on the Ambari Server host.
Hive : By default (on RHEL/CentOS/Oracle Linux 6), Ambari will install an instance of MySQL on the Hive Metastore host
Oozie : By default, Ambari will install an instance of Derby on the Oozie Server host
You can also use existing instance of PostgreSQL, MySQL or Oracle. For the Ambari database, if you use an existing Oracle database, make sure
the Oracle listener runs on a port other than 8080 to avoid conflict with the default Ambari port.
Also ensure that you have atleast 8 GB of ram on each host. Apache recommends that you have atleast 1 GB of ram available.
6. Create password -less SSH. To have Ambari Server automatically install Ambari Agents on all your cluster hosts, you must set up password-less SSH connections between the Ambari Server host and all other hosts in the cluster. The Ambari Server host uses SSH public key authentication to remotely access and install the Ambari Agent.
- Generate public and private SSH keys on the Ambari Server host (“Command ssh-keygen”)
- Copy the SSH Public Key (id_rsa.pub) to the root account on your target hosts.
>> [root@adh-node-003 /]# scp root@adh-node-001:/root/.ssh/id_rsa.pub /root/.ssh/
- Add the SSH Public Key to the authorized_keys file on your target hosts.
cat id_rsa.pub >> authorized_keys
- Depending on your version of SSH, you may need to set permissions on the .ssh directory (to 700) and the authorized_keys file in that directory (to 600) on the target hosts.
chmod 700 ~/.ssh chmod 600 ~/.ssh/authorized_keys
7. Each HDP service requires a service user account. The Ambari Install wizard creates new and preserves any existing service user
9. Configuring iptables: For Ambari to communicate during setup with the hosts it deploys to and manages, certain ports must be open and available. The easiest way to do this is to temporarily disable iptables, as follows:
chkconfig iptables off /etc/init.d/iptables stop
You can restart iptables after setup is complete. If the security protocols in your environment prevent disabling iptables, you can proceed with iptables enabled, if all required ports are open and available.
10. Disable SELinux and PackageKit and check the umask Value : To permanently disable SELinux set SELINUX=disabled
in /etc/selinux/config. This ensures that SELinux does not turn itself on after you reboot the machine and to set the umask value to 022, run the following command as root on all hosts,
Then,append the following line: umask 022
Setting up a local repository
If your cluster is behind a fire wall that prevents or limits Internet access, you can install Ambari and a Stack using local repositories else you can use internet to access Hortonwork repositories
1. Obtaining the Repositories
Ambari Repositories : If you do not have Internet access for setting up the Ambari repository, use the link appropriate for your OS family to download a tarball that contains the software.
Ambari 2.0.1 Tarball Links: RHEL/CentOS/Oracle Linux 6
HDP Stack Repositories : If you do not have Internet access to set up the Stack repositories, use the link appropriate for your OS family to download a tarball that contains the HDP Stack version you plan to install.
HDP 2.2 Tarball links: RHEL/CentOS/Oracle Linux 6
2. To get started setting up your local repository, complete the following prerequisites:
- Select an existing server in, or accessible to the cluster, that runs a supported operating system.
- Enable network access from all hosts in your cluster to the mirror server.
- Ensure the mirror server has a package manager installed such as yum (RHEL / CentOS /Oracle Linux), zypper (SLES), or apt-get (Ubuntu).
- Optional: If your repository has temporary Internet access, and you are using RHEL/CentOS/Oracle Linux as your OS, install yum utilities:
yum install yum-utils createrepo
- Create an HTTP server.: On the mirror server, install an HTTP server (such as Apache httpd)
- Download httpd-2.4.12.tar.gz file from apache
- Extract $ gzip -d httpd-2.4.12.tar.gz
$ tar xvf httpd-2.4.12.tar
$ cd httpd-2.4.12
- Configure $ ./configure –prefix=/apacheHttpServer/
- Compile $ make
- Install $ make install
- Customize $ vi /apacheHttpServer/conf/httpd.conf
- Start $ /apacheHttpServer/bin/apachectl -k start
You might see errors such as configure: error: APR not found. Please read the documentation or configure: error: pcre-config for libpcre not found. PCRE is required and available from http://pcre.org/. You will need to download and make the files and then make-install them
- Activate this web server.
Customise : vi /apacheHttpServer/conf/httpd.conf
Test: $ /apacheHttpServer/bin/apachectl -k start
$ /apacheHttpServer/bin/apachectl -k stop
- Ensure that any firewall settings allow inbound HTTP access from your cluster nodes to your mirror server.Now on your mirror server, create a directory for your web server.
mkdir -p /var/www/html/
Copy the repository tarballs to the web server directory and untar.
For Ambari Repository Untar under <web.server.directory>
For HDP Stack Repositories Create directory and untar under <web.server.directory>/hdp.
- Restart the httpd server & Confirm you can browse to the newly created local repositories
Ambari Base URL
HDP Base URL http://<web.server>/hdp/HDP/<OS>/2.x/updates/<latest.version>
HDP-UTILS Base URL http://<web.server>/hdp/HDP-UTILS-<version>/repos/<OS>
Now that we have built our repository server for Ambari & Hadoop we will use it to deploy Ambari and then use Ambari to deploy Hadoop cluster.
Preparing The Ambari Repository Configuration File
Download the ambari.repo file from the mirror server you created in the preceding sections or from the public repository.
• From your mirror server: http://<web.server>/ambari/<OS>/2.x/updates/2.0.1/ambari.repo
• From the public repository: http://public-repo-1.hortonworks.com/ambari/<OS>/2.x/updates/2.0.1/ambari.repo
where <web.server> = FQDN of the web server host, and <OS> is CENTOS6, SLES11, or UBUNTU12.
Edit the ambari.repo file using the Ambari repository Base URL obtained when setting up your local repository. If this an Ambari updates release, disable the GA repository definition. Update the baseurl and the gpgkey entries in the ambari.repo file
Place the ambari.repo file on the machine you plan to use for the Ambari Server.i.e. for RHEL/CentOS/Oracle Linux:
Also edit the /etc/yum/pluginconf.d/priorities.conf file to add the following:
To install Ambari server on a single host in your cluster, complete the following steps:
1. Download the Ambari repository
2. Set Up the Ambari Server
yum install ambari-server
Respond to the following prompts:
i. If you have not temporarily disabled SELinux, you may get a warning. Accept the default (y), and continue.
ii. By default, Ambari Server runs under root. Accept the default (n) at the Customize user account for ambari-server daemon prompt, to proceed as root. If you want to create a different user to run the Ambari Server, or to assign a previously created user, select y at the Customize user account for ambari-server daemon prompt, then provide a user name
iii. If you have not temporarily disabled iptables you may get a warning. Enter y to continue
iv. Select a JDK version to download. Enter 1 to download Oracle JDK 1.7.By default, Ambari Server setup downloads and installs Oracle JDK 1.7 the accompanying Java Cryptography Extension (JCE) Policy Files. If you plan to use a different version of the JDK, see Setup Options for more information.
Accept the Oracle JDK license when prompted. You must accept this license to download the necessary JDK from Oracle. The JDK is installed during the deploy phase.
v. Select n at Enter advanced database configuration to use the default, embedded PostgreSQL database for Ambari. The default PostgreSQL database name is ambari. The default user name and password are ambari/bigdata. Otherwise, to use an existing PostgreSQL, MySQL or Oracle database with Ambari, select y.
vi. Setup Completes
3. Start the Ambari Server
Run the following command on the Ambari Server host:: ambari-server start
To stop the Ambari Server: ambari-server stop
To know status: ambari-server status
Installing, Configuring, and Deploying a HDP Cluster
We will use the Ambari Install Wizard running in your browser to install, configure, and deploy our cluster.
- Log In to Apache Ambari
After starting the Ambari service, open Ambari Web using a web browser & Point your browser to http://<your.ambari.server>:8080, where
<your.ambari.server> is the name of your ambari server host. i.e http://adh-node-001.hadoop.com:8080/#/login
Log in to the Ambari Server using the default user name/password: admin/admin. You can change these credentials later.
2. From the Ambari Welcome page, choose Launch Install Wizard
3. In Name your cluster, type a name for the cluster you want to create. Use no white spaces or special characters in the name.
4. The Service Stack (the Stack) is a coordinated and tested set of HDP components. Use a radio button to select the Stack version you want to install. To install an HDP 2x stack, select the HDP 2.2, HDP 2.1, or HDP 2.0 radio button.
5. Select the correct OS and update the repository URL
6. In order to build up the cluster, the install wizard prompts you for general information about how you want to set it up. You need to supply the FQDN of each of your hosts. The wizard also needs to access the private key file you created earlier. Using the host names and key file information, the wizard can locate, access, and interact securely with all hosts in the cluster. If you want to let Ambari automatically install the Ambari Agent on all your hosts using SSH, select Provide your SSH Private Key and either use the Choose File button in the Host Registration Information section to find the private key file that matches the public key you installed earlier on all your hosts or cut and paste the key into the text box manually.
7. Confirm Hosts
8. Choose Services you want to deploy
9. The Ambari install wizard assigns the master components for selected services to appropriate hosts in your cluster and displays the assignments in Assign Masters
10. The Ambari installation wizard assigns the slave components (DataNodes, NodeManagers, and RegionServers) to appropriate hosts in your cluster. It also attempts to select hosts for installing the appropriate set of clients.
11. Customize Services, here provide usernames and passwords in the required fields
12. Review before starting the install
13. Start the install
14. Complete the installation: The Summary page provides you a summary list of the accomplished tasks. Choose Complete.
This completes our Hadoop Server Cluster deployment. In the upcoming blogs/documents we look at manually deploying Ambari agents to other hosts to prepare them so to include them in the cluster, administration of Hadoop, Hive & Hbase and integration of SAP, HANA with Hadoop.