Add Nodes to pseudo Hadoop cluster
In this blog, I assume a single node hadoop cluster has been set up and we want to add more slave nodes to the cluster. For more information about setting up single node cluster, please refer to “Access to Hive from HANA – Section 1 Hadoop Installation“
Setting up a hadoop slave is similar to setting up the pseudo Hadoop we already have, so please follow “Access to Hive from HANA – Section 1 Hadoop Installation“ to set up Hadoop on your slaves machine. Then we need to do some configurations on both hadoop and hostname to set the connection between a master node and slave nodes.
In master node,
edit IPV4 address in /etc/hosts like following:
127.0.0.1 localhost
xxx.xxx.xxx.xxx Full-domain-name master
xxx.xxx.xxx.xxx Full-domain-name slave1
xxx.xxx.xxx.xxx Full-domain-name slave2
…….
list all the slaves you have with ip address and full-domain-name and a short-name. You may find your IP address by running ifconfig as root.
still in the master node, edit core-site.xml and yarn-site.xml file to tell hadoop where the master node is.
In the core-site.xml, modify the fs.default.name property as:
<property>
<name>fs.default.name</name>
<value>hdfs:master:8020</value>
</property>
in the yarn-site.xml, modify the yarn.resourcemanager.hostname property as :
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
then edit slaves file in the same directory ( $HADOOP_HOME/etc/hadoop) as:
master
slave1
slave2
….
list all the slaves node short-name you defined in the /etc/hosts file.(NOTICE: master node can be both used as master and slave in hadoop)
For all the slave nodes,
copy the configurations files from master node by
scp user2@hostnames:$HADOOP_HOME/etc/hadoop/file user2@hostnames:$HADOOP_HOME/etc/hadoop/file
(NOTICE: For test, we just need copy core-site.xml, hdfs-site.xml and yarn-site.xml from master to slaves. The aim of copying the configuration files is to let the slaves know who is their boss!)
You may want to modify dfs.datanode.data.dir in hdfs-site.xml to define where to store data on each slave.
next, run hadoop format -datanode to format datanode on each slave and
copy storageID in dfs.datanode.data.dir/current/VERSTION from master node to slave nodes.
the last step is to start slaves. we just need to start datanode and nodemanager on slave nodes by:
hadoop-daemon.sh –config $HADOOP_CONF_DIR –script hdfs start datanode
yarn-daemon.sh –config $HADOOP_CONF_DIR start nodemanager