First, a Warm Welcome to All, for Visiting my Blog

Will share my thoughts and inputs on the word “ SECURITY “ and how it has influenced itself on Humans on running their lifestyle. And then, will have an overview on Security features in Hadoop.


The word “ Security “ derived from a Latin word “ SECURUS “ meaning [ FREE FROM CARE ]. We humans will not feel safe, until we are protected or until our belongings  is been kept secured. We should have noticed, even a child protects its favourite toys, by placing it in a cupboard or in boxes. Even in animal kingdom, an Alpha monkey stays behind and watches its young ones and female ones and protect them from threats. This is a special characteristics which mother nature has given for all the living beings, as a gift.

Its started a long time back, when early man protected himself and his family from natural disasters and wildings..

When the evolution happend, the words “ Money , Profit, Threats “ started playing an important role in human’s life. Humans started creating their own world, and he wants to secure all objects which he has created and himself.

To make my point more clear, have given a few of our daily activities here,

  •       When he leaves his home, he make sure that, the security is positioned at the gate.
  •       When he enters traffic on road, he is guided by a traffic police.
  •       When he enters his office, he is greeted by the Security guard at entrance.
  •       When he goes for an ATM , he is guarded by a Security guard and CCTV.
  •       When he logs in to his laptop, he is asked by user name & password.

In all ways, we are protected somehow or safeguarded by some means. These are only a few things, which came into my mind at once.

When he starts to earn Money, buy selling his products or whatever he has created to achieve Profit, he faced new Threats on a daily basis.

So he tightened up his Security measures to protect his created ones.


Now I will come to the main concept of this Blog, Data Security in Hadoop.

How Secured is Hadoop ?

When Hadoop was created, it was dealing only with public data. The product is been developed by creating clusters with trusted machines used by trusted users in a trusted environment. On later stages, threats were started flowing like waves towards it. So people at Hadoop tightened the security features of it, by strengthening the Authentication and Authorization of the product.  I will call the below picture as “ Rings of Defense “

rings of defence.jpg

Perimeter Level Security:

Knox acts as the firewall,

Making it simpler, we can have an use case of  “ Mr. X’s entry to his office “.

When, Mr. X enters his company, he will be asked to display his ID card to the security at the entrance. Only when he enters the company, he can start his work, similar thing happens here ( not exactly the same but quite similar  ) , KNOX acts as a single point of Authentication and access for Apache Hadoop services in a cluster. The goal is to simplify Hadoop security for both users (i.e. who access the cluster data and execute jobs) and operators (i.e. who control access and manage the cluster).


  • Provide perimeter security for Hadoop REST APIs to make Hadoop security easier to setup and use 
  • Provide authentication and token verification at the perimeter 
  • Enable authentication integration with enterprise and cloud identity management systems 
  • Provide service level authorization at the perimeter

Authentication: { is the process of determining, whether someone is who they claim to be }

To create a secure communication among its various components, HDP uses Kerberos.

Kerberos is a third party authentication mechanism, in which users and services that users wish to access rely on a third party – the Kerberos server – to authenticate each to the other.

Ex: Once  Mr. X passes the main gate security, he has to use his RF id to enter into his office building, where his id card will be scanned, gets checked in the database and the Authentication process will be executed.

Authorization { is the function of specifying access rights to resources }

File Permissions,  ACL Permissions [ Access Control List ],  Auditing using HDP Advanced Security.

File Permissions:

  • Authorization tells us what any given user can or cannot do within a Hadoop cluster, after the user has been successfully authenticated.
  • In HDFS this is primarily governed by file permissions.

Example: if we run  ” ls ” in a directory,

we would see record like this:

drwxr-xr-x 2 sam Hadoop 4096 2015-03-01 11:20 foo

-rw-r- – r–  1 sam Hadoop      87 2015-02-01 12:50 bar

On the far left, there is a string of letters. The first letter determines whether a file is a directory or not, and then there are three sets of three letters each. Those sets denote owner, group, and other user permissions, and the “rwx” are read, write, and execute permissions, respectively. The “sam hadoop” portion says that the files are owned by sam, and belong to the group hadoop.

ACL Permissions:

Securing any system requires you to implement layers of protection.

Application of ACLs at every layer of access for data is critical to secure a system. Every layer means that, ACL’s can be applied from the point of the User’s login to the point of getting the report.

For each file or directory, permissions are managed for a set of 3 distinct user classes: owner, group, and others.  There are 3 different permissions controlled for each user class: read, write, and execute.

  For example, consider a sales department that wants a single user, the department manager, to control all modifications to sales data.  Other members of the department need to view the data, but must not be able to modify it.  Everyone else in the company outside the sales department must not be able to view the data.

Commands when applying ACL’s:

  • Setfacl
  • Getfacl

Set the ACL:

     Hdfs dfs –setfacl –m group:execs:r– /sales-data

     [ sets an ACL that grants read access to sales-data for members of the execs group ]

Check result by running getfacl:

     Hdfs dfs –getfacl /sales-data

     [ When an ACL is been set on a file or directory, the permission of those file or directory will be append with “ + “ ]

Inheritance of ACL’s:

A default ACL may be applied only to a directory, not a file.  Default ACLs have no direct effect on permission checks and instead define the ACL that newly created child files and directories receive automatically.

When a parent directory “ monthly-sales-data “ set with a default ACL and on creating sub – directories “ Jan “ & “ Feb “ under it, will inherit the ACL from its parent’s directory “ monthly-sales-data “.

Set default ACL on parent directory:

     Hdfs dfs –setfacl –m default:group:execs:r-x /monthly-sales-data

Create Sub-directories:

     hdfs dfs –mkdir /monthly-sales-data/Jan

     hdfs dfs –mkdir /montly-sales-data/Feb

so, theses sub-directories “ Jan & Feb “ will automatically inherit the ACL’s from its parent folder.

//* * * *  In the upcoming blogs will see more detail on  KNOX , Kerberos  &  HDP Advanced Security  * * * * //



      These are the few main features by which the Data  is been highly secured in Hadoop and gaining popularity and creating confidence and attracting the customers towards it.

To report this post you need to login first.


You must be Logged on to comment or reply to a post.

Leave a Reply