Skip to Content

Today Enterprise or Social data is growing exponentially and hence the world is moving towards Big data solutions rapidly. While the reliance on the Big data grows for organizations, the analytical requirements for the same also increases. Ability to perform faster data connectivity, cleansing, visualizing, analyzing and sharing on top of Big data sources becomes very important for analytical tools. Challenge for analytical tools is to be able to work on these big data sources with out leveraging a lot on IT.

Lumira as a self service analytical tool provides easy connectivity to Big data sources such as HADOOP/HIVE in different ways allowing business users to perform the required data manipulation/cleansing there by analyzing large business data all this without depending on IT or report developers.This is done in different stages.

Stage1: Since the big data is too huge in size, Lumira samples the data during acquisition  allowing user to provide the sampling percentage to acquire data with. Once the user acquires data after sampling, user can apply the required data manipulations, make sure that required visualizations are created and report created is as per the business needs.

Stage2: Now since the document is created for sampled data set, user can now submit this in stage2 back to Hadoop to generate a complete data set as Hive tables or Lumira documents. For this Oozie settings of the HADOOP cluster would be required.

With the above stages of sampling, acquisitons, manipulation and submit the jobs back to Hadoop, hadoop creates complete data set which can be visualized back in Lumira and shared based on business needs.

In order to support different ways of data residing in HADOOP, Lumira offers two possibilities to connect:

1. Connect to Hadoop:

  • This option can be used to connect to Hadoop clusters and load the data residing as files in the Hadoop clusters into Lumira.

  • In the next steps, user needs to enter the server, port and credentials to logon to Hadoop/Hive.

NOTE: to connect to a secured Hadoop cluster kerberos needs to be configured for Lumira         desktop, which will be explained at the bottom.

  • In the next step, user can select the file he wants to acquire by specifying the number of lines to sample as shown below:

  • After this user can select the delimiter option and acquire the document.
  • Post this user can acquire the document. Once the document is acquired, Lumira shows the content as shown below indicating that it has acquired a sampled data. User can perform the required data manipulations, create visualizations as required.

  • User can click on the SAMPLE above to see how many lines have been sampled and by clicking on the “Generate Full Data set” user can schedule the job back to Hadoop to apply the data transformation to create a Lumira document or a HIVE table as shown below:

  • if user has complete Oozie details required by the Hadoop, user needs to provide this information along with location/name of the document being created in the next step. Post this new Lumira document or HIVE table will be created which includes the data manipulation performed by the business user in previous steps.

2. SQL on Hadoop:
with this option, user can connect to Hadoop clusters and select the tables inside the HIVE schema as detailed below:

  • Option to connect

  • Lumira provides connectivity drivers required to connect to HIVE tables. depending on your needs you can select the appropriate drivers.


NOTE: To connect to secured Hadoop systems, kerberos needs to be configured for Lumira desktop, which will be explained at bottom.

  • User gets option to select required columns from tables and option to provide the sampling percentage as shown below

  • Post this option to create full data set and submitting the job back to Oozie to create complete data set remains same as that of Connect to Hadoop options.

Once the job is submitted to Hadoop, job will be completed in seconds to minutes depending on the complexity/size of the data set. User can check the status of the job in the Home page of lumira as shown below:

NOTE: the Hdoop system to connect to in the above can be configured in Lumira preferences.

Once the Job is successful, users can open the lumira documents from here to check the results and share with others.

Using Kerberos to connect to Hadoop:

On a very high level, these are the configurations you may need to perform to be able to connect to secured Hadoop clusters:

Place your java.login.config files in C:/Windows location (example). java.login.config file looks like below:

com.sap.bo.lumira.bdata {
com.sun.security.auth.module.Krb5LoginModule required
debug=true
doNotPrompt=false
useTicketCache=false;
};
Client {
com.sun.security.auth.module.Krb5LoginModule required
debug=true
doNotPrompt=false
useTicketCache=true;
};

Place your krb.ini files in C:/Windows location (example). krb.ini looks like below:

[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log

[libdefaults]
default_realm = GLOBAL.CORP.SAP
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
default_tkt_enctypes = RC4-HMAC
default_tgs_enctypes = RC4-HMAC
udp_preference_limit = 1

[realms]
GLOBAL.CORP.SAP = {
kdc = DS1VAN0000.global.corp.sap
admin_server = DS1VAN0000.global.corp.sap
kpasswd_server = DS1VAN0000.global.corp.sap
}

[domain_realm]
global.corp.sap = GLOBAL.CORP.SAP
.global.corp.sap = GLOBAL.CORP.SAP
pgdev.sap.corp = GLOBAL.CORP.SAP
.pgdev.sap.corp = GLOBAL.CORP.SAP

Append the following two lines in SAPLumira.ini file available in location <Lumira_installation_location>\Desktop Eg: C:\Program Files\SAP Lumira\Desktop

-Djava.security.auth.login.config=C:/Windows/java.login.config

-Djava.security.krb5.conf=C:/Windows/krb5.ini

NOTE: if you are using high encryption algorithm, replace the local_policy.jar and US_export_policy.jar in SAPJVM_location/lib/security with jars supporting higher encryption.

With these changes you should be able to connect to secured Hadoop clusters.

With this I would conclude this blog on Lumira with Big data.

Thank you!!

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

  1. JinChong Tsai

    Is there a “Sampling Percentage” available for other data type besides “SQL on Hadoop”, such as “Query with SQL” to a RDBMS, Netezza database?

    I know that Lumira Desktop acquires a maximum of 1 million rows, and we can change that limit by adding or changing this property in the SAPLumira.ini file for acquiring data from Hadoop.  Is there a similar option for other RDBMS?

    -Dhilo.hivemaxsamplingsize=<VALUE>

    The reason is that Lumira Desktop will download the entire set of the chosen table before you can click on Create, and this is time consuming when the chosen table is huge with big data.

    Regards,
    Jin-Chong

    (0) 

Leave a Reply