BIG Data: Integrating SAP BusinessObjects with Hadoop
Enterprises today are increasingly keen on harnessing the power of BIG Data for decision-making. While a few have made strong inroads into BIG Data and reaped benefits, realizing its true value has been elusive for most. For one, in order to benefit from BIG Data, it is essential to understand the business, and have a specific problem identified that needs solving.
Once the above is identified, one needs to look at the sources of information – structured and unstructured – that could come to aid. A mechanism to capture, analyze and interpret data has to be designed at this stage. An understanding of technology, tools and the various innovations in BIG Data is key to this step. Finally, the data may have to be integrated with structured information already available in the enterprise to make specific decisions.
We will focus on the last step mentioned above by integrating SAP BusinessObjects with Hadoop.
Keep in mind that despite having robust features, platforms such as Hadoop are dependent on batch processing, which make them unsuitable for real-time analytics and reporting. The powerful combination of SAP and Hadoop can bring together agility, scalability, flexibility and affordability needed to fully tap the potential of BIG data for effective decision making.
We used the following approach to achieve SAP BusinessObjects integration with Hadoop. Click here to download step by step instructions.
Step 1. Install and configure multi-node Hadoop Cluster
Hadoop cluster installation was done sequentially to ensure easy configuration and maintenance. The initial installation was done in a single machine with all services running in a single node (master node). It was then extended to two other nodes (slave nodes). Once the cluster became operational, all the different file types were loaded into the system. The Hadoop cluster replicated it across all available data nodes. The cluster was then tested with a conventional word count program. This verifies communication between various Hadoop nodes and validates their ability to perform distributed operations.
Step 2. Install and configure Hive Data Warehouse
In contrast to other data warehousing tools that extract, transform and store/load data into target systems, Hive extracts and transforms data at run time without having to load or store it in a target system. Hive converts and processes user tasks (defined using HiveQL statements) using the Map-Reduce distributed framework. Map-Reduce deploys the job tracker to split a single job to multiple sub tasks and returns the results back to the master node.
We then performed text analysis by running word count on a text file that contained a speech delivered by Richard Stallman on Free Software was performed. Hive schema was defined on the file which was then queried with HiveQL statements to obtain the respective word counts.
Step 3. Integrating SAP BusinessObjects with Hadoop
In order to run reports on Hadoop, SAP BusinessObjects was set up on top of Hadoop Hive-JDBC connectors. Information Design Tool (IDT) was used to configure the setup. Once you complete this, you would be able to use an array of BO front end tools that can report on Hadoop – including SAP Dashboards, SAP Web Intelligence, SAP Crystal Reports, SAP BO Explorer, Visual Intelligence and more.
Hi Poovin,
May I know hardware requirements for Apache Hadoop and Apache Hive ?
Both are open-source right ? Both should be install on different server ? Appreciate your reply. Thanks.
Hi Harshil,
There are no written standards for Hadoop/Hive Hardware requirements. Initially Hadoop/Hive was light and it used to run even in commodity hardware (ex: 2.0 GHZ Processor, 2 GB RAM and 20 GB HDD). However with increased functionalities, they demand more now. Still you can run Hadoop/Hive in your Desktop if it is of the recent configuration.
Yes, Both Hadoop and Hive are opensource. You can also get it from vendors like Cloudera and Hortonworks. They have both community edition(free) and enterprise edition. This installation will demand more resources due to increased functionality.
Hadoop and Hive can sit together in a same server. You can also scale out the installations across multiple nodes based on your need.
Thanks,
Poovin.
thanks poovin.
Hi Poovin,
I am in a Project with similar situation, the above provided link is not providing me the document or page would have broken. Can you please send me the document to my id.
Also how do we connect universe with impala hive tables, do we use ODBC or JDBC ?
Many Thanks,
sumanT
Hi Suman,
I have shared the document with you through E-Mail. I will fix the SCN link as well.
Reg Impala - We can use both ODBC and JDBC to connect with hive/impala. However i would prefer to go with JDBC drivers provided by respective Hadoop vendors. BO 4.1 SP05 has improved support for both hive and impala.
I will share the information very soon with step-by-step guides.
Hi Poovin,
I am pursuing my masters. I want to start my career in SAP BI or Hadoop. But I don't know which one to opt for. Recently, I have seen a job with title as SAP BI architect with hadoop hive. Can you please tell me about both technologies so that I can decide on which one to take up.
Thanks,
Shravani
Hi Shravani,
SAP BI is a large domain with different set of tools and features. SAP provides solutions (or) platforms for Business needs and that includes Business Warehousing, DataBase, ERP, Reporting tools and lot many.
Hadoop is an Open-Source product promoted by Apache. Hadoop is also a huge ecosystem with many allied tools like hive, impala, hbase etc.. However most of these tools cater towards Data Warehousing and Analytics.
A job role with Both SAP BI and Hadoop will be great and challenging.
Thanks,
Poovin
thanks poovin...
Hi Poovin,
How about the Performance of BO Reports, Is there any document on Performance.
Hi Venu,
Do you mean the performance of reports on top of Big Data (or) in general?..
the latest Information and how to Connect the BI environment to Hadoop Hive may be found here on the SCN Semantic Team Page.
http://wiki.scn.sap.com/wiki/x/FA-OFw
Jacqueline
Hi Poovin,
Great Article keep it up
Hello Poovin,
We are currently on SAP Business Objects 4.1 SP8 and trying to establish a JDBC connection with Cloudera Imapala and Hive through Simba drivers but we could not successful because cloudera enabled with Kerberos authentication so we are getting below error.
[Simba][ImpalaJDBCDriver](500164) Error initialized or created transport for authentication: CONN_KERBEROS_AUTHENTICATION_ERROR_GET_TICKETCACHE.
Do you have any idea on this error? or any documents related to this?
Cloudera version 5.7
Imapala version 2.5
Hive version 1.1
BOBJ on Red hat linux 6.x
Thanks,
Gana
Hi Gana,
I'm seeing the same error. Did you ever get past this error?
Hi Brian,
We have resolved this issue by enabling the LDAP authentication on Impala. Please let me know, if you need more details.
Thanks,
Gana
Impala query is much faster than Hive.
Do you see any performance issue in WebI report when it is trying to get data from Hadoop using Hive?
Regards,
vinesh
Hi Poovin,
Thanks for the document its bit old now but still useful.
Thanks,
Prateek