Enterprises today are increasingly keen on harnessing the power of BIG Data for decision-making. While a few have made strong inroads into BIG Data and reaped benefits, realizing its true value has been elusive for most. For one, in order to benefit from BIG Data, it is essential to understand the business, and have a specific problem identified that needs solving.
Once the above is identified, one needs to look at the sources of information – structured and unstructured – that could come to aid. A mechanism to capture, analyze and interpret data has to be designed at this stage. An understanding of technology, tools and the various innovations in BIG Data is key to this step. Finally, the data may have to be integrated with structured information already available in the enterprise to make specific decisions.
We will focus on the last step mentioned above by integrating SAP BusinessObjects with Hadoop.
Keep in mind that despite having robust features, platforms such as Hadoop are dependent on batch processing, which make them unsuitable for real-time analytics and reporting. The powerful combination of SAP and Hadoop can bring together agility, scalability, flexibility and affordability needed to fully tap the potential of BIG data for effective decision making.
We used the following approach to achieve SAP BusinessObjects integration with Hadoop. Click here to download step by step instructions.
Step 1. Install and configure multi-node Hadoop Cluster
Hadoop cluster installation was done sequentially to ensure easy configuration and maintenance. The initial installation was done in a single machine with all services running in a single node (master node). It was then extended to two other nodes (slave nodes). Once the cluster became operational, all the different file types were loaded into the system. The Hadoop cluster replicated it across all available data nodes. The cluster was then tested with a conventional word count program. This verifies communication between various Hadoop nodes and validates their ability to perform distributed operations.
Step 2. Install and configure Hive Data Warehouse
In contrast to other data warehousing tools that extract, transform and store/load data into target systems, Hive extracts and transforms data at run time without having to load or store it in a target system. Hive converts and processes user tasks (defined using HiveQL statements) using the Map-Reduce distributed framework. Map-Reduce deploys the job tracker to split a single job to multiple sub tasks and returns the results back to the master node.
We then performed text analysis by running word count on a text file that contained a speech delivered by Richard Stallman on Free Software was performed. Hive schema was defined on the file which was then queried with HiveQL statements to obtain the respective word counts.
Step 3. Integrating SAP BusinessObjects with Hadoop
In order to run reports on Hadoop, SAP BusinessObjects was set up on top of Hadoop Hive-JDBC connectors. Information Design Tool (IDT) was used to configure the setup. Once you complete this, you would be able to use an array of BO front end tools that can report on Hadoop – including SAP Dashboards, SAP Web Intelligence, SAP Crystal Reports, SAP BO Explorer, Visual Intelligence and more.