Hadoop and Predictive Analytics are some of the most exciting technologies for businesses today but are often seen as having a steep learning curve. While both are complex, getting started is simple thanks to the Hortonworks Sandbox providing the database and SAP InfiniteInsight making predictive analytics intuitive for both data scientists and business users. In just 3 easy steps, you can setup your own Hadoop cluster and tackle real predictive use cases!
First you’ll need to install 3 components:
1. VirtualBox for the virtualization environment: https://www.virtualbox.org/wiki/Downloads
2. HortonWorks Sandbox with HDP 2.2 image: http://hortonworks.com/products/hortonworks-sandbox/ [Go to ‘Download & Install’ tab and select either Mac or Windows for VirtualBox]
3. SAP InfiniteInsight 7.0: http://bit.ly/1t77brW [Trial]
Once you’ve installed Virtualbox, open up the Hortonworks Sandbox .ova file and it’ll automatically load it into your interface. Hit ‘Start’ and you now have a fully functional Hadoop environment!
Next we simply set up our connection from Hadoop to SAP InfiniteInsight using an ODBC connection. Download and install the driver here: http://hortonworks.com/hdp/addons/.
After installation, open up your ODBC Administrator and under the System DSN tab, “Sample Hortonworks Hive DSN” is now available.
Configure it with the IP address from the startup screen of your Hadoop environment, with the remaining fields shown below.
Test the connection and you have now successfully added Hadoop as a data source for InfiniteInsight.
TIP: Your <ip address>:8888 will be your homepage for Hadoop in your browser for accessing Hive, HDFS, and more
Now that everything is set up, you’re ready to do predictive analytics! Open InfiniteInsight and we’ll ‘Create a Clustering Model’ based on the sample tables in Hadoop. Select the ‘Data Type’ as ‘Database’ and select “default”.sample_07 that shows various job titles with the number of total employees and salaries.
TIP: Check out this great tutorial for uploading your own datasets into Hadoop: http://hortonworks.com/hadoop-tutorial/loading-data-into-the-hortonworks-sandbox/
On the next screen, hit the ‘Analyze’ icon and continue with ‘Next’ and ‘Generate’ leaving the default settings and voila, we’ve done it!
We’ve set up our Hadoop environment and performed a clustering analysis on the fly with SAP InfiniteInsight in 3 easy steps. Give it a spin and please leave any feedback below.