Skip to Content
Technical Articles

Cloudera/SAP Data Intelligence replaces Apache Ambari/SAP Data Hub

3 and a half years ago, I started to Send my Raspberry Pi sensor data to SAP Vora via Apache Kafka managed by the SAP Data Hub leveraging Apache Ambari:

This was the precursor to mostly exactly 2 years ago, Restoring my NEO Internet of Things Service scenarios with Apache Kafka and the SAP Data Hub.

About a year ago, I dipped into Building my first SAP Data Intelligence ML Scenario with TensorFlow.

All these scenarios to connect, discover, enrich, and orchestrate disjointed data assets into actionable business insights at enterprise scale rely on an Enterprise Data Hub.

Therefore, in this blog, I will describe how to setup a CDP Private Cloud Base Trial Installation:

Three hosts are the minimum for a healthy cluster with varying roles:

The nodes preparation is straight forward:

sudo locale-gen en_US.utf8
sudo update-locale LANG=en_US.utf8
sudo bash -c "echo 'vm.swappiness = 10' >> /etc/sysctl.conf"
sudo sysctl -p

As is the installation:

sudo su
wget https://archive.cloudera.com/cm7/7.1.4/cloudera-manager-installer.bin
chmod u+x cloudera-manager-installer.bin
./cloudera-manager-installer.bin

To start with, the Cloudera Management Server gets installed:

Next, I specify my hosts:

And select my JDK:

That gets installed accordingly:

Followed by the packages:

After successfully inspecting my cluster:

I select my services:

Which get installed respectively:

From there I can e.g., determine my Kafka TCP Port:

To adjust my Python code and get my Kafka messages flowing again:

#!/usr/bin/env python
import datetime
import bme680
from kafka import KafkaProducer
from json import dumps
sensor = bme680.BME680()
json_body = [
	{
		"millis": str(datetime.datetime.now()),
		"temperature": sensor.data.temperature,
		"pressure": sensor.data.pressure,
		"humidity": sensor.data.humidity
	}
]
producer = KafkaProducer(bootstrap_servers=['ambari:9092'], value_serializer=lambda x: dumps(x).encode('utf-8'))
future = producer.send('bme680', json_body)
try:
	record_metadata = future.get(timeout=10)
except KafkaError:
	log.exception()
pass

I hope this gives you a first impression on how to move from Apache Ambari/SAP Data Hub to Cloudera/SAP Data Intelligence.

Be the first to leave a comment
You must be Logged on to comment or reply to a post.