Connect your SAP Data Hub to SAP Vora and Hadoop
With the SAP Data Hub enabled on my SAP HANA, express edition and connected to BW and HANA I want to connect it to SAP Vora and Hadoop next:
Originally I had intended to use the SAP Vora Developer Edition, but that is currently based on SAP Vora 1.4 so I go for Vora 2.0 on Kubernetes with Minikube in GCP. The installation on the Google Cloud Platform is straight forward and well supported by a series of SAP HANA Academy You Tube videos:
As a result, I retrieve the Vora ports needed for the SAP Vora Spark Extensions installation on my Hadoop cluster later:
root@vora2:~/SAPVora-DistributedRuntime$ ./install.sh -s
############ Ports for external connectivity ############
vora-tx-coordinator/tc port: 31213
vora-tx-coordinator/hana-wire port: 31950
vora-catalog/catalog port: 32635
vora-tools/tools port: 31304
#########################################################
Next, I install the Hortonworks Data Platform 2.6 on SUSE Linux Enterprise Server for SAP Applications because unfortunately the respective VMware image does not work with the latest SAP HANA Data Provisioning Agent. Again, the installation with Ambari is straight forward, with two slight deviations from the installation manual:
- Set Up Password-less SSH
ssh-keygen ssh-copy-id -i ~/.ssh/id_rsa.pub HOSTNAME_OR_IP ssh HOSTNAME_OR_IP
- Edit the /etc/ambari-server/conf/ambari.properties file and add the following line to the end of the file:
security.server.disabled.ciphers= TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384|TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384| TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384|TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384| TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA|TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA| TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA|TLS_ECDH_RSA_WITH_AES_256_CBC_SHA| TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256|TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256| TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256|TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256| TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA|TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA| TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA|TLS_ECDH_RSA_WITH_AES_128_CBC_SHA| TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA|TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA| TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA|TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA| TLS_ECDH_anon_WITH_AES_256_CBC_SHA|TLS_ECDH_anon_WITH_AES_128_CBC_SHA| TLS_ECDH_anon_WITH_3DES_EDE_CBC_SHA|TLS_ECDHE_ECDSA_WITH_NULL_SHA| TLS_ECDHE_RSA_WITH_NULL_SHA|TLS_ECDH_ECDSA_WITH_NULL_SHA| TLS_ECDH_RSA_WITH_NULL_SHA|TLS_ECDH_anon_WITH_NULL_SHA
Subsequently the installation finishes smoothly with all the services that I need:
So, I continue with Installing SAP Vora on the Hadoop Cluster:
linux-p2i7:/home/frank/SAPVora-SparkIntegration # ./install.sh
SAP Vora 2.0 Spark Integration Installer: INFO SAP Vora 2.0 Spark Integration Installer
Cluster Manager (MapR, Cloudera, Ambari, Bare, FusionInsight) [ambari]:
Install support for Spark 1.6.x? [Y/n]
Install support for Spark 2.x? [Y/n]
SAP Vora 2.0 Spark Integration folder, must start with /opt/ [/opt/vora-spark]:
HDFS upload folder for SAP Vora 2.0 Spark Integration [/user/vora/lib]:
OS user for HDFS access [hdfs]:
Path to folder with datanucleus jars [/usr/hdp/current/spark-client/lib]:
SAP Vora 2.0 Spark Integration Installer: INFO Installing SAP Vora 2.0 Spark Integration
Attention: File Access
For the install process to run correctly, the files /tmp/vora-spark/lib/spark-sap-datasources-spark1.6.jar
and /tmp/vora-spark/lib/spark-sap-datasources-spark2.jar must be accessible to the hdfs user.
Please make sure that it can access the configuration.
Do you think the user has access? [y/N] y
Do you want specify the connection parameters for the SAP Vora Kubernetes cluster? [Y/n]
Transaction coordinator host: 35.189.104.251
Transaction coordinator port: 31213
Catalog host: 35.189.104.251
Catalog port: 32635
Catalog timeout in seconds [6]:
Do you want to configure authentication to the Vora cluster? [Y/n]
Vora authentication username: vora
Vora authentication password: Pr0file!
Path to folder where v2auth.conf is stored: /opt/vora-spark
Owner of the v2auth.conf file [vora]: root
Group of the v2auth.conf file [root]: root
Ambari User ID: admin
Ambari password:
Ambari cluster name: Sandbox
Ambari cluster address [http://localhost:8080]:
SAP Vora 2.0 Spark Integration Installer: INFO Running: /usr/bin/hdp-select versions
Hortonworks version [2.6.3.0-235]:
Path to host file [/home/frank/SAPVora-SparkIntegration/lib/../config/hosts.txt]:
SAP Vora 2.0 Spark Integration Installer: INFO Reading host file at /home/frank/SAPVora-SparkIntegration/lib/../config/hosts.txt
SAP Vora 2.0 Spark Integration Installer: INFO Parsed 1 hostnames from hostfile
Followed by Install SAP HANA Data Provisioning Agent:
./hdbinst --batch --path /usr/sap/dataprovagent --user_id=<dpagent>
And then Install SAP Data Hub Adapter:
./hdbinst --batch --path /usr/sap/dataprovagent/bdh --hadoopConfDir /etc/hadoop/conf --voraHome=/opt/vora-spark --sparkConfDir /usr/hdp/current/spark-client/conf
However, this adapter needs upgrading by replacing the respective jar file with the latest patch:
linux-p2i7:/home/frank # cp adapter-core/package/lib/adapters/com.sap.bdh.adapter.adapter-core-1.1.20.jar /usr/sap/dataprovagent/adapters/com.sap.bdh.adapter.adapter-core.jar
As a result, I can register my New System:
And two Connections, namely HDFS:
And VORA Catalog:
Finally, I discover the content of these connections:
In my next blog, I will leverage this configuration to Define a Data Pipeline.
Nice Blog!
Will surely help my finish our setup we are working on.
HI Frank,
I am currenlty done with the minimal landscape set up, but I can't connect to any system. I got the following error: "Unable to create and agnet with a zone. BDH Adapter xx.xxx.xxx.xxx:xxxx" is not reachable. Do you have an idea what is going there? Where should I get my agent host? Maybe I used the wrong adresse?
Nice Blog by the way!
Hi Frank,
Great blog for Vora and Hadoop connectivity with Data Hub.I have a query is it possible to consume Hadoop services given by Google Cloud Dataproc in place of Hortonworks Data Platform 2.6 in SAP DataHub.
Thanks in advance!!!
Regards,
Aamrin Firdoush
Hello
I have not tried Cloud Dataproc yet, but I do not see why it should not work.
Best regards
Frank
Aamrin and Frank,
I recently had an SDH 1.4 trial deployed on GCP and connected to a small 3-node Dataproc cluster that I created on the same network, and it worked with SDH fine.
I had to use the "manual" HDFS connection configuration (not default). To use the SDH-provided example HDFS graphs/pipelines, I had to swap out the HDFS operators with webHDFS operators. When trying the "regular" HDFS operator in the HDFS example pipeline, it couldn't mkdir a new directory in HDFS, so I just switched to webHDFS operators and voila, it worked.
When the SDH 2.3 trial is available soon, I plan to spin up a small Dataproc cluster for HDFS, and add other connections we'e interested in like BW/4HANA, S3 and GCS to prove out the SDH pipeline and governance/catalog/profiling capabilities. I noted in SDH 2.3 last week at TechEd , there's a new native GCP DataProc connection type that may make it even easier for SDH to connect to Dataproc.
Hope that helps!
Doug