NOTE: It is assumed that the reader has experience with setup, configuration, and administration of The SAP® Adaptive Server Enterprise® (ASE), The SAP® Replication Server® (SRS), and HDP®.


Introduction

The SAP® Adaptive Server Enterprise® (ASE) is market leading database management system for Online Transaction Processing – it is a major part of the SAP® Data Management Portfolio for end-to-end, real-time data management (as shown in Figure 1 below).


ASE-SAP-Portfolio.jpg

Figure 1

The SAP® Replication Server® (SRS) enables business continuity with active insights, accelerating data recovery and access, with real-time replication, across the extended enterprise. SRS is an important part of the SAP® Enterprise Information Management® Portfolio as shown in Figure 2, while also being a standalone solution.

eim.JPG

Figure 2

HDP is based on the open source Apache Hadoop Data Platform, architected for the enterprise. According to Hortonworks®:

Hortonworks® Data Platform enables the deployment of Open Enterprise Hadoop – leveraging 100% open source components, driving enterprise readiness requirements and empowering the adoption of brand new innovations that comes out of the Apache Software Foundation and key Apache projects.

This comprehensive set of capabilities is aligned to the following functional areas: Data Management, Data Access, Data Governance and Integration, Security, and Operations.


NOTE: Replication to Hadoop Hive is supported on Linux X86_64 platform only.

Steps to Install HortonWorks HDP Sandbox

  1. Download the image HDP 2.2 on Sandbox for VirtualBox http://hortonassets.s3.amazonaws.com/2.2/Sandbox_HDP_2.2.4_virtualbox.ova
  2. Start the image via VirtualBox, please configure the base memory to at least 4GB and configure the network as Bridged Adapter which will make the VM (Virtual Machine) has own IP address for access outside.
  3. Access Hontonwork Start Page on: http://<VM IP address>:8888
  4. Explore HDP sandbox via webpage on: http://<VM IP address>:8000
  5. Log in the HDP VM via ssh root@< VM IP address > with password: Hadoop

Steps to Setup SAP Adaptive Server Enterprise with HortonWorks

  • HDP sandbox is a dedicated VM for a series services for Hadoop, so we will setup the Replication Server and ASE Server in our replinux machine, and the communication between them via IP network. (NOTE: replinux is an assumed host name for this example, replace this appropriately for your environment)
  • Our replication Hive connector is designed to generate Hive temporary text type file and load this type file to HDFS via Hive, so we need a NFS share folder. (NOTE: If RepServer and HDP are hosted on same machine, then NFS share folder is not required. Just make sure proper file access permission are established for both processes have appropriate directory/file access)

/wp-content/uploads/2015/12/srs_hdp_857002.png

Figure 3

Figure 3 shows a simple replication environment architectures with a source SAP Adaptive Server Enterprise (ASE) as the primary source and a SAP Replication Server (SRS) with minimum requirement is version 15.7.1 SP204, co-existing on the same node and HortonWorks HDP sandbox on a second Linux node, with a shared folder for staging Changed Data Capture files for a load into Hadoop Hive.

To setup the replication environment the user has to perform the following key configurations:

  1. Mount the share folder to HDP Virtual Machine, so that the SRS and HDP share a common folder. This folder is used by SRS to stage CDC files for upload into Hadoop Hive and HDP will use this folder to upload the staged files (invoked from the SRS Hive Connector).

       # mkdir /remote/repeng10                  /* On the SRS host */

       # mount -t nfs -o rw,nolock 10.173.20.12:/repeng10 /remote/repeng10 /* On the HDP host */

  1. Set the environment variable RS_HIVE_TEMP_FILE_PATH on SRS host to insure that they both see the staging files.
    1. Because Hive connector generates the hive temporary files (staged in the directory pointed to by the variable RS_HIVE_TEMP_FILE_PATH)  without read/write/exec authority for other group user, create a sybase group and repserver running user (gzhong) in HDP Virtual Machine and add all HDP users to the group (sybase), then reboot the system (example shown below):

# vi  /etc/group

gzhong:x:1020:

sybase:x:200:root,hue,gzhong,hive,flume,hdfs,storm,spark,mapred,hbase,zookeeper,kafka,falcon,sqoop,yarn,hcat,ams,knox

NOTE: Create sybase group and add id gzhong to HDP running machine – this is required because in this example SRS is running as id gzhong and its belongs to sybase group, to insure NFS access with permissions. More generally, this is to insure that HDP users like hue in this example have access to NFS mount files (CDC staged filed) in 640 operation permission, for upload into Hive.  

  1. Setup the interfaces file

#

# — These entries added by Sybase InstallShield Installer —

# — for the sample Replication Server —

# In this interfaces example, ZPDS represents the Primary Data Server, REP_hive represents the Hive Server and PIDS the Replication Server

#

ZPDS

        master tcp ether replinuxb25 6002

        query tcp ether replinuxb25 6002

REP_hive

        query tcp ether 10.172.219.254 10000

        master tcp ether 10.172.219.254 10000

PIDS

        query tcp ether replinuxb25 13515

        master tcp ether replinuxb25 13515

  1. Configure the ASE to Hive Replication Environment

#### On Source ASE, assuming that primary/source database pidb exists

% isql -Usa -PSybase123 -Sreplinuxb25.sybase.com:6002

> use pidb

> go

> create table dl_mat_tb (

col1 bigint,

col2 int null,

col3 smallint null,

col4 unsigned bigint null,

col5 unsigned int null,

col6 unsigned smallint null,

col7 tinyint null,

col8 numeric null,

col9 decimal null,

col10 numeric (10,4) null,

col11 decimal (10,4) null,

col12 float null,

col13 real null,

col14 double precision null,

col15 money null,

col16 smallmoney null,

col17 datetime null,

col18 smalldatetime null,

col19 bigdatetime null,

col20 date null,

col21 time null,

col22 char (20) null,

col23 varchar (20) null,

col24 unichar (40) null,

col25 univarchar (40) null,

col26 nchar (40) null,

col27 nvarchar (40) null,

col28 binary (2) null,

col29 varbinary (2) null,

col30 bit,

col31 sysname null,

col32 longsysname null,

col33 bigtime null)

> go

> sp_setreptable dl_mat_tb, true

> go

> grant all on dl_mat_tb to public

> go

> insert into pidb..dl_mat_tb values (0, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,  ‘2012-10-01 10:30:01.123’ ,  ‘2012-10-02 12:31:11.123’ ,  ‘2012-10-03 14:32:21.123’ ,  ‘2012-10-04′ ,  ’16:34:31.123’ , NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 0, NULL, NULL, ‘2012-10-05 20:35:31.123456’ )

> go

> insert into pidb..dl_mat_tb values (1, 2, 3, 4, 5, 6, 7, 8, 9, 10.1234, 11.2345, 12.456, 13.567, 14.678, $15.5808, $16.5808, ‘2012-10-01 10:30:01.123’ ,  ‘2012-10-02 12:31:11.123’ ,  ‘2012-10-03 14:32:21.123’ ,  ‘2012-10-04′ ,  ’16:34:31.123’ , ‘col22 char’, ‘col23 varchar’, ‘col24 unichar’, ‘col25 univarchar’, ‘col26 nchar’, ‘col27 nvarchar’, 0x2828, 0x0029, 0, ‘col31 sysname’, ‘col32 longsysname’,  ‘2012-10-05 20:35:31.123456’ )

> go

> insert into pidb..dl_mat_tb values (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,  ‘2012-10-01 10:30:01.123’ ,  ‘2012-10-02 12:31:11.123’ ,  ‘2012-10-03 14:32:21.123’ ,  ‘2012-10-04′ ,  ’16:34:31.123’ , NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 0, NULL, NULL, ‘2012-10-05 20:35:31.123456’ )

> go

> insert into pidb..dl_mat_tb values (3, 2, 3, 4, 5, 6, 7, 8, 9, 10.1234, 11.2345, 12.456, 13.567, 14.678, $15.5808, $16.5808, ‘2012-10-01 10:30:01.123’ ,  ‘2012-10-02 12:31:11.123’ ,  ‘2012-10-03 14:32:21.123’ ,  ‘2012-10-04′ ,  ’16:34:31.123’ , ”, ”, ”, ”, ”, ”, NULL, NULL, 1, ”, ”,  ‘2012-10-05 20:35:31.123456’ )

> go

> insert into pidb..dl_mat_tb values (4, 2, 3, 4, 5, 6, 7, 8, 9, 10.1234, 11.2345, 12.456, 13.567, 14.678, $15.5808, $16.5808, ‘2012-10-01 10:30:01.123’ ,  ‘2012-10-02 12:31:11.123’ ,  ‘2012-10-03 14:32:21.123’ ,  ‘2012-10-04′ ,  ’16:34:31.123’ , ‘col22 char’, ‘col23 varchar’, ‘col24 unichar’, ‘col25 univarchar’, ‘col26 nchar’, ‘col27 nvarchar’, 0x2828, 0x0029, 0, ‘col31 sysname’, ‘col32 longsysname’,  ‘2012-10-05 20:35:31.123456’)

> go

# Validate the inserts in the primary database

> select count(*) from pidb..dl_mat_tb

> go

  1. Key step – Create the replicate table in Hive Service on HDP.

#### On HDP/Hive, assuming that target database garydb exists with user hue and password sybase (if not please create this database garydb with a user hue with the passwd sybase). Note the current implementation to Hive replicate will treat it as a append only repository (supporting only ‘IUD’ operations). That is all source Inserts, Updates, and Deletes will be treated as Inserts on the target.

####  OP_TYPE represents the CDC operation type (Insert, Update, Delete), and OP_TS the source operation timestamp.

% beeline -u jdbc:hive2://10.172.219.254:10000/garydb -n hue -p sybase

> create table dl_mat_tb (col1 bigint, col2 int, col3 smallint, col4 varchar (20), col5 bigint, col6 int, col7 smallint, col8 decimal, col9 decimal, col10 decimal (10,4), col11 decimal (10,4), col12 double, col13 float, col14 double, col15 decimal (20,2), col16 decimal (20,2), col17 timestamp, col18 timestamp, col19 timestamp, col20 date, col21 timestamp, col22 char (20), col23 varchar (20), col24 varchar (40), col25 varchar (40), col26 varchar (40), col27 varchar (40), col28 binary, col29 binary, col30 tinyint, col31 varchar(30), col32 varchar(255), col33 timestamp, OP_TYPE int, OP_TS timestamp);

##### Create connection to primary DB on ZPDS.pidb using rs_init utility.

##### On SAP Replication Server, set up the connection profile to the replicate Hive database garydb, with user name access hue and passwd sybase

% isql -Usa -PSybase123 -Sreplinuxb25.sybase.com:13515

> create connection to REP_hive.garydb

> using profile rs_ase_to_hive;standard

> set username “hue”

> set password “sybase”

> go

> admin who_is_down

> go

> create replication definition dl_mat_tb_repdef  with primary at ZPDS.pidb with all tables named

    dl_mat_tb (

    col1  bigint,

    col2  integer,

    col3  smallint,

    col4  unsigned bigint,

    col5  unsigned int,

    col6  unsigned smallint,

    col7  tinyint,

    col8  numeric,

    col9  decimal,

    col10 decimal,

    col11 decimal,

    col12 float,

    col13 real,

    col14 float,

    col15 money,

    col16 smallmoney,

    col17 datetime,

    col18 smalldatetime,

    col19 bigdatetime,

    col20 date,

    col21 time,

    col22 char (20),

    col23 varchar (20),

    col24 unichar (40),

    col25 univarchar (40),

    col26 char (40),

    col27 varchar (40),

    col28 binary (2),

    col29 varbinary (2),

    col30 tinyint,

    col31 varchar (30),

    col32 varchar (255),

    col33 bigtime) primary key (col1)

> go

> create subscription sub_dl_mat_tb for dl_mat_tb_repdef with replicate at REP_hive.garydb without holdlock direct_load

> go

> admin who_is_down

> go

#### On the Hive server, run a select to verify the initial load (data synchronization)

% beeline -u jdbc:hive2://10.172.219.254:10000/garydb -n hue -p Sybase

> select * from dl_mat_tb;

Continue replication with delete and update operations

#### On the ASE server, simulate the CDC operations

% isql -Usa -PSybase123 -Sreplinuxb25.sybase.com:6002

> insert into pidb..dl_mat_tb values (5, 2, 3, 4, 5, 6, 7, 8, 9, 10.1234, 11.2345, 12.456, 13.567, 14.678, $15.5808, $16.5808, ‘2012-10-01 10:30:01.123’ ,  ‘2012-10-02 12:31:11.123’ ,  ‘2012-10-03 14:32:21.123’ ,  ‘2012-10-04′ ,  ’16:34:31.123’ , ‘col22 char’, ‘col23 varchar’, ‘col24 unichar’, ‘col25 univarchar’, ‘col26 nchar’, ‘col27 nvarchar’, 0x2828, 0x0029, 0, ‘col31 sysname’, ‘col32 longsysname’,  ‘2012-10-05 20:35:31.123456’ )

> go

> update  pidb..dl_mat_tb set col2=20 where col1=1

> go

> delete from   pidb..dl_mat_tb where col1=2

> go

##### RepServer

> isql -Usa -PSybase123 -Sreplinuxb25.sybase.com:13515

> admin who, sqm

> go

#### On the Hive Server, validate the changes

% beeline -u jdbc:hive2://10.172.219.254:10000/garydb -n hue -p Sybase

> select * from (select *, max(OP_TS) over (partition by col1) last_ts from dl_mat_tb) newtable where newtable.OP_TS = last_ts and newtable.OP_TYPE != 3;


Resources

The SAP Adaptive Server Enterprise:

http://www.sap.com/pc/tech/database/software/adaptive-server-enterprise/index.html

The SAP Replication Server:

http://www.sap.com/pc/tech/database/software/sybase-data-replication-software/index.html

Hortonworks:

http://hortonworks.com/

Acknowledgement

I would like to sincerely thank Ye, Guo Gang (guo.gang.ye@sap.com) and Zhong, Gary (gary.zhong@sap.com) for their contributions and review.

To report this post you need to login first.

2 Comments

You must be Logged on to comment or reply to a post.

Leave a Reply