Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
marton_horvath2
Explorer

Introduction and background of technologies


Amazon recently introduced an interesting big data product called Athena. In this post I will go through how to connect a BusinessObjects universe with it.

What is Amazon Athena?


Athena is a query service that makes it possible to access data stored on Amazon S3 using standard SQL. (https://aws.amazon.com/athena/details/)

Probably the most distinctive features of Athena is that it is serverless. In practice it means that:

  • no need to configure and initiate virtual machines,

  • no cost for allocated CPU and memory


Amazon manages all these, and charges only based on the queries that you actually run. 

  • $5 per TB of data scanned


In order to put this price in perspective; and to simulate a real word example: I have generated call detail records (well, sort of) of an imaginary telecom company with about 5 million active users, initiating about 5-10 calls/users/per day. I would like to analyze the last 90 days without pre-aggregation. So I ended up with more than 3 billion records, or about 12-15 GB compressed.

With only 5$ you can probably run all the meaningful analytical queries on the full dataset. (60-80 queries) But if you partition and query your data wisely, you can get much more for your money. If the WHERE condition narrows down the “data to scan” to a single partition, you will only pay for the size of that partition.

In case you would like to stick with your on-premise hardware and software, Athena is based on Presto. https://prestodb.io/

Step-by-step


This step-by-step guide is going to be a bit long for a blog post, but detailed in return, so bear with me. The steps are going to be:

  • Loading and defining the database

  • Setting up the JDBC driver

  • Defining the Athena JDBC in BusinessObjects

  • Creating a relational connection in IDT

  • One more thing…


Loading and defining the database


At first you need to create an AWS account (https://console.aws.amazon.com/console/home) and create an S3 bucket (https://aws.amazon.com/s3). Note that Athena is not yet available in all regions, only atUS East (Northern Virginia),US East (Ohio), US West (Oregon), Europe (Ireland). It's best to create the S3 bucket in one of these as well.

Once you have a bucket, you can upload your uncompressed data in CSV with simple drag and drop, or also with SAP BO DataServices.

If you have your data on S3, you may already define your database schema in Athena, and immediately start querying the dataset. (https://aws.amazon.com/athena)

Athena uses Hive DDL. Here's an example table definition:
CREATE EXTERNAL TABLE CDR (CALL_TYPE String,CALL_RESULT String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3://[bucket name]/cdr'

To achieve better performance and lower price, I recommend converting the plain CSV to a column based and compressed format, for example Parquet.

For this part you need to step outside of Athena and initiate a EMR cluster with Hive. I will not go into details, but briefly the steps:

  • In Hive you need to reference the simple table,

  • Create a new empty table in Parquet format

  • With an insert statement load the data into parquet


(See: http://docs.aws.amazon.com/athena/latest/ug/athena-ug.pdf)

The DDL of a parquet table:
CREATE EXTERNAL TABLE CDR_PARQUET (CALLING_PARTY int, RECIEVING_PARTY int, CALL_TYPE String, CALL_RESULT String, CALL_DURATION_SECS int) PARTITIONED BY (START_DATE int) STORED AS PARQUET LOCATION 's3://[bucket name]/parquet/cdr'

Setting up the JDBC driver


Download the JDBC driver from Amazon: 

Before going further, as a baseline you may try to access your Athena database with SQL Workbench. Please find the detailed documentation on the above link.

In order to use the JDBC driver with Information Design Tool, you would also need to add the Amazon SSL certificate to the Java runtime. (SQL workbench doesn't require this, but IDT would throw an error if it can't find the certificate in the keystore.) You need to add the certificate to the java keystore with keytool.

Get the Amazon SSL certificate:



Next you need to find the Java Runtime Environment (JRE) paths. All of them: SAP and non-SAP, 32 and 64bit. For example: C:\Program Files (x86)\Java\jre1.8.0_121\bin

Once you located them, create a directory under each \bin as \certs and copy the Amazon certification there.

To import these certificates you need to use the keytool from command prompt. An example script:
set drive=C:
set rootdir=C:\Program Files (x86)\SAP BusinessObjects\SAP BusinessObjects Enterprise XI 4.0
set JREdir=C:\Program Files (x86)\Java\jre1.8.0_121
set SAPJVM32dir=%rootdir%\win32_x86\sapjvm\jre
set SAPJVM64dir=%rootdir%\win64_x64\sapjvm\jre
set SAPJVM64_7dir=%rootdir%\win64_x64\sapjvm_7\jre
set CERTdir=%JREdir%\bin\certs
set CERT=amazon_base64_x509.cer
set ALIAS=Amazon
%drive%
CD %JREdir%\bin
keytool -importcert -keystore "%JREdir%\lib\security\cacerts" -storepass changeit -alias Amazon -file "%CERTdir%\%CERT%" -noprompt
CD %SAPJVM32dir%\bin
keytool -importcert -keystore "%SAPJVM32dir%\lib\security\cacerts" -storepass changeit -alias Amazon -file "%CERTdir%\%CERT%" -noprompt
CD %SAPJVM64dir%\bin
keytool -importcert -keystore "%SAPJVM64dir%\lib\security\cacerts" -storepass changeit -alias Amazon -file "%CERTdir%\%CERT%" -noprompt
CD %SAPJVMdir%\bin
keytool -importcert -keystore "%SAPJVM64_7dir%\lib\security\cacerts" -storepass changeit -alias Amazon -file "%CERTdir%\%CERT%" -noprompt
pause

Note that you should get "Certificate was added to keystore" four times! Unortunately it won't throw a loud error message if it fails.

Defining the Athena JDBC in BusinessObjects


Copy the Athena JDBC driver to: C:\Program Files (x86)\SAP BusinessObjects\SAP BusinessObjects Enterprise XI 4.0\dataAccess\connectionServer\drivers\java

Locate the amazon.sbo file at: C:\Program Files (x86)\SAP BusinessObjects\SAP BusinessObjects Enterprise XI 4.0\dataAccess\connectionServer\jdbc

Add the following definition to the Databases node, similarly to the default Amazon Redshift.
<DataBase Active="Yes" Name="Amazon Athena JDBC">
<JDBCDriver>
<ClassPath>
<Path>$ROOT$/drivers/java/AthenaJDBC41-1.1.0.jar</Path>
</ClassPath>
<Parameter Name="JDBC Class">com.amazonaws.athena.jdbc.AthenaDriver</Parameter>
<Parameter Name="URL Format">jdbc:awsathena://$DATASOURCE$</Parameter>
</JDBCDriver>
<Parameter Name="Array Fetch Size">10</Parameter>
<Parameter Name="Family">Amazon</Parameter>
<Parameter Name="Driver Capabilities">Query</Parameter>
<Parameter Name="s3_staging_dir">s3://[your bucket name]/</Parameter>
</DataBase>

This needs to be done on the server and on the client machine of the IDT.

Creating a relational connection in IDT


Server (host:port): athena.eu-west-1.amazonaws.com:443


JDBC driver properties: s3_staging_dir=s3://[bucket]/athena_query_results



If all went well, we have a successfull connection!

One more thing...


Not the good type though. There's an issue: inserting a table with IDT on the data foundation fails.

I could not solve this straight, but found the following work-around:

  • create the same database schema in another database (tested on SQL Server)

  • create a connection and the data foundation

  • replace the connection to the previously created Athena one

  • right click the imported tables and Chage Quilifier/Owner

  • set the Quilifier as your Athena database name, and leave the owner blank


Conclusion


This setup connects SAP BusinessObjects' leading enterprise BI capabilties with one of the best and most affordable big data services.
4 Comments
Labels in this area