As you may know, SAP Cloud for Analytics (C4A) supports remote data sources. This allows for a hybrid approach where you are still not ready to go completely cloud, but want to utilize the fast and flexible deployment of a cloud solution.
The purpose of this blog is to explain what data is sent, where it is sent, and how it is secured.
The setup details are available in the online guides here:
In this remote data access through reverse-proxy scenario, your data is NOT sent to, or stored in C4A. The communication is actually between the browser and the system.
The job of the reverse proxy is to make the two systems, C4A and your on premise data source, appear as one system to the browser.
This is necessary to handle a security measure in browsers called cross-origin resource sharing. In short, your browser will by default reject any interaction requests from another domain that is different than the one you’re visiting.
Below is what the communication looks like at a high level. “Remote data” in this case will actually be your local on-premise data like HANA or BW.
How it all works:
As previously mentioned, the actual data is not send to, or stored in C4A.
C4A provides the business logic, and build the queries required to see your data to your browser. Your browser in turn sends those queries, through the reverse proxy, down to the on-premise database. The results of those queries are returned to the browser, where any charts etc are rendered. If your query was a list of profits per customer, none of that information would actually return to C4A.
Throughout the whole process, the browser is actually interacting with the reverse proxy, which in turn sends out the requests to C4A or the remote data source depending on the path of each request.
http://mydomain.com/mysystem/fetchdata might go to your remote data source, while http://mydomain.com/c4a/render might go to C4A to get the business logic of how to display the data. To the client, it is all transparent and looks like they’re interacting with one site.
So what IS stored in C4A?
The metadata. The queries for building the stories, measure names, columns names, filter values. Basically the metadata required to replay your last query. But none of the actual data, not even the query results like totals.
End to end SSO can be accomplished with SAML. In order for this to work, both C4A and your on premise data source must be configured to trust the same identity provider, such as your Active Directory using ADFS (active directory federation services).
This means that the data security implemented at the source data will always be respected with every request.
All communication between your browser and C4A is automatically encrypted. The on-premise communication from your reversed proxy to your remote source should also be encrypted using TLS. All data persisted on C4A (yes, even just the metadata) is also fully encrypted.
In order for the business logic running on the C4A side to build the queries required for your browser to pass down to your data, you need to configure the proxy path in the C4A administration. This is all detailed in the user guide. However the important thing to understand is that the C4A system will construct the queries using this reverse proxy path, so that your browser can execute them. Again, the actual query and the results are between the browser and the data source. C4A does NOT communicate with your on-premise data in this case.
Understanding security implications:
In this setup, the browser communicates with the reverse proxy which passes on requests to your data source.
Your data source, let’s take HANA here, will reject any unauthenticated requests.
Are you opening up your on premise data to any internet requests?
That depends on your setup. The access is always gated with authentication being required.
However you have the option of taking it further.
Unless the reverse proxy is actually exposed and accessible to the internet, running in your DMZ, users will be limited to be on the corporate network in order to use this setup. So it is possible, if you so desire, to limit users to be on the corporate network (or using VPN) in order to be able to access the system. What amount of flexibility for accessing the data you expose is going to be in the hands of the administrator here.