Cloud computing brings a number of compliance risks to organisations because of the lack of transparency that it brings. Some regulations like the EU Data protection directive can restrict the location of the processing and storage of sensitive data, in particular personal identifiable information. A major problem for cloud service consumers (CSCs), acting as data controllers, is how to demonstrate compliance to data transfer constraints. However, currently cloud service providers (CSPs) often do not provide transparency on how they handle sensitive data and where it is stored. And even if so, they rarely give supporting evidence. As a result, the CSCs’ sensitive data may get spread across several parties and countries across service supply chain without knowledge and control from the CSCs. This is one of the reasons why some of organizations are reluctant to transferring their IT processes to the cloud. The figure below shows an example of proliferation of sensitive data in the cloud resulting in an unauthorized transfer to an untrusted server.
Of course, there are preventive means to protect sensitive data in the cloud, such as the use of encryption, access control, etc. However, security specialists tend to converge on the fact that preventive approaches are unlikely to ever provide required level of protection and, instead, detective mechanisms that support accountability are required. This is the main idea behind the EU FP7 A4CLOUD (Cloud Accountability) project in which I and my colleagues from Product Security research team at SAP Labs France participate. We argue that CSPs providing transparency over the handling of sensitive data supported by reliable evidence will have a competitive advantage, especially considering tightening of the data protection regulations in EU.
This is equally relevant for SAP with its increasing portfolio of cloud offerings at all software, platform and infrastructure levels, such as Business ByDesign and HANA Enterprise Cloud. These applications may store highly sensitive data, such as business confidential data, personal data, or payment card data. Providing the cloud consumer with not only the assurance that their privacy preferences are enforced correctly but also a way to verify that can help to win additional customers concerned with privacy and security of their data.
One outcome of our work is a paper on “Monitoring data transfers in the cloud” which I presented at the IEEE conference CloudCom 2013. In this work we strive to address the lack of mechanisms to support accountable data localization and transfers control across cloud software, platform and infrastructure services. We introduced a framework for automating the collection of evidence that obligations with respect to transfers of sensitive data are being carried out in the cloud service supply chain. We experiment our approach in the OpenStack open source IaaS implementation, showing how interested parties can verify whether data transfers were compliant.
The proposed architecture consists of trusted monitors — Data tracking monitors (DTM) — introduced at each party in the cloud service supply chain and an accountability service.
A DTM is managed by a trusted third party, possibly accredited by a Data Protection Authority (DPA) — in case of personal data. Their purpose is to monitor data transfer events invoked through service API to verify compliance to the data transfer policies (authorizations) granted by the data controller and/or DPA. DTMs record all parties that processed the data for a cloud consumer at all times and makes it available for further analysis by the data controller or DPA.
The data can move horizontally (e.g. from one SaaS provider to another) or vertically (e.g. from a SaaS provider to a PaaS or IaaS provider) along the cloud service supply chain. Both vertical and horizontal data transfers are monitored in order to provide accountability.
The figure below shows the architecture of the DTM.
The service API calls from cloud consumers (tenants) are proxied by a Traffic Analysis component that is service specific and translate data transfer calls to normalized events that a logged and further translated into logical facts using PyKe, a prolog-like extension of Python.
This information along with the cloud topology description is further used to enable CSC (data controller), CSPs (data processors) and internal or external auditors to querry DTMs for data transfer events over time to answer:
- Who holds the data set relating to a cloud consumer?
- Who processed the data over time?
- Where has the data been physically over time?
In order to be able to deduce the physical location a mapping should be provided also between the virtual machine, physical host, availability zone. In the simplest case this is just a VM to country mapping that is provided by the infrastructure CSP. More elaborate approaches can use geolocation techniques based on IP address or timing measurements .
This solution is not lacking shortcomings, which were described in our paper. The main one is the possibility for the CSP, or someone working on his behalf (e.g. a malicious insider) to bypass the DTM and transfer the sensitive data to an unauthorized location by other means then using service API.
A way to address this is to monitor also data transfers at a lower abstraction level (e.g. operating system and network level). This may involve e.g. tagging files that handle sensitive data and monitoring file operations. Another problem arises when an untrustworthy service application mangles the data and transfers it to another location as an insensitive data.
In summary, moving sensitive data to the cloud does not have to mean loosing control over it. With the tightening of the data protection regulation
and the saturation of the cloud market SAP and other CSPs will have to develop ways to demonstrate their compliance to their customers and hence be accountable for failing to do so. Here we illustrated a possible approach to provide such transparency by monitoring sensitive data transfers along the cloud service provision chain. Further work in the scope of the A4CLOUD project includes implementing a framework for enforcing personal data handling policies, which goes beyond mere data location policies.
Nice and pragmatic perspective to solve a burning issue. Since our applications are still largely manipulating (very) structured data, SAP Cloud can be in the fortunate chance to know the semantic of the data to be exchanged and therefore be able to construct the audit with only minimum overhead. The Hana layer and its "append" only policy may also give some advantage to provide the history of the exchanges. This may enable us to provide not only a monitoring service - but a clever one able to offer contextual, customizable and on-the-fly perspectives (eg. security, fraud, compliance) suitable to 'simplify' its consumption- by the stakeholders. While personal information is important, customers, prices, IP related informations should also be part of your expermental playground.