What should I consider in terms of security in my cloud scenarios?
If you are one of those who constantly tried to avoid paying attention to security stuff because you either saw no value in it, or it was too boring or too complicated maybe you are not the only one in this team. But times change and the digital transformation along with cloud software benefits impose other rules and it is very important to understand a large set of pretty unrelated topics typically referred to as “security” and make sure they are all in place before going live. The rest of the document aims to explain this area in a very simple language focused on the needs and not so much on the conceptual correctness, I leave that part to real experts…
First let’s try to brainstorm the different aspects of security and how we would protect against it. Of course, and by no means this list is complete, but I think it may contain most of the topics we need to consider.
Part I- Threats brainstorming
A) Spying – Encripting the Data for Confidentiality
Threats and safeguards in data handling confidentiality
Information is transmitted in between 2 remote components over a network. The content of the transmission may contain private data, like usernames or passwords and other sensitive payload data like credit cards numbers. It could be a simple http request to invoke a web service or start a front-end application from a mobile phone. From the sender to the receiver (server) information will go through a long list of network devices (…all around the world…) and it could be saved and analyzed afterwards.
This is a data transport issue, normally it won’t happen inside the device (unless it is infected with a virus!), but there are standards to add confidentiality to the transmission, most probably you have heard talking about TLS (new SSL) for internet communications or even something called SSH for files transfer. We can search in Wikipedia to find out what those mean, but for us developers, what is important is to understand what needs to be done so that if the transmission is saved (e.g.) in a router, the content is not readable, or at least hard enough with the latest standards for the existing computers to break it.
So, like in many other situations in the security realm, the way to implement a confidential transmission is to use TLS/SSL. Many times, when the communication involves external communications (like in cloud environments) this is achieved with the help from a third party. That means it is not the sender or the receiver, but another well-known company knows as a trusted certification authority that helps validate by signing the certificates (which are small encrypted txt files). The server side must share its public certificate with the clients that are going to establish the confidential communication (https protocol), which will in turn store in their system the certificate. It is also required to store on the client side others certificates known as intermediate and root certificates which state that the company signing the end-user certificate is a trusted certification authority The certification authority is a company (you might have heard about Verisign, Symantec, Quo Vadis, Go Daddy, etc., etc.) that sells this type of service, btw SAP also provides this service.
This certificate won’t let the receiver know who the sender is, and it is not interested either, because the purpose of this process is to establish a confidential communication, the authentication process (who the sender is) will happen afterwards.
If your eyes are wide open surprised for the complexity of this process, let me propose a simple exercise to demonstrate that today you might have done it hundreds of times. Open a browser and access your webmail (works for some newspapers or search engines as well). You will notice somewhere in the address bar a lock icon, a notification that the site is safe or even the company name. If you are experienced enough to access the browser developer tools, you can see that on the fly the complete TLS set up downloaded to your browser the complete trust chain which involved at least 2 certificates used to establish the confidential channel. All in about 0.5 seconds.
For server to server communications (when browsers are not involved) this procedure is not automatic and the client side administrators (outbound) must save the certificates in confidential secure stores beforehand. The technologies to store the certificates are diverse. In ABAP world, the transaction STRUST allows to define many “PSEs” that are represented with folders where the certificates are stored. In the latest versions you should typically place all the end user, intermediate and root certificates user for client TLS in one PSE called “anonymous ”. On the other hand, in SAP NetWeaver Java stack based solutions the secure store for certificates is structured in views, and there is no anonymous view, furthermore the root certificates are stored in a view called “Trusted CAs”. Finally, SAP Cloud Platform Integration – Process Integration (formally “HCI”), uses a plain keystore with 2 sections where SAP and customers can upload security elements but without any hierarchical structure.
Basically, this is the procedure to establish a confidential HTTPs connection to the SAP datacenter to run any SAP Cloud Platform Service, i.e. storing 2 certificates SAP provides on the client side.
In many situations in general related to system to system integration, a secure transport is not enough and the payload is also encrypted for confidentiality (and also for integrity…and even more: to compress it.). Consider situations where the message goes through multiple systems where it needs to be partially modified. There could be several reason and techniques coming from different angles like PKCS#7and PGP. The adoption of these techniques differ from region to region.
Finally there is a 3rd level of encryption that does not corresponds to the transport or the complete payload, but the complete message implements a security algorithm. For more information on this, consider the WS-Security protocol that defines a large and flexible list of options to add security (from a pretty generic stand point and not confidentiality) and S/MIME as well.
Storing confidential data safely
What could happen if you miss your mobile device or computer from confidentiality perspective? Well, hopefully you have locked your device so that access to the storage is not simple, or much better if the storage is encrypted. All in all, there are certain pieces of information that must be specially kept confidential. We noticed lately how the Microsoft Windows 10 Sticky Notes and Eclipse basic authentication data moved to secure stores inside the laptop instead of plain file system. It actually makes more sense in a server where many processes are executed, today you can find secure store technologies in all devices including database or complete device encryption, what basically means that it is required to provide a password to access the information contained in the store (but that’s “authentication” and I have not explained it yet).
As explained, some pieces of information are far more confidential and those are kept safe behind a couple of passwords, like the certificates private keys also explained later.
B) Impersonation – Authentication for user identity validation
Before we access some application or functionality, it is required to validate we are who we say we are. There are 2 ways to set up this validation, either we have established a relationship with the provider (server) in advance or we involve a mutually trusted entity as we did before.
The first option is the easy one, and we simply get a publicly available unique identifier (“user name”) and a “safe” procedure to generate the private identified (“password”). Here it is important to remember the confidentiality is critical to exchange the password information safely using TLS/SSL the way it was described in the prior section. so that nobody else can see the password during the transport.
This is known as the “basic authentication” mechanism. It is also the easiest one, and have a broad set of use cases mostly oriented to business to consumer UX scenarios, where there’s a massive number of consumers who need a fully automatized way to set up the authentication and a rather simple authorization granularity if any. It has also some drawbacks, as the secret password lasts for a rather long time and for system to system integrations changing passwords is too complicated.
The second group of methods work a bit different, imagine ourselves arriving to another country. The first step is going through the immigration procedure in order to let them determine who we are (and later, but not now, later, determine if we are allowed to stay). For this purpose, we present the receiving country authorities a passport that some entity or organization (origin country here) printed with some information about us. In this process notice a couple of things, first, if instead of a passport printed by our country we present a paper I printed at home it won’t work. Also, the receiving country may require additional information like visa as well. Consider the required elements present here, one is the trust relationship between the receiving country and the sender country (issuer and receiver) and the other is the authenticity of the passport and Visa I present.
There are several technological options for this procedure, let’s take a look….
Certificates are similar to the basic authentication method from the perspective that I get a unique identifier (“distinguished name”) as part of the public key, and a private identifier (“private key”) but here the issuer is a mutually trusted entity, and not the application or service provider itself. The certificates last for a long time, in general about 2 years, have to be payed and unlike the anonymous ones we use for confidentiality purposes these ones are specific for the user. Technically these certificates with both a private and public key are known as “key pairs” (PKCS#12 for personal information) and application or service consumers sometimes are requested to present the public one to the provider, but the private one is typically kept secret protected with a password as well.
This is a good solution for scenarios where the consumer list is rather small and static, since getting a certificate takes time and cost.
SAML (from the authentication perspective!) defines the way the consumer, a service provider and a security provider at runtime interact so as to let the consumer negotiate with a single trusted identity provider the access to diverse service providers’ functionalities (“resources”). That is, the consumer will ask target service provider to run some application (…. access a page, or resource… the way you prefer), the provider will ask the consumer to talk to the identity provider, which will let the service provider know if the consumer is who it really says it is.
SAML does more that than in term of authorization as well, and also overlaps with OAuth to certain extent, but conceptually this flow is one of the key differences.
Again from the authentication point of view, let’s imagine we want to keep on using the same “login” functionality we had in the past to a single fully integrated system, but now using a highly-distributed cloud infrastructure. How would it be possible to simulate a login session? The answer for that is having a central server that handles session tokens accepted by multiple service providers .and a generic schema flexible enough to support a wide variety of clients, ranging from pretty insecure third party isolated self-contained web pages to static clients running inside a server. OAuth describes list of scenarios and procedures to provision authorization tokens to all possible clients.
Another benefit of this technology is that the token lifetime can be configured, so suppose that somebody is able to steal a token which has a 5 minutes lifetime, worst case the security issue will last for 5 minutes (actually there are more measures like single origin policy control that prevents unauthorized access from other places).
The Identity provider
Also in this process, it is important to understand that some mechanisms require the existence of a system or entity called the “Identity Provider” that is the one in charge of keeping the access data. In older systems, the system itself used to have a couple of tables with user data, but in highly distributed cloud environments, this entity is running in an independent place, loosely coupled and trusted by the applications that are going to be used. This schema is referred to as “Federated Authentication” where a central identity provider manages authentication for many distributed and technologically different systems and applications, and provide a “Single Sign-On” option from the user (aka “principal” perspective).
In SAP Cloud Platform, we have 2 services, the default SAP ID service (default authentication for development and testing) and the SAP Cloud Platform Identity and Authorization Service (SAP Cloud Platform IAS) running in isolated datacenters with a failover one, to guarantee maximum quality of service and availability. As a general rule, all identity providers connected to SAP Cloud Platform must be SAML 2.0 compliant.
Furthermore the identity provider could also be running on premise either in locations accessible via internet or even internal locations.
Also, in the business to consumer world considering the social networks, it is very practical to exploit the open authentication tools offered by the social networks (or similar), needless to say, normally the price to pay is sharing some marketing relevant data from my profile.
Suppose the following situation. I develop a portal based application to create purchase orders, but behind the scenes I have to point to 3 different back-end systems depending on the goods or services contained in the order. Most probably some type of mediation middleware (at least one!) like SAP Cloud Platform Integration will be part of the architecture to help route the order. From authentication perspective, the end user (aka “principal”) will authenticate against the portal, but those 3 back-end systems need to run the same authentication as well for the same user. This is what is called principal propagation, keeping the same user in every system involved. How this principal propagation needs to be configured is dependent on the architecture and can be seen in this blueprint:
Example of SAP CLOUD PLATFORM Principal Propagation (public release pending! – sorry)
C) Repudiation – Signatures for message Integrity
While we on one hand we experience authentication and authorization in first person on daily basis, integrity looks a bit more distant, nevertheless the need is very simple to understand in the context of system to system integration. In UI use cases, integrity is still relevant.
Suppose a corporate is sending a bank a request to pay € 10,000 to a vendor. The following 4 hypothetic “repudiation” scenarios may occur:
- Corporate: I have not sent any payment initiation request!
- Corporate: I have not requested to pay € 10,000 but € 1,000!
- Bank: I have not been requested to pay € 1,000 but € 10,000
- Bank: I have not received any payment initiation request!
The implementation for this also involves a certificate from a (let’s say…) mutually trusted CA. This is not a public key only certificate like the ones used to establish TLS/SSL, but a key-pair file with 2 components inside (PKCS#12) like before. The first component is a private key which is safely kept on the sender side (corporate in this example) and is able to calculate and encrypt the most important part of the message content (called a “Digest” using standards like PKCS #7 or XML-Signature) following the rules from the bank. The second component is the public key which the corporate shares with the bank, and it is only useful to decrypt that digest. So, let’s see how the process works step by step.
1- Non-Repudiation of Origin Implementation – “Proof of Origin”
The corporate calculates the message digest (probably payee identifier, amount, currency code and date) using the rules from the bank and encrypting it with the private key they have, then the encrypted message is sent to the bank along with the digest. The bank receives the message (all going through some authentication and authorization process with confidentiality in place) and using the key pair public key, the digest is successfully decrypted. At this moment, the bank understands the bank came from the specific corporate, so the origin is proven since nobody else has the private key to generate the same encrypted digest.
2- Non-Repudiation of Emission Implementation
The following step is for the bank to compare the result of the digest decryption with the message content using the rules it defined. If the content of the digest is the same compared to the content in the message, then the message integrity is guaranteed (nobody changed the content) and furthermore the digest value represents a valid proof that the sender issued the message.
3- Non-Repudiation of Emission from Receiver
Since the bank is supposed to decrypt the digest and compare with the payload content, but they have no private key to calculate the digest, there is no way to simulate a valid digest. In general, this method is combined with a second security factor to maximize security.
4- Non-Repudiation of Receipt Implementation
Industry standards define specific message flows for this purpose, but there is nothing in this generic context, so this requirement could be achieved with further logging or other elements. As an example, the blockchain technology used in Bitcoin transactions generate a proof that it was executed and cannot be altered by any means. The way it works is connecting transactions like chain-links, but every new link defines how the next one will look like, so affecting one link affects the complete chain up to the very first root link, that is the transaction identifier reported from the financial services provider side is sufficient proof that the transaction was successfully executed.
Understanding the purpose of encoding and reliable messaging
It might be a childish explanation for security experts, but since I have seen this many times I think it’s worth explaining. In the early days of IT, each English language character (and some others…) were defined with a number ranging from 0 to 127 (blame me for anything, except for not being pedagogical here!). It is clear that there are far more characters in all human languages than 128, so the standards changed to accommodate this, bringing also a bit of confusion, namely if we are sending something as simple as a German “ß”, or a Portuguese “ç” on the other side the recipient may receive some different character. Somebody came up with a good solution for this translating all the tiny bits of information into old English characters before being transmitted and reverting the process after the reception on the other side. When I see the transmission, it is not readable, but on the other hand it is encoded using a standard procedure (Base64) anyone can revert and read (e.g.: https://www.base64encode.org/ ), what is not possible with encryption. Long story short: encoding is not intended to provide confidentiality, but data integrity. If we ask our bank to pay € 1.000 to Jörg Nuñez then they need to see the correct payee name otherwise the transaction request might not be accepted, that is “JNrg NuAez” won’t get the money. That is, this encoding -even when it is not a security threat per se-, is a way to protect the information from an integrity hazard coming from a different nature, similar to messaging reliability in the transactional conversation context (e.g. WS-RM: Web Service Reliable messaging or SAP RM version).
D) Authorization Violation- Authorization control for access rights validation
Sometimes the authentication process is confused with the authorization check process, because they typically go hand in hand one after the other, but they are fundamentally different. It is not so visible in simple resources access control, but in a business platform with up to 5 levels of authorization control (existence, visualization, modification, create, deletion and execution) including business content filtering it is simple to understand. The user authorizations are checked all the time, when executing functionalities, displaying data on the screen, etc., but in general the user roles (…authorizations, …profiles, etc..) are cached right after the authentication succeeds.
But something most fundamental is to understand how in general most of this combined authentication and authorization process works from a conceptual perspective. That is, after successful authentication (I am who I say I am), the system needs to know what I am allowed to do by accessing the user authorization database and loading the roles.
In SAML I can map the SAML token attributes to specific app role or group to control what the user is allowed to do.
OAuth allows to determine the scope of resources the user is allowed to access.
E) Systems Overloading – Control for Denial of Service
This is basically sending unmanageable number of requests to a particular system to cause a denial of service. This type of threat is controlled with specific software and hardware preventing using statically defined rules as well as dynamic determinations based on incoming traffic monitoring.
Part II – Specific Malicious Techniques
Cross Site Scripting (XSS) and phishing
XSS Scripting is a vulnerability that mainly applies to browser based applications exploiting the rather open and flexible tools available in web technologies, interfering the system behavior generating a different functionality, and even when the main target of this threat is authorization control breach, this change in behavior could for example simulate an authentication request to grab data (aka trojan screen). This type of malicious attack is known as “phishing” when it is specially intended to steal passwords and it is not purely implemented via XSS, but also many others, e.g. using similar urls. This is where you start to pay attention to browser lock icons and the trusted CAs value in this matter, as well as the need to switch from basic authentication to something more advanced, and turn on the same origin policy in your browser that prevents any communication with any other site, other than the one you accessed.
Also, this threat affects authentication, authorization, availability and integrity, and is implemented interfering the UI or communication protocol using malicious code that access memory data to grab or change it.
This technique affects programs that generate SQL or ODATA instructions dynamically from user input (UI based or not), and changing the intended functionality to affect authorization or availability.
Refers to unintended attempts to change the system software (and thus behavior), in particular in web environment where the resources administration techniques are open, exploiting errors in default configurations web server and network administration. It may affect authentication and authorization among others.
Part III- Summary
Security deserves attention from day one during the cloud architecture set up, analyzing each of these aspects and most importing working with tooling that natively protects against these threats.