Skip to Content
Technical Articles
Author's profile photo Jay Thoden van Velzen

Public Cloud Infrastructure Compliance Scanning at SAP with Chef InSpec

One of the most important security requirements for public cloud is to ensure that misconfigurations in the landscape are avoided, and if not, quickly remediated. Misconfigurations can leave landscapes inadvertently exposed and vulnerable, without the operator being aware. Previous research has shown this, but this recent report from Sonrai Security mentions cloud misconfigurations as a growing problem. Well-known incidents like the Capital One data breach in 2019 have shown how damaging these misconfigurations can be.

Scanning for these misconfigurations impacting security is called Cloud Security Posture Management (CSPM) in the industry, tied to common security policies and frameworks (CIS benchmarks, NIST 800-53, etc.), as well as certification requirements like ISO, SOC and PCI. SAP has been conducting CSPM scanning centrally across the landscape for years through our Multi Cloud Security Operations team, and has deployed two different tools on the market. We are now well underway in deploying a third, and this blog describes our journey to get there, and where we’re headed.

SAP’s Size & Growth Rate in Public Cloud

I mentioned in an earlier blog that one of the biggest challenges we have is scale and growth rate. Since then, the landscape has grown further to now over 9,500 active cloud accounts and over 11 million cloud resources deployed in them, across AWS, Azure, GCP and AliCloud. SAP has doubled its use of public cloud each year for the past three years and in 2021 that growth rate has accelerated at a higher rate. As a provider of solutions to customers operating on public cloud, and through SAP’s own internal public cloud use, this growth will only continue.

Number%20of%20cloud%20objects%20deployed%20%28projected%20and%20actual%29

Number of cloud objects deployed (projected and actual)

The market for security tooling in public cloud is not very mature, and much that is available has trouble scaling to our volumes. The important question is, how do we solve the security problem as growth continues? Compliance scanning is a non-negotiable requirement, but at this scale there are not many options available, as we have to consider costs, that limits options even further.

SAP is committed to a multi cloud strategy. This is not common and vendors will prioritize the most obvious IaaS providers. The situation has improved since, but coverage of GCP or Azure was not always a given a few years ago, plus support for AliBaba Cloud is rare. Support by the vendor may also not be global. Many security vendors don’t have operations or partners in China, which is an important market for SAP.

Complex Organization

SAP is a large organization of over 105,000 employees across multiple board areas. Use of public cloud naturally dominates in the development organizations for platform, products and services, but is spread across various board areas, of all kinds of different teams of different sizes. There are customer-facing landscapes, but also many internal systems of various kinds – from development landscapes, to training and demonstration environments.

The breadth of the SAP portfolio of solutions offered in public cloud alone means there are many different organizations within the company involved, with a variety of development tooling and pipelines and different ways of operating. We must accommodate and approach this in different ways and make it as easy for them to avoid security misconfigurations in the first place. Where they do occur, we must ensure that they are quickly followed up on, and ideally as early as possible in the development lifecycle.

The variety of use cases also means that teams may have valid business reasons for exceptions. Managing security exceptions – only granted after careful consideration and with appropriate controls in place to limit the risk – on this scale is a challenge in its own right, and SAP has developed our own solutions to manage this process.

Finally, when as a team we run central compliance scans and call other teams in the company to account on the basis of them, it is important that the alerts generated are valid and correct. At this scale and across this wide variety of landscapes and cloud providers, it is inevitable to run up against false positives. If you depend on a vendor to investigate and potentially correct alerts generating false positives this causes an inevitable time delay that will need to be explained to the various teams involved and can take considerable time. Such false positives erode trust in the quality of the compliance scans and causes organizational reporting issues. It also makes teams less inclined to follow-up on misconfiguration alerts with high priority.

Taking Control: CSPM with Chef InSpec for Public Cloud

We first reviewed Chef InSpec in 2019. Then when we started working on our secure-by-default public cloud infrastructure planning at the end of that year, it came strongly recommended by an expert from one of our hyperscaler partners which confirmed our thinking. The Multi Cloud SecOps team was confident that Chef InSpec could be containerized and run within a Kubernetes cluster and as a result scale as needed.

SAP Global Security defines policies and hardening procedures for public cloud that are abstractions from common public cloud baseline security requirements for common security frameworks, certification audits, regulatory reasons or contractual terms and conditions. These policies match, but don’t necessarily fully align with the compliance checks included in CSPM tools on the market and we need to be able to adjust the ruleset to our policies. Chef InSpec, as compliance-as-code, allows us to do that in whatever way we might need.

With the CIS Benchmarks controls already available through their open source community, there was a good base to work from to modify them to SAP’s needs and accelerate our development work. The code base being open source allows us to add new functionality and even cloud platforms to Chef InSpec’s capabilities, and therefore gives us the freedom and control to implement the detective controls we want, as long as the public cloud provider’s API supports it.

With the support of the Chef team we deployed a first MVP at the end of last year, provided initial AliBaba Cloud coverage, and are now on track to completely move public cloud infrastructure compliance scanning across all landscapes to this solution.

Scale as Needed

The containerization is a success. We now run a fully private Chef InSpec Kubernetes cluster of three nodes that scanned the entire landscape (around 8 million cloud resources) in three hours, while taking over 900 exceptions (or waivers) into account. During this test, we ran up to 280 containers, but during normal operations this is 150. Depending on need as the number of cloud resource grow, we can both dial up the number of nodes and have room in the length of time the scan can run. With the most critical misconfigurations covered by our preventative controls, daily scans are considered timely enough for the organization to absorb, leaving us the opportunity to dial the knobs to keep operating costs under control as the cloud environment grows.

Chef%20InSpec%20Production%20Kubernetes%20cluster

Chef InSpec Production Kubernetes cluster

Shift-Left: Empower Teams and Solve Misconfigurations Early

Containerizing Chef InSpec has additional benefits. It gives us a highly flexible tool for teams throughout the company to use in order to manage the compliance of their cloud accounts. We provide a consumer version of the container that can be run through a “docker run” command line and thus will run wherever Docker runs. Developer teams can run this interactively, integrate it into their development, testing and deployment pipeline, whatever the toolset in use by the respective team. This allows teams using public cloud to “shift-left” and adopt DevSecOps practices for public cloud infrastructure, as well as verify the status of their cloud accounts whenever they want. This also helps during any remediation exercise by developer teams, in being able to confirm instantly whether a configuration change brings the cloud account into compliance.

Control Coverage

We have now a total of 60 controls implemented across AWS, Azure, GCP and AliBaba Cloud, with detective coverage – depending on hyperscaler capabilities – in the following areas:

  • Public storage buckets
  • API logging centrally collected with appropriate log retention (minimum 6 months) and not publicly accessible
  • Internet exposed admin ports and common database ports
  • Disk volume and storage bucket encryption
  • Encrypted communication for storage accounts
  • TLS 1.2+ SSL policies
  • Password policy, MFA and verified corporate identities for cloud admins
  • Kubernetes master node logging
  • KMS configuration and key rotation policies

Over the coming months this control coverage will be extended to another ~200 “medium” severity controls to match (and go beyond) the ruleset of the authoritative CSMP solution currently in place.

Tools Don’t Solve Problems, People Do: Organizational Support

Ultimately, tools don’t solve problems, but people do. Whether caught early during the development and testing pipeline, as part of a deployment, or during operational central scanning, teams need to follow up.

We have built up a support structure within SAP at multiple levels – from notifications to account owners, to direct interaction during weekly office hours with security experts and stakeholders within the business units, to executive reporting and weekly follow-up meetings with board area representatives to ensure any outstanding misconfigurations are responded to with the appropriate urgency. Scanning alerts are enriched with account metadata and organizational structure to facilitate security analytics and assignment of responsibility to the appropriate teams. This is already in place with our existing toolset and has proved very effective in ensuring accountability throughout the organization.

The integration becomes even tighter with the transition to Chef InSpec as there is a guarantee that the ruleset scanned for centrally in daily operations across the landscape is the same as the ruleset developer teams can scan for during the lifecycle of their cloud accounts.

This integration also allows us to work efficiently with teams directly to deal with any suspected false positives. The control set compliance-as-code itself is available to inspect by developer teams and they are very explicit in what they check for. Teams can submit pull requests or reach out to the Multi Cloud team directly to work with us to test and if necessary correct the control. Since this is all an internal process, the turn-around time is much quicker. The transparency alone raises confidence and trust.

Open Source: Flexibility for SAP, Benefits Resonating Beyond

Not only can we develop our own control set, we can expand coverage where we need through the Chef InSpec Open Source process, both for the resource pack and for the back-end system. Through this collaboration there is now support for AliBaba and we are adding support for new features and API changes for the different platform providers as our policy controls require.

It provides SAP the flexibility in the future to potentially support additional cloud providers should business needs move in that direction, as well as respond quickly to new and changing security requirements coming from SGS as changes in technology and cybersecurity require. It also provides other Chef InSpec users to benefit from those enhancements.

Next Steps

With the coverage already in place, SAP provides great value to teams inside our company to support stakeholders in their security compliance needs. Before the end of 2021 we plan to complete the build-out of medium severity controls, as well as refine the reporting pipeline, dashboarding and data delivery to internal stakeholders, including API access by teams to integrate scan results into their own workflows.

We’ve since found additional use cases like “right-now” compliance scans to support security incident response (beyond the daily alert stream already going into our SIEM environment), as well as during service requests and internal support calls with developer teams.

CALL TO ACTION – For more information, please reach out via a comment below!

Assigned Tags

      2 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Johannes Goerlich
      Johannes Goerlich

      Thank you Jay for sharing all these details. Very interesting to read how other big players approach the same problems.

      Best regards

      Joe

      Author's profile photo Atanu Mandal
      Atanu Mandal

      Great article, this is one of the real time example- how MC and SGS working together to overcome cybersecurity challenges in growing public cloud landscape and adding values. Worth to read through.. thanks Jay Thoden van Velzen

      thanks & Regards,

      Atanu