Perform more accurate vulnerability reviews by code-centric analysis
This blog post is part of a series outlining the processes and tools that we at SAP apply for a secure handling of our open–source supply chain. If you haven’t read the introductory blog post, you may want to read it first here. Find below the references to the correlated blog posts:
- Introduction to SAP tools supporting the secure development process with Open Source
- Support the selection of Open Source by security ratings
- Performing security scans early in development
- Perform more accurate vulnerability reviews by code-centric analysis (the current article)
If you have followed the previous blog posts, you have understood that it is important to carefully review and select open source components prior to including them in your software development project. CII badges, Fosstars and comparable initiatives and tools support this selection: They show if open source projects follow security best-practices, meet your quality criteria and whether they are alive and kicking.
Still, no matter how carefully you chose, your dependencies will have bugs and vulnerabilities that potentially affect your application and thus its users. This blog post will explain how to detect, assess and mitigate such vulnerabilities in your dependencies.
To that regard, it is important to understand one difference between software development and car manufacturing: Whatever is built into a car can potentially be used throughout a car’s lifetime, take the spare wheel, seatbelts or air conditioning as examples. Every part of the car is needed in one or another use-case or event. The majority of today’s software applications, however, comprises open source code that can never be used, no matter the use-case and program input.
This phenomenon is a variation of a problem called “software bloat”, and results from the fact that developers typically only use a fraction of the functionality of upstream open source components. Code related to un-used functionalities, as well as their upstream dependencies, are nevertheless pulled into your project, even though they can be considered as dead code in the context of a specific project.
Coming back to vulnerability assessments, it is obvious that vulnerabilities in un–used, bloated code do not need the same attention as vulnerable code that is executed on a regular basis.
At this point, you could argue that it is cheaper to simply update vulnerable components than to travel on the long and bumpy road of issue prioritization and review. Indeed, that is often true: You just bump a version identifier in order to pull a non-vulnerable release of a given open source component.
However, it quickly becomes trickier if we talk about transitive dependencies, versions that are not supported anymore, components that do not follow semantic versioning, or <insert other developer nightmares here>. Not to mention customers of on-premise software, who must download, test and install every single update you produce.
Other phenomena in the world of open source consumption are re-bundling and re-packing, where code from one open source project is included in artifacts of other projects, e.g., to create self-contained executables or to avoid name-clashes. This common practice makes it difficult for tools relying on artifact metadata, e.g., digests, file names or package identifiers, to understand the origins of code and thus to identify components with known vulnerabilities.
To cater for the above-described phenomena, we advocate the use of code-centric analyses to identify, assess and mitigate the use of open source code with known vulnerabilities. In this context, SAP initiated two projects, Eclipse Steady and Project KB. They go hand in hand, and both have been released as open source itself.
Project KB – Open and collaborative exchange of code-level information about open source vulnerabilities
Code-centric analysis requires to know which code fragments are vulnerable and how the vulnerable and fixed code looks like. So far, this information is not covered by public vulnerability databases like the CVE/NVD. A few smaller open source projects like the Vulncode-DB try to close this gap, but they are by no means as comprehensive as CVE/NVD or proprietary databases maintained by commercial players.
We initiated Project KB to collect and share such code-level information in an open and collaborative fashion way in developer and researcher communities. To this end, Project KB comes with a YAML format to capture such information (see Figure 1 for an example), a tool to create, download and merge it, and an actual dataset.
The YAML format references so-called “fix commits”, which are the commits of open source projects that fix given vulnerabilities, as well as package identifiers of projects using the PURL specification, in order to denote versions that are known to be affected by a vulnerability or free of it (non-affected).
The tool can be used to create, export, import and merge such YAML files. Merging becomes necessary as we opted for a distributed model, where information consumers can pull vulnerability information from different sources, e.g., individual open source projects disclosing vulnerability information by themselves in their Git, open source foundations covering the disclosures for all their projects or 3rd parties. The distributed character makes it possible to scale the maintenance and management, and in addition allows private data repositories with complementary information, e.g., specific remediation recommendations, and additional vulnerabilities for non-public components.
Finally, we also published 700+ vulnerabilities for Java components that we have manually curated over the course of several years, when running Eclipse Steady productively at SAP.
Eclipse Steady – Determine the reachability of vulnerable code
As mentioned above, not all code, of all open source dependencies, is used or usable in a specific software application, and Eclipse Steady offers a unique combination of static and dynamic analysis techniques to determine the reachability of vulnerable code.
Starting from the fix commits available in Project KB’s database, Eclipse Steady determines the signatures of vulnerable methods as well as the abstract syntax tree of the fixed and vulnerable method bodies.
This information is used to improve the detection of vulnerable code, no matter if it is contained in the original project artifacts, or whether it has been re-bundled. In a second step, Eclipse Steady builds the call graph of the application in order to check whether the application methods can be executed so that the control flow enters vulnerable methods. Finally, dynamic analysis is used to also collect actual execution information from the unit or integration tests, in order to enrich the data collected during the static analysis.
The following screenshots are taken from Steady’s user interface and illustrate the drill down at the example of vulnerability CVE-2018-1000632 in constructor org.dom4j.Namespace(String,String) in dom4j 1.6.1. Figure 2 shows that the application depends on components with vulnerable code, whereby the red paws indicate that this code is reachable both statically and dynamically.
Figure 3 shows the reachability information on the level of individual methods, and Figure 4 shows the actual call path from application methods (in green) to the vulnerable constructor org.dom4j.Namespace (in red).
Once vulnerabilities are detected, mitigation steps need to be taken. The most common and preferable approach is to update vulnerable dependencies to a fixed/corrected version. Though it sounds as easy as changing a version number, it comes at a cost as newer non-vulnerable versions may include breaking changes, requiring a considerable migration effort. Eclipse Steady supports developers in migrating application dependencies to fixed versions by offering update metrics that helps in estimating the effort. As an example, it provides the number of calls from application to library code that need to be modified or the number of (reachable) methods that exists in identical form in both versions in order to estimate the likelihood of regression.
No doubt, the management of vulnerabilities in open source is challenging, considering the wide-spread use of open source, the frequency of disclosures, shortened response windows and all those nitty-gritty technicalities.
With Eclipse Steady and Project KB, SAP promotes open source solutions to address this industry-wide problem. Their code-centric approach has the potential to both increase the detection accuracy and to support advanced assessment and mitigation features.
Join us in our efforts to make the consumption of open source more secure – as users and/or contributors.