Code-level vulnerability analysis with Eclipse Steady
Importing open-source libraries in your project allows you to focus on the novel parts of your work while relying on free community-developed components for all the rest. No piece of software, however, is free from defects, and even the most mature open-source components make no exception. When incorporating a component, you are not only reaping the benefits of a ready-to-use implementation, but also the risks posed by its vulnerabilities, which could easily become your project’s vulnerabilities. In this context, the use of a Software Composition Analysis (SCA) tool can help manage the risk by discovering all direct and indirect component dependencies.
In the past few years, a number of such tools have emerged, and Eclipse Steady https://github.com/eclipse/steady) is quite unique among them. Developed by SAP Security Research, Eclipse Steady provides extremely accurate detection of vulnerable dependencies, state-of-the-art reachability analysis through a novel combination of static and dynamic, and smart support for dependency updates. Not only Steady is at the forefront of what SCA tools can do, it is also free and open source itself. The reason why it is so accurate is because it uses the knowledge of how each vulnerability is fixed at the source code level. This information is found in the source code repositories of the affected open-source components and is extracted by mining the so-called fix-commits.
While operating Eclipse Steady at SAP, serving thousands of developers executing over 250k scans per month, we spent a substantial amount of time mining source code repositories and curating a knowledge base of fix-commits. Such information is the fuel of Eclipse Steady, and it needs to be continuously harvested, through considerable effort and expert knowledge.
Given the increasing size of open-source ecosystems and the pace at which new vulnerabilities are discovered, the current human-intensive approach is not adequate and cannot scale.
The growing interest in SCA tools resulted in a new market to emerge with several commercial offerings, each of which has its own proprietary vulnerability knowledge base.
The need for fine-grained (code-level) vulnerability data
The data about the vulnerabilities that affect open-source software (OSS) are often scattered across different sources and therefore difficult to obtain: public vulnerability databases such as the NVD, project-specific issue trackers, websites publishing security advisories and proprietary databases such as those offered by vendors of SCA tools, such as Snyk.io (https://snyk.io/vuln), SourceClear (https://www.veracode.com/products/software-composition-analysis), WhiteSource (https://www.whitesourcesoftware.com/vulnerability-database/) and others.
This situation is unfortunate for two reasons: Firstly, this difficulty in obtaining vulnerability data hinders further development of new tools that could push the state of the art in vulnerability detection and mitigation. Secondly, the software industry is spending (wasting?) a considerable effort replicating the same activities in different organizations, each of which is searching for data and keeping it in its own silo, thus experiencing the same scalability and coverage issues (multiplied), but never addressing the problem at the root.
The paradox of proprietary databases of vulnerabilities affecting free open source software
The root of the problem is that the link between a vulnerability, as described in an advisory (for example in the National Vulnerability Database, NVD) and its fix in the source code repository of the affected project is not considered worth of preserving and is thus lost. With a lot of effort, it is then recovered by SCA vendors to become part of the many (proprietary) vulnerability databases that have emerged in the past few years.
The fact that data about open-source software are not open themselves is somewhat paradoxical. This paradox has the effect of forcing the industry to allocate resources to the wrong task (that is, mining data that should be readily available in the first place) while they would be better spent investigating innovative, more efficient ways to use those data to deliver more secure software.
We are convinced that vulnerability knowledge-bases about open-source should be open-source themselves and adopt the same community-oriented contribution and management model that governs the rest of the open-source ecosystem.
Project “KB”: the open-source way
To realize this vision, SAP Security Research released project “KB” (https://github.com/sap/project-kb), an initiative to promote a different way of collecting and publishing vulnerability data, based on a collaborative and distributed approach. Using a simple, machine- and human-readable format to represent vulnerability data, project “KB” captures essential information such as which commits in which repository fixed a given vulnerability.
This format is accompanied by a tool (kaybee) to create, publish and consume data from distinct independent sources. The tool allows users to aggregate data from different sources using customizable policies. For example, one can assign a higher priority to a particular source, or only accept sources that offer digitally-signed statements.
Together with the tool and the statement format, SAP Security Research released data for 700+ vulnerabilities (as of March 2020) and more will be published in the coming months.
The project is actively seeking contributions of different kind: vulnerability data (in the form of statements), code implementing new features, bug reports and/or fixes, user feedback, feature requests.
Project “KB” was presented at the EclipseCon 2020, during a talk titled “Vulnerability data about open- source software should be open too!” by Antonino Sabetta, Henrik Plate, and Serena E. Ponta. A recording of the talk and of the Q&A session is now available on YouTube:
Discover how SAP Security Research serves as a security thought leader at SAP, continuously transforming SAP by improving security.