SAP Security Research steps up for the French-funded “Investments for the future”
“Investments for the future” is a French government-funded initiative which seeks to modernize and make the country more competitive while making it more attractive to investments and innovation. It focuses on strategic sectors including artificial intelligence (AI) for defense-security. The program works in three phases:
- Industry sponsors are invited to submit a technical challenge to be solved using AI
- An open call is issued for innovative companies to apply and propose a solution to the challenge
- The selected winner receives a grant to develop their approach and show their results
Within this program, we here at the Intelligent Code Analysis team of SAP Security Research proposed the VULN-AI Challenge, dedicated to finding vulnerabilities in source code.
At SAP Security Research, we have been tackling this problem for some time. The relevance for SAP is straight forward as the landscape has shifted from a mostly in-house developed software to a greater reliance of open-source libraries and components. While this practice helps increase productivity by focusing development on the core functions that distinguish SAP products from their competition, it comes with its own set of risks.
A key question arises, how can one securely integrate open-source components? Security-relevant changes (commits) are pushed to source code repositories of these components on a continuous basis. It is important to identify these changes in a timely manner. Changes that introduce security fixes are critical because they are precursors to releases that typically need to be adopted urgently.
We have tried different approaches to detect commits that fix vulnerabilities, including methods inspired by natural language processing applied to source code tokens or to the syntax trees obtained by parsing (commit2vec).
For the context of the VULN-AI Challenge, we provided a manually curated dataset of code changes (commits) that correspond to vulnerabilities affecting more than 200 industry-relevant Java open-source projects. The dataset is augmented with samples that correspond to non-vulnerability related code changes resulting in almost 2000 code change samples. Therefore, the challenge amounts to correctly classify the security-relevant from the non-security-relevant commits.
The company selected to tackle the challenge from the open call was Yagaan, a French deep tech start-up company established in Brittany, at the heart of the Cyber Ecosystem. The company is an application security software vendor which has developed an innovative approach to security applications, with the aim to overcome limitations of the state-of-the-art of static analysis. The singularity of their approach is based on an extensive analysis of the source code of an application to augment usual static analysis approach with knowledge about its code sequences semantics and their contextualisation. They provide innovative features such as contextual analysis of the code, individual qualification of each of the vulnerability warnings in terms of relevance (true/false positives) and criticality (CVSS Scoring), intuitive contextual remediation support and code querying. Yagaan proposed to make use of their code mining technology to try to meet the challenge. This approach was expected to offer a good opportunity to increase the state-of-the-art performance of open-source commits classification for vulnerability fixes.
Yagaan built upon our commit2vec method by additionally feeding it the semantic and contextual knowledge extracted from the source code. An intense experimentation phase took place to assess and fine tune the classification algorithm, such as identifying which information is relevant to feed the deep learning, what is the influence of different properties of the source code, etc.
Yagaan obtained great results: using their code mining approach to feed a deep learning network has improved classification performance by almost +10% compared to the commit2vec baseline, with a F1-Score as high as 82.6 !
These results confirm that using semantic and contextualised information extracted from the source code is a promising approach to vulnerability fix detection in open-source commits. Additionally, it opens new perspectives to support application security experts in their dependencies configuration management efforts to significantly reduce the risk of inadvertently importing security vulnerabilities.
Discover how SAP Security Research serves as a security thought leader at SAP, continuously transforming SAP by improving security.