Fosstars: a framework for defining ratings for open source projects
Open source components can extend application’s functionality, save time and reduce costs. The main criteria for selecting open source components are the desired functionality. However, there are other criteria such as project activity, popularity or security which are good to check. I am a security engineer at SAP. I support development teams to mitigate risks when using open source components. In Feb 2020, we open sourced a project that helps to assess open source projects in an automated manner. In this blog post, I talk about Fosstars, an open source Java-based framework, for defining ratings that help checking security, activity and other properties of open source projects. You will get a better understanding of how Fosstars works and why it can help you choose reliable open source components.
Why do we need ratings for open source projects?
When we start using an open source component in an application, we rely on the projects’ team and community who deliver new features, fix issues, write documentation and support users. There are many projects that regularly fix bugs, maintain good documentation and take great care of security. Unfortunately, it is not a secret that there are other projects that have a growing backlog, do not maintain documentation and do not take security seriously. Relying on such projects may be dangerous for several reasons: Important bugs may never get fixed, it is unlikely that new functionalities will be added, or security issues may remain unpatched.
Detecting risky open source projects can be tricky. Sometimes it is relatively easy to figure out whether an open source project is abandoned or not. For example, one can check how many commits and pull requests have been made in the last several months, or how often a new version is shipped. An obvious sign that a project is inactive is that no commits and no new versions have been implemented within a few years. But what if there have been some commits and a new version recently? Is the project reliable or is this a bad sign? Sometimes it is not easy to judge. Especially security is a parameter, which is extremely difficult to assess. How can one check whether an open source project takes good care of security risks? Which information is useful to execute an assessment? A security engineer can confidently answer these questions. However, the analysis may take long for those who don’t have a strong background in software security. Even then, they may not feel confident and would appreciate advice of an expert. This is exactly where Fosstars comes into play.
Defining a rating for open source projects
Fosstars offers a Java-based framework that helps to assess open source projects by defining various ratings. A rating describes a particular property of a project, for example, security. A domain expert determines what kind of data should be used for assessing an open source project, and how this data is used for calculating a score for the project. In other words, the expert defines data features and an algorithm that turns the data feature into a score. Now, users can use the defined rating to assess open source projects that they (want to) use in their applications.
What the Fosstars framework looks like
Let me explain the main terms that Fosstars uses for defining ratings: Data feature, scoring function, rating procedure.
A data feature (or just a feature) is a measurable characteristic of an open source project. A feature has a type and may have constraints. Here are several examples:
- The number of commits in the last month. It is a non-negative integer.
- CVSS scores for vulnerabilities. It is a float number from 0 to 10.
- If a project has a security policy. It is a Boolean and can be yes or no.
A scoring function is a procedure that takes several data features and produces a score from 0 to 10. The score describes a particular property of an open source project. The higher the score is, the better the property is implemented in the project. A scoring function may also take other scores as input.
Here are several examples of scores:
- The security testing scoring function describes how well security testing is implemented in an open source project. This scoring function may be based on data features that tell, which security tools are used in the project.
- The project activity scoring function describes how active an open source project is. This scoring function may be based on statistics from a code repository such as the number of commits and contributors in the last month.
Finally, a rating procedure is a combination of a scoring function, a set of labels and a function that maps a score to one of the labels. As an input, a rating procedure takes data features that are necessary to calculate a score. Next, it passes the data to a scoring function to produce this score. Finally, it passes the score to the label function to convert it to a label. In other words, a rating procedure interprets a score by translating it to a human-readable label. A score combined with a label together is called a rating.
For example, a security rating procedure for an open source project may be based on a security scoring function that assesses how well the project takes care of security. The rating procedure may then label it as “good” if a score is greater than 7, or as “bad” if otherwise.
Quality assurance for a rating
Once a rating procedure is defined, we have to make sure that it produces meaningful results. One of the usual ways to assure this is creating a test suite.
The Fosstars framework allows you to create test suites for both scoring functions and rating procedures. A test suite consists of several test vectors. A test vector includes three elements:
- A data set about an open source project that can be passed to the rating procedure.
- An expected range of a score that is calculated for the data set.
- An optional expected label that is calculated for the data set.
There are two main ways of defining test vectors:
- The first option is to create a test vector based on a real open source project. This way, we collect data about an existing open source project and set an expected score range as well as an expected label for the project.
- The second option is to create a test vector based on a hypothetical open source project. This way, we try to come up with a hypothetical open source project by manually defining a data set for it. Then, we set an expected score range and an expected label for this hypothetical project.
While building a test suite, the two strategies above can also be combined.
A test suite defines quality requirements for a rating procedure, making sure that it produces expected ratings for data sets that correspond to specific open source projects. A basic test suite should contain at least two test vectors to make sure that the rating procedure returns low scores for “bad” projects and high scores for “good” projects. One test vector addresses an open source project with a very low score, another a project with a high score. The more test vectors are defined, the better.
Currently, Fosstars provides a comprehensive security rating that helps identifying projects that can be a security risk for an application. You can already try it out. Fosstars offers a command-line tool to calculate a security rating for any project on GitHub. Or check out a report for some popular open source projects. Besides the security rating, we are planning to add other ratings. We are also working on a GitHub action and Maven plugin for integrating Fosstars ratings to CI/CD. Feel free to reach out to us if you have any questions. Feedback is also much appreciated!