An Empirical Analysis of Input Validation Mechanisms in Web Applications and Languages
Security researchers from the SAP Security & Trust Research Practice attended the 27th Symposium On Applied Computing that was held in Riva del Garda, Italy. At this symposium, we presented several scientific publications including our study on Input Validation Mechanisms in Web Applications and Languages which was accepted at the security track of this symposium. In this study, we aim to give an answer to the question how well input validation in isolation performs in mitigating common web vulnerabilities such as cross-site scripting and SQL injection.
Input Validation vs. Output Sanitization
One class of defense techniques against vulnerabilities in web applications is known as input validation. Input validation essentially involves checking whether user input to an application respects a specification of legitimate values. Examples of input validation would be checking that the value of a parameter to a web application is indeed of type integer or a URL supplied to a web application conforms to the URL specification.
Input validation has the broader goal of program correctness rather than preventing specific classes of attacks. However, input validation is often recommended as first line of defense against common classes of web-related vulnerabilities such as cross-site scripting and SQL injection.
An alternative defense mechanism against the introduction of those vulnerabilities in web applications is the use of output sanitization. This mechanism is preferred over input validation by security experts as it provides a high degree of assurance that vulnerabilities are indeed mitigated. This is because if a sanitizer is applied on data immediately prior to its inclusion in a SQL query or web document, then the protection system’s on the data is identical to the real system’s view.
Input validation has clear drawbacks compared to output sanitization. First, input validators might misclassify input as benign. Second, input might undergo arbitrary transformations as part of subsequent application processing prior to being output in a SQL query or a web document making input validation ineffective.
Compared to output sanitization, input validation has some advantages as well. Output sanitization requires that the right sanitization function is applied on a piece of untrusted data prior to its use in output such as a URL, a web document or a SQL query. Operational experience has shown that developers are not particularly good at this. Input validation, on the other hand, requires that validation functions are applied on a relative small number of inputs. Therefore, it is much simpler to achieve complete coverage with input validation than with output sanitization.
Empirical Web Security
So, we have Input Validation as technique for mitigating vulnerabilities in applications. This technique is imprecise but easy to apply. Then one question we can ask is: how well does it actually perform?
To answer this question, we performed an analysis on a large number of vulnerability reports. These reports were drawn from the Common Vulnerabilities and Exposures (CVE) dataset hosted by MITRE. Each report describes a known vulnerability in an application. Each report describes the name and version of the application and – in many cases – also the vector which can be used to exploit the vulnerability.
For 20 applications with highest incidence of cross-site scripting vulnerabilities and 20 applications with the highest incidence of SQL injection vulnerabilities, we manually determined:
- The exact means to exploit a vulnerability.
- The data type of the input vector that can be used to exploit the vulnerability.
As not all information is available in the vulnerability report, we manually linked the vulnerability reports to the vulnerable source code. We repeated this process for 895 vulnerability reports describing cross-site scripting and SQL injection flaws.
In total, we were able to determine the datatype of 270 parameters corresponding to cross-site scripting attack vectors and 248 parameters for SQL injection attack vectors. First, we found that 35 % of the cross-site scripting attack vectors correspond to a simple data type such as a Boolean or a number. Thus, 35 % of the cross-site scripting vulnerabilities could have been prevented by rigorously validating input using simple data types. While input validation in isolation is clearly not able to prevent all cross-site scripting vulnerabilities, it is still a significant improvement which could even be bigger if input would be validated based on more complex types such as a username, URL or search query. More surprisingly, 68 % of the SQL injection vulnerabilities from out dataset could have been prevented by rigorously validating input by applying the same set of validators.
|Data Types of XSS attack vectors|
|Data Types of SQL injection attack vectors|
Conclusion and Future Work
To conclude, the security of web applications could be significantly improved by rigorously performing input validation. The technique is imprecise but performs quite well in practice. Perhaps the real problem is that developers are not very good at applying validators on every possible entry point of an application. Our follow-up work, which will be presented soon at the IEEE COMPSAC 2012 conference in Izmir, Turkey, will address this problem. I hope to blog about this work in the coming weeks. In the meantime, I encourage you to read the paper for more details about our empirical study.
The work leading to these research results has been a collaborative effort between Prof. Engin Kirda and Prof. William Robertson from Northeastern University (Boston, United States), Prof. Davide Balzarotti from Institute Eurecom (Sophia-Antipolis, France) and the author of this article who is employed as a Research Engineer by SAP Research (Sophia-Antipolis, France).