Preventing Input Validation Vulnerabilities in Web Applications through Automated Type Analysis
Last month, researchers from the Security and Trust Research Practice attended the IEEE COMPSAC conference to present a paper on the IPAAS (Input PArameter Analysis System) project. This year, COMPSAC was held in Izmir, Turkey.
In a previous blog post, we reported about the results of an empirical study on Input Validation Mechanisms in Web Applications and Languages. In this study, we examined a large number of vulnerable web applications and the results suggested that the majority of SQL injection and a significant number of cross-site scripting vulnerabilities could have been prevented by rigorously validating input. Furthermore, this study suggested that one of the reasons of why critical web vulnerabilities such as cross-site scripting and SQL injection are still very common, is that developers are not particular good at rigorously validating input everywhere in web applications that there should be. The main goal of the IPAAS (Input PArameter Analysis System) project is to tackle this problem.
IPAAS is an approach to securing web applications against XSS and SQL injection attacks using input validation. The key insight behind IPAAS is to automatically and transparently augment otherwise insecure web application development environments with input validators that result in significant and tangible security improvements for real web applications.
An architectural overview of IPAAS is shown in the Figure above. IPAAS can be decomposed into three phases:
- Parameter Extraction. This is a data collection step in which a proxy server intercepts HTTP requests and responses exchanged between web client and web application during testing. For each intercepted request, the parameter key/value pairs are extracted and stored in a database. The HTML payload of intercepted HTTP responses is processed by an HTML parser. This parser extracts hyperlinks and forms that point to the web application under test. For each link, it parses the querystring and extracts and stores parameter key/value pairs in a similar way as for HTTP requests. For HTML forms, the name of the input elements are extracted and stored.
- Parameter Analysis. The goal of this phase is to label each parameter with a data type based on the observed values for that parameter resulting into input validation policies. The labeling process is performed by an analysis engine that applies a set of validators to the test inputs. Validators are functions that check whether a value meets a particular set of constraints denoting a type. For each parameter, the analysis engine first initializes a score vector with a length equal to the number of validators or data types. Then, it determines the data type by passing all the observed values for a particular parameter to the all the validators. If a validator accepts a value, the score corresponding to that validator is incremented. After all the values have been passed to all the different validators, the validator or data type with the highest score is chosen. Input validation policies are composed for each resource by grouping the pairs of parameter and types per resource. The parameter analysis phase is augmented with static analysis to find the parameters and application resources that were missed due to insufficient training data.
- Runtime Enforcement. After deploying a web application and the corresponding input validation policies that have been learned, IPAAS intercepts at runtime incoming HTTP requests. It checks whether the payload of each request complies with the input validation policy. If the request does not meet the constraints specified in the input validation policy, IPAAS drops the request. Otherwise, it continues execution of the request.
To evaluate our approach, we implemented a prototype of IPAAS for protecting PHP web applications and assessed the effectiveness on five real-world web applications containing known SQL injection and cross-site scripting vulnerabilities. In order to create input validation policies, IPAAS requires a training set of benign requests submitted to the web application. We collected this input data by manually exercising the web application and providing valid data for each parameter. Then, we ran IPAAS to automatically determine the data types of the input parameters to the web application and to create input validation policies. The input validation policies were deployed on the web application runtime environment. Finally, we experimented whether it was still possible to exploit SQL injection and cross-site scripting vulnerabilities in the web application while the runtime enforcement component was enabled. During this experiment, we explored different ways to perform the attacks, and to evade possible sanitization and validation routines as reported by XSS and SQL cheatsheets available on the Internet. The outcome of this experiment is that IPAAS is able to prevent the exploitation of 65 % of the cross-site scripting and 83 % of the SQL injection vulnerabilities.
In this blog post, I presented the IPAAS approach, which improves the secure development of web applications by transparently learning types for web application parameters during testing, and automatically applying robust validators for these parameters at runtime. The evaluation of IPAAS confirms the results of our empirical study that a large number of SQL injection and cross-site scripting vulnerabilities could be prevented if input is rigorously validated.
The work leading to these research results has been a collaborative effort between Prof. Engin Kirda and Prof. William Robertson from Northeastern University (Boston, United States), Prof. Davide Balzarotti from Institute Eurecom (Sophia-Antipolis, France) and the author of this article who is employed as a Research Engineer by SAP Research (Sophia-Antipolis, France).