Piping as a form of programming
Authors: Michael Fichter and Florian Urmetzer
In a wider study we have had a look at piping as a form of programming. This form of connecting different visualized information resources together is very dominant when it comes to the world of SOA and Mashups. In other blogs the FAST project has been introduced already (http://fast-fp7project.morfeo-project.org). In this blog I would like to introduce the work of Michael Fichter, who worked in collaboration with me in this area. The focuses here are lightweight resource composition styles for mashups, which in the following is referred to as piping.
In this Blog first, the concept of piping is defined by describing its origins and introducing its use in the context of mashups. Moreover, an outline of piping compared to other approaches that allow for the expression of data combination logic is drawn. After that a piping stack is presented that allows classifying existing piping-related literature. For each layer existing approaches and alternatives are described. Based on this, comprehensive approaches for piping are identified. The blog concludes with an overview of current existing tools and languages supporting the integration of data with pipes.
Concept of Piping
In computer science, piping relates to data processing pipelines that process data sequentially in successive data processing steps. The architectural style for pipelines is known as pipes and filters. Filters are individual processing steps that are chained together through pipes.
” As shown in Figure 1, the processing functionality of a filter is implemented by a component whereas a pipe links components. This means that the output of Component 1 is the input of Component 2. Components all use the same external interface. Hence, they can be composed in different ways. Connecting components to different pipes results in different pipelines. New components can be added, existing ones can be omitted, or they can be rearranged into other sequences without the need to change the components themselves.
The pipes and filters style focuses on data flows. Data flows through the filters and is processed. For example after Component 1 finishes processing data and passes the outputs to Component 2, another message can already be processed by Component 1. Moreover, as Component 3 and 4 in Figure 1 illustrate, tasks can be processed in parallel which allows increasing the system throughput (Hohpe and Woolf, 2008, p. 74).
Because complex tasks can be broken down into small processing steps implemented by filters, this architectural style reduces complexity and increases flexibility. Components are decoupled and can be exchanged easily. Furthermore, components can be tested separately. Additionally, the results of processing steps or the whole processing pipeline could be cached, reused, and shared across multiple processing threads.
Drawbacks of the pipes and filters style can occur when the number of filters of a system is large. Then error handling can be difficult. Furthermore, the overhead, both for data transfer between the filters and the translation of the data from the application-internal format into the format of the piping infrastructure can be high.
The most popular example of a pipes and filters architecture are Unix shell scripts. There, the outputs of one command can serve as the input of another command by connecting both with the Unix pipe symbol (|).
1 Piping and Mashups
The pipes and filters style is applied to mashup platforms to allow for the creation of data mashups by users. Users can configure and combine data flows of data services using flow composition tools that visualize data flows. Such tools help to facilitate users to allow for user-configurable data processing (Riabov et al., 2008).
Piping in mashup tools allows users to express data combination logic. Mashup tools, such as Yahoo! Pipes and IBM Damia, were early tools to provide components which implement operations like accessing sources, filtering input data, or merging feeds. Users can link such operators visually and thus configure a sequence of data flow pipelines and ideally integrate heterogeneous Web-based resources. There are authors that compared the procedural approach of creating a pipe with an explicit definition of a query execution plan. The main advantage of the piping approach is its simplicity. Therefore mashup tools, based on the piping approach, try to allow for users with no or minor programming skills to create mashups. Here one good example is the FAST GVS (Gadget Visual Storyboard). FAST is a EU sponsored research project and focuses on building a tool that allows the visual “programming” of mashups.
In the Video you can see how piping can be used to create an operation based on predefined interfaces and web based resources. The pipe defines a screen in the FAST platform, which has an input and an output. The in- and outputs interface to other screens and multiple screens form a process. The piping operation here is only for the use of the integration of information flow between the resources (e.g. web services) and the interfaces of the screens.
Hohpe, G. and Woolf, B. (2008). Enterprise integration patterns: Designing, building, and deploying messaging solutions. The Addison-Wesley signature series. Addison-Wesley, Boston, Mass., 12. print. edition.
Riabov, A. V., Boillet, E., Feblowitz, M. D., Liu, Z., and Ranganathan, A. (2008). Wishfulsearch: interactive composition of data mashups. In WWW ’08: Proceeding of the 17thinternational conference on World Wide Web, pages 775-784. ACM.