Introduction to StAX
StAX, the Streaming API for XML, is a new API for pull-parsing of XML, developed under the Java Community Process as JSR 173. This blog gives an introduction to this API, which combines the efficiency of SAX with the ease of use of tree-based APIs.
Most of the XML parsers fall into two broad categories: tree based (e.g., DOM) or event based (e.g., SAX). Although StAX is more closely aligned with the latter, it bridges the gap between the two. In SAX, data is pushed via events to application code handlers. In StAX, the application “pulls” the data from the XML data stream at its convenience. Application code can filter, skip tags, or stop parsing at any time. The application–not the parser–is in control, which enables a more intuitive way to process data.
StAX API
The StAX API exposes methods for iterative, event-based processing of XML documents. XML documents are treated as a filtered series of events, and infoset states can be stored in a procedural fashion. Moreover, unlike SAX, the StAX API is bidirectional, enabling both reading and writing of XML documents.
The StAX API is really two distinct API sets: a cursor API and an iterator API.
Cursor API
As the name implies, the StAX cursor API represents a cursor with which you can walk an XML document from beginning to end. This cursor can point to one thing at a time, and always moves forward, never backward, usually one infoset element at a time.
The two main cursor interfaces are XMLStreamReader and XMLStreamWriter.
XMLStreamReader
An Instance of XMLStreamReader is used to read the XML Content. The StAX API provides XMLInputFactory to create an instance of XMLStreamReader.
By Calling the next method of XMLStreamReader, it emits one of the following events:
Depending on the event, one can get more information by calling other corresponding methods appropriate to the event. For example, if the START_ELEMENT event is thrown, then calling getLocalName() will return the local name of the element.
XMLStreamWriter
An Instance of XMLStreamWriter is used to write the XML content to output. The StAX API provides XMLOutputFactory to create an instance of XMLStreamWriter
Then this writer can be used to write events. For example:
To write a start element: writer.writeStartElement(“Name”)
To write an end element: writer.writeEndElement()
To write a comment: writer.writeComment(“This is a comment”)
Iterator API
The StAX iterator API represents an XML document stream as a set of discrete event objects. These events are pulled by the application and provided by the parser in the order in which they are read in the source XML document.
The base iterator interface is called XMLEvent, and there are many other subinterfaces for each event type.The primary parser interface for reading iterator events is XMLEventReader, and the primary interface for writing iterator events is XMLEventWriter. The XMLEventReader interface contains five methods, the most important of which is nextEvent(), which returns the next event in an XML stream. XMLEventReader implements java.util.Iterator, which means that returns from
XMLEventReader can be cached or passed into routines that can work with the standard Java Iterator; for example:
Similarly, on the output side of the iterator API, you have:
Summing up, StAX XML Processing gives more control to the client application than to the parser, enabling much faster and more memory-efficient processing.
As far as I can see, this API is available in Java EE 5 (see http://java.sun.com/javaee/5/docs/tutorial/doc/StAX.html)
Do you have any implementations of this API for J2SE (primarily 1.4) which you could recommend?
Found a couple on my own
http://www.extreme.indiana.edu/xgws/xsoap/xpp/mxp1/index.html
http://xerces.apache.org/xerces2-j/xni-config.html