Additional Blogs by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member
0 Kudos

Certain time ago I came across the following entry in

Java Specialist Newsletter:

Fast Exceptions in RIFE . While I had no chance to play

with RIFE continuations

framework I just skip this entry assuming that such trick is anything

but practical in real life. However the real life is a more interesting

thing that we use to think about it...



Imaging the following task: you have to parse just several fields
from remotely located XML file of large size. More exactly, you have some
large RSS but you are interested in only in tiny subset of data
located somewhere at the beginning of file. Obviously, using DOM here
is not the best option while the whole content will be loaded remotely
and eat fair amount of memory. SAX parser is more appropriate here.
Sure, there is always an option for home-grown XML stream parser,
but it takes too much effort.



So SAX parsing is our starting point. Let us make the task more concrete
- we will parse the very latest post to SDN blogs. As far as RSS engines
order items in reverse chronological order (latest item goes first)
we have to just parse sub-elements of first item and skip the rest.
The code of handler looks like below.




class SDNLastPostHandler extends DefaultHandler {
String title, link, description;

private ParserState parserState;
final
private StringBuilder content = new StringBuilder();

private boolean done = false;

public void startDocument () {
parserState = ParserState.NONE;
title = link = description = null;
done = false;
}

@Override public void startElement (
final String uri,
final String localName,
final String qName,
final Attributes attributes) {

if (done) return;
switch (parserState) {
case NONE:
if ( isRdfRoot(uri, localName) ) parserState = ParserState.RDF;
break;
case RDF:
if ( isItem(uri, localName) ) parserState = ParserState.item;
break;
case item:
if ( "http://purl.org/rss/1.0/".equals(uri) ) {
final ParserState next = ParserState.valueOf(localName);
// Ignore elements we are not interested in
if ( null == next ) return;
parserState = next;
content.setLength(0);
}
}
}

@Override public void endElement (final String uri, final String localName, final String qName) {
if (done) return;
switch (parserState) {
case NONE:
case RDF:
break;
case item:
if ( isItem(uri, localName) ) done = true;
break;
case title:
title = content.toString(); parserState = ParserState.item;
break;
case link:
link = content.toString(); parserState = ParserState.item;
break;
case description:
description = content.toString(); parserState = ParserState.item;
break;
}
}

@Override public void characters (final char ch[], final int start, final int length) {
content.append(ch, start, length);
}

private static boolean isRdfRoot(final String uri, final String localName) {
return "RDF".equals(localName) && "http://www.w3.org/1999/02/22-rdf-syntax-ns#".equals(uri);
}

private static boolean isItem(final String uri, final String localName) {
return "item".equals(localName) && "http://purl.org/rss/1.0/".equals(uri);
}

static enum ParserState {NONE, RDF, item, title, link, description}
}





 

Notice the done flag processing: it is set true when we fully parsed

first &ltgitem> element; afterwards any other content is ignored.Note that we can easily distinguish between actual errors and jumps.

Also pay attention how our Jump class is defined. The most important point

here is that we suppress populating stack trace in overridden method -

namely, fillInStackTrace call takes a lion share of overhead when

creating and throwing exception. As far as our semi-exception carries no

stack trace information we may freely use singleton instance.



It's worth to mention that using exceptions for controlling flow is
something that should be used only under very specific circumstances and
with great care. Most importantly, strive to keep code that throws
such "jump" exception and handler as close to each other as possible -
as pair of outer/inner class or as classes within same package to
minimize impact on overall code clarity.

2 Comments