The purpose of this document is to describe a technique to simplify the development of SAX parsers.
Introduction
One of the difficulties in writing SAX parsers is that, unlike DOM, SAX is not state aware and it is the applications responsibility to track its own state. This is made more difficult by the nature of the events that SAX reports. For example, the startElement method is triggered for every element, whereas the developer usually wants to know when a particular element starts. For character data the situation is even worse as the characters method is triggered for any character data and the containing element is not reported – it is up to the application to know this.
The usual solution to these issues is for the application to somehow maintain its state; and this is where the problems start as there are numerous ways to do this and the code often quickly becomes unmanageable.
My technique is to extend the DefaultHandler with a new class that is state aware and to provide a mechanism to register event listeners for each of the elements.
StateAwareHandler
This class extends the DefaultHandler using a stack to record its state and a list of handlers that listen for events.
package name.paulshipley.Common.XML; import java.util.ArrayDeque; import java.util.ArrayList; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; public class StateAwareHandler extends DefaultHandler { private ArrayList taghandlerlist = new ArrayList(); private ArrayDeque tagstack = new ArrayDeque(); public void addTagHandler(TagHandler th) { this.taghandlerlist.add(th); } private TagHandler getTagHandler(String tagname) { TagHandler th = null; for (TagHandler t : this.taghandlerlist) { if (t.tagName().equals(tagname)) { th = t; } } return th; } public void startElement(String namespaceURI, // namespace String sName, // simple name String qName, // qualified name Attributes attrs) throws SAXException { String eName = (sName.equals("")) ? qName : sName; TagHandler th = getTagHandler(eName); if (th != null) { tagstack.addFirst(th); // create a collection of attributes to be used by the Tag Handler th.clearAttrib(); if (attrs != null) { for (int i = 0; i < attrs.getLength(); i++) { th.setAttrib(attrs.getLocalName(i), attrs.getValue(i)); } } th.start(); } } public void characters(char[] buf, int offset, int len) throws SAXException { String s = new String(buf, offset, len); if (!tagstack.isEmpty()) { TagHandler th = tagstack.peekFirst(); th.characters(s); } } public void endElement(String namespaceURI, String sName, // simple name String qName // qualified name ) throws SAXException { String eName = (sName.equals("")) ? qName : sName; TagHandler th = getTagHandler(eName); if (th != null) { th.end(); th.clearAttrib(); tagstack.removeFirst(); // assuming that XML is well formed } } }
TagHandler
This is an abstract class that provides the implementation framework for the event listeners called by the StateAwareParser. The tagName method must be overridden to define the element name that is being listened for. The start, characters and end methods are called when the current state matches the tagName, and can be overridden as required.
package name.paulshipley.Common.XML; import java.util.HashMap; import org.xml.sax.Attributes; import org.xml.sax.helpers.DefaultHandler; public abstract class TagHandler { private HashMap attribs = new HashMap(); abstract public String tagName(); public final void clearAttrib() { attribs.clear(); } public final void setAttrib(String attribName, String attribValue) { attribs.put(attribName, attribValue); } public final String getAttrib(String attribName) { return attribs.get(attribName); } public void start() { // Debug.println(tagName() + ".start()"); } public void characters(String s) { // Debug.println(tagName() + ".characters("" + s.trim() + "")"); } public void end() { // Debug.println(tagName() + ".end()"); } }
Example
The following example builds a list of items stocked in a online store. While this is a trivial example and would probably be better implemented using DOM anyway, it illustrates the use of these classes.
XML
<?xml version="1.0" encoding="iso-8859-1"?> <store xmlns="https://paulshipley.id.au/store" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://paulshipley.id.au/store store.xsd"> <item barcode="12345" colour="red">First item</item> <item barcode="67890" colour="blue">Second item</item> <item barcode="09876" colour="green">Third item</item> </store>
StoreParser
package name.paulshipley.example; import java.io.IOException; import java.util.ArrayList; import java.util.Iterator; import javax.xml.XMLConstants; import javax.xml.parsers.ParserConfigurationException; import javax.xml.transform.sax.SAXResult; import javax.xml.transform.sax.SAXSource; import javax.xml.transform.stream.StreamSource; import javax.xml.validation.Schema; import javax.xml.validation.SchemaFactory; import javax.xml.validation.Validator; import org.xml.sax.InputSource; import org.xml.sax.SAXException; import name.paulshipley.Common.ExceptionHandler; import name.paulshipley.Common.XML.StateAwareHandler; import name.paulshipley.Common.XML.TagHandler; public class StoreParser { private ArrayList itemlist = new ArrayList(); private class Item { private String barcode; private String colour; private String description; public String getBarcode() { return barcode; } public void setBarcode(String barcode) { this.barcode = barcode; } public String getColour() { return colour; } public void setColour(String colour) { this.colour = colour; } public String getDescription() { return description; } public void setDescription(String description) { this.description = description; } public String toString() { return "Item [barcode=" + barcode + ", colour=" + colour + ", description=" + description + "]"; } } /** * StoreTag class handles the <store> elements. */ private class StoreTag extends TagHandler { public String tagName() { return "store"; } public void start() { System.out.println("Start store"); } public void end() { System.out.println("End store"); } } /** * ItemTag class handles the <item> elements. */ private class ItemTag extends TagHandler { Item i; StringBuffer description; public String tagName() { return "item"; } public void start() { i = new Item(); i.setBarcode(this.getAttrib("barcode")); i.setColour(this.getAttrib("colour")); description = new StringBuffer(); } public void characters(String s) { description.append(s.trim()); } public void end() { i.setDescription(description.toString()); itemlist.add(i); System.out.println("Item: " + i.getDescription()); } } public void parse() throws ParserConfigurationException, SAXException, IOException { // Configure the SAX event handler StateAwareHandler handler = new StateAwareHandler(); handler.addTagHandler(new StoreParser.StoreTag()); handler.addTagHandler(new StoreParser.ItemTag()); // Define validation SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI); StreamSource xsd = new StreamSource( StoreParser.class.getResourceAsStream("store.xsd")); Schema schema = sf.newSchema(xsd); Validator validator = schema.newValidator(); // Define input as xml file InputSource is = new InputSource(StoreParser.class.getResourceAsStream("store.xml")); SAXSource saxIn = new SAXSource(is); // Output to my handler SAXResult saxOut = new SAXResult(handler); // Parse and validate validator.validate(saxIn, saxOut); } public static void main(String[] args) { try { // Create a new parser and parse the file StoreParser sp = new StoreParser(); sp.parse(); // Output the Item list System.out.println("Item List"); Iterator iit = sp.itemlist.iterator(); while (iit.hasNext()) { Item i = iit.next(); System.out.println(i.toString()); } } catch (Exception e) { ExceptionHandler.handleAndTerminate(e); } System.exit(0); } } >java -jar StoreParser.jar Start store Item: First item Item: Second item Item: Third item End store Item List Item [barcode=12345, colour=red, description=First item] Item [barcode=67890, colour=blue, description=Second item] Item [barcode=09876, colour=green, description=Third item] >
Resources
The full source code, including javadoc, of these classes is available in the StoreParser.jar file on my web site.
Website
Reference
Tip: Set up a SAX ContentHandler http://www.ibm.com/developerworks/library/x-tiphandl.html
Tip: Elements and text in ContentHandler http://www.ibm.com/developerworks/xml/library/x-tiphndl3.html
Simplify document handler programs with the SAX parser http://www.ibm.com/developerworks/xml/library/x-dochan.html
SAX, the power API https://www.ibm.com/developerworks/xml/library/x-saxapi/
About the author
Paul Shipley has been a software developer for over twenty years and has done a bit of just about everything. He is a co-author of Photoshop Elements 2: Zero to Hero (ISBN 1 904344 23 2).
— End of Document —