State Aware SAX Parser

The purpose of this document is to describe a technique to simplify the development of SAX parsers.

Introduction

One of the difficulties in writing SAX parsers is that, unlike DOM, SAX is not state aware and it is the applications responsibility to track its own state. This is made more difficult by the nature of the events that SAX reports. For example, the startElement method is triggered for every element, whereas the developer usually wants to know when a particular element starts. For character data the situation is even worse as the characters method is triggered for any character data and the containing element is not reported – it is up to the application to know this.

The usual solution to these issues is for the application to somehow maintain its state; and this is where the problems start as there are numerous ways to do this and the code often quickly becomes unmanageable.

My technique is to extend the DefaultHandler with a new class that is state aware and to provide a mechanism to register event listeners for each of the elements.

StateAwareHandler

This class extends the DefaultHandler using a stack to record its state and a list of handlers that listen for events.

package name.paulshipley.Common.XML;

import java.util.ArrayDeque;
import java.util.ArrayList;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class StateAwareHandler extends DefaultHandler {
	private ArrayList taghandlerlist = new ArrayList();
	private ArrayDeque tagstack = new ArrayDeque();

	public void addTagHandler(TagHandler th) {
		this.taghandlerlist.add(th);
	}

	private TagHandler getTagHandler(String tagname) {
		TagHandler th = null;

		for (TagHandler t : this.taghandlerlist) {
			if (t.tagName().equals(tagname)) {
				th = t;
			}
		}

		return th;
	}

	public void startElement(String namespaceURI, // namespace
			String sName, // simple name
			String qName, // qualified name
			Attributes attrs) throws SAXException {

		String eName = (sName.equals("")) ? qName : sName;

		TagHandler th = getTagHandler(eName);
		if (th != null) {
			tagstack.addFirst(th);

			// create a collection of attributes to be used by the Tag Handler
			th.clearAttrib();
			if (attrs != null) {
				for (int i = 0; i < attrs.getLength(); i++) {
					th.setAttrib(attrs.getLocalName(i), attrs.getValue(i));
				}
			}

			th.start();
		}
	}

	public void characters(char[] buf, int offset, int len) throws SAXException {
		String s = new String(buf, offset, len);

		if (!tagstack.isEmpty()) {
			TagHandler th = tagstack.peekFirst();
			th.characters(s);
		}
	}

	public void endElement(String namespaceURI, String sName, // simple name
			String qName // qualified name
	) throws SAXException {
		String eName = (sName.equals("")) ? qName : sName;

		TagHandler th = getTagHandler(eName);
		if (th != null) {
			th.end();
			th.clearAttrib();
			tagstack.removeFirst();  // assuming that XML is well formed
		}
	}
}

TagHandler

This is an abstract class that provides the implementation framework for the event listeners called by the StateAwareParser. The tagName method must be overridden to define the element name that is being listened for. The start, characters and end methods are called when the current state matches the tagName, and can be overridden as required.

package name.paulshipley.Common.XML;

import java.util.HashMap;

import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;

public abstract class TagHandler {
	private HashMap attribs = new HashMap();

	abstract public String tagName();

	public final void clearAttrib() {
		attribs.clear();
	}

	public final void setAttrib(String attribName, String attribValue) {
		attribs.put(attribName, attribValue);
	}

	public final String getAttrib(String attribName) {
		return attribs.get(attribName);
	}

	public void start() {
		// Debug.println(tagName() + ".start()");
	}

	public void characters(String s) {
		// Debug.println(tagName() + ".characters("" + s.trim() + "")");
	}

	public void end() {
		// Debug.println(tagName() + ".end()");
	}
}

Example

The following example builds a list of items stocked in a online store. While this is a trivial example and would probably be better implemented using DOM anyway, it illustrates the use of these classes.

XML

<?xml version="1.0" encoding="iso-8859-1"?>
<store xmlns="https://paulshipley.id.au/store" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="https://paulshipley.id.au/store store.xsd">
	<item barcode="12345" colour="red">First item</item>
	<item barcode="67890" colour="blue">Second item</item>
	<item barcode="09876" colour="green">Third item</item>
</store>

StoreParser

package name.paulshipley.example;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;

import javax.xml.XMLConstants;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.sax.SAXResult;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;

import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import name.paulshipley.Common.ExceptionHandler;
import name.paulshipley.Common.XML.StateAwareHandler;
import name.paulshipley.Common.XML.TagHandler;

public class StoreParser {
	private ArrayList itemlist = new ArrayList();

	private class Item {
		private String barcode;
		private String colour;
		private String description;

		public String getBarcode() {
			return barcode;
		}

		public void setBarcode(String barcode) {
			this.barcode = barcode;
		}

		public String getColour() {
			return colour;
		}

		public void setColour(String colour) {
			this.colour = colour;
		}

		public String getDescription() {
			return description;
		}

		public void setDescription(String description) {
			this.description = description;
		}

		public String toString() {
			return "Item [barcode=" + barcode + ", colour=" + colour
					+ ", description=" + description + "]";
		}
	}

	/**
	 * StoreTag class handles the <store> elements.
	 */
	private class StoreTag extends TagHandler {
		public String tagName() {
			return "store";
		}

		public void start() {
			System.out.println("Start store");
		}

		public void end() {
			System.out.println("End store");
		}
	}

	/**
	 * ItemTag class handles the <item> elements.
	 */
	private class ItemTag extends TagHandler {
		Item i;
		StringBuffer description;

		public String tagName() {
			return "item";
		}

		public void start() {
			i = new Item();
			i.setBarcode(this.getAttrib("barcode"));
			i.setColour(this.getAttrib("colour"));
			description = new StringBuffer();
		}

		public void characters(String s) {
			description.append(s.trim());
		}

		public void end() {
			i.setDescription(description.toString());
			itemlist.add(i);
			System.out.println("Item: " + i.getDescription());
		}
	}

	public void parse() throws ParserConfigurationException, SAXException,
			IOException {
		// Configure the SAX event handler
		StateAwareHandler handler = new StateAwareHandler();
		handler.addTagHandler(new StoreParser.StoreTag());
		handler.addTagHandler(new StoreParser.ItemTag());

		// Define validation
		SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
		StreamSource xsd = new StreamSource(
					StoreParser.class.getResourceAsStream("store.xsd"));
		Schema schema = sf.newSchema(xsd);
		Validator validator = schema.newValidator();

		// Define input as xml file
		InputSource is = new InputSource(StoreParser.class.getResourceAsStream("store.xml"));
		SAXSource saxIn = new SAXSource(is);

		// Output to my handler
		SAXResult saxOut = new SAXResult(handler);

		// Parse and validate
		validator.validate(saxIn, saxOut);
	}

	public static void main(String[] args) {
		try {
			// Create a new parser and parse the file
			StoreParser sp = new StoreParser();
			sp.parse();

			// Output the Item list
			System.out.println("Item List");
			Iterator iit = sp.itemlist.iterator();
			while (iit.hasNext()) {
				Item i = iit.next();
				System.out.println(i.toString());
			}
		} catch (Exception e) {
			ExceptionHandler.handleAndTerminate(e);
		}

		System.exit(0);
	}

}


>java -jar StoreParser.jar

Start store
Item: First item
Item: Second item
Item: Third item
End store
Item List
Item [barcode=12345, colour=red, description=First item]
Item [barcode=67890, colour=blue, description=Second item]
Item [barcode=09876, colour=green, description=Third item]

>

Resources

The full source code, including javadoc, of these classes is available in the StoreParser.jar file on my web site.

Website

https://paulshipley.id.au

Reference

Tip: Set up a SAX ContentHandler http://www.ibm.com/developerworks/library/x-tiphandl.html

Tip: Elements and text in ContentHandler http://www.ibm.com/developerworks/xml/library/x-tiphndl3.html

Simplify document handler programs with the SAX parser http://www.ibm.com/developerworks/xml/library/x-dochan.html

SAX, the power API https://www.ibm.com/developerworks/xml/library/x-saxapi/

About the author

Paul Shipley has been a software developer for over twenty years and has done a bit of just about everything. He is a co-author of Photoshop Elements 2: Zero to Hero (ISBN 1 904344 23 2).

— End of Document —

174 views

Need help? Let me take care of your IT issues.

Share this page

Scroll to Top