Package org.apache.commons.digester

The Digester package provides for rules-based processing of arbitrary XML documents.

See:
          Description

Interface Summary
ObjectCreationFactory Interface for use with FactoryCreateRule.
Rules Public interface defining a collection of Rule instances (and corresponding matching patterns) plus an implementation of a matching policy that selects the rules that match a particular pattern of nested elements discovered during parsing.
RuleSet Public interface defining a shorthand means of configuring a complete set of related Rule definitions, possibly associated with a particular namespace URI, in one operation.
 

Class Summary
AbstractObjectCreationFactory Abstract base class for ObjectCreationFactory implementations.
AbstractRulesImpl AbstractRuleImpl provides basic services for Rules implementations.
BeanPropertySetterRule Rule implements sets a bean property on the top object to the body text.
CallMethodRule Rule implementation that calls a method on an object on the stack (normally the top/parent object), passing arguments collected from subsequent CallParamRule rules or from the body of this element.
CallParamRule Rule implementation that saves a parameter for use by a surrounding CallMethodRule.
Digester A Digester processes an XML input stream by matching a series of element nesting patterns to execute Rules that have been added prior to the start of parsing.
ExtendedBaseRules Extension of RulesBase for complex schema.
FactoryCreateRule Rule implementation that uses an ObjectCreationFactory to create a new object which it pushes onto the object stack.
NodeCreateRule A rule implementation that creates a DOM Node containing the XML at the element that matched the rule.
ObjectCreateRule Rule implementation that creates a new object and pushes it onto the object stack.
ObjectParamRule Rule implementation that saves a parameter for use by a surrounding CallMethodRule.
ParserFeatureSetterFactory Creates a SAXParser based on the underlying parser.
PathCallParamRule Rule implementation that saves a parameter containing the Digester matching path for use by a surrounding CallMethodRule.
RegexMatcher Regular expression matching strategy for RegexRules.
RegexRules Rules implementation that uses regular expression matching for paths.
Rule Concrete implementations of this class implement actions to be taken when a corresponding nested pattern of XML elements has been matched.
RulesBase Default implementation of the Rules interface that supports the standard rule matching behavior.
RuleSetBase Convenience base class that implements the RuleSet interface.
SetNestedPropertiesRule Rule implementation that sets properties on the object at the top of the stack, based on child elements with names matching properties on that object.
SetNextRule Rule implementation that calls a method on the (top-1) (parent) object, passing the top object (child) as an argument.
SetPropertiesRule Rule implementation that sets properties on the object at the top of the stack, based on attributes with corresponding names.
SetPropertyRule Rule implementation that sets an individual property on the object at the top of the stack, based on attributes with specified names.
SetRootRule Rule implementation that calls a method on the root object on the stack, passing the top object (child) as an argument.
SetTopRule Rule implementation that calls a "set parent" method on the top (child) object, passing the (top-1) (parent) object as an argument.
SimpleRegexMatcher Simple regex pattern matching algorithm.
Substitutor (Logical) Interface for substitution strategies.
WithDefaultsRulesWrapper Rules Decorator that returns default rules when no matches are returned by the wrapped implementation.
 

Package org.apache.commons.digester Description

The Digester package provides for rules-based processing of arbitrary XML documents.

[Dependencies] [Introduction] [Configuration Properties] [The Object Stack] [Element Matching Patterns] [Processing Rules] [Logging] [Usage Example] [Namespace Aware Parsing] [Pluggable Rules Processing] [Encapsulated Rule Sets] [Using Named Stacks For Inter-Rule Communication] [Registering DTDs] [Troubleshooting] [FAQ] [Extensions] [Known Limitations]

External Dependencies

The Digester component is dependent upon implementations of the following standard libraries:

It is also dependent on a compatible set of Apache Commons library components:

Compatible Dependency Sets
Digester+Logging 1.0.x+BeanUtils 1.x+Collections 2.x
Digester+Logging 1.0.x+BeanUtils 1.x+Collections 3.x
Digester+Logging 1.0.x+BeanUtils 1.7-

Introduction

In many application environments that deal with XML-formatted data, it is useful to be able to process an XML document in an "event driven" manner, where particular Java objects are created (or methods of existing objects are invoked) when particular patterns of nested XML elements have been recognized. Developers familiar with the Simple API for XML Parsing (SAX) approach to processing XML documents will recognize that the Digester provides a higher level, more developer-friendly interface to SAX events, because most of the details of navigating the XML element hierarchy are hidden -- allowing the developer to focus on the processing to be performed.

In order to use a Digester, the following basic steps are required:

Alternatively a Digester may be used as a sax event hander, as follows:

For example code, see the usage examples, and the FAQ .

Digester Configuration Properties

A org.apache.commons.digester.Digester instance contains several configuration properties that can be used to customize its operation. These properties must be configured before you call one of the parse() variants, in order for them to take effect on that parse.

Property Description
classLoader You can optionally specify the class loader that will be used to load classes when required by the ObjectCreateRule and FactoryCreateRule rules. If not specified, application classes will be loaded from the thread's context class loader (if the useContextClassLoader property is set to true) or the same class loader that was used to load the Digester class itself.
errorHandler You can optionally specify a SAX ErrorHandler that is notified when parsing errors occur. By default, any parsing errors that are encountered are logged, but Digester will continue processing as well.
namespaceAware A boolean that is set to true to perform parsing in a manner that is aware of XML namespaces. Among other things, this setting affects how elements are matched to processing rules. See Namespace Aware Parsing for more information.
ruleNamespaceURI The public URI of the namespace for which all subsequently added rules are associated, or null for adding rules that are not associated with any namespace. See Namespace Aware Parsing for more information.
rules The Rules component that actually performs matching of Rule instances against the current element nesting pattern is pluggable. By default, Digester includes a Rules implementation that behaves as described in this document. See Pluggable Rules Processing for more information.
useContextClassLoader A boolean that is set to true if you want application classes required by FactoryCreateRule and ObjectCreateRule to be loaded from the context class loader of the current thread. By default, classes will be loaded from the class loader that loaded this Digester class. NOTE - This property is ignored if you set a value for the classLoader property; that class loader will be used unconditionally.
validating A boolean that is set to true if you wish to validate the XML document against a Document Type Definition (DTD) that is specified in its DOCTYPE declaration. The default value of false requests a parse that only detects "well formed" XML documents, rather than "valid" ones.

In addition to the scalar properties defined above, you can also register a local copy of a Document Type Definition (DTD) that is referenced in a DOCTYPE declaration. Such a registration tells the XML parser that, whenever it encounters a DOCTYPE declaration with the specified public identifier, it should utilize the actual DTD content at the registered system identifier (a URL), rather than the one in the DOCTYPE declaration.

For example, the Struts framework controller servlet uses the following registration in order to tell Struts to use a local copy of the DTD for the Struts configuration file. This allows usage of Struts in environments that are not connected to the Internet, and speeds up processing even at Internet connected sites (because it avoids the need to go across the network).

    URL url = new URL("/org/apache/struts/resources/struts-config_1_0.dtd");
    digester.register
      ("-//Apache Software Foundation//DTD Struts Configuration 1.0//EN",
       url.toString());

As a side note, the system identifier used in this example is the path that would be passed to java.lang.ClassLoader.getResource() or java.lang.ClassLoader.getResourceAsStream(). The actual DTD resource is loaded through the same class loader that loads all of the Struts classes -- typically from the struts.jar file.

The Object Stack

One very common use of org.apache.commons.digester.Digester technology is to dynamically construct a tree of Java objects, whose internal organization, as well as the details of property settings on these objects, are configured based on the contents of the XML document. In fact, the primary reason that the Digester package was created (it was originally part of Struts, and then moved to the Commons project because it was recognized as being generally useful) was to facilitate the way that the Struts controller servlet configures itself based on the contents of your application's struts-config.xml file.

To facilitate this usage, the Digester exposes a stack that can be manipulated by processing rules that are fired when element matching patterns are satisfied. The usual stack-related operations are made available, including the following:

A typical design pattern, then, is to fire a rule that creates a new object and pushes it on the stack when the beginning of a particular XML element is encountered. The object will remain there while the nested content of this element is processed, and it will be popped off when the end of the element is encountered. As we will see, the standard "object create" processing rule supports exactly this functionalility in a very convenient way.

Several potential issues with this design pattern are addressed by other features of the Digester functionality:

Element Matching Patterns

A primary feature of the org.apache.commons.digester.Digester parser is that the Digester automatically navigates the element hierarchy of the XML document you are parsing for you, without requiring any developer attention to this process. Instead, you focus on deciding what functions you would like to have performed whenver a certain arrangement of nested elements is encountered in the XML document being parsed. The mechanism for specifying such arrangements are called element matching patterns.

The Digester can be configured to use different pattern-matching algorithms via the Digester.setRules method. However for the vast majority of cases, the default matching algorithm works fine. The default pattern matching behaviour is described below.

A very simple element matching pattern is a simple string like "a". This pattern is matched whenever an <a> top-level element is encountered in the XML document, no matter how many times it occurs. Note that nested <a> elements will not match this pattern -- we will describe means to support this kind of matching later.

The next step up in matching pattern complexity is "a/b". This pattern will be matched when a <b> element is found nested inside a top-level <a> element. Again, this match can occur as many times as desired, depending on the content of the XML document being parsed. You can use multiple slashes to define a hierarchy of any desired depth that will be matched appropriately.

For example, assume you have registered processing rules that match patterns "a", "a/b", and "a/b/c". For an input XML document with the following contents, the indicated patterns will be matched when the corresponding element is parsed:

  <a>         -- Matches pattern "a"
    <b>       -- Matches pattern "a/b"
      <c/>    -- Matches pattern "a/b/c"
      <c/>    -- Matches pattern "a/b/c"
    </b>
    <b>       -- Matches pattern "a/b"
      <c/>    -- Matches pattern "a/b/c"
      <c/>    -- Matches pattern "a/b/c"
      <c/>    -- Matches pattern "a/b/c"
    </b>
  </a>

It is also possible to match a particular XML element, no matter how it is nested (or not nested) in the XML document, by using the "*" wildcard character in your matching pattern strings. For example, an element matching pattern of "*/a" will match an <a> element at any nesting position within the document.

It is quite possible that, when a particular XML element is being parsed, the pattern for more than one registered processing rule will be matched because you registered more than one processing rule with the exact same matching pattern.

When this occurs, the corresponding processing rules will all be fired in order. Rule methods begin (and body) are executed in the order that the Rules were initially registered with the Digester, whilst end method calls are executed in reverse order. In other words - the order is first in, last out.

Note that wildcard patterns are ignored if an explicit match can be found (and when multiple wildcard patterns match, only the longest, ie most explicit, pattern is considered a match). The result is that rules can be added for "an tag anywhere", but then for that behaviour to be explicitly overridden for specific cases, eg "but not an that is a direct child of an ". Therefore if you have rules A and B registered for pattern "*/a" then want to add an additional rule C for pattern "x/a" only, then what you need to do is add *three* rules for "x/a": A, B and C. Note that by using:

  Rule ruleA = new ObjectCreateRule();
  Rule ruleB = new SetNextRule();
  Rule ruleC = new SetPropertiesRule();

  digester.addRule("*/a", ruleA);
  digester.addRule("*/a", ruleB);
  digester.addRule("x/a", ruleA);
  digester.addRule("x/a", ruleB);
  digester.addRule("x/a", ruleC);
you have associated the same rule instances A and B with multiple patterns, thus avoiding creating extra rule object instances.

Processing Rules

The previous section documented how you identify when you wish to have certain actions take place. The purpose of processing rules is to define what should happen when the patterns are matched.

Formally, a processing rule is a Java class that subclasses the org.apache.commons.digester.Rule interface. Each Rule implements one or more of the following event methods that are called at well-defined times when the matching patterns corresponding to this rule trigger it:

As you are configuring your digester, you can call the addRule() method to register a specific element matching pattern, along with an instance of a Rule class that will have its event handling methods called at the appropriate times, as described above. This mechanism allows you to create Rule implementation classes dynamically, to implement any desired application specific functionality.

In addition, a set of processing rule implementation classes are provided, which deal with many common programming scenarios. These classes include the following:

You can create instances of the standard Rule classes and register them by calling digester.addRule(), as described above. However, because their usage is so common, shorthand registration methods are defined for each of the standard rules, directly on the Digester class. For example, the following code sequence:

    Rule rule = new SetNextRule(digester, "addChild",
                                "com.mycompany.mypackage.MyChildClass");
    digester.addRule("a/b/c", rule);

can be replaced by:

    digester.addSetNext("a/b/c", "addChild",
                        "com.mycompany.mypackage.MyChildClass");

Logging

Logging is a vital tool for debugging Digester rulesets. Digester can log copious amounts of debugging information. So, you need to know how logging works before you start using Digester seriously.

Digester uses Apache Commons Logging. This component is not really a logging framework - rather an extensible, configurable bridge. It can be configured to swallow all log messages, to provide very basic logging by itself or to pass logging messages on to more sophisticated logging frameworks. Commons-Logging comes with connectors for many popular logging frameworks. Consult the commons-logging documentation for more information.

Two main logs are used by Digester:

Complete documentation of how to configure Commons-Logging can be found in the Commons Logging package documentation. However, as a simple example, let's assume that you want to use the SimpleLog implementation that is included in Commons-Logging, and set up Digester to log events from the Digester logger at the DEBUG level, while you want to log events from the Digester.log logger at the INFO level. You can accomplish this by creating a commons-logging.properties file in your classpath (or setting corresponding system properties on the command line that starts your application) with the following contents:

  org.apache.commons.logging.Log=org.apache.commons.logging.impl.SimpleLog
  org.apache.commons.logging.simplelog.log.org.apache.commons.digester.Digester=debug
  org.apache.commons.logging.simplelog.log.org.apache.commons.digester.Digester.sax=info

Usage Examples

Creating a Simple Object Tree

Let's assume that you have two simple JavaBeans, Foo and Bar, with the following method signatures:

  package mypackage;
  public class Foo {
    public void addBar(Bar bar);
    public Bar findBar(int id);
    public Iterator getBars();
    public String getName();
    public void setName(String name);
  }

  package mypackage;
  public class Bar {
    public int getId();
    public void setId(int id);
    public String getTitle();
    public void setTitle(String title);
  }

and you wish to use Digester to parse the following XML document:

  <foo name="The Parent">
    <bar id="123" title="The First Child"/>
    <bar id="456" title="The Second Child"/>
  </foo>

A simple approach will be to use the following Digester in the following way to set up the parsing rules, and then process an input file containing this document:

  Digester digester = new Digester();
  digester.setValidating(false);
  digester.addObjectCreate("foo", "mypackage.Foo");
  digester.addSetProperties("foo");
  digester.addObjectCreate("foo/bar", "mypackage.Bar");
  digester.addSetProperties("foo/bar");
  digester.addSetNext("foo/bar", "addBar", "mypackage.Bar");
  Foo foo = (Foo) digester.parse();

In order, these rules do the following tasks:

  1. When the outermost <foo> element is encountered, create a new instance of mypackage.Foo and push it on to the object stack. At the end of the <foo> element, this object will be popped off of the stack.
  2. Cause properties of the top object on the stack (i.e. the Foo object that was just created and pushed) to be set based on the values of the attributes of this XML element.
  3. When a nested <bar> element is encountered, create a new instance of mypackage.Bar and push it on to the object stack. At the end of the <bar> element, this object will be popped off of the stack (i.e. after the remaining rules matching foo/bar are processed).
  4. Cause properties of the top object on the stack (i.e. the Bar object that was just created and pushed) to be set based on the values of the attributes of this XML element. Note that type conversions are automatically performed (such as String to int for the id property), for all converters registered with the ConvertUtils class from commons-beanutils package.
  5. Cause the addBar method of the next-to-top element on the object stack (which is why this is called the "set next" rule) to be called, passing the element that is on the top of the stack, which must be of type mypackage.Bar. This is the rule that causes the parent/child relationship to be created.

Once the parse is completed, the first object that was ever pushed on to the stack (the Foo object in this case) is returned to you. It will have had its properties set, and all of its child Bar objects created for you.

Processing A Struts Configuration File

As stated earlier, the primary reason that the Digester package was created is because the Struts controller servlet itself needed a robust, flexible, easy to extend mechanism for processing the contents of the struts-config.xml configuration that describes nearly every aspect of a Struts-based application. Because of this, the controller servlet contains a comprehensive, real world, example of how the Digester can be employed for this type of a use case. See the initDigester() method of class org.apache.struts.action.ActionServlet for the code that creates and configures the Digester to be used, and the initMapping() method for where the parsing actually takes place.

(Struts binary and source distributions can be acquired at http://struts.apache.org.)

The following discussion highlights a few of the matching patterns and processing rules that are configured, to illustrate the use of some of the Digester features. First, let's look at how the Digester instance is created and initialized:

    Digester digester = new Digester();
    digester.push(this); // Push controller servlet onto the stack
    digester.setValidating(true);

We see that a new Digester instance is created, and is configured to use a validating parser. Validation will occur against the struts-config_1_0.dtd DTD that is included with Struts (as discussed earlier). In order to provide a means of tracking the configured objects, the controller servlet instance itself will be added to the digester's stack.

    digester.addObjectCreate("struts-config/global-forwards/forward",
                             forwardClass, "className");
    digester.addSetProperties("struts-config/global-forwards/forward");
    digester.addSetNext("struts-config/global-forwards/forward",
                        "addForward",
                        "org.apache.struts.action.ActionForward");
    digester.addSetProperty
      ("struts-config/global-forwards/forward/set-property",
       "property", "value");

The rules created by these lines are used to process the global forward declarations. When a <forward> element is encountered, the following actions take place:

Later on, the digester is actually executed as follows:

    InputStream input =
      getServletContext().getResourceAsStream(config);
    ...
    try {
        digester.parse(input);
        input.close();
    } catch (SAXException e) {
        ... deal with the problem ...
    }

As a result of the call to parse(), all of the configuration information that was defined in the struts-config.xml file is now represented as collections of objects cached within the Struts controller servlet, as well as being exposed as servlet context attributes.

Parsing Body Text In XML Files

The Digester module also allows you to process the nested body text in an XML file, not just the elements and attributes that are encountered. The following example is based on an assumed need to parse the web application deployment descriptor (/WEB-INF/web.xml) for the current web application, and record the configuration information for a particular servlet. To record this information, assume the existence of a bean class with the following method signatures (among others):

  package com.mycompany;
  public class ServletBean {
    public void setServletName(String servletName);
    public void setServletClass(String servletClass);
    public void addInitParam(String name, String value);
  }

We are going to process the web.xml file that declares the controller servlet in a typical Struts-based application (abridged for brevity in this example):

  <web-app>
    ...
    <servlet>
      <servlet-name>action</servlet-name>
      <servlet-class>org.apache.struts.action.ActionServlet<servlet-class>
      <init-param>
        <param-name>application</param-name>
        <param-value>org.apache.struts.example.ApplicationResources<param-value>
      </init-param>
      <init-param>
        <param-name>config</param-name>
        <param-value>/WEB-INF/struts-config.xml<param-value>
      </init-param>
    </servlet>
    ...
  </web-app>

Next, lets define some Digester processing rules for this input file:

  digester.addObjectCreate("web-app/servlet",
                           "com.mycompany.ServletBean");
  digester.addCallMethod("web-app/servlet/servlet-name", "setServletName", 0);
  digester.addCallMethod("web-app/servlet/servlet-class",
                         "setServletClass", 0);
  digester.addCallMethod("web-app/servlet/init-param",
                         "addInitParam", 2);
  digester.addCallParam("web-app/servlet/init-param/param-name", 0);
  digester.addCallParam("web-app/servlet/init-param/param-value", 1);

Now, as elements are parsed, the following processing occurs:

Namespace Aware Parsing

For digesting XML documents that do not use XML namespaces, the default behavior of Digester, as described above, is generally sufficient. However, if the document you are processing uses namespaces, it is often convenient to have sets of Rule instances that are only matched on elements that use the prefix of a particular namespace. This approach, for example, makes it possible to deal with element names that are the same in different namespaces, but where you want to perform different processing for each namespace.

Digester does not provide full support for namespaces, but does provide sufficient to accomplish most tasks. Enabling digester's namespace support is done by following these steps:

  1. Tell Digester that you will be doing namespace aware parsing, by adding this statement in your initalization of the Digester's properties:
        digester.setNamespaceAware(true);
        
  2. Declare the public namespace URI of the namespace with which following rules will be associated. Note that you do not make any assumptions about the prefix - the XML document author is free to pick whatever prefix they want:
        digester.setRuleNamespaceURI("http://www.mycompany.com/MyNamespace");
        
  3. Add the rules that correspond to this namespace, in the usual way, by calling methods like addObjectCreate() or addSetProperties(). In the matching patterns you specify, use only the local name portion of the elements (i.e. the part after the prefix and associated colon (":") character:
        digester.addObjectCreate("foo/bar", "com.mycompany.MyFoo");
        digester.addSetProperties("foo/bar");
        
  4. Repeat the previous two steps for each additional public namespace URI that should be recognized on this Digester run.

Now, consider that you might wish to digest the following document, using the rules that were set up in the steps above:

<m:foo
   xmlns:m="http://www.mycompany.com/MyNamespace"
   xmlns:y="http://www.yourcompany.com/YourNamespace">

  <m:bar name="My Name" value="My Value"/>

  <y:bar id="123" product="Product Description"/>L

</x:foo>

Note that your object create and set properties rules will be fired for the first occurrence of the bar element, but not the second one. This is because we declared that our rules only matched for the particular namespace we are interested in. Any elements in the document that are associated with other namespaces (or no namespaces at all) will not be processed. In this way, you can easily create rules that digest only the portions of a compound document that they understand, without placing any restrictions on what other content is present in the document.

You might also want to look at Encapsulated Rule Sets if you wish to reuse a particular set of rules, associated with a particular namespace, in more than one application context.

Using Namespace Prefixes In Pattern Matching

Using rules with namespaces is very useful when you have orthogonal rulesets. One ruleset applies to a namespace and is independent of other rulesets applying to other namespaces. However, if your rule logic requires mixed namespaces, then matching namespace prefix patterns might be a better strategy.

When you set the NamespaceAware property to false, digester uses the qualified element name (which includes the namespace prefix) rather than the local name as the patten component for the element. This means that your pattern matches can include namespace prefixes as well as element names. So, rather than create namespace-aware rules, create pattern matches including the namespace prefixes.

For example, (with NamespaceAware false), the pattern 'foo:bar' will match a top level element named 'bar' in the namespace with (local) prefix 'foo'.

Limitations of Digester Namespace support

Digester does not provide general "xpath-compliant" matching; only the namespace attached to the last element in the match path is involved in the matching process. Namespaces attached to parent elements are ignored for matching purposes.

Pluggable Rules Processing

By default, Digester selects the rules that match a particular pattern of nested elements as described under Element Matching Patterns. If you prefer to use different selection policies, however, you can create your own implementation of the org.apache.commons.digester.Rules interface, or subclass the corresponding convenience base class org.apache.commons.digester.RulesBase. Your implementation of the match() method will be called when the processing for a particular element is started or ended, and you must return a List of the rules that are relevant for the current nesting pattern. The order of the rules you return is significant, and should match the order in which rules were initally added.

Your policy for rule selection should generally be sensitive to whether Namespace Aware Parsing is taking place. In general, if namespaceAware is true, you should select only rules that:

ExtendedBaseRules

ExtendedBaseRules, adds some additional expression syntax for pattern matching to the default mechanism, but it also executes more slowly. See the JavaDocs for more details on the new pattern matching syntax, and suggestions on when this implementation should be used. To use it, simply do the following as part of your Digester initialization:

  Digester digester = ...
  ...
  digester.setRules(new ExtendedBaseRules());
  ...

RegexRules

RegexRules is an advanced Rules implementation which does not build on the default pattern matching rules. It uses a pluggable RegexMatcher implementation to test if a path matches the pattern for a Rule. All matching rules are returned (note that this behaviour differs from longest matching rule of the default pattern matching rules). See the Java Docs for more details.

Example usage:

  Digester digester = ...
  ...
  digester.setRules(new RegexRules(new SimpleRegexMatcher()));
  ...
RegexMatchers

Digester ships only with one RegexMatcher implementation: SimpleRegexMatcher. This implementation is unsophisticated and lacks many good features lacking in more power Regex libraries. There are some good reasons why this approach was adopted. The first is that SimpleRegexMatcher is simple, it is easy to write and runs quickly. The second has to do with the way that RegexRules is intended to be used.

There are many good regex libraries available. (For example Jakarta ORO, Jakarta Regex, GNU Regex and Java 1.4 Regex) Not only do different people have different personal tastes when it comes to regular expression matching but these products all offer different functionality and different strengths.

The pluggable RegexMatcher is a thin bridge designed to adapt other Regex systems. This allows any Regex library the user desires to be plugged in and used just by creating one class. Digester does not (currently) ship with bridges to the major regex (to allow the dependencies required by Digester to be kept to a minimum).

WithDefaultsRulesWrapper

WithDefaultsRulesWrapper allows default Rule instances to be added to any existing Rules implementation. These default Rule instances will be returned for any match for which the wrapped implementation does not return any matches.

For example,

    Rule alpha;
    ...
    WithDefaultsRulesWrapper rules = new WithDefaultsRulesWrapper(new BaseRules());
    rules.addDefault(alpha);
    ...
    digester.setRules(rules);
    ...
when a pattern does not match any other rule, then rule alpha will be called.

WithDefaultsRulesWrapper follows the Decorator pattern.

Encapsulated Rule Sets

All of the examples above have described a scenario where the rules to be processed are registered with a Digester instance immediately after it is created. However, this approach makes it difficult to reuse the same set of rules in more than one application environment. Ideally, one could package a set of rules into a single class, which could be easily loaded and registered with a Digester instance in one easy step.

The RuleSet interface (and the convenience base class RuleSetBase) make it possible to do this. In addition, the rule instances registered with a particular RuleSet can optionally be associated with a particular namespace, as described under Namespace Aware Processing.

An example of creating a RuleSet might be something like this:

public class MyRuleSet extends RuleSetBase {

  public MyRuleSet() {
    this("");
  }

  public MyRuleSet(String prefix) {
    super();
    this.prefix = prefix;
    this.namespaceURI = "http://www.mycompany.com/MyNamespace";
  }

  protected String prefix = null;

  public void addRuleInstances(Digester digester) {
    digester.addObjectCreate(prefix + "foo/bar",
      "com.mycompany.MyFoo");
    digester.addSetProperties(prefix + "foo/bar");
  }

}

You might use this RuleSet as follow to initialize a Digester instance:

  Digester digester = new Digester();
  ... configure Digester properties ...
  digester.addRuleSet(new MyRuleSet("baz/"));

A couple of interesting notes about this approach:

Using Named Stacks For Inter-Rule Communication

Digester is based on Rule instances working together to process xml. For anything other than the most trival processing, communication between Rule instances is necessary. Since Rule instances are processed in sequence, this usually means storing an Object somewhere where later instances can retrieve it.

Digester is based on SAX. The most natural data structure to use with SAX based xml processing is the stack. This allows more powerful processes to be specified more simply since the pushing and popping of objects can mimic the nested structure of the xml.

Digester uses two basic stacks: one for the main beans and the other for parameters for method calls. These are inadequate for complex processing where many different Rule instances need to communicate through different channels.

In this case, it is recommended that named stacks are used. In addition to the two basic stacks, Digester allows rules to use an unlimited number of other stacks referred to by an identifying string (the name). (That's where the term named stack comes from.) These stacks are accessed through calls to:

Note: all stack names beginning with org.apache.commons.digester are reserved for future use by the Digester component. It is also recommended that users choose stack names prefixed by the name of their own domain to avoid conflicts with other Rule implementations.

Registering DTDs

Brief (But Still Too Long) Introduction To System and Public Identifiers

A definition for an external entity comes in one of two forms:

  1. SYSTEM system-identifier
  2. PUBLIC public-identifier system-identifier

The system-identifier is an URI from which the resource can be obtained (either directly or indirectly). Many valid URIs may identify the same resource. The public-identifier is an additional free identifier which may be used (by the parser) to locate the resource.

In practice, the weakness with a system-identifier is that most parsers will attempt to interprete this URI as an URL, try to download the resource directly from the URL and stop the parsing if this download fails. So, this means that almost always the URI will have to be an URL from which the declaration can be downloaded.

URLs may be local or remote but if the URL is chosen to be local, it is likely only to function correctly on a small number of machines (which are configured precisely to allow the xml to be parsed). This is usually unsatisfactory and so a universally accessable URL is preferred. This usually means an internet URL.

To recap, in practice the system-identifier will (most likely) be an internet URL. Unfortunately downloading from an internet URL is not only slow but unreliable (since successfully downloading a document from the internet relies on the client being connect to the internet and the server being able to satisfy the request).

The public-identifier is a freely defined name but (in practice) it is strongly recommended that a unique, readable and open format is used (for reasons that should become clear later). A Formal Public Identifier (FPI) is a very common choice. This public identifier is often used to provide a unique and location independent key which can be used to subsistute local resources for remote ones (hint: this is why ;).

By using the second (PUBLIC) form combined with some form of local catalog (which matches public-identifiers to local resources) and where the public-identifier is a unique name and the system-identifier is an internet URL, the practical disadvantages of specifying just a system-identifier can be avoided. Those external entities which have been store locally (on the machine parsing the document) can be identified and used. Only when no local copy exists is it necessary to download the document from the internet URL. This naming scheme is recommended when using Digester.

External Entity Resolution Using Digester

SAX factors out the resolution of external entities into an EntityResolver. Digester supports the use of custom EntityResolver but ships with a simple internal implementation. This implementation allows local URLs to be easily associated with public-identifiers.

For example:

    digester.register("-//Example Dot Com //DTD Sample Example//EN", "assets/sample.dtd");

will make digester return the relative file path assets/sample.dtd whenever an external entity with public id -//Example Dot Com //DTD Sample Example//EN is needed.

Note: This is a simple (but useful) implementation. Greater sophistication requires a custom EntityResolver.

Troubleshooting

Debugging Exceptions

Digester is based on SAX. Digestion throws two kinds of Exception:

The first is rarely thrown and indicates the kind of fundemental IO exception that developers know all about. The second is thrown by SAX parsers when the processing of the XML cannot be completed. So, to diagnose the cause a certain familiarity with the way that SAX error handling works is very useful.

Diagnosing SAX Exceptions

This is a short, potted guide to SAX error handling strategies. It's not intended as a proper guide to error handling in SAX.

When a SAX parser encounters a problem with the xml (well, ok - sometime after it encounters a problem) it will throw a SAXParseException. This is a subclass of SAXException and contains a bit of extra information about what exactly when wrong - and more importantly, where it went wrong. If you catch an exception of this sort, you can be sure that the problem is with the XML and not Digester or your rules. It is usually a good idea to catch this exception and log the extra information to help with diagnosing the reason for the failure.

General SAXException instances may wrap a causal exception. When exceptions are throw by Digester each of these will be wrapped into a SAXException and rethrown. So, catch these and examine the wrapped exception to diagnose what went wrong.

Frequently Asked Questions

Extensions

Three extension packages are included within the Digester distribution. These provide extra functionality extending the core Digester concepts. Detailed descriptions are contained within their own package documentation.

Known Limitations

Accessing Public Methods In A Default Access Superclass

There is an issue when invoking public methods contained in a default access superclass. Reflection locates these methods fine and correctly assigns them as public. However, an IllegalAccessException is thrown if the method is invoked.

MethodUtils contains a workaround for this situation. It will attempt to call setAccessible on this method. If this call succeeds, then the method can be invoked as normal. This call will only succeed when the application has sufficient security privilages. If this call fails then a warning will be logged and the method may fail.

Digester uses MethodUtils and so there may be an issue accessing methods of this kind from a high security environment. If you think that you might be experiencing this problem, please ask on the mailing list.



Copyright © 2001-2005 The Apache Software Foundation. All Rights Reserved.