Tech and Media Labs
This site uses cookies to improve the user experience.




Java DOM: The Document Object

Jakob Jenkov
Last update: 2014-05-21

The DOM Document object represents an XML document. When you parse an XML file using a Java DOM parser, you get back a Document object. In this text I will give you a head start in traversing a DOM graph. I cannot cover it all, but that isn't necessary. You just need enough to get the picture. The rest you can read in the JavaDoc's.

The two most commonly used features of DOM are:

  1. Accessing Child Elements of an Element
  2. Accessing Attributes of an Element

It is these two primary features that this text covers.

The Document interface and all related interfaces are located in the Java package org.w3c.dom, because they were designed by the World Wide Web Consortium (W3C). You need to know this when looking for the DOM interfaces in the JavaDoc's.

The DOM Document Element

A DOM object contains a lot of different nodes connected in a tree-like structure. At the top is the Document object. The Document object has a single root element, which is returned by calling getDocumentElement() like this:

Element rootElement = document.getDocumentElement();

DOM Elements, Child Elements, and the Node Interface

The root element has children which can be elements, comments, processing instructions, characters etc. You get the children of an element like this:

NodeList nodes = element.getChildNodes();

for(int i=0; i<nodes.getLength(); i++){
  Node node = nodes.item(i);

  if(node instanceof Element){
    //a child element to process
    Element child = (Element) node;
    String attribute = child.getAttribute("width");
  }
}

The getChildNodes() method returns a NodeList object, which is a list of Node elements. The Node interface is a superinterface for pretty much all of the different node types in DOM. This means, that the Document interface inherits from (extends) Node, the Element interface extends Node, the Attr (attribute) interface extends Node etc.

The fact that Node is the super-interface of all the node-interfaces in DOM means that you will sometimes have to look in the Node interface for the methods you need, like the method getChildNodes(). This is something to be aware of, when trying to iterate through a Document graph.

DOM Element Attributes

As you have already seen, you can access the attributes of an element via the Element interface. There are two ways to do so:

String attrValue = element.getAttribute("attrName");

Attr attribute = element.getAttributeNode("attrName");

Most of the time the getAttribute() method will do just fine.

The Attr interface extends Node. It allows you to access the owning element via the method getOwnerElement() etc. Accessing an attribute via this interface is mostly handy if you need to pass the attribute to one or more methods, where the method needs to access more info about the attribute in order to process it.

There is a lot more you can do with the Document object and the related nodes, but accessing child elements and attributes are what you will be using 90% of the time. The rest you can find by checking out the JavaDoc's. Sooner or later you will have to do that anyways.

Jakob Jenkov




Copyright  Jenkov Aps
Close TOC