Introduction to XML Processing


Date: January 21, 2006

Objectives

Links

Instructions

We will use the DOM parser interface in the Xerxes package to read and print a given XML file. The file we read contains data in XGMML, an XML language used to describe graphs. Our input file is the XML file graph.xml (you can also modify the file to add or change elements).

<?xml version="1.0"?>
<graph>
     <node id="1" label="http://www.example.org/" weight="2345">
             <att name="title" value="Example Main Page"/>
             <att name="mime" value="text/html"/>
             <att name="size" value="2345"/>
             <att name="date" value="Wed Jun  9 23:01:06 2000"/>
             <att name="code" value="200"/>
     </node>
     <node id="2" label="http://www.example.org/software/" weight="1234">
             <att name="title" value="Software Examples"/>
             <att name="mime" value="text/html"/>
             <att name="size" value="1234"/>
             <att name="date" value="Wed Sep  19 13:11:23 2000"/>
             <att name="code" value="200"/>
     </node>
     <edge label="Software" source="1" target="2"/>
</graph>

The metadata (or schema) of XGMML is described in xgmml.xsd. Usually, the schema is used to validate the XML data before processing the data. However, in our case we will not use the schema and assume the data given has already been validated.

You need to do the following:

  1. Download Xerces Java Parser 1.4.4 from http://archive.apache.org/dist/xml/xerces-j/.
  2. Create a new Java project in Eclipse and put xerces.jar in your project folder (You can also choose not to use Eclipse).
  3. Add xerces.jar to the project build-path. Do that via 'Project Properties' / 'Java Build Path' / Libraries / 'Add JARs' ..

  4. In your code, include the appropriate packages:

    import org.apache.xerces.parsers.DOMParser;
    import org.w3c.dom.*;
    

  5. Create a DOMParser object:

    DOMParser p = new DOMParser();
    

  6. Parse the input file:
    p.parse("graph.xml");
    

  7. Get the Document object and then extract the root element:

    Node root = p.getDocument().getDocumentElement();
    

    In our case, root should point to a graph element.

  8. You will now need to traverse all child elements of the graph element (nodes and edges) and print their information.

Note the following:

The output of your program should be similar to the one below:

graph
  node  id=1  label=http://www.example.org/  weight=2345  
    att  name=title  value=Example Main Page  
    att  name=mime  value=text/html  
    att  name=size  value=2345  
    att  name=date  value=Wed Jun  9 23:01:06 2000  
    att  name=code  value=200  
  node  id=2  label=http://www.example.org/software/  weight=1234  
    att  name=title  value=Software Examples  
    att  name=mime  value=text/html  
    att  name=size  value=1234  
    att  name=date  value=Wed Sep  19 13:11:23 2000  
    att  name=code  value=200  
  edge  label=Software  source=1  target=2



Yarom Gabay 2006-01-21