XML Processing APIs

The XML Processing service contains three packages :

  • the querying package
  • the resolving package
  • the transforming package

XML Querying

This package is used to query, update and delete part of XML files. All the queries are based on the XPath language

The main entry point for that service is the XMLQueryingService but the most important class is the XMLQuery one.

There are three ways to create a new XMLQuery.

To get a Statement object - used by the XMLQuery one - a SimpleStatementHelper class was provided. An XMLDataManager object is also available in the helper package and is used to create several types of XMLData objects.

Let's show some code :

How to select part of the XML :

XMLQueryingService service = (XMLQueryingService) manager.getService(XMLQueryingServiceImpl.class);
Query query = (Query)service.createQuery();

Statement qc = service.createStatementHelper().select("employees/employee[firstname='Gennady']",
                                                      "tmp/employees.xml",
                                                      "tmp/employees-out.xml");
query.prepare(qc);
query.execute();
query.serialize();

qc = service.createStatementHelper().select("employees/employee[firstname='Gennady']/lastname/text()",
                                            "tmp/employees.xml");
query.prepare(qc);
query.execute();
assertEquals( "Azarenkov", query.getResult().toString());
        

To delete :

XMLQueryingService service = (XMLQueryingService) manager.getService(XMLQueryingServiceImpl.class);
Query query = (Query)service.createQuery();

Statement qc = service.createStatementHelper().delete("employees/employee[firstname='Gennady']",
                                                      "tmp/employees.xml",
                                                      "tmp/employees-del.xml");
query.prepare(qc);
query.execute();
query.serialize();

qc = service.createStatementHelper().select("count(employee[firstname='Gennady']/lastname/text())");
query.prepareNext(qc);
query.execute();

assertEquals( "0", query.getResult().toString());
          

Many other manipulations are possible like update, append...As usual you can find more in unit tests.

XML Resolving

The resolving service provides two services to resolve DTDs in local :

  • A simple resolvin service that uses DTDs stored in a local repository
  • An XML Catalog resolving service - based on Apache XML commons Resolver. It is implementation of the OASIS Entity Resolution Technical Committee definitions and supports OASIS XML Catalogs, OASIS TR9401 Catalogs, XCatalogs (supported by Apache) catalog formats.

Use of the simple service :

SimpleDirResolvingService service = (SimpleDirResolvingService) manager.getService(SimpleDirResolvingServiceImpl.class);

javax.xml.parsers.SAXParserFactory factory=javax.xml.parsers.SAXParserFactory.newInstance();
factory.setNamespaceAware( true );
javax.xml.parsers.SAXParser jaxpParser=factory.newSAXParser();
org.xml.sax.XMLReader reader=jaxpParser.getXMLReader();

reader.setEntityResolver(service.getEntityResolver());

reader.parse("tmp/web.xml");
          
The use of the XML commons resolving service is identical.

XML Transforming

This package provides several services for transforming incoming XML.

All transformers implement the Transformer interface and the TransformingService class acts as a factory.

HTML

This Transformer takes non well formed HTML as input and transforms it to a XHTML one.

Transformer tt = transformingService.getTransformer(TidyTransformer.class);

StreamResult res = new StreamResult(new FileOutputStream("tmp/rss-out.out"));
FileInputStream fis = new FileInputStream("tmp/rss-out.html");
tt.transform(new StreamSource(fis), res);
            

The default implementation using jTidy for such a work.

Trax

This service is used to apply XSL transformation to incoming XML streams.

The initStyle() method allows you to define the XSL file to use during the transformation.

Pipe

This service is an abstract modelization of the Chain of Responsability design pattern as defined in the GoF book.

You can create and manage a list of Transformers objects that will be called one after the other.

Transformer pipe = service.getTransformer(PipeTransformer.class);

StreamResult res = new StreamResult(new FileOutputStream("tmp/rss-out.pipe"));
FileInputStream fis = new FileInputStream("tmp/rss-out.html");
rules.addTransformer(transformerA).addTransformer(transformerB);

pipe.transform(new StreamSource(fis), res);