Discuss this tutorial. Printable version (PDF).
Go to Contents page. Go to previous page. Go up a level. Go to next page.

 

 

Implementation


Two Java classes handle the normalization chores: XMLNormalization takes a file name as input and converts it to a URL, then passes the URL to MyDocumentBuilder which removes the whitespace characters. When MyDocumentBuilder returns the normalized document, XMLNormalization.main prints the results to the standard output device.

   public static void main(String [] args) throws Exception
   {
     SAXParser saxParser = new SAXParser();
     DocumentBuilder docBuilder = new MyDocumentBuilder();

   
     saxParser.setContentHandler(docBuilder);
     saxParser.parse(fileNameToURL(args[0]));
    
     XMLDocument doc = docBuilder.getDocument();
     doc.print(System.out);
   }

MyDocumentBuilder extends DocumentBuilder, which provides the getDocument method called above. DocumentBuilder also provides the characters method, which receives notification of character data inside an element. The following code from MyDocumentBuilder overrides the DocumentBuilder.characters method to handle specific whitespace characters. First, it replaces tabs, newlines, and return characters with spaces, then it calls String.trim to remove the spaces.

   /**
* Receive notification of character data inside an element.
* @param ch The characters.
* @param start The start position in the character array.
* @param length The number of characters to use from the
* character array.
* @exception org.xml.sax.SAXException Any SAX exception, possibly
* wrapping another exception.
* @see org.xml.sax.DocumentHandler#characters
*/
public void characters(char ch[], int start, int length)
throws SAXException
{
String str = new String(ch, start, length);
//replace
str = str.replace('\t',' ');
str = str.replace('\n',' ');
str = str.replace('\r',' ');
// collapse
str = str.trim();
char[] ca = str.toCharArray();
int i, j;
boolean seenWS = false;
for (i=0,j=0; j< str.length(); j++)
{
if (ca[j] != ' ' || !seenWS)
{
ca[i++] = ca[j];
if (ca[j] == ' ')
seenWS = true;
else
seenWS = false;
}
}
super.characters(ca,0,i);
}

Discuss this tutorial. Printable version (PDF).
Go to Contents page. Go to previous page. Go up a level. Go to next page.
E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy