XHTML Validation

I've added the first new and interesting feature to my weblog in a while: it validates articles I post and edit for XHTML 1.0 Transitional compliance.

Here's a bit of the code that does it:

/**
 * validate the input String to be proper XHTML 1.0 Transitional.
 * @author John M Flinchbaugh
 */
public class XmlValidator extends DefaultHandler {
    private String errorMessage = null;

    /**
     * validate input and return the message.  null means good.
     * @param in input xhtml string
     * @return parser message.  null means it passed.
     * @throws SAXException exception from SAXParser
     * @throws ParserConfigurationException exception from
     *      SAXParserFactory
     */
    public String validate(String in)
        throws SAXException, ParserConfigurationException {
        errorMessage = null;
        StringBuffer sb = new StringBuffer();

        // wrap article string in context HTML
        sb.append("<!DOCTYPE html"
            + " PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\""
            + " \"http://www.w3.org/TR/xhtml1/DTD/"
            + "xhtml1-transitional.dtd\">"
            + "<html><head><title>test</title></head><body><div>");
        sb.append(in);
        sb.append("</div></body></html>");
        InputSource source = new InputSource(
            new StringReader(sb.toString()));
        SAXParserFactory factory = SAXParserFactory.newInstance();
        factory.setValidating(true);
        SAXParser parser = factory.newSAXParser();
        try {
            parser.parse(source, this);
        } catch (IOException ignored) {
            // won't happen, because this is a StringReader
        } catch (SAXParseException parseEx) {
            errorMessage = parseEx.getMessage();
        }

        return errorMessage;
    }

    /**
     * error method from DefaultHandler.
     * saves the error message to returned by validate() method
     * @param e parse exception
     */
    public void error(SAXParseException e) throws SAXParseException {
        errorMessage = e.getMessage();
    }
}

(It wouldn't even let me post this message, until I cleaned up the embedded HTML in the code.)


Filed Under: Web-Dev Java Blog-Code