Marc, himself, his blogs, and you reading them.
I didn't, right? But I should have, already a long time ago.
So here is my shameless plug concernig the CyberNeko tools from Andy Clark. Utterly great stuf.
To get my RadioUserLand-2-MT conversion done I need to parse the radio html files I exported, so I went on using cyberneko-html. I'm quite a dedicated JAXP and SAX addept. So finding out that I can use the Cyberneko-HTML-Parser just like any other JAXP/SAX thingy makes me a happy man. Just slide in the jar into your classpath up-front of the xerces jar and the magic of the META-INF services does the rest. Only now you get a SAXParser that cleans up the parsed HTML (as if it were welformed XML) by balancing out tags and the like. Yummy!
# Posted by mpo at 09:54 AM | TrackBackSAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser sp = spf.newSAXParser(); this.htmlReader = sp.getXMLReader(); this.htmlReader.setContentHandler(this.defaultHandler); this.htmlReader.setErrorHandler(this.defaultHandler);

