OpenWGA 7.8 - TMLScript reference
WGA » WGA.HtmlMethod :
parse(htmlText)
On object | WGA.Html |
Usage |
Parses HTML text and returns it as DOM document object |
Description |
The returned object is of type "org.dom4j.Document", which is the DOM document object of the Dom4J which is the preferred XML parser in OpenWGA. See the API documentation for available operations on it. The input HTML does neither need to be XHTML nor really wellformed as NekoHTML is quite tolerant against common HTML errors and returns a corrected DOM. |
Parameters |
htmlText (String): HTML code to be parsed |
Return value | Java object of type org.dom4j.Document |
Allowed in script types |
|
Examples |
Parsing a very simple HTML document: var dom = WGA.Html.parse("<html><head><title>The page</title></head><body><h2>The contents</h2> This is the contents</body></html>"); Retrieving the text-only contents of the body tag. Note that we use the uppercase name "BODY" although the tag was written lowercase in the source: var theBody = dom.selectSingleNode("//BODY").getText(); The result of this is "This is the contents". The text content of the tag <h2> is not contained as it is contained on a deeper tag level. Put out the body tag and all its contents as XML: var theBody = dom.selectSingleNode("//BODY").asXML(); Result: <BODY><H2>The contents</H2> This is the contents</BODY> |