OpenWGA 7.7 - TMLScript reference

WGA » WGA.Html
Method :

parse(htmlText)

On object WGA.Html
Usage Parses HTML text and returns it as DOM document object
Description The returned object is of type "org.dom4j.Document", which is the DOM document object of the Dom4J which is the preferred XML parser in OpenWGA. See the  API documentation for available operations on it.

The input HTML does neither need to be XHTML nor really wellformed as NekoHTML is quite tolerant against common HTML errors and returns a corrected DOM. 

Parameters htmlText (String):
HTML code to be parsed
Return value Java object of type org.dom4j.Document
Allowed in script types
  • WebTML pages and normal WebTML actions
  • Master actions
  • TMLScript tasks in jobs
  • Content type events
    portletevent
Examples Parsing a very simple HTML document:

var dom = WGA.Html.parse("<html><head><title>The page</title></head><body><h2>The contents</h2> This is the contents</body></html>");



Retrieving the text-only contents of the body tag. Note that we use the uppercase name "BODY" although the tag was written lowercase in the source:

var theBody = dom.selectSingleNode("//BODY").getText();


The result of this is "This is the contents". The text content of the tag <h2> is not contained as it is contained on a deeper tag level.

Put out the body tag and all its contents as XML:

var theBody = dom.selectSingleNode("//BODY").asXML();


Result:

<BODY><H2>The contents</H2> This is the contents</BODY>