OpenWGA 5.4 - Query languages reference



Lucene is a library for executing fulltext queries that is embedded to OpenWGA. It is available for all WGA Content Store types to index and query all of the contained content documents and their data. You may use lucene as a site-internal search engine to perform fulltext queries for special terms. You can also use lucene like a database query language to do more specific queries on special items and metadata fields and have the results sorted the way you want.

As only query language in WGA it also allows queries on multiple WGA Content Stores at once.

The feature to query the lucene fulltext index must be enabled for individual content stores in administration. There you also have the ability to configure the way that lucene treats items and metadata fields in index, modifying importance, sorting capability and indexing type. If a lucene query does not return the results you want it to return, chances are that the behaviour of the lucene index can be adapted to your needs in administrative setup.

One general drawback of lucene is the fact that the index is updated asynchronously after each data change. Because of that the index may not include the latest data additions and modifications of the content store. If you need your query to return realtime results you should choose another query language like HQL.


In the following document we want to demonstrate the most commonly used search syntaxes for lucene. For a more in-depth documentation you can use the official lucene dokumentation.

A lucene query consists of a number of singular search clauses. A search clause may be some simple term or a specific search for terms in a field. Individual clauses are divided by space characters. Therefor the following query consists of two clauses:
<tml:query type="lucene">
    Content Management

It searches for documents that have both words "Content" and "Management" somewhere in their item data, or in textual metadata fields like title and description.
A clause that searches a term in a specific field contains fieldname and term divided by a colon:

The field names are interpreted as content items when they are lowercase, or as metadata fields when they are uppercase, for example:
body:WGA TITLE:Google

Searches a document whose item "body" contains the term "WGA" and whose metadata field "title" contains the term "Google". A list of valid metadata fieldnames is at the end of this document.


The default sort order of lucene results is by "relevance", i.e. those documents with the "best matches" are displayed first. What differs a "better match" from a "worse match" is dependent on the field that the terms are found in. For example a match in the title has a higher relevance than a match in any item. Also the configuration of the lucene index in adminstration can "boost" special items so that matches in them are regarded "better" than in other items. For an in-depth treatment of the mathematics behind revelance determination you can read the lucene documentation about Scoring.

In any way you can return a numerical representation of the individual relevance as metadata field "searchscore" on each result document, which will return a fraction value between 1 and 0.

Alternatively you can sort the lucene results by item and metadata values, providing the used field was indexed to be sortable (which again can be configured in administration). Use the <tml:query> attribute options to specify the desired sorting:
<tml:query type="lucene" options="sort: myitem (asc)"> terms...</tml:query>

The sort expression has the following syntax as seen in the example above:
  • The prefix "sort:"
  • The name of the field that should determine the sorting order, like it would be specified in the query itself (so items lowercase, metadatafields uppercase)
  • The suffix "(asc)" or  "(desc)" determining if you want ascending or descending sort order

The metadata table at the end of this document describes what metadata fields are sortable.

Sorting based on more than one field at once currently is not possible. If you need something like this you might want to fallback on WebTMLs sorting capabilities.


Search clauses can be combined with a variety of operators. It is placed either before a clause or between clauses, depending on its nature. The complete syntax of a search query including optional operators would therefor be:
<preceding-operator><fieldname>:<term>  <between operator> <preceding-operator><fieldname>:<term>  ...

But as we have seen before most queries never have a preceding or "between" operator. In that case default operators are implicitly used by lucene. The default preceding operator is "+" (means, that the clause is positive). The default operator between clauses is "AND" (means that result documents must match both clauses).

The following operators are available:
Operator Description Position
AND, && Combines two clauses so all documents are found that match both clauses. This is a default operator of lucene which is implicitly used if multiple clauses are just divided by space characters without explicit operator. Between two clauses
OR, || Combines two clauses so all documents are found that match either one of them or both clauses. Between two clauses
+ Marks the clause as "positive", i.e. all documents must match the clause. This is a default operator of lucene which is implicitly used when clauses have no preceding operator. Directly preceding the clause
NOT, -, ! Marks the clause as "negative", i.e. all documents must not match the clause. A query may not just consist of negative clauses. Directly preceding the clause im case of "-" and "!", preceding the clause but divided by a space character from it in case of NOT

Advanced syntax


A search term can contain two types of wildcards characters:

A question mark "?" is a wildcard for one arbitrary character.
A star sign "*" is a wildcard for any number of arbitrary characters (including none).

Wildcards may NOT be used as the first sign in search clauses.

Space characters in search terms

When searching for terms that contain space characters it does not work to just specify the term. As the space character normally is used by lucene to divide individual search clauses lucene will take everything after the space character as separate clause.

For example, the following query will search for the term "Content" in item "body", but them for the terms "Content", "Management", "with" and "WGA" in all other items (plus metas title and description):

body:Content Management with WGA

To search for a term with space characters exactly the way that it is entered, you have to encose it in double quotes. This will make lucene recognize it as one single term:

body:"Content Management with WGA"

Searching the contents of file attachments

Optionally lucene can also index files attached to content documents. It is disabled in default configuration and it needs some special "analyzer" modules to interpret the contents of the used file types which are not part of the OpenWGA standard distribution. Analyzer modules for the most frequently used file types are available in the OpenWGA Enterprise Edition.

There is a special "item name" in lucene for explicitly searching the contents of fileattachments named "allattachments". So if you want to also search in file attachments you may add an item specific clause it to your search term.

"Content Management" AND allattachments:"Content Management"

Searching content relations

Content relations are also indexed to lucene but do not provide direct links to the target as lucene can only index text. The index name of a relation is $rel_relationname and its index field contains the struct key and language of the target content, divided by a point. For example: "4028fbe5125651ea01125656704d000f.en".

Relations are indexed as keywords and are also sortable.

Searching for date and number values

As lucene is a fulltext indexing engine it treats all values as text, including dates and numbers which are converted to a standard text format. This must be considered when searching for those value types.

Date values

Date values are indexed as text in format "yyyyMMddHHmmss" indiziert. The characters mean (y)ear, (M)onth, (d)ay in month, (H)our, (m)inute and (s)econd. If a date contains no time information the time values are indexed as 0. So  1. September 2005 is indexed as "20050901000000". You can use wildcards when searching for dates if time does not matter. This searches for documents that were modified on that day, no matter what time:

If you want to search for date items that way they must be configured in WGA Admin Client to be indexed as type "KEYWORD".

Number values:

Numbers are just converted to text, optionally with the dot "." as decimal separator and without any grouping separator.


Again items with number values must be configured to be indexed as type "KEYWORD" if they are meant to be queried that way.

Specifying ranges

In the following syntax it is possible to specify a range of values that a field may have:
<fieldname>:[<start> TO <end>] or
<fieldname>:{<start> TO <end>}

The difference of these two syntaxes is, that the square bracket syntax treats start and end values as inclusive (documents are found which have exactly equal values like <start> or <end>) while the curly brackets syntax treats them as exclusive (the values must be higher than <start> and lower than <end> for a document to be found).

The ranges syntax is most useful when searching for date ranges. The following search finds documents that were modified between 15. August and 1. September 2005 inclusive:
MODIFIED:[20050815000000 TO 20050901235959]

Searching multiple databases

As stated lucene is able to search multiple content stores at once. To specify which databases to include in the search you can use the following values on <tml:query> attribute db:
Value for attribute "db" Description
dbkey [, dbkey, ...] Comma separated list of databases to be searched.
* Search all lucene indexed databases in the same domain as the context database
** Search all lucene indexed databases
The default value for attribute db is the dbkey of the current context database. So if you just want so search this database you may omit this attribute.

Further functionality

Search score

As stated above lucene provdes a "search score" for each found content document, providing information about the relevance of the document for the search query. However it is only available when the query result is sorted by relevance (which is the default when no other sort order is declared).

It is retrievable as metadata field "SEARCHSCORE" on each result document and is a numeric fraction value ranging from 1 (perfect match) to nearly 0 (weak match).

The relevance of a document for a search query is calculated based on many parameters:
  • Count of found terms
  • Items/Metadata fields where the terms were found and their importance
  • Position of the terms inside the field data
  • Configured "boost" value for the field (settable in WGA Admin Client under the "Fulltext configuration" of the content store)
Further information about this topic is found in the chapter "Sorting".


The highlighting feature allows you to highlight the searched terms in the data of found documents. To enable this just set attribute highlight  at the <tml:query> tag to "true". Also, when putting out the data of found documents via <tml:item> set the attribute highlight of this tag to "true" to enable automatic highlighting.

The default highlighting simply marks the terms bold. You can change this by using the <tml:item> attributes highlightprefix and highlightsuffix to explicitly specify the HTML code that is to put out right before and after the term.

The following example highlights terms by wrapping them in a HTML span of CSS class "highlight":
<tml:item name="body" highlight="true" 
highlightprefix="<span class="highlight">" highlightsuffix="</span>"/>
This feature does not support you in finding the fields where the matches occured. You need to know the item that is to put out via <tml:item> and enable highlighting there. Therefor this feature is most useful with documents whose main data is just in one "body" item, that always can be put out.

Best fragments

The feature "best fragments" automatically detects those text fragments in an item that matched the query terms and is able to return them. This is useful if the text of a data item normally is too long to be put out in whole on a search result page.

You can retrieve these fragments by the TMLScript method this.bestFragments(), which returns the fragments for a specific item on the current result document. It always uses the fragments data for the last lucene search on the current user session. So executing another lucene query will delete the fragments data of a previous search.

Including virtual documents

Virtual documents by default are excluded from the result list of lucene as their data is not the one shown when the virtual document is displayed. You may however choose to include them by specifying the native query option "includeVirtualContent":
<tml:query type="lucene" options="includeVirtualContent" ... />

Metadata fields in lucene index

This table shows all metadata fields that are contained in the lucene index. There are different indexing types which allow different usages:
  • keyword: The field value is stored unmodified and analyzed, therefor (only) can be found when querying for the exact and complete contents of the field.
  • analyzed: The field value is analyzed and tokenized. It can be found querying for any single word token.
  • fulltext: Like "analyzed", but the field can also be found when using field-unspecific search clauses
  • date: Like "keyword". Only for dates, that will be indexed in the text form "yyyyMMddHHmmss". See chapter "date values" for details.
Metadata field Description Index type Sortable
AREA Name of the area containing the content keyword Yes
AUTHOR Author of the content analyzed Yes
COAUTHORS Additional authors of the content (Only OpenWGA content stores of version 5 or higher) fulltext No
CONTENTCLASS Name of the content class of the content keyword Yes
CONTENTTYPE Name of the content type of the page keyword Yes
CREATED Date and time of creation date Yes
DBKEY Key of containing database keyword Yes
DESCRIPTION Kurzbeschreibung des Inhaltes fulltext Yes
DOCNAME, NAME, UNIQUENAME Unique name of the content keyword Yes
HIDDENINNAV Is "true" if the document is to be shown in navigators, "false" otherwise keyword Yes
HIDDENINSEARCH Is "true" if the document is to be shown in query results, false otherwise keyword Yes
HIDDENINSITEMAP Is "true" if the document is to be shown in sitemaps, false otherwise keyword Yes
KEY The complete content key of syntax "structkey.language.version" keyword Yes
KEYWORDS Keywords for this content to be used by internet search machines keyword No
LANGUAGE Code of the language of this content, for example "en" or "de" keyword Yes
LASTCLIENT Type of the last OpenWGA authoring client that edited this content keyword Yes
LASTMODIFIED, MODIFIED Date and time of last modification date Yes
OWNER The owner of the content (Only OpenWGA content stores of version 5 or higher) fulltext Yes
PAGEPUBLISHED The published date of the first ever published version of this content (Only OpenWGA content stores of version 5 or higher) keyword Yes
PARENT Struct key of the parent page keyword No
PATH Struct keys of all pages up the page hierarchy to the root page. Querying for the struct key of a specific page on PATH will return all contents that are in the hierarchy below that page keyword No
PUBLISHED The published date of the content (Only OpenWGA content stores of version 5 or higher) keyword Yes
STATUS Workflow state of the content:
"w" - Working copy
"g" - In approval process
"p" - Published
"a" - Archived
keyword Yes
STRUCTENTRY, STRUCTKEY Key of the struct entry belonging to this content keyword Yes
TITLE Title of the content fulltext Yes
VALIDFROM Optional date and time before which the document should be invisible date Yes
VALIDTO Optional date and time after which the document should be invisible date Yes
VERSION Number of version of this content keyword Yes
VIRTUALLINK If this document is a virtual document points to its target. Contents depends on type of virtual link (which is indexed as VIRTUALLINKTYPE):
"int" - Content key of the target document
"exturl" - URL to an external website
"file" - Syntax: <documentkey>/<filename>, where <documentkey> is the name of a file container or the key of a content document
"intfile" - Name of a file attachment on this content
keyword Yes
VIRTUALLINKTYPE Type of virtual document:
"int" - Targets a content document in this database
"exturl" - Targets some custom URL
"file" - Targets a file attachment on a file container or content document in this database
"intfile" - Targets a file attachment on this content document
keyword Yes
VISIBLE General visibility flag holding "true" or "false". keyword Yes