OpenWGA 7.10 - Query languages reference

lucene

Advanced syntax

Wildcards

A search term can contain two types of wildcards characters:

A question mark "?" is a wildcard for one arbitrary character.
A star sign "*" is a wildcard for any number of arbitrary characters (including none).

Wildcards may NOT be used as the first sign in search clauses.

Space characters in search terms

When searching for terms that contain space characters it does not work to just specify the term. As the space character normally is used by lucene to divide individual search clauses lucene will take everything after the space character as separate clause.

For example, the following query will search for the term "Content" in item "body", but them for the terms "Content", "Management", "with" and "WGA" in all other items (plus metas title and description):

body:Content Management with WGA

To search for a term with space characters exactly the way that it is entered, you have to encose it in double quotes. This will make lucene recognize it as one single term:

body:"Content Management with WGA"

Searching the contents of file attachments


Optionally lucene can also index files attached to content documents. It is disabled in default configuration and it needs some special "analyzer" modules to interpret the contents of the used file types which are not part of the OpenWGA standard distribution. Analyzer modules for the most frequently used file types are available in the OpenWGA Enterprise Edition.

There is a special "item name" in lucene for explicitly searching the contents of fileattachments named "allattachments". So if you want to also search in file attachments you may add an item specific clause it to your search term.

"Content Management" AND allattachments:"Content Management"

Searching content relations

Content relations are also indexed to lucene but do not provide direct links to the target as lucene can only index text. The index name of a normal relation is $rel_relationname and its index field contains the struct key and language of the target content, divided by a point. For example: "4028fbe5125651ea01125656704d000f.en".

Relation groups are indexed with name $relgroup_groupname and contain the same data.


Relations are indexed as keywords and are also sortable.

Searching for date and number values

As lucene is a fulltext indexing engine it treats all values as text, including dates and numbers which are converted to a standard text format. This must be considered when searching for those value types.

Date values

Date values are indexed as text in format "yyyyMMddHHmmss" indiziert. The characters mean (y)ear, (M)onth, (d)ay in month, (H)our, (m)inute and (s)econd. If a date contains no time information the time values are indexed as 0. So  1. September 2005 is indexed as "20050901000000". You can use wildcards when searching for dates if time does not matter. This searches for documents that were modified on that day, no matter what time:

MODIFIED:20050901*


If you want to search for date items that way they must be configured in WGA Admin Client to be indexed as type "KEYWORD".

Number values:

Numbers are just converted to text, optionally with the dot "." as decimal separator and without any grouping separator.

VERSION:5

Again items with number values must be configured to be indexed as type "KEYWORD" if they are meant to be queried that way.

Specifying ranges

In the following syntax it is possible to specify a range of values that a field may have:

<fieldname>:[<start> TO <end>] or
<fieldname>:{<start> TO <end>}


The difference of these two syntaxes is, that the square bracket syntax treats start and end values as inclusive (documents are found which have exactly equal values like <start> or <end>) while the curly brackets syntax treats them as exclusive (the values must be higher than <start> and lower than <end> for a document to be found).

The ranges syntax is most useful when searching for date ranges. The following search finds documents that were modified between 15. August and 1. September 2005 inclusive:

MODIFIED:[20050815000000 TO 20050901235959]


Searching multiple databases

As stated lucene is able to search multiple content stores at once. To specify which databases to include in the search you can use the following values on <tml:query> attribute db:
Value for attribute "db" Description
dbkey [, dbkey, ...] Comma separated list of databases to be searched.
* Search all lucene indexed databases in the same domain as the context database
** Search all lucene indexed databases
The default value for attribute db is the dbkey of the current context database. So if you just want so search this database you may omit this attribute.