OpenWGA 6.3 - Query languages reference


hql / fullhql

Query types "hql" and "fullhql"


Queries against OpenWGA Content Stores on relational database systems normally are done using HQL, the integrated query language of  Hibernate, a software library that OpenWGA uses internally to transform the relational database data into document objects and vice versa. You can find detail information about general HQL in the hibernate dokumentation.

OpenWGA simplifies the usage of HQL so that you only have to specify filter criterias by which you want yor result contents to be selected and optionally order statements. The following query for example searches for a content document with the unique name "home":

<tml:query type="hql">content.uniquename = 'home'</tml:query>


In special cases you will want to formulate a "complete HQL query" which also specifies the objects that are to be returned. Therefor you are able to declare your query of type "fullhql", which expects complete HQL queries, like the following.

<tml:query type="fullhql">select content from Content as content where content.uniquename = 'home'</tml:query>

Both queries seem to do the same, selecting a document of unique name "home". But with type "hql" the query is also internally enhanced according to the current WebTML environment, so it only selects released documents which are visible. In "fullhql" the query is executed unmodified and generally returns documents that normally are invisible. The WebTML output via <tml:foreach> normally filters out documents that are invisible, but when you use other functionality to put out "fullhql" query results it is important to note that it may contain invisible contents. Readability however is enforced by both query types.


General syntax of HQL

HQL is an "object oriented" query language as you specify document objects and their properties and decide by what values you want to filter the documents in your result or by which properties you want them to be ordered. Those properties may be text, number or date values or even further objects with their own set of properties. But the base of a query is always the object "content", representing a content document.

Many other things in HQL resemble closely SQL, so in most cases it can be used just like SQL. String literals are specified by delimiting them with single quote characters ' (as you could see in the introductory example). The most commonly know operators and techniques are also available in HQL. So besides the object oriented nature of HQL you should have no problems understanding HQL concepts when coming from SQL.

A property of an object is queried by specifying the object and the property name, divided by a dot. The following HQL selects a content document based on its title property:

content.title = 'Home'


As stated some properties of objects are again objects themselves. You can also query the properties of those sub objects by continuing the expression with the propery name after a dot. For example the "language" property of "content" is a language object which itself has a property "name". To query for all contents of name "de" specify the following:

content.language.name = 'de'



Common operators


The following operators can be used anywhere in HQL and are not dependent on any special functionalities. So you can combine multiple HQL expressions by the operators AND/OR and can negate an expression by NOT:

content.title = 'Home' AND content.language.name = 'de'

content.title = 'Home' OR content.language.name = 'de'

content.title = 'Home' AND NOT content.language.name = 'en'


Using normal brackets you can specify the precendence of expressions:

content.title = 'Home' AND (content.language.name = 'en' OR  content.language.name = 'de')


You can query for empty properties with IS NULL or IS NOT NULL

content.virtuallink is not null


You can test for equality and non equality. For numeric properties you can also test for larger/smaller values:

content.structentry.position != 0

content.structentry.position > 0

content.structentry.position >= 0


The "like" operator like in SQL is used for string matching. You can use "?" as a wildcard for a single arbitrary character and "%" as wildcard for an unspecified number of arbitrary characters:

content.title like '%wga%'


The "in" operator tests if a property is in a selection of values:

content.structentry.doctype.name in ('standard', 'homepage', 'news')


The "between" operator tests on numeric values if they are in a given range:

content.structentry.position between 0 and 10


Treatment of list properties

Some properties do not consist of single values but are effectively a list of values or objects. The individual values of those lists are either addressable by specifying a numeric index or a text descriptor, depending on the list type.  You can specify this index by appending it to the property name in square brackets. For example the "keywords" property is a list indexed by numbers:

content.keywords[0] = 'Hamburg'


An example for a list property indexed by a text descriptor - and the most common use case of this syntax - is the property "items" on the content object. This lists all the custom data items that a content document contains, which themselves are objects of type "contentItem". The index text is the name of the item in lowercase. Those item objects have individual properties for addressing the items value as "text", "number", "date" or "boolean". Only the property that matches the actual data type of the item can be queried. Following examples test an item "revision" of number type and an item "docname" of text type:

content.items['revision'].number = 1 AND

content.items['docname'].text = 'ref'


More about treatment of items will be shown later in this document.

Text indexes of content items should always be specified lowercase, as they are internally stored that way.

All list properties have a property "size" which provides the number of values in the list:

content.keywords.size > 10


Sorting

You can specify a sorting clause "order by" in HQL just like you do in SQL. You should prefer sorting inside a HQL query rather than in WebTML because in most cases it is faster and more resource effective:

content.language.name='de' order by content.title, content.created


You can also use item values for sorting:

order by content.items['shortname'].text


Items in HQL

Items in HQL, as already stated, are single objects that are available via the list property "items" of the content objects. They have individual properties for each data type that can be stored natively in an item. Testing string and number items is straightforward:content.items['type'].text = 'article'

content.items['type'].text = 'article'

content.items['count'].number = 5

You can of course combine multiple expressions testing for individual item values using AND, OR etc.

content.items['type'].text = 'article' AND content.items['count'].number = 5

Selecting and sorting items in HQL has its caveats which may let your query not work as expected:

  • When selecting or sorting for an item you are effectively filtering the query result to contain only documents that have an item of that name. All documents who do not contain an item of the given name will not appear in the result

  • In HQL you cannot check for values of items that do not contain a single primitive value like a text, number, date or boolean. For example: retrieving an item which contains a string list will result in the retrieval of some XML code. While non-primitive values can be stored and retrieved in WebTML they are serialized into the database in an XML form that is not reliably queryable.

Subqueries

If your database platform supports it you can use subqueries in HQL by specifying the syntax "object in (subquery)". This subquery effectively has the syntax of a "fullhql" query where you can query a different HQL object than "content". The result of the main query will then include all contents where object, typically some part of a content or the content itself, is in the query result of the respective subquery. 

For example, using a subquery you can query for an items value without implicitly filtering out all contents that do not have this item:

content.contentclass='product' or

content in (select item.parentcontent from ContentItem as item where item.name="contenttype" and item.text='product')

This query returns contents which either have the content class "product" or do have an item named "contenttype" with the value of "product". The subquery effectively does the same as

content.items['contenttype'].text='product'

...but this term would have filtered out all contents which do not have an item "contenttype" (see advice box above), so the other term content.contentclass='product' would have no change to select contents without this item.

Many things that seem impossible in regular HQL can be done using subqueries, but the resulting code may become very complicated.

Functions

HQL provides some predefined functions which are not dependent on the database platform that is used. Here are some of the most frequently used:

The functions elements and indices return the elements / index values of a list property. They often are combined with operator "in" to test if a certain value/index is contained in the list property:

'body' in indices(content.items)


The functions upper and lower convert strings to upper/lower case:

upper(content.title) like '%HAMBURG%'


In most cases you can also use native SQL functions of your database platform to modify values. The following example uses the MySQL functions SUBDATE()  and NOW() to select contents who were modified less than 10 days ago:

content.lastmodified < SUBDATE(NOW(), 10)



There are problems when using database functions with unusual parameter syntax. For example the MySQL function DATE_SUB() (which is a variant of function SUBDATE() used above) has a second parameter that itself consists of multiple keywords:


DATE_SUB(NOW(), INTERVAL 10 DAYS).


In that case OpenWGA cannot correctly identify and merge HQL and SQL parts and the query will fail.



Query parameters

HQL supports the specification of query parameters in the query which can be dynamically injected, like with <tml:param> in WebTML. Specify the parameter with a colon and a custom name directly inside the query:

content.item['city'] = :city


Do not surround the parameter with type specific delimiters, like single quotes for strings. The parameter injection functionality will automatically take care of this. Her is a complete example in WebTML:

<tml:query type="hql">

  <tml:param name="city">Berlin</tml:param>

  content.items['city'].text = :city

</tml:query>



While at first it might seem easier and better to just inject the needed value at the right position into the query string query parameters have some crucial advantages why they should be preferred where possible:
  • Query parameters are automatically protected against SQL injection attacks. They take care that the injected value will only be used as a literal.
  • Query parameters allow database drivers to precompile the SQL statement and reuse it with different parameter values, preventing unneccessary recompilation, therefor improving performance

Content Relations

Content relations (available on OpenWGA content stores of version 5 or higher) are also queryable in HQL. They are usable just as items - specify the relation name in square brackets - instead that there are no data properties to read out but a "target" property which points to a Content object that is the relations target:

content.relations['dependenton'].target.contentclass = 'project'

This selects all contents whose relation 'dependenton' point to a content of content class "project".

These queries really get useful in HQL in combination with query parameters holding content objects, like default parameter ":content" holding the content currently in context of the <tml:query> tag. You may just compare the "target" property of the relation to it to select documents whose named relation points to the current context document:

content.relations['parent-project'].target = :content

This will select all contents whose relation "parent-project" points to the content given as parameter. So this is a query for an "incoming relation", which is the usual way of querying regular relations. The "other way round" normally does not make sense in a query as a relation only may have one target. You can use other WebTML features, like context expression "relation:<relation-name>" to resolve that.

You can inject other WGAPI content objects (WGContent) as query parameters with <tml:param> and use them the same way.

Content relation groups

Relation groups consist of regular relations that also have a filled "group" property. Others than with relation names there may be multiple relations with the same "group" name, so this construct is able to represent 1:n relations.

So relations from relation groups they are also retrievable via "content.relations". But as their name is not of importance (it is a unique, generated string) you query them differently, using the "group" property on the contentRelation object for reference. In most cases it needs subqueries against these relation objects to do something meaningful.

Here is an example how to select all target contents in the relation group "customers" of the content document referenced by the ":content" query variable.

content in (select rel.target from ContentRelation as rel where rel.parentcontent=:content and rel.group="customers")

The following query goes the other way round: It selects all contents whose relation group "customers" addresses the content on the ":contetn" query variable:

content in (select rel.parentcontent from ContentRelation as rel where rel.target=:content and rel.group="customers")

So it's effectively just a question if your parameter content is used for the target or the parent content of the queried relation.

Fetch types "lazy" and "straight" in HQL

Query type "hql" can be executed in two fetch types which influence the way content data is read from the query result and therefor influence performance: "lazy" and "straight". You can influence which fetch type a query uses by specifying either "lazy" or "straight" in the native options (for example, in attribute "options" on WebTML tag <tml:query>) of a query. Note: This has no effect on query type "fullhql". The following query uses fetch type "straight":

<tml:query type="hql" options="straight">content.structentry.contenttype.name='standard'</tml:query>

Fetch type "lazy" is the default fetch type, used when none is specified. When a "lazy" query fetches the contents from the query result it does not actually return the whole content data but only the content key of the document. This is very lightweight data that is transmitted very fast. Then it tries to fetch the actual document from the WGAPI document cache using this key. If the document is there in the cache everything is ok and the document can be returned. If it is not yet there OpenWGA needs to retrieve the specific document from the database in an individual backend call.

This fetch type is fastest when many documents in it are already contained in the cache. It however gets slower with every document in the query result that is not in the cache and needs to be retrieved from the backend, as this is done with 1 backend call per document.

Fetch type "straight" on the opposite reads the -nearly - complete content data from the backend for each result document of the query (file data is excluded and retrieved separately). This might be much data to retrieve but it is done in a single backend call. It therefor gets slower with every result document that is fetched, but not as much as fetch type "lazy" when it needs to fetch a backend document in an individual call.

The fetch type used may make a huge difference in query performance, so if you have performance trouble with a query it might be a good idea to test the other fetch type. Here are some rules of thumb about what fetch type may be best in what situations.

lazy
straight
  • Not very much documents to actually process (<= 100)
  • Result documents likely in cache
  • Many documents to process
  • Many result documents likely not in cache

A note about what documents are really "processed"  from a query result:

  • All documents that are actually retrieved, like in a <tml:foreach>, until the page size is filled or the end of the result is reached.
    • When using a paged collection and retrieving page 2 to n of it the preceding documents before the selected page are skipped and not processed.
  • All documents that are not readable for the current user and needed to be loaded until the retrieved documents were complete
If there are many documents in the query result that the current user cannot read this may have a huge impact on the number of processed documents, even when using a page size. Imagine a <tml.foreach pagesize="10"> which normally just needs to retrieve 10 documents per request. If the query result contains 1000 documents where only 3 documents are readable for the current user then the query will actually process all of those 1000 documents because it will never reach the page size intended, even after processing all those docs.

Virtual resultsets in "fullhql"

The query type "fullhql" can be used to specify the full HQL query instead of only the part after WHERE. That way you have complete control over the SELECT part of the query and are able to return other data than content documents, like the result of aggregate functions. OpenWGA will interpret each "result row" as a "virtual document" (in true meaning of virtual, not meaning docs that point to some other content) holding the row colums as items. Therefor you should specify an alias for each value that you select which will become the virtual item name.

See the following simple "fullhql" query which just counts the available content documents:

select count(*) as cnt from Content as content


This will return a result set of a single "virtual" content document. It will contain a single item "cnt" containing the result of the aggregate function.

Of course you can combine this syntax with grouping clauses, like this query which groups contents by their content type and then counts each group:

select content.structentry.contenttype.name, count(*) as cnt from Content as content

group by content.structentry.contenttype


This query will return a "virtual" content document for each group, providing items "name" containing the name of the contenttype and again "cnt", which is the count of content documents with the given content type.

Unfortunately it is impossible in HQL to reference items via the syntax "content.items['itemname']" in the select clause. So to use aggregate functions on items you will need to select them directly and filter the item of the name you want via the where clause. Selections on the document itself can then be done via the "parentcontent" property of the item. For example like this:

select AVG(ratingitem.number) as avrating from ContentItem ratingitem

where ratingitem.name='rating' and rating.parentcontent.contentclass='pollresult'


Queries returning virtual result sets cannot be cached in Query cache. Therefor you cannot combine these queries together with <tml:query> attribute cache="true".


Native query options and object reference

The following documents provide complete reference information about the native query options usable with HQL and the objects queryable in HQL:

Table of contents: