RDBMS : Queries

Using an RDBMS datastore DataNucleus allows you to query the objects in the datastore using the following

  • JDOQL - language based around the objects that are persisted and using Java-type syntax
  • SQL - language found on alomst all RDBMS.
  • JPQL - language defined in the JPA1 specification for JPA persistence which closely mirrors SQL.

When using queries with RDBMS there are some specific situations where it can be useful to benefit from special treatment. These are listed here.

Result Set : Type

java.sql.ResultSet defines three possible result set types.

  • forward-only : the result set is navegable forwards only
  • scroll-sensitive : the result set is scrollable in both directions and is sensitive to changes in the datastore
  • scroll-insensitive : the result set is scrollable in both directions and is insensitive to changes in the datastore

DataNucleus allows specification of this type as a query extension datanucleus.rdbms.query.resultSetType .

To do this on a per query basis for JDO you would do

query.addExtension("datanucleus.rdbms.query.resultSetType", "scroll-insensitive");

To do this on a per query basis for JPA you would do

query.setHint("datanucleus.rdbms.query.resultSetType", "scroll-insensitive");

The default is forward-only . The benefit of the other two is that the result set will be scrollable and hence objects will only be read in to memory when accessed. So if you have a large result set you should set this to one of the scrollable values.



Result Set : Caching of Results

When using a "scrollable" result set (see above for datanucleus.rdbms.query.resultSetType ) by default the query result will cache the rows that have been read. You can control this caching to optimise it for your memory requirements. You can set the query extension datanucleus.query.resultCacheType and it has the following possible values

  • weak : use a weak hashmap for caching (default)
  • soft : use a soft reference map for caching
  • hard : use a HashMap for caching (objects not garbage collected)
  • none : no caching (hence uses least memory)

To set this on a per query basis for JDO you would do

query.addExtension("datanucleus.query.resultCacheType", "weak");

To do this on a per query basis for JPA you would do

query.setHint("datanucleus.query.resultCacheType", "weak");


Large Result Sets : Size

If you have a large result set you clearly don't want to instantiate all objects since this would hit the memory footprint of your application. To get the number of results many JDBC drivers will load all rows of the result set. This is to be avoided so DataNucleus provides control over the mechanism for getting the size of results. The persistence property datanucleus.query.resultSizeMethod has a default of last (which means navigate to the last object - hence hitting the JDBC driver problem). If you set this to count then it will use a simple "count()" query to get the size.

To do this on a per query basis for JDO you would do

query.addExtension("datanucleus.query.resultSizeMethod", "count");

To do this on a per query basis for JPA you would do

query.setHint("datanucleus.query.resultSizeMethod", "count");


Large Result Sets : Loading Results at Commit()

When a transaction is committed by default all remaining results for a query are loaded so that the query is usable thereafter. With a large result set you clearly don't want this to happen. So in this case you should set the extension datanucleus.query.loadResultsAtCommit to false.

To do this on a per query basis for JDO you would do

query.addExtension("datanucleus.query.loadResultsAtCommit", "false");

To do this on a per query basis for JPA you would do

query.setHint("datanucleus.query.loadResultsAtCommit", "false");


Result Set : Control

DataNucleus provides a useful extension allowing control over the ResultSet's that are created by queries. You have at your convenience some properties that give you the power to control whether the result set is read only, whether it can be read forward only, the direction of fetching etc.

To do this on a per query basis for JDO you would do

query.addExtension("datanucleus.rdbms.query.fetchDirection", "forward");
query.addExtension("datanucleus.rdbms.query.resultSetConcurrency", "read-only");

To do this on a per query basis for JPA you would do

query.setHint("datanucleus.rdbms.query.fetchDirection", "forward");
query.setHint("datanucleus.rdbms.query.resultSetConcurrency", "read-only");

Alternatively you can specify these as persistence properties so that they apply to all queries for that PMF/EMF. Again, the properties are

  • datanucleus.rdbms.query.fetchDirection - controls the direction that the ResultSet is navigated. By default this is forwards only. Use this property to change that.
  • datanucleus.rdbms.query.resultSetConcurrency - controls whether the ResultSet is read only or updateable.

Bear in mind that not all RDBMS support all of the possible values for these options. That said, they do add a degree of control that is often useful.



JDOQL : SQL Generation

When using the method contains on a collection (or containsKey , containsValue on a map) this will either add an EXISTS subquery (if there is a NOT or OR present in the query) or will add an INNER JOIN across to the element table. Let's take an example

SELECT FROM org.datanucleus.samples.A
WHERE (elements.contains(b1) && b1.name == 'Jones')
VARIABLES org.datanucleus.samples.B b1

Note that we add the contains first that binds the variable "b1" to the element table, and then add the condition on the variable. The order is important here. If we instead had put the condition on the variable first we would have had to do a CROSS JOIN to the variable table and then try to repair the situation and change it to INNER JOIN if possible. In this case the generated SQL will be like

SELECT `A0`.`ID`
FROM `A` `A0`
INNER JOIN `B` `B0` ON `A0`.ID = `B`.ELEMENT
WHERE `B0`.NAME = 'Jones'


JDOQL : Use of variables and joining

In all situations we aim for DataNucleus JDOQL implementation to work out the right way of linking a variable into the query, whether this is via a join (INNER, LEFT OUTER), or via a subquery. As you can imagine this can be complicated to work out the optimum for all situations so with that in mine we allow (for a limited number of situations) the option of specifying the join type. This is achieved by setting the query extension datanucleus.query.jdoql.{varName}.join to the required type. For 1-1 relations this would be either "INNERJOIN" or "LEFTOUTERJOIN", and for 1-N relations this would be either "INNERJOIN" or "SUBQUERY".

Please, if you find a situation where the optimum join type is not chosen then report it in JIRA for project "NUCRDBMS" as priority "Minor" so it can be registered for future work



JPQL : SQL Generation

With a JPQL query running on an RDBMS the query is compiled into SQL. Here we give a few examples of what SQL is generated. You can of course try this for yourself observing the content of the DataNucleus log.

In JPQL you specify a candidate class and its alias (identifier). In addition you can specify joins with their respective alias. The DataNucleus implementation of JPQL will preserve these aliases in the generated SQL.

JPQL:
SELECT Object(P) FROM mydomain.Person P INNER JOIN P.bestFriend AS B

SQL:
SELECT P.ID
FROM PERSON P INNER JOIN PERSON B ON B.ID = P.BESTFRIEND_ID

With the JPQL MEMBER OF syntax this is typically converted into an EXISTS query.

JPQL:
SELECT DISTINCT Object(p) FROM mydomain.Person p WHERE :param MEMBER OF p.friends

SQL:
SELECT DISTINCT P.ID FROM PERSON P
WHERE EXISTS (
    SELECT 1 FROM PERSON_FRIENDS P_FRIENDS, PERSON P_FRIENDS_1 
    WHERE P_FRIENDS.PERSON_ID = P.ID
    AND P_FRIENDS_1.GLOBAL_ID = P_FRIENDS.FRIEND_ID 
    AND 101 = P_FRIENDS_1.ID)