DataNucleus provides a
generic
query processing engine. It provides for compilation
of
string-based query languages
. Additionally it allows
in-memory evaluation
of these queries. This is very useful when providing support for new datastores which either
don't have a native query language and so the only alternative is for DataNucleus to
evaluate the queries, or where it will take some time to map the compiled query to the
equivalent query in the native language of the datastore.
When a user invokes a query, using the JDO/JPA APIs, they are providing either
-
A single-string query made up of keywords and clauses
-
A query object that has the clauses specified directly
The first step is to convert these two forms into the constituent clauses. It is assumed
that a string-based query is of the form
SELECT {resultClause} FROM {fromClause} WHERE {filterClause}
GROUP BY {groupingClause} HAVING {havingClause}
ORDER BY {orderClause}
The two primary supported query languages have helper classes to provide this migration from
the
single-string query form
into the individual clauses. These can be found in
org.datanucleus.query.JDOQLSingleStringParser
and
org.datanucleus.query.JPQLSingleStringParser
So we have a series of clauses and we want to compile them. So what does this mean?
Well, in simple terms, we are going to convert the individual clauses from above into
expression tree(s) so that they can be evaluated. The end result of a compilation is
a
org.datanucleus.query.compiler.QueryCompilation
So if you think about a typical query you may have
SELECT field1, field2 FROM MyClass
This has 2 result expressions - field1, and field2 (where they are each a "PrimaryExpression"
meaning a representation of a field).
The query compilation of a particular clauses has 2 stages
-
Compilation into a Node tree, with operations between the nodes
-
Compilation of the Node tree into an Expression tree of supported expressions
and compilation is performed by a JavaQueryCompiler, so look at
org.datanucleus.query.compiler.JDOQLCompiler
and
org.datanucleus.query.compiler.JPQLCompiler
These each have a Parser that performs the extraction of the different components of the
clauses and generation of the Node tree. Once a Node tree is generated it can then be converted
into the compiled Expression tree; this is handled inside the JavaQueryCompiler.
The other part of a query compilation is the
org.datanucleus.query.symbol.SymbolTable
which is a lookup table (map) of identifiers and their value. So, for example, an input
parameter will have a name, so has an entry in the table, and its value is stored there.
This is then used during evaluation.
Intuitively it is more efficient to evaluate a query within the datastore since it means
that fewer actual result objects need instantiating in order to determine the result objects.
To evaluate a compiled query in the datastore there needs to be a compiler for taking the
generic expression compilation and converting it into a native query. Additionally it should
be noted that you aren't forced to evaluate the whole of the query in the datastore, maybe
just the filter clause. This would be done where the datastore native language maybe only
provides a limited amount of query capabilities. For example with db4o we evaluate the
filter
and
ordering
in the datastore, using their SODA query language.
The remaining clauses can be evaluated on the resultant objects
in-memory
(see below).
Obviously for a datastore like RDBMS it should be possible to evaluate the whole query
in-datastore.
Evaluation of queries in-memory assumes that we have a series of "candidate" objects.
These are either user-input to the query itself, or retrieved from the datastore. We then
use the in-memory evaluator
org.datanucleus.query.evaluator.memory.InMemoryExpressionEvaluator
.
This takes in each candidate object one-by-one and evaluates whichever of the query clauses
are desired to be evaluated. For example we could just evaluate the filter clause.
Evaluation makes use of the values of the fields of the candidate objects (and related objects)
and uses the SymbolTable for values of parameters etc. Where a candidate fails a particular
clause in the filter then it is excluded from the results.