Issue Details (XML | Word | Printable)

Key: NUCRDBMS-594
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Andy Jefferson
Reporter: Martin Brauner
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
DataNucleus Store RDBMS

Optimizing the fetch of collection fields (1:N relations) could significantly reduce number of sql statements

Created: 04/Feb/06 09:12 AM   Updated: 28/Nov/13 03:29 PM   Resolved: 27/Nov/13 10:06 AM
Component/s: Queries
Affects Version/s: None
Fix Version/s: 3.2.9


 Description  « Hide
When executing a query, the current JPOX implementation does not immediately fetch all objects B that are part of a collection field of the candidate class A of that query (so there is a A :1 --- :N B relationship), even though the user would expect this according to the given fetch plan (fetch groups and/or fetch depths). Instead, the B objects are fetched when the collection field is accessed during iteration of the A result objects. This means n+1 SQL statements which could be improved. See also thread "How to fetch objects including a 1:N relationship in 1 SQL join statement?" in the forum.


May Q be the generated SQL statement of a JDOQL query on a candidate class A and B the type of a collection field of A.

Case 1: direct join in 1 statement
Q' = select A.*, B.* from A, B where all B.fk (+) = A.pk and (WHERE clause of Q)
pros:
a) in lot of cases the most performant solution
cons:
a) would mean more than 1 candidate classes (here: A and B) to be fetched from the sql result set which probably means a lot of changes in the current implementation.
b) if not only 1 collection field of A should be fetched according to the fetch plan, but more, this could lead to a complex join over many tables with a lot of repetitive data being sent from the database in large rows and long query execution times. So this approach, good for 1 relation, can backfire with more relations.
c) a RANGE specified in Q cannot be handled in the database any longer in Q'
d) aggregations in the JDOQL query that result in a GROUP BY clause in Q will cause trouble for Q'
e) probably there can be situations in Q where also ORDER BY or DISTINCT clauses will be a problem

Case 2: separate direct joins
Q' = select B.* from A, B where all B.fk (+) = A.pk and (WHERE clause of Q)
Q' is executed after Q for each collection field of A according to fetch plan.
pros:
a) will eliminate cons a) + b) of case 1
cons:
a) cons c) to e) of case 1 remain
b) means 1+k (k = number of :N relations to navigate according fatch plan), but still far better than n+1

Case 3: separate subselect joins
Q' = select 'this'.* from B 'this', (Q) 'that' where all 'this'.fk fields = 'that'.pk fields
Q' is executed after Q for each collection field of A according to fetch plan.
pros:
a) would eliminate all cons of case 1
b) relativly easy to implement as SQL string of Q can directly be embedded into Q' (works even when recursively fetching collection fields of B)
cons:
a) can be inefficient in relation to a direct join. Don't know how good todays RDBMS query optimizers will handle these subselects. But of course
should still be better than running n statements against the database.

So, personally, I would say Case 3 could be a solution. Would be nice of you can improve JPOX with something like this.

Martin



Sort Order: Ascending order - Click to sort in descending order
Erik Bengtson added a comment - 04/Feb/06 05:55 PM
There are 2 issues that users of these solutions must be aware:

1- the fetched objects may already be in the cache. The cache is one the main mechanisms that improves the ORM engines performance.
2-may add more I/O for the above reason, and also for returning redundant data in the result set

Andy Jefferson added a comment - 26/Nov/13 12:27 PM
The "simplest" implementation of this would be to have a "bulk-fetch" mode for a SCO collection/map, as in case 3 above. In particular the SQL (for collections) would become
1. FK case
SELECT ELEM.COL1, ELEM.COL2, ..., ELEM.OWNER_ID
FROM ELEM
WHERE EXISTS (SELECT OWNER_ID FROM OWNER WHERE (query where clause) AND OWNER.ID = ELEM.OWNER_ID)
2. JoinTable case
SELECT ELEM.COL1, ELEM.COL2, ..., JOIN.OWNER_ID
FROM ELEM, JOIN
WHERE JOIN.ELEM_ID = ELEM.ID AND EXISTS (SELECT OWNER_ID FROM OWNER WHERE (query where clause) AND OWNER.ID = JOIN.OWNER_ID)


Note that Case 2 would involve much more work, needing to adapt the query generation process to update an existing SQLStatement (for the element) applying all joins to it, so would not be worth the effort.

Note also that there is a Case 4 where instead of EXISTS you could use "OWNER_ID IN (.., .., ..)" but that would only work with single column owner PKs and would be limited by the size of the SQL (and number of IN terms).

Andy Jefferson added a comment - 26/Nov/13 08:48 PM
SVN trunk has case 3 support for JDOQL for fields of type Set (join table, or foreign key) and had basic testing. Still need to test for embedded join table case, add support for lists and maps, and then copy across to JPQL.

Andy Jefferson added a comment - 27/Nov/13 10:06 AM
SVN trunk has this for collection fields now, for JDOQL and JPQL