Issue Details (XML | Word | Printable)

Type: Improvement Improvement
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: Chris Rued
Votes: 0
Watchers: 0

If you were logged in you would be able to see more operations.
DataNucleus Store MongoDB

Support for SUM, MIN, MAX, AVG in MongoDB natively (via mapReduce, OR via AggregationFramework)

Created: 26/Mar/12 05:45 PM   Updated: 31/Aug/12 10:37 AM
Component/s: Query
Affects Version/s: None
Fix Version/s: None

File Attachments: 1. Text File NUCMONGODB-68-all-aggregates.patch (21 kB)
2. Text File vcs-diff7927918977296091893.patch (37 kB)

Datastore: MongoDB

Sort Order: Ascending order - Click to sort in descending order
Chris Rued added a comment - 26/Mar/12 05:52 PM - edited
NUCMONGODB-68-all-aggregates.patch contains some changes to support a wider selection of Aggregates.

The return types are a little tricky, and I don't think the implementation is completely correct yet.

According to a JPA Spec Reference I'm reading [1]:

   * COUNT returns Long.
   * MAX, MIN return the type of the state-field to which they are applied.
   * AVG returns Double.
   * SUM returns Long when applied to state-fields of integral types (other than BigInteger); Double when
     applied to state-fields of floating point types; BigInteger when applied to state-fields of type
     BigInteger; and BigDecimal when applied to state-fields of type BigDecimal. If SUM , AVG, MAX, or MIN
     is used, and there are no values to which the aggregate function can be applied, the result of the
     aggregate function is NULL. If COUNT is used, and there are no values to which COUNT can be applied,
     the result of the aggregate function is 0.

So COUNT and AVG are simple. SUM, MAX and MIN require knowledge of the type of the expression used as an argument. This might be available during compileResult, but I haven't yet looked into it (limited time).

This patch currently tries to get the results for these first an an Long, then as a Double, then as a String regardless of the expression type. This could cause problems where a numeric value is stored in property persisted as a String...

Any advice on this?


Andy Jefferson added a comment - 26/Mar/12 07:20 PM
JPA spec and JDO spec details are in the DN docs
but are consistent, so no need to do anything different with one or the other, so as you stated.

The result expression "field" will, of course, be linkable to the metadata for the field (AbstractMemberMetaData), and hence its type. From the type it ought to be possible to convert to the required type, see also org.datanucleus.util.TypeConversionHelper.convertTo() method which may help in some simple cases.

Andy Jefferson added a comment - 04/Apr/12 11:04 AM
Is this considered a final patch (i.e reasonable testing has been done etc), or will you be submitting an update to cater for the type conversion comments? Just that I'm releasing 3.1.0-m2 very soon

Chris Rued added a comment - 10/May/12 10:38 PM
Hi Andy,

Sorry for the extended silence. I've been swamped with other work and I get too much email so JIRA email was buried...

I wouldn't call the current patch final. I will try to get a "final" version out soon...

On a related subject, I recently noticed that MongoDB will soon release an "Aggregation Framework" which I suspect is likely to perform better than a map-reduce implementation:

The problem is it's only supported after version 2.1 (which is not yet a "Production Release").

I suppose the right thing to do is to have a different implementation that is called depending on the version of MongoDB in use.

Chris Rued added a comment - 25/May/12 06:08 AM
I've been poking around the internals for a while, trying to find a simple way of loading an Entity from the session (or into the session)...So far, I'm missing it ...

The bit I'm trying to get working now is returning Entity references that are part of the GROUP BY portion of the query. Here's an example:

    SELECT count(*), o.related FROM Entity1 o GROUP BY o.related

I have access to the ExecutionContext and the object's id...Any advice?


Andy Jefferson added a comment - 25/May/12 09:08 AM
If your query returns the "id" of the related object then you can call ExecutionContext.findObject(id, ...);

There are a couple of possible methods in ExecutionContext, one which just has the "id" and flags for whether it is the right inheritance level or not etc, and one which also allows applying field values for the retrieved object (this is for the case like with RDBMS where the SQL returns a ResultSet and so has the values of the field(s) that are stored in the DB for the related object).

Chris Rued added a comment - 29/May/12 09:22 PM
Thanks for the pointers. I had found those methods, but it didn't quite know how to call them. It looks like what I needed to do was to call ApiAdapter.getNewApplicationIdentityObjectId to get an ID that could be used in the call to findObject.

I've got the functionality I need completed. It was a bit more involved than I had initially imagined...hopefully not very much more complicated than is necessary.

Patch to follow for review.

Chris Rued added a comment - 29/May/12 09:49 PM
Attached patch for converting arguments to aggregates to JS expressions, applying specificied (single) aggregate. Also supports GROUP BY arguments and returning the GROUP BY values, looking up entities when appropriate.

Andy Jefferson added a comment - 30/May/12 03:31 PM
Thx for the patch. Before looking further you can't just do "getNewApplicationIdentityObjectId" since what if the class is using datastore identity, or nondurable identity?
Consider IdentityUtils.getObjectFromIdString if you have the String form of the id. (see FetchFieldManager for an example usage).

Andy Jefferson added a comment - 06/Jul/12 09:57 AM
In addition to the previous comment, various new files in that patch need comments explaining *what* they are there for, and also need javadocs/comments for how the process works