Issue Details (XML | Word | Printable)

Key: NUCMONGODB-145
Type: Bug Bug
Status: Closed Closed
Resolution: Won't Fix
Priority: Major Major
Assignee: Unassigned
Reporter: Petteri Manninen
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
DataNucleus Store MongoDB

Cannot insert a large byte array to MongoDB via DataNucleus/JDO

Created: 24/Apr/14 12:03 PM   Updated: 14/Nov/14 02:58 PM   Resolved: 07/Nov/14 09:22 AM
Component/s: Persistence
Affects Version/s: 3.1.0.release, 3.2.8, 4.0.0.m2
Fix Version/s: None

File Attachments: 1. Zip Archive test-jdo-byte-array.zip (16 kB)

Environment:
- DataNucleus Access Platform 3.3.8
- MongoDB win32-i386-2.6.0 (also tested with 2.4.5 on CentOS 6.4)
- Mongo Java driver 2.12.0

Datastore: MongoDB
Severity: Development


 Description  « Hide
Inserting a persistent object containing a large byte array to MongoDB fails due to excessive document size. Even a simple object with an array of two megabytes causes Mongo driver to reject the insertion as it thinks the database object made by DataNucleus exceeds the BSON size limit:

>>> com.mongodb.MongoInternalException: DBObject of size 26151951 is over Max BSON size 16777216...

Just wondering if it would be correct to assume that the memory consumption of a database object should never be overly bigger than a byte stream representation of it? Considering however, that there's a reasonable max size limit in BSON, now the consumption just seems to way too high as we could easily run out of memory before we are even close to having 16 MBs of data.

This problem seems to have been first emerged in the DataNucleus Access Platform 3.1.0.release as the earlier milestone versions did not suffer from this issue. It looks like some changes might have been introduced into the array persistence scheme with that particular release. The earlier versions used to store the bytes in case of MongoDB with the BSON BinData type whereas the later versions seem to have resorted to storing the arrays with a plain value sequence. Is that correct?

Changing the Mongo driver version does not have any effect on the issue nor does this seem to be related to RDBMS persistence since persisting objects with a byte array of size up to 20 MBs caused no problems.

Our test configuration was as follows:

- DataNucleus Access Platform 3.1.0_m5, 3.3.8 (and 4.0.0_m2)
- MongoDB win32-i386-2.6.0 (and 2.4.5 on CentOS 6.4)
- Mongo Java driver 2.12.0 (and 2.9.3 where applicable)
- (H2 database 1.4.177)


Sort Order: Ascending order - Click to sort in descending order
Petteri Manninen added a comment - 06/Nov/14 01:02 PM
Found out by a colleague that adding the @Serialized annotation to the definition of the byte array field seems to be resolving the issue with storing larger arrays to Mongo. Obviously the size is still constrained by the Max BSON size but that should be bearable. Tried this with DN 4.0.3 and Mongo driver 2.12.4.

Andy Jefferson added a comment - 06/Nov/14 04:24 PM
FWIW when you have serialised=true it will do
dbObject.put(colName, javaObjectSerialisedByteArray);
where "javaObjectSerialisedArray" is generated by ObjectOutputStream and toByteArray.

and if not serialised then
dbObject.put(colName, fieldArray);

I don't see that there is anything that needs changing here, since you have a reasonable way of persisting that type (i.e one way is serialised and one way is not, hence catering for both options). If you disagree then please state what you'd expect and why, otherwise I close this

Petteri Manninen added a comment - 07/Nov/14 09:16 AM
Based on my experience and proven by the attached test case it seems that to serialize is clearly the only option available for byte arrays anything larger than a few megs. If we don't serialize (DataObject.byteArray) the test would fail as follows:

09:35:57,998 (main) ERROR [DataNucleus.Persistence] - Exception inserting object
 StateManager[pc=mydomain.model.DataObject@7627291d, lifecycle=P_NEW]
com.mongodb.MongoInternalException: DBObject of size 26151951 is over Max BSON size 16777216
        at com.mongodb.OutMessage.putObject(OutMessage.java:291)
        at com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:239)
        at com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:204)
        at com.mongodb.DBCollection.insert(DBCollection.java:148)
        at com.mongodb.DBCollection.insert(DBCollection.java:91)
        at org.datanucleus.store.mongodb.MongoDBPersistenceHandler.insertObject(MongoDBPersistenceHandler.java:223)
        ...

Frankly we don't need both options to work but if you mean to say that any of them should, then I'd suggest rechecking my test case once more.

Andy Jefferson added a comment - 07/Nov/14 09:22 AM
They both store slightly different things (as you've seen), so I haven't said anything about "both should work" in your case.
We need both options since some people may have been saving fields using the non-serialised option, and removing that option would shaft their usage. Consequently I'll leave the code as-is and both are available.