Issue Details (XML | Word | Printable)

Key: NUCRDBMS-502
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Andy Jefferson
Reporter: Gerd Behrmann
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
DataNucleus Store RDBMS

Race in statement batching

Created: 21/Feb/11 04:40 PM   Updated: 07/Mar/11 05:37 PM   Resolved: 22/Feb/11 09:00 AM
Component/s: None
Affects Version/s: 2.2.1
Fix Version/s: 2.2.3, 3.0.0.m2

File Attachments: 1. Zip Archive bug.zip (2 kB)
2. Text File sqlcontroller-map.patch (0.9 kB)

Environment:
Ubuntu 8.04, PostgreSQL 8.3.9, Java(TM) SE Runtime Environment (build 1.6.0_22-b04), all running in a KVM virtual machine.

Libs used:

asm-3.3.1.jar datanucleus-cache-2.1.2.jar datanucleus-core-2.2.2.jar datanucleus-enhancer-2.1.3.jar datanucleus-jpa-2.1.5.jar datanucleus-rdbms-2.2.2.jar jdo2-api-2.3-ec.jar postgresql-9.0-801.jdbc4.jar

Datastore: PostgreSQL
Severity: Production


 Description  « Hide
The attached test case demonstrates a race in concurrent delete operations. Each thread uses its own PersistenceManager and thus its own transaction. This case should be safe even when datanucleus.Multithreaded is false, although the problem occurs independent of the value of that flag.

The test fails with the following stack trace:

javax.jdo.JDODataStoreException: Exception thrown flushing changes to datastore
at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:420)
at org.datanucleus.jdo.JDOTransaction.commit(JDOTransaction.java:139)
at org.datanucleus.test.Main.delete(Main.java:78)
at org.datanucleus.test.Main.access$000(Main.java:8)
at org.datanucleus.test.Main$1.run(Main.java:56)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
NestedThrowablesStackTrace:
org.postgresql.util.PSQLException: This statement has been closed.
at org.postgresql.jdbc2.AbstractJdbc2Statement.checkClosed(AbstractJdbc2Statement.java:2508)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2683)
at org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:585)
at org.datanucleus.store.rdbms.SQLController.processStatementsForConnection(SQLController.java:548)
at org.datanucleus.store.rdbms.SQLController$1.managedConnectionFlushed(SQLController.java:638)
at org.datanucleus.store.connection.AbstractManagedConnection.flush(AbstractManagedConnection.java:115)
at org.datanucleus.store.connection.ConnectionManagerImpl$2.transactionFlushed(ConnectionManagerImpl.java:375)
at org.datanucleus.TransactionImpl.flush(TransactionImpl.java:208)
at org.datanucleus.ObjectManagerImpl.flushInternal(ObjectManagerImpl.java:3212)
at org.datanucleus.ObjectManagerImpl.flush(ObjectManagerImpl.java:3119)
at org.datanucleus.ObjectManagerImpl.preCommit(ObjectManagerImpl.java:3260)
at org.datanucleus.ObjectManagerImpl$2.transactionPreCommit(ObjectManagerImpl.java:324)
at org.datanucleus.TransactionImpl.internalPreCommit(TransactionImpl.java:394)
at org.datanucleus.TransactionImpl.commit(TransactionImpl.java:279)
at org.datanucleus.jdo.JDOTransaction.commit(JDOTransaction.java:90)
at org.datanucleus.test.Main.delete(Main.java:78)
at org.datanucleus.test.Main.access$000(Main.java:8)
at org.datanucleus.test.Main$1.run(Main.java:56)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

The problem is very racy and doesn't occur on every run. We have been able to reproduce it in several environments, although one may have to run the test 10 times before seeing the problem even ones (if at all in the given environment).

If the built in DBCP connection pool is used instead, the exception changes slightly, although it is still the same error.

javax.jdo.JDODataStoreException: Exception thrown flushing changes to datastore
at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:420)
at org.datanucleus.jdo.JDOTransaction.commit(JDOTransaction.java:139)
at org.datanucleus.test.Main.delete(Main.java:78)
at org.datanucleus.test.Main.access$000(Main.java:8)
at org.datanucleus.test.Main$1.run(Main.java:56)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
NestedThrowablesStackTrace:
java.sql.SQLException: org.datanucleus.store.rdbms.datasource.dbcp.DelegatingPreparedStatement with address: "DELETE FROM "MYCLASS" WHERE "MYCLASS_ID"='51059'" is closed.
at org.datanucleus.store.rdbms.datasource.dbcp.DelegatingStatement.checkOpen(DelegatingStatement.java:137)
at org.datanucleus.store.rdbms.datasource.dbcp.DelegatingStatement.executeBatch(DelegatingStatement.java:297)
at org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:585)
at org.datanucleus.store.rdbms.SQLController.processStatementsForConnection(SQLController.java:548)
at org.datanucleus.store.rdbms.SQLController$1.managedConnectionFlushed(SQLController.java:638)
at org.datanucleus.store.connection.AbstractManagedConnection.flush(AbstractManagedConnection.java:115)
at org.datanucleus.store.connection.ConnectionManagerImpl$2.transactionFlushed(ConnectionManagerImpl.java:375)
at org.datanucleus.TransactionImpl.flush(TransactionImpl.java:208)
at org.datanucleus.ObjectManagerImpl.flushInternal(ObjectManagerImpl.java:3212)
at org.datanucleus.ObjectManagerImpl.flush(ObjectManagerImpl.java:3119)
at org.datanucleus.ObjectManagerImpl.preCommit(ObjectManagerImpl.java:3260)
at org.datanucleus.ObjectManagerImpl$2.transactionPreCommit(ObjectManagerImpl.java:324)
at org.datanucleus.TransactionImpl.internalPreCommit(TransactionImpl.java:394)
at org.datanucleus.TransactionImpl.commit(TransactionImpl.java:279)
at org.datanucleus.jdo.JDOTransaction.commit(JDOTransaction.java:90)
at org.datanucleus.test.Main.delete(Main.java:78)
at org.datanucleus.test.Main.access$000(Main.java:8)
at org.datanucleus.test.Main$1.run(Main.java:56)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

If the transaction begin/commit lines are uncommented then the problem can no longer be triggered. That is most likely because deletes in that case are no longer batched.


I did search through the DataNucleus code base and may have found the problem, however please keep in mind that I am new to this code base (ie the following analysis may be completely wrong). JDOPersistenceManager, the associated ObjectManagerImpl and the transaction are allocated per thread in the test case. However JDOPersistenceManagerFactory (which is documented to be thread safe) is shared. RDBMSStoreManager and the embedded SQLController are associated with JDOPersistenceManagerFactory and thus shared as well. Hence SQLController ought to be thread safe. The only state in SQLController (beyond configuration stuff that isn't changed) is the map from Connection to ConnectionStatementState. This is a regular HashMap. HashMap is not thread safe and thuss need to be protected. However neither SQLController.removeConnectionStatement, SQLController.getConnectionStatement, nor setConnectionStatementState are synchronized. So unless calls to SQLController are synchronized somewhere outside of this class, I suspect this code not to be thread safe. A corruption of this map could possibly cause the wrong ConnectionStatementState to be returned and thus execute and close the same prepared statement twice. If this is the case, then marking said three methods as synchronized or using a ConcurrentHashMap should fix the problem.

Let me know if you need anything else. If you can confirm that the above description is a likely bug, then I don't mind writing a patch (even though it is trivial). I just need the experts to tell me if I am on the right track.

Sort Order: Ascending order - Click to sort in descending order
Gerd Behrmann added a comment - 21/Feb/11 04:43 PM
Testcase

Andy Jefferson added a comment - 21/Feb/11 05:07 PM
Certainly looks like their ought to be locking added there, so why not just replace the HashMap with ConcurrentHashMap and rebuild/rerun it ?

Gerd Behrmann added a comment - 21/Feb/11 05:09 PM
Yeah, I am trying that now (currently tracking down external dependencies not in maven). Will let you know how that goes.

Gerd Behrmann added a comment - 21/Feb/11 08:02 PM
Proposed patch for the issue.

Gerd Behrmann added a comment - 21/Feb/11 08:04 PM
The issue appears to be gone when replacing the HashMap by a ConcurrentHashMap. I attached a patch.

Chris Colman added a comment - 22/Feb/11 07:25 AM
Thanks Gerd. Looks like you've found and fixed that issue. I wonder if it's worth converting your test case to a unit test and add it to the set of unit tests run regularly. Given that it is an intermittent problem it might need to be wrapped in a loop that causes 10-20 executions of the test in the hope that it would catch this type of timing/sequencing dependent intermittent issue in the future to catch any regression.

Gerd Behrmann added a comment - 22/Feb/11 08:27 AM
In my experience it is tricky to write unit tests for race conditions unless you manage to trigger it reliably (eg by sleeping in strategic places). Otherwise any statistics gathered from unit tests will fluctuate. Running it 20 times may catch it in many cases, but that will also make it a fairly slow test (in my KVM virtual machine it failed on almost every run, but on a colleague's XEN VM we barely caught it once when running 20 times).

Chris, Andy, can I assume you will take of the patch from here or is there something else you want me to do?

Andy Jefferson added a comment - 22/Feb/11 09:00 AM
Gerd, nothing more to do. Thanks for the patch.
SVN now has this patch applied for 2.2 and 3.0.