DataNucleus, by default, provides certain functionality. In particular circumstances some of this
functionality may not be appropriate and it may be desirable to turn on or off particular features
to gain more performance for the application in question. This section contains a few common tips
You should perform enhancement
before
runtime. That is, do not use
java agent
since it will enhance classes at runtime, when you want responsiveness from your application.
DataNucleus provides 4 PersistenceManagerFactory properties
datanucleus.autoCreateSchema
,
datanucleus.autoCreateTables
,
datanucleus.autoCreateColumns
, and
datanucleus.autoCreateConstraints
that allow creation of the datastore tables. This can
cause performance issues at startup. We recommend setting these to
false
at runtime, and
instead using SchemaTool to
generate any required database
schema before running DataNucleus
.
DataNucleus provides 3 PersistenceManagerFactory properties
datanucleus.validateTables
,
datanucleus.validateConstraints
,
datanucleus.validateColumns
that enforce strict
validation of the datastore tables against the Meta-Data defined tables. This can cause performance
issues at startup. In general this should be run only at schema generation, and should be turned off
for production usage. Set all of these properties to
false
. In addition there is a PMF property
datanucleus.rdbms.CheckExistTablesOrViews
which checks whether the tables/views that the
classes map onto are present in the datastore. This should be set to
false
if you require
fast start-up. Finally, the property
datanucleus.rdbms.initializeColumnInfo
determines whether
the default values for columns are loaded from the database. This property should be set to
NONE
to avoid loading database metadata.
To sum up, the optimal settings with schema creation and validation disabled are:
#schema creation
datanucleus.autoCreateSchema=false
datanucleus.autoCreateTables=false
datanucleus.autoCreateColumns=false
datanucleus.autoCreateConstraints=false
#schema validation
datanucleus.validateTables=false
datanucleus.validateConstraints=false
datanucleus.validateColumns=false
datanucleus.rdbms.CheckExistTablesOrViews=false
datanucleus.rdbms.initializeColumnInfo=None
Creation of
PersistenceManagerFactory
and
EntityManagerFactory
objects can be expensive and should be kept
to a minimum. Depending on the structure of your application, use a single persistence factory
per datastore wherever possible. Clearly if your application spans multiple servers then this
may be impractical, but should be borne in mind.
You can improve startup speed by setting the property
datanucleus.autoStartMechanism
to
None
. This means that it won't try to load up the classes (or better said the metadata of
the classes) handled the previous time that this schema was used. If this isn't an issue for
your application then you can make this change. Please refer to the
Auto-Start Mechanism for full details.
Some RDBMS (such as Oracle) have trouble returning information across multiple catalogs/schemas
and so, when DataNucleus starts up and tries to obtain information about the existing tables, it
can take some time. This is easily remedied by specifying the catalog/schema name to be used -
either for the PMF as a whole (using the persistence properties
javax.jdo.mapping.Catalog
,
javax.jdo.mapping.Schema
) or for the package/class using attributes in the MetaData.
This subsequently reduces the amount of information that the RDBMS needs to search through and
so can give significant speed ups when you have many catalogs/schemas being managed by the RDBMS.
Clearly the structure of your application will have a major influence on how you utilise a
PersistenceManager or EntityManager.
A pattern that gives a clean definition of process is to use a different persistence manager for
each request to the data access layer. This reduces the risk of conflicts where one thread performs
an operation and this impacts on the successful completion of an operation being performed by
another thread. Creation of PM/EM's is not an expensive process and use of multiple threads
writing to the same persistence manager should be avoided.
Where you have an inheritance tree it is best to add a
discriminator
to the base
class so that it's simple for DataNucleus to determine the class name for a particular row.
This results in cleaner/simpler SQL which is faster to execute. Otherwise it would be
necessary to do a UNION of all possible tables.
DataNucleus, by default, will allocate connections when they are required. It then will close
the connection. In addition, when it needs to perform something via JDBC (RDBMS datastores) it
will allocate a PreparedStatement, and then discard the statement after use. This can be
inefficient relative to a database connection and statement pooling facility such as Apache DBCP.
With Apache DBCP a Connection is allocated when required and then when it is closed the
Connection isn't actually closed but just saved in a pool for the next request that comes in for
a Connection. This saves the time taken to establish a Connection and hence can give performance
speed ups the order of maybe 30% or more. You can read about how to enable connection pooling
with DataNucleus in the
Connection Pooling Guide.
When retrieving objects using their identity, and when the object is cached, DataNucleus by default
will validate the existence of the object before handing it out. You can skip this check
by setting the persistence property
datanucleus.findObject.validateWhenCached
to
false
DataNucleus verifies if newly persisted objects are memory reachable on commit, if they are not,
they are removed from the database. This process mirrors the garbage collection, where objects
not referenced are garbage collected or removed from memory. Reachability is expensive because
it traverses the whole object tree and may require reloading data from database. If reachability
is not needed by your application, you should disable it. To disable reachability set to false
the persistence property
datanucleus.persistenceByReachabilityAtCommit
.
DataNucleus will, by default, perform a check on any bidirectional relations to make sure
that they are set at both sides at commit. If they aren't set at both sides then they will be
made consistent. This check process can involve the (re-)loading of some instances. You can
skip this step if you always set
both sides of a relation
by setting the persistence
property
datanucleus.manageRelationships
to
false
.
DataNucleus provides a series of value generators for generation of identity values.
These can have an impact on the performance depending on the choice of generator, and also on the
configuration of the generator.
-
The
max
strategy should not really be used for production since it makes a separate DB
call for each insertion of an object. Something like the
increment
strategy should be
used instead. Better still would be to choose
native
and let DataNucleus decide for you.
-
The
sequence
strategy allows configuration of the datastore sequence. The default can
be non-optimum. As a guide, you can try setting
key-cache-size
to 10 and
key-increment-by
to 10.
The
native
identity generator value is the recommended choice since this will allow DataNucleus to
decide which identity generator is best for the RDBMS in use.
DataNucleus has 2 ways of handling calls to SCO Collections/Maps. The original method was to
pass all calls through to the datastore. The second method (which is now the default) is to
cache the collection/map elements/keys/values. This second method will read the
elements/keys/values once only and thereafter use the internally cached values. This second
method gives significant performance gains relative to the original method. You can configure
the handling of collections/maps as follows :-
-
Globally for the PMF/EMF
- this is controlled by setting the persistence property
datanucleus.cache.collections
. Set it to
true
for caching the collections
(default), and
false
to pass through to the datastore.
-
For the specific Collection/Map
- this overrides the global setting and is controlled
by adding a MetaData
<collection>
or
<map>
extension
cache
.
Set it to
true
to cache the collection data, and
false
to pass through to the
datastore.
The second method also allows a finer degree of control. This allows the use of lazy loading
of data, hence elements will only be loaded if they are needed. You can configure this as follows :-
-
Globally for the PersistenceManagerFactory
- this is controlled by setting the PMF property
datanucleus.cache.collections.lazy
. Set it to true to use lazy loading, and set it to false
to load the elements when the collection/map is initialised.
-
For the specific Collection/Map
- this overrides the global PMF setting and is controlled
by adding a MetaData
<collection>
or
<map>
extension
cache-lazy-loading
. Set it to
true
to use lazy loading, and
false
to load
once at initialisation.
NontransactionalRead
has advantages and disadvantages in performance and data freshness in cache.
In
NontransactionalRead=true
mode, the PersistenceManager is able to read objects outside a
transaction. The objects read are held cached by the PersistenceManager. The second time a user
application requests the same objects from the PersistenceManager they are retrieved from cache.
The time spent reading the object from cache is minimum, but the objects may become stale and not
represent the database status. If fresh values need to be loaded from the database, then the user
application should first call refresh on the object.
Another disadvantage of
NontransactionalRead=true
mode is due to each operation realized opens
a new database connection, but it can be minimized with the use of connection pools.
Reading objects outside a transaction and
PersistenceManager
is a trivial task, but performed
in a certain manner can determine the application performance. The objective here is not give you an
absolute response on the subject, but point out the benefits and drawbacks for the many possible solutions.
-
Use
makeTransient
method.
Object pc = null;
try
{
PersistenceManager pm = pmf.getPersistenceManager();
pm.currentTransaction().begin();
//retrieve in some way the object, query, getObjectById, etc
pc = pm.getObjectById(id);
pm.makeTransient(pc);
pm.currentTransaction().commit();
}
finally
{
pm.close();
}
//read the persistent object here
System.out.prinln(pc.getName());
-
Use
RetainValues=true
.
Object pc = null;
try
{
PersistenceManager pm = pmf.getPersistenceManager();
pm.currentTransaction().setRetainValues(true);
pm.currentTransaction().begin();
//retrieve in some way the object, query, getObjectById, etc
pc = pm.getObjectById(id);
pm.currentTransaction().commit();
}
finally
{
pm.close();
}
//read the persistent object here
System.out.prinln(pc.getName());
-
Use
detachCopy
method.
Object copy = null;
try
{
PersistenceManager pm = pmf.getPersistenceManager();
pm.currentTransaction().begin();
//retrieve in some way the object, query, getObjectById, etc
Object pc = pm.getObjectById(id);
copy = pm.detachCopy(pc);
pm.currentTransaction().commit();
}
finally
{
pm.close();
}
//read or change the detached object here
System.out.prinln(copy.getName());
-
Use
detachAllOnCommit
.
Object pc = null;
try
{
PersistenceManager pm = pmf.getPersistenceManager();
pm.setDetachAllOnCommit(true);
pm.currentTransaction().begin();
//retrieve in some way the object, query, getObjectById, etc
pc = pm.getObjectById(id);
pm.currentTransaction().commit(); // Object "pc" is now detached
}
finally
{
pm.close();
}
//read or change the detached object here
System.out.prinln(pc.getName());
The most expensive in terms of performance is the
detachCopy
because it makes copies of
persistent objects. The advantage of detachment (via
detachCopy
or
detachAllOnCommit
)
is that changes made outside the transaction can be futher used to update the database in a
new transaction. The other methods also allow changes outside of the transaction, but the
changed instances can't be used to update the database.
In
RetainValues=true
and
makeTransient
no object copies are made and the object values
are set down in instances when the PersistenceManager disassociates them. Both methods are equivalent
in performance, however the
makeTransient
method will set the values of the object during the
instant the
makeTransient
method is invoked, and the
RetainValues=true
will set
values of the object during commit.
The bottom line is to not use detachment if instances will only be used to read values.
If you are retrieving an object by its identity and know that it will be present in the
Level2 cache, for example, you can set the persistence property
datanucleus.findObject.validateWhenCached
to
false
and this will skip
a separate call to the datastore to validate that the object exists in the datastore.
I/O consumes a huge slice of the total processing time. Therefore it is recommended to reduce or
disable logging in production. To disable the logging set the DataNucleus category to OFF in the Log4j
configuration. See Logging for more information.
log4j.category.DataNucleus=OFF