DataNucleus Core

datanucleus-core is the base persistence implementation of DataNucleus. All other DataNucleus projects build on top of this, and so it is the pre-requisite for any DataNucleus-enabled application.

Source Code

trunk can be checked out as follows

svn co https://svn.code.sf.net/p/datanucleus/code/platform/core/trunk core

Download

datanucleus-core is downloadable as following


Dependencies

datanucleus-core is dependent on the following packages of software. Click on the name to go to the home page for that software to download it.

Package Version Description Required?
JDO 3.0+ Apache JDO API. Yes
log4j 1.2.x+ Apache Log4J Logging framework. No. Use it or JDK1.4 logging


Core : Persistence Process

datanucleus-core provides the framework for the persistence process. Whether you are using optimistic or pessimistic transactions governs which persistence process is followed.

Optimistic

With the optimistic route, when a user tries to persist an object, the object will not reach the datastore immediately. ObjectManager will call StateManager.makePersistent and this will run reachability on the object to make this object and any reachable persistable objects provisionally persistent. This means that their lifecycle state is PERSISTENT_NEW, but they haven't been flushed to the datastore.

When a user tries to delete an object, the object will not be removed from the datastore immediately. ObjectManager will call StateManager.deletePersistent and this will run reachability on the object to make this object and any reachable persistable objects provisionally deleted. This means that their lifecycle state is PERSISTENT_DELETED, but they haven't been flushed to the datastore.

When the user calls flush or tx.commit then the list of objects with outstanding changes will be processed one by one. Any outstanding persists will effect a call to StorePersistenceHandler.insertObject for each object. Similarly any oustanding deletes will effect a call to StorePersistenceHandler.deleteObject for each object. The same goes for any updates. Obviously within each datastore persistence handler they can detect that the related object(s) are provisionally persistent or provisionally deleted and so not to cascade to them.


Pessimistic

With the pessimistic route, when a user tries to persist an object, the object will reach the datastore immediately. ObjectManager will call StateManager.makePersistent and this will change the lifecycle state to PERSISTENT_NEW, and will relay the call to StorePersistenceHandler.insertObject. It is the responsibility of this method to handle cascading to related objects.

When a user tries to delete an object, the object will be removed from the datastore immediately. ObjectManager will call StateManager.deletePersistent and this will change the lifecycle state to PERSISTENT_DELETED, and relay the call to StorePersistenceHandler.deleteObject. It is the responsibility of this method to handle cascading to related objects.

When the user calls flush or tx.commit then the list of objects with outstanding changes will be processed one by one. In general for the pessimistic route there will be little to do here.


Logging

DataNucleus provides flexibility with logging. You can choose whether to use the popular Log4J or JDK1.4 logging for example. Moreover, DataNucleus allows you to log messages to various categories, allowing users to filter the logged messages by these categories.

org.datanucleus.util.NucleusLogger

The NucleusLogger class provides the central registry of logging categories used by DataNucleus. It provides an accessor for retrieving a particular logging category -- used as follows

NucleusLogger.METADATA.info("my log message");

There are the various categories defined in the DataNucleusLogger class. Only add new ones after discussion with other developers. NucleusLogger will decide if Log4J, or JDK14 logging or other should be used - you don't need to do anything in your code.

Logging messages

NucleusLogger allows you to log messages at various severity levels. These are DEBUG, INFO, WARN, ERROR, FATAL. Each message is logged at a particular level to a category (as described above).

To log a message is very simple. See below for a few examples

    NucleusLogger.DATASTORE_SCHEMA.info("my log message");
    NucleusLogger.DATASTORE_SCHEMA.error("my log message");
    if (NucleusLogger.DATASTORE_SCHEMA.isDebugEnabled())
    {
        NucleusLogger.DATASTORE_SCHEMA.debug("my log message at debug level");
    }

Please refer to Log4J Manual for details of what you can do with a Log4J Logger To see how you can use the logging from a users perspective, refer to the User Logging Guide for DataNucleus AccessPlatform.

Using other logging mechanisms

DataNucleus provides an interface for allowing other types of logging if you so wish.



Internationalisation of Messages

The DataNucleus system is internationalisable hence messages (to log files or exceptions) can be displayed in multiple languages. Currently DataNucleus contains localisation files in the default locale (English), but can be extended easily by adding localisationfiles in languages such as Spanish, French, etc. The internationalisation operates around the org.datanucleus.util.Localiser class that is responsible for generating the messages in the specified locale. Each class needs to instantiate a Localiser

private static final Localiser LOCALISER=Localiser.getInstance("org.datanucleus.store.Localisation",
                    MyClass.class);
and then output messages via
LOCALISER.msg("012345", schemaName, autoStartMechanism)
The messages themselves are contained in a file for each package. For example, with the above example, we have org.datanucleus.store.Localisation.properties. This contains entries such as
012345=Initialising Schema "{0}" using "{1}" auto-start option
So the 2 parameters specified in the LOCALISER.msg call are inserted into the message. The language-specific parts are always contained in the Localisation.properties file. To extend the current system to internationalise in, for example, Spanish you would add a file org.datanucleus.store.Localisation_es.properties and add an entry such as
012345=Inicializando la esquema "{0}" con la opción de empezar "{1}"
With this file installed, anybody running an application where the JDK is running in Spanish as default locale would see the above Spanish message. It is intended to include such files in a future release.

If you want to extend this to another language and contribute the files for your language you need to find all files "Localisation.properties" and provide an alternative variant. The key ones are

  • core : org/datanucleus/Localisation.properties
  • api.jdo : org/datanucleus/api/jdo/Localisation.properties
  • store.rdbms : org/datanucleus/store/rdbms/Localisation.properties

Note that the second argument used in constructing the Localiser is important for OSGi. It has to be a class in the same OSGi bundle as the Localisation.properties file

You will find alternates in Spanish already present named "Localisation_es.properties", so if you wanted to create a French localisation then provide "Localisation_fr.properties".

Further references: International Components for Unicode for Java



Query Compilation and Evaluation

DataNucleus provides a generic query processing engine. It provides for compilation of string-based query languages. Additionally it allows in-memory evaluation of these queries. This is very useful when providing support for new datastores which either don't have a native query language and so the only alternative is for DataNucleus to evaluate the queries, or where it will take some time to map the compiled query to the equivalent query in the native language of the datastore.

Input Processing

When a user invokes a query, using the JDO/JPA APIs, they are providing either

  • A single-string query made up of keywords and clauses
  • A query object that has the clauses specified directly

The first step is to convert these two forms into the constituent clauses. It is assumed that a string-based query is of the form

SELECT {resultClause} FROM {fromClause} WHERE {filterClause}
GROUP BY {groupingClause} HAVING {havingClause}
ORDER BY {orderClause}

The two primary supported query languages have helper classes to provide this migration from the single-string query form into the individual clauses. These can be found in org.datanucleus.query.JDOQLSingleStringParser and org.datanucleus.query.JPQLSingleStringParser


Compilation

So we have a series of clauses and we want to compile them. So what does this mean? Well, in simple terms, we are going to convert the individual clauses from above into expression tree(s) so that they can be evaluated. The end result of a compilation is a org.datanucleus.query.compiler.QueryCompilation

So if you think about a typical query you may have

SELECT field1, field2 FROM MyClass
This has 2 result expressions - field1, and field2 (where they are each a "PrimaryExpression" meaning a representation of a field).

The query compilation of a particular clauses has 2 stages

  1. Compilation into a Node tree, with operations between the nodes
  2. Compilation of the Node tree into an Expression tree of supported expressions

and compilation is performed by a JavaQueryCompiler, so look at org.datanucleus.query.compiler.JDOQLCompiler and org.datanucleus.query.compiler.JPQLCompiler These each have a Parser that performs the extraction of the different components of the clauses and generation of the Node tree. Once a Node tree is generated it can then be converted into the compiled Expression tree; this is handled inside the JavaQueryCompiler.

The other part of a query compilation is the org.datanucleus.query.symbol.SymbolTable which is a lookup table (map) of identifiers and their value. So, for example, an input parameter will have a name, so has an entry in the table, and its value is stored there. This is then used during evaluation.


Evaluation : In-datastore

Intuitively it is more efficient to evaluate a query within the datastore since it means that fewer actual result objects need instantiating in order to determine the result objects. To evaluate a compiled query in the datastore there needs to be a compiler for taking the generic expression compilation and converting it into a native query. Additionally it should be noted that you aren't forced to evaluate the whole of the query in the datastore, maybe just the filter clause. This would be done where the datastore native language maybe only provides a limited amount of query capabilities. For example with db4o we evaluate the filter and ordering in the datastore, using their SODA query language. The remaining clauses can be evaluated on the resultant objects in-memory (see below). Obviously for a datastore like RDBMS it should be possible to evaluate the whole query in-datastore.


Evaluation : In-memory

Evaluation of queries in-memory assumes that we have a series of "candidate" objects. These are either user-input to the query itself, or retrieved from the datastore. We then use the in-memory evaluator org.datanucleus.query.evaluator.memory.InMemoryExpressionEvaluator . This takes in each candidate object one-by-one and evaluates whichever of the query clauses are desired to be evaluated. For example we could just evaluate the filter clause. Evaluation makes use of the values of the fields of the candidate objects (and related objects) and uses the SymbolTable for values of parameters etc. Where a candidate fails a particular clause in the filter then it is excluded from the results.


Results

There are two primary ways to return results to the user.

  • Instantiate all into memory and return a (java.util.)List. This is the simplest, but obviously can impact on memory footprint.
  • Return a wrapper to a List, and intercept calls so that you can load objects as they are accessed. This is more complex, but has the advantage of not imposing a large footprint on the application.

To make use of the second route, consider extending the class org.datanucleus.store.query.AbstractQueryResult and implement the key methods. Also, for the iterator, you can extend org.datanucleus.store.query.AbstractQueryResultIterator.



Second-Class Objects

When a persistable class is persisted and has a field of a second-class type (Collection, Map, Date, etc) then DataNucleus needs to know when the user calls operations on it to change the contents of the object. To do this, at the first reference to the field once enlisted in a transaction, DataNucleus will replace the field value with a proxy wrapper wrapping the real object. This has no effect for the user in that the field is still castable to the same type as they had in that field, but all operations are intercepted.

Container fields : Caching of Values

By default when a container field is replaced by a second-class object (SCO) wrapper it will be enabled to cache the values in that field. This means that once the values are loaded in that field there will be no need to make any call to the datastore unless changing the container. This gives significant speed ups when compared to relaying all calls via the datastore. You can change to not use caching by setting either

  • Globally for the PersistenceManagerFactory - this is controlled by setting the PMF property org.datanucleus.cache.collections. Set it to false to pass through to the datastore.
  • For the specific Collection/Map - add a MetaData <collection> or <map> extension cache setting it to false to pass through to the datastore.

This is implemented in a typical SCO proxy wrapper by using the SCOUtils method useContainerCache() which determines if caching is required, and by having a method load() on all proxy wrapper container classes.


Container fields : Lazy Loading

JDO and JPA provide mechanisms for specifying whether fields are loaded lazily (when required) or whether they are loaded eagerly (when the object is first met). DataNucleus follows these specifications but also allows the user to override the lazy loading for a SCO container. For example if a collection field was marked as being part of the default fetch group it should be loaded eagerly which means that when the owning object is instantiated the collection is loaded up too. If the user overrides the lazy loading for that field in that situation to make it lazy, DataNucleus will instantiate the owning object and instantiate the collection but leave it marked as "to be loaded" and the elements will be loaded up when needed. You can change the lazy loading setting via

  • Globally for the PersistenceManagerFactory - this is controlled by setting the PMF property org.datanucleus.cache.collections.lazy. Set it to true to use lazy loading, and set it to false to load the elements when the collection/map is initialised.
  • For the specific Collection/Map - add a MetaData <collection> or <map> extension cache-lazy-loading. Set it to true to use lazy loading, and false to load once at initialisation.

Containg fields : Queuing operations

When DataNucleus is using an optimistic transaction it attempts to delay all datastore operations until commit is called on the transaction or flush is called on the PersistenceManager/EntityManager. This implies a change to operation of SCO proxy wrappers in that they must queue up all mutating operations (add, clear, remove etc) until such a time as they need to be sent to the datastore. All SCO proxy wrappers have a List of queued operations for this purpose.

All code for the actual queued operations are stored under org.datanucleus.sco.queued.


Simple SCO interceptors

There are actually two sets of SCO wrappers in DataNucleus. The first set provide lazy loading, queueing, etc etc. The second set are simple wrappers that intercept operations and mark the field as dirty in the StateManager. This second set are for use with datastores such as db4o that don't utilise backing stores and just want to know when the field is dirty and hence should be written.

All code for the simple SCO wrappers are stored under org.datanucleus.sco.simple.

Enhancer

DataNucleus relies on classes implementing PersistenceCapable, and Detachable. Users could clearly do this manually but we provide the byte-code enhancement option. The DataNucleus Enhancer is structured to firstly determine from the input which classes are required to be enhanced, and secondly to enhance each class using the selected ClassEnhancer. DataNucleus has the JDOClassEnhancer providing enhancement to the JDO bytecode enhancement contract.

JDOClassEnhancer

ASM is very lightweight and operates using the same pattern as a SAX Parser and much faster. It uses a Visitor pattern. First the class is visited, then fields and methods, and finally an "end" point where you can add on any new fields/methods etc. The JDOClassEnhancer uses the JDOClassVisitor to obtain information about a class to be enhanced and adds on all required fields/methods.

A very useful utility when developing with ASM is its "Bytecode Outline" Eclipse plugin. To install it simply add an "Eclipse Update site" to your Eclipse config as "http://download.forge.objectweb.org/eclipse-update/" and the name "ObjectWeb". You then install the "Bytecode Outline" plugin. Once you have it installed select "Window" -> "Show View" -> "Other" -> "Java : Bytecode". This provides a window showing the Java bytecode for the class being edited. If you click on the "ASM" button on this window it shows you the ASM commands you would need to create the class, or a particular method/field!. This makes developing new ASMClassMethod implementations a doddle - just create a class with the method you want generating and then cut and paste the ASM code in.

Decompiling Classes

If you ever need to check the byte-code enhanced class for correctness you can always decompile it back to the Java file. This can be done with a bytecode decompiler such as JD. Unpack the JD-GUI download so that you have the following

  • jd-gui
  • readme.txt

and invoke the following command

jd-gui

and select "Open", choosing a class file, and it shows the java code