DataNucleus - Tutorial for JDO using HBase
Background

An application can be JDO-enabled via many routes depending on the development process of the project in question. For example the project could use Eclipse as the IDE for developing classes. In that case the project would typically use the DataNucleus Eclipse plugin. Alternatively the project could use Ant, Maven or some other build tool. In this case this tutorial should be used as a guiding way for using DataNucleus in the application, using HBase as the datastore. The JDO process is quite straightforward.

  1. Step 0 : Download DataNucleus AccessPlatform
  2. Step 1 : Design your domain/model classes as you would do normally
  3. Step 2 : Define their persistence definition using Meta-Data.
  4. Step 3 : Compile your classes, and instrument them (using the DataNucleus enhancer).
  5. Step 4 : Generate any schema where your classes are to be persisted.
  6. Step 5 : Write your code to persist your objects within the DAO layer.
  7. Step 6 : Run your application.
  8. Step 7 : Things to add.

The tutorial guides you through this. You can obtain the code referenced in this tutorial from SourceForge (one of the files entitled "datanucleus-samples-tutorial-*").



Step 0 : Download DataNucleus AccessPlatform

You can download DataNucleus in many ways, but the simplest is to download the distribution zip appropriate to your datastore (HBase in this case). You can do this from SourceForge DataNucleus download page. When you open the zip you will find DataNucleus jars in the lib directory, and dependency jars in the deps directory.

Step 1 : Create your domain/model classes

Do this as you would normally. The only JDO constraint on any Java class that needs persisting is that it has a default constructor (this can be private if you prefer, and will actually be added by the DataNucleus Enhancer if you don't add it ;-) ) . To give a working example, let us consider an application using accounts and logins.

package org.datanucleus.samples.jdo.hbase;

public class Account
{
    String firstName = null;
    String lastName = null;
    int level = 0;

    Login login = null;

    ...
}
package org.datanucleus.samples.jdo.hbase;

public class Login
{
    String login = null;
    String password = null;

    ...
}

So we have a relation between 2 classes.



Step 2 : Define the Persistence for your classes

You now need to define how the classes should be persisted, in terms of which fields are persisted etc. With JDO you could use

  • XML Metadata
  • Annotations
  • Annotations + XML
  • MetaData API at runtime
Here we use what could be considered a best practice, specifying basic persistence info as annotations, and then adding on ORM information in XML (since if we want then to persist to a different datastore later we don't need to update/recompile our classes, just change the XML file). So for our 2 domain classes

package org.datanucleus.samples.jdo.hbase;

@PersistenceCapable
public class Account
{
    String firstName = null;
    String lastName = null;
    int level = 0;

    @Embedded
    Login login = null;

    ...
}
package org.datanucleus.samples.jdo.hbase;

@PersistenceCapable
public class Login
{
    @PrimaryKey
    String login = null;

    String password = null;

    ...
}

So we will store the Account in its own HBase table, and embed the Account information into it. Now we define ORM information in an XML file

<?xml version="1.0"?>
<!DOCTYPE orm PUBLIC 
    "-//Sun Microsystems, Inc.//DTD Java Data Objects Metadata 2.0//EN" 
    "http://java.sun.com/dtd/orm_2_0.dtd">
<orm>
    <package name="org.datanucleus.samples.jdo.hbase">
        <class name="Account" table="Accounts">
            <datastore-identity strategy="IDENTITY"/>
            <field name="firstName" column="CHRISTIAN_NAME"/>
            <field name="lastName" column="SURNAME"/>
            <field name="level" column="LEVEL"/>
            <field name="login" column="LOGIN"/>
        </class>

        <class name="Login">
            <field name="login" column="LOGIN"/>
            <field name="password" column="PWD"/>
        </class>
    </package>
</orm>

With JDO you have various options as far as where this XML MetaData files is placed in the file structure, and whether they refer to a single class, or multiple classes in a package. With the above example, we have both classes specified in the same file package-hbase.orm , in the package these classes are in, since we want to persist to HBase.

In this tutorial we are using datastore identity for Account which means that all objects of this type will be assigned an identity by DataNucleus to be able to reference them. You should read about datastore identity and application identity when designing your systems persistence.



Step 3 : Enhance your classes

JDO relies on the classes that you want to persist being PersistenceCapable . That is, they need to implement this Java interface. You could write your classes manually to do this but this would be laborious. Alternatively you can use a post-processing step to compilation that "enhances" your compiled classes, adding on the necessary extra methods to make them PersistenceCapable . There are several ways to do this, using an "enhancer" at compile time (with JDK1.6+), or at runtime, or as a post-compile step. We use the post-compile step in this tutorial.

DataNucleus JDO provides its own byte-code enhancer for instrumenting/enhancing your classes for use by any JDO implementation. You will need to obtain the datanucleus-enhancer JAR for this.

To understand on how to invoke the enhancer you need to visualise where the various source and jdo files are stored

src/java/org/datanucleus/samples/jdo/hbase/Account.java
src/java/org/datanucleus/samples/jdo/hbase/Login.java
src/java/org/datanucleus/samples/jdo/hbase/package-hbase.orm

target/classes/org/datanucleus/samples/jdo/hbase/Account.class
target/classes/org/datanucleus/samples/jdo/hbase/Login.class
target/classes/org/datanucleus/samples/jdo/hbase/package-hbase.orm

lib/jdo-api.jar
lib/datanucleus-core.jar
lib/datanucleus-api-jdo.jar
lib/datanucleus-enhancer.jar
lib/asm.jar

The first thing to do is compile your domain/model classes. You can do this in any way you wish, but the downloadable JAR provides an Ant task, and a Maven2 project to do this for you.

Using Ant :
ant compile


Using Maven2 :
mvn compile

To enhance classes using the DataNucleus Enhancer, you need to invoke a command something like this from the root of your project.

Using Ant :
ant enhance

Using Maven : (this is usually done automatically after the "compile" goal)
mvn datanucleus:enhance


Manually on Linux/Unix :
java -cp target/classes:lib/datanucleus-enhancer.jar:lib/datanucleus-core.jar:
         lib/datanucleus-api-jdo.jar:lib/jdo-api.jar:lib/asm.jar
     org.datanucleus.enhancer.DataNucleusEnhancer 
     target/classes/org/datanucleus/samples/jdo/hbase/*.class

Manually on Windows :
java -cp target\classes;lib\datanucleus-enhancer.jar;lib\datanucleus-core.jar;
         lib\datanucleus-api-jdo.jar;lib\jdo-api.jar;lib\asm.jar
     org.datanucleus.enhancer.DataNucleusEnhancer 
     target\classes\org\datanucleus\samples\jdo\hbase\*.class

[Command shown on many lines to aid reading - should be on single line]

This command enhances the .class files that have @PersistenceCapable annotations. If you accidentally omitted this step, at the point of running your application and trying to persist an object, you would get a ClassNotPersistenceCapableException thrown. The use of the enhancer is documented in more detail in the Enhancer Guide. The output of this step are a set of class files that represent PersistenceCapable classes.



Step 4 : Generate any schema required for your domain classes

This step is optional, depending on whether you have an existing database schema or just want things to be created at runtime. Anyway, here you can use the SchemaTool to generate the HBase tables/columns where these domain objects will be persisted. DataNucleus SchemaTool is a command line utility (it can be invoked from Maven2/Ant in a similar way to how the Enhancer is invoked). The first thing that you need is to update the datanucleus.properties file with your database details. Here we have a sample file (for HBase)

javax.jdo.option.ConnectionURL=hbase:
javax.jdo.option.Mapping=hbase
datanucleus.autoCreateSchema=true

Now we need to run DataNucleus SchemaTool. For our case above you would do something like this

Using Ant :
ant createschema


Using Maven2 :
mvn datanucleus:schema-create


Manually on Linux/Unix :
java -cp target/classes:lib/datanucleus-core.jar:lib/datanucleus-hbase.jar:
         lib/datanucleus-jdo-api.jar:lib/jdo-api.jar:lib/{hbase.jar}
     org.datanucleus.store.schema.SchemaTool
     -props datanucleus.properties
     -create
     target/classes/org/datanucleus/samples/jdo/hbase/*.class

Manually on Windows :
java -cp target\classes;lib\datanucleus-core.jar;lib\datanucleus-hbase.jar;
         lib\datanucleus-api-jdo.jar;lib\jdo-api.jar;lib\{hbase.jar}
     org.datanucleus.store.schema.SchemaTool
     -props datanucleus.properties
     -create
     target\classes\org\datanucleus\samples\jdo\hbase\*.class

[Command shown on many lines to aid reading. Should be on single line]

This will generate the required tables and indexes for the classes defined in the JDO Meta-Data file.





Step 5 : Write the code to persist objects of your classes

Writing your own classes to be persisted is the start point, but you now need to define which objects of these classes are actually persisted, and when. Interaction with the persistence framework of JDO is performed via a PersistenceManager. This provides methods for persisting of objects, removal of objects, querying for persisted objects, etc. This section gives examples of typical scenarios encountered in an application.

The initial step is to obtain access to a PersistenceManager, which you do as follows

PersistenceManagerFactory pmf = JDOHelper.getPersistenceManagerFactory("datanucleus.properties");
PersistenceManager pm = pmf.getPersistenceManager();

So we are creating a PersistenceManagerFactory using the file datanucleus.properties as used above for DataNucleus SchemaTool. This will contain all properties necessary for our persistence usage. This file is found at the root of the CLASSPATH.

Now that the application has a PersistenceManager it can persist objects. This is performed as follows

Account acct = new Account("John", "Cameron", 3);
Login login = new Login("jcameron", "xxxx");
acct.setLogin(login);
pm.makePersistent(acct);

Please note that you could have done this within a transaction, but then HBase doesn't support ACID transactions, so the benefit of that is limited. Note that we had two persistent objects there and both are persisted by the single call to the PersistenceManager. This is called persistence-by-reachability .

If you want to retrieve an object from persistent storage, something like this will give what you need. This uses a "Query", and retrieves all Account objects that have a level of 3 or greater, ordering them by the surname.

Query q = pm.newQuery("SELECT FROM " + Account.class.getName() +
          " WHERE level >= 3 ORDER BY lastName ASC");
List<Account> c = (List)q.execute();
Iterator<Account> iter = c.iterator();
while (iter.hasNext())
{
    Account acct = iter.next();
    ... (use the retrieved objects)
}
q.close();

If you want to delete an object from persistence, you would perform an operation something like

Account acct = (Account)pm.getObjectById(acctId);

pm.deletePersistent(acct);

Clearly you can perform a large range of operations on objects. We can't hope to show all of these here. Any good JDO book will provide many examples.



Step 6 : Run your application

To run your JDO-enabled application will require a few things to be available in the Java CLASSPATH, these being

  • Any properties file for the PersistenceManagerFactory creation
  • The JDO MetaData files for your persistable classes
  • The HBase jar(s) needed for accessing your datastore
  • The JDO API JAR (defining the JDO interface)
  • The DataNucleus Core , DataNucleus JDO API , and DataNucleus HBase JARs

Now you should start up HBase. This is typically a call like

${HBASE_HOME}/bin/start_hbase.sh

After that it is simply a question of starting your application and all should be taken care of. You can access the DataNucleus Log file by specifying the logging configuration properties, and any messages from DataNucleus will be output in the normal way. The DataNucleus log is a very powerful way of finding problems since it can list all communications actually sent to the datastore as well as many other parts of the persistence process.

Using Ant (you need the included "datanucleus.properties" to specify your database)
ant run


Using Maven2:
maven exec:java


Manually on Linux/Unix :
java -cp lib/jdo-api.jar:lib/datanucleus-core.jar:lib/datanucleus-hbase.jar:
         lib/datanucleus-api-jdo.jar:lib/hbase.jar:target/classes/:. 
             org.datanucleus.samples.jdo.hbase.Main


Manually on Windows :
java -cp lib\jdo-api.jar;lib\datanucleus-core.jar;lib\datanucleus-hbase.jar;
         lib\datanucleus-api-jdo.jar;lib\hbase.jar;target\classes\;. 
             org.datanucleus.samples.jdo.hbase.Main
DataNucleus AccessPlatform with JDO
===================================
Persisting Account+Login
Account+Login have been persisted, with account-id=
                4d5ad6d75d466947e7c221cc[OID]org.datanucleus.samples.jdo.hbase.Account

Executing Query for Accounts with level of 3 or above
>  Account : John Cameron [level=3]

Deleting all Accounts from persistence
Deleted 1 accounts

End of Tutorial
Step 7 : Things to add

Now that you have your simple tutorial working, you can look to adding on other features. For example, add datanucleus-cache to the CLASSPATH as well as something like EHCache and you get more scalable L2 caching

Any questions?

If you have any questions about this tutorial and how to develop applications for use with DataNucleus please read the online documentation since answers are to be found there. If you don't find what you're looking for go to our Forums.

Again, you can download the sample classes from this tutorial from SourceForge.

The DataNucleus Team