DataNucleus - JDO 1-N Relationships : Collection

JDO : 1-N Relationships with Collections

You have a 1-N (one to many) or N-1 (many to one) when you have one object of a class that has a Collection of objects of another class. Please note that Collections allow duplicates, and so the persistence process reflects this with the choice of primary keys. There are two ways in which you can represent this in a datastore : Join Table (where a join table is used to provide the relationship mapping between the objects), and Foreign-Key (where a foreign key is placed in the table of the object contained in the Collection.

The various possible relationships are described below.

1-N Unidirectional using Join Table
1-N Unidirectional using Foreign-Key
1-N Bidirectional using Join Table
1-N Bidirectional using Foreign-Key
1-N Unidirectional of non-PC using Join Table
1-N embedded elements using Join Table
1-N Serialised collection
1-N using shared join table (DataNucleus Extension)
1-N using shared foreign key (DataNucleus Extension)
1-N Bidirectional "Compound Identity" (owner object as part of PK in element)

Important : If you declare a field as a Collection, you can instantiate it as either Set-based or as List-based. With a List an "ordering" column is required, whereas with a Set it isn't. Consequently DataNucleus needs to know if you plan on using it as Set-based or List-based. You do this by adding an "order" element to the field if it is to be instantiated as a List-based collection. If there is no "order" element, then it will be assumed to be Set-based.

Please note that RDBMS supports the full range of options on this page, whereas other datastores (ODF, Excel, HBase, MongoDB, etc) persist the Collection in a column in the owner object (as well as a column in the non-owner object when bidirectional) rather than using join-tables or foreign-keys since those concepts are RDBMS-only.

equals() and hashCode()

Important : The element of a Collection ought to define the methods equals and hashCode so that updates are detected correctly. This is because any Java Collection will use these to determine equality and whether an element is contained in the Collection. Note also that the hashCode() should be consistent throughout the lifetime of a persistable object. By that we mean that it should not use some basis before persistence and then use some other basis (such as the object identity) after persistence, for this reason we do not recommend usage of JDOHelper.getObjectId(obj) in the equals/hashCode methods.

1-N Collection Unidirectional

We have 2 sample classes Account and Address. These are related in such a way as Account contains a Collection of objects of type Address, yet each Address knows nothing about the Account objects that it relates to. Like this

There are 2 ways that we can persist this relationship. These are shown below

Using Join Table

If you define the XML metadata for these classes as follows

<package name="com.mydomain">
    <class name="Account">
        <field name="id" primary-key="true">
            <column name="ACCOUNT_ID"/>
        </field>
        ...
        <field name="addresses" table="ACCOUNT_ADDRESSES">
            <collection element-type="com.mydomain.Address"/>
            <join column="ACCOUNT_ID_OID"/>
            <element column="ADDRESS_ID_EID"/>
        </field>
    </class>

    <class name="Address">
        <field name="id" primary-key="true">
            <column name="ADDRESS_ID"/>
        </field>
        ...
    </class>
</package>

or alternatively using annotations

public class Account
{
    ...

    @Persistent(table="ACCOUNT_ADDRESSES")
    @Join(column="ACCOUNT_ID_OID")
    @Element(column="ADDRESS_ID_EID")
    Collection<Address> addresses;
}

public class Address
{
    ...
}

The crucial part is the join element on the field element - this signals to JDO to use a join table.

This will create 3 tables in the database, one for Address, one for Account, and a join table, as shown below.

The join table is used to link the 2 classes via foreign keys to their primary key. This is useful where you want to retain the independence of one class from the other class.

If you wish to fully define the schema table and column names etc, follow these tips

To specify the name of the table where a class is stored, specify the table attribute on the class element
To specify the names of the columns where the fields of a class are stored, specify the column attribute on the field element.
To specify the name of the join table, specify the table attribute on the field element with the collection.
To specify the names of the join table columns, use the column attribute of join, element elements.
To specify the foreign-key between container table and join table, specify <foreign-key> below the <join> element.
To specify the foreign-key between join table and element table, specify <foreign-key> below either the <field> element or the <element> element.
If you wish to share the join table with another relation then use the DataNucleus "shared join table" extension
The join table will, by default, be given a primary key. If you want to omit this then you can turn it off using the DataNucleus metadata extension "primary-key" (within <join>) set to false.
The column "ADPT_PK_IDX" is added by DataNucleus so that duplicates can be stored. You can control this by adding an <order> element and specifying the column name for the order column (within <field>).
If you want the set to include nulls, you can turn on this behaviour by adding the DataNucleus extension metadata "allow-nulls" to the <field> set to true

Using Foreign-Key

In this relationship, the Account class has a List of Address objects, yet the Address knows nothing about the Account. In this case we don't have a field in the Address to link back to the Account and so DataNucleus has to use columns in the datastore representation of the Address class. So we define the XML metadata like this

<package name="com.mydomain">
    <class name="Account">
        <field name="id" primary-key="true">
            <column name="ACCOUNT_ID"/>
        </field>
        ...
        <field name="addresses">
            <collection element-type="com.mydomain.Address"/>
            <element column="ACCOUNT_ID"/>
        </field>
    </class>

    <class name="Address">
        <field name="id" primary-key="true">
            <column name="ADDRESS_ID"/>
        </field>
        ...
    </class>
</package>

or alternatively using annotations

public class Account
{
    ...

    @Element(column="ACCOUNT_ID")
    Collection<Address> addresses;
}

public class Address
{
    ...
}

Again there will be 2 tables, one for Address, and one for Account. Note that we have no "mapped-by" attribute specified, and also no "join" element. If you wish to specify the names of the columns used in the schema for the foreign key in the Address table you should use the element element within the field of the collection.

In terms of operation within your classes of assigning the objects in the relationship. You have to take your Account object and add the Address to the Account collection field since the Address knows nothing about the Account.

If you wish to fully define the schema table and column names etc, follow these tips

To specify the name of the table where a class is stored, specify the table attribute on the class element
To specify the names of the columns where the fields of a class are stored, specify the column attribute on the field element.
To specify the foreign-key between container table and element table, specify <foreign-key> below either the <field> element or the <element> element.

Limitation : Since each Address object can have at most one owner (due to the "Foreign Key") this mode of persistence will not allow duplicate values in the Collection. If you want to allow duplicate Collection entries, then use the "Join Table" variant above.

1-N Collection Bidirectional

We have 2 sample classes Account and Address. These are related in such a way as Account contains a Collection of objects of type Address, and each Address has a reference to the Account object that it relates to. Like this

There are 2 ways that we can persist this relationship. These are shown below

Using Join Table

If you define the XML metadata for these classes as follows

<package name="com.mydomain">
    <class name="Account">
        <field name="id" primary-key="true">
            <column name="ACCOUNT_ID"/>
        </field>
        ...
        <field name="addresses" mapped-by="account">
            <collection element-type="com.mydomain.Address"/>
            <join/>
        </field>
    </class>

    <class name="Address">
        <field name="id" primary-key="true">
            <column name="ADDRESS_ID"/>
        </field>
        ...
        <field name="account"/>
    </class>
</package>

or alternatively using annotations

public class Account
{
    ...

    @Persistent(mappedBy="account")
    @Join
    Collection<Address> addresses;
}

public class Address
{
    ...
}

The crucial part is the join element on the field element - this signals to JDO to use a join table.

This will create 3 tables in the database, one for Address, one for Account, and a join table, as shown below.

The join table is used to link the 2 classes via foreign keys to their primary key. This is useful where you want to retain the independence of one class from the other class.

If you wish to fully define the schema table and column names etc, follow these tips

To specify the name of the table where a class is stored, specify the table attribute on the class element
To specify the names of the columns where the fields of a class are stored, specify the column attribute on the field element.
To specify the name of the join table, specify the table attribute on the field element with the collection.
To specify the names of the join table columns, use the column attribute of join, element elements.
To specify the foreign-key between container table and join table, specify <foreign-key> below the <join> element.
To specify the foreign-key between join table and element table, specify <foreign-key> below either the <field> element or the <element> element.
If you wish to share the join table with another relation then use the DataNucleus "shared join table" extension
The join table will, by default, be given a primary key. If you want to omit this then you can turn it off using the DataNucleus metadata extension "primary-key" (within <join>) set to false.
The column "ADPT_PK_IDX" is added by DataNucleus so that duplicates can be stored. You can control this by adding an <order> element and specifying the column name for the order column (within <field>).
When forming the relation please make sure that you set the relation at BOTH sides since DataNucleus would have no way of knowing which end is correct if you only set one end.
If you want the set to include nulls, you can turn on this behaviour by adding the extension metadata "allow-nulls" to the <field> set to true

Using Foreign-Key

Here we have the 2 classes with both knowing about the relationship with the other.

If you define the XML metadata for these classes as follows

<package name="com.mydomain">
    <class name="Account">
        <field name="id" primary-key="true">
            <column name="ACCOUNT_ID"/>
        </field>
        ...
        <field name="addresses" mapped-by="account">
            <collection element-type="com.mydomain.Address"/>
        </field>
    </class>

    <class name="Address">
        <field name="id" primary-key="true">
            <column name="ADDRESS_ID"/>
        </field>
        ...
        <field name="account">
            <column name="ACCOUNT_ID"/>
        </field>
    </class>
</package>

or alternatively using annotations

public class Account
{
    ...

    @Persistent(mappedBy="account")
    Collection<Address> addresses;
}

public class Address
{
    ...
}

The crucial part is the mapped-by on the "1" side of the relationship. This tells the JDO implementation to look for a field called account on the Address class.

This will create 2 tables in the database, one for Address (including an ACCOUNT_ID to link to the ACCOUNT table), and one for Account. Notice the subtle difference to this set-up to that of the Join Table relationship earlier.

If you wish to fully define the schema table and column names etc, follow these tips

To specify the name of the table where a class is stored, specify the table attribute on the class element
To specify the names of the columns where the fields of a class are stored, specify the column attribute on the field element.
To specify the foreign-key between container table and element table, specify <foreign-key> below either the <field> element or the <element> element.
When forming the relation please make sure that you set the relation at BOTH sides since DataNucleus would have no way of knowing which end is correct if you only set one end.

Limitation : Since each Address object can have at most one owner (due to the "Foreign Key") this mode of persistence will not allow duplicate values in the Collection. If you want to allow duplicate Collection entries, then use the "Join Table" variant above.

1-N Collection of non-persistable objects

All of the examples above show a 1-N relationship between 2 persistable classes. DataNucleus can also cater for a Collection of primitive or Object types. For example, when you have a Collection of Strings. This will be persisted in the same way as the "Join Table" examples above. A join table is created to hold the collection elements. Let's take our example. We have an Account that stores a Collection of addresses. These addresses are simply Strings. We define the XML metadata like this

<package name="com.mydomain">
    <class name="Account">
        <field name="id" primary-key="true">
            <column name="ACCOUNT_ID"/>
        </field>
        ...
        <field name="addresses" persistence-modifier="persistent">
            <collection element-type="java.lang.String"/>
            <join/>
            <element column="ADDRESS"/>
        </field>
    </class>

or alternatively using annotations

public class Account
{
    ...

    @Persistent
    @Join
    @Element(column="ADDRESS")
    Collection<String> addresses;
}

In the datastore the following is created

The ACCOUNT table is as before, but this time we only have the "join table". In our MetaData we used the <element> tag to specify the column name to use for the actual address String.

Please note that the column ADPT_PK_IDX is added by DataNucleus so that duplicates can be stored. You can control the name of this column by adding an <order> element and specifying the column name for the order column (within <field>).

Embedded into a Join Table

The above relationship types assume that both classes in the 1-N relation will have their own table. A variation on this is where you have a join table but you embed the elements of the collection into this join table. To do this you use the embedded-element attribute on the collection MetaData element. This is described in Embedded Collection Elements.

Serialised into a Join Table

The above relationship types assume that both classes in the 1-N relation will have their own table. A variation on this is where you have a join table but you serialise the elements of the collection into this join table in a single column. To do this you use the serialised-element attribute on the collection MetaData element. This is described in Serialised Collection Elements

Shared Join Tables

The relationships using join tables shown above rely on the join table relating to the relation in question. DataNucleus allows the possibility of sharing a join table between relations. The example below demonstrates this. We take the example as show above (1-N Unidirectional Join table relation), and extend Account to have 2 collections of Address records. One for home addresses and one for work addresses, like this

We now change the metadata we had earlier to allow for 2 collections, but sharing the join table

<package name="com.mydomain">
    <class name="Account">
        <field name="id" primary-key="true">
            <column name="ACCOUNT_ID"/>
        </field>
        ...
        <field name="workAddresses" persistence-modifier="persistent" table="ACCOUNT_ADDRESSES">
            <collection element-type="com.mydomain.Address"/>
            <join column="ACCOUNT_ID_OID"/>
            <element column="ADDRESS_ID_EID"/>
            <extension vendor-name="datanucleus" key="relation-discriminator-column" value="ADDRESS_TYPE"/>
            <extension vendor-name="datanucleus" key="relation-discriminator-pk" value="true"/>
            <extension vendor-name="datanucleus" key="relation-discriminator-value" value="work"/>
        </field>
        <field name="homeAddresses" persistence-modifier="persistent" table="ACCOUNT_ADDRESSES">
            <collection element-type="com.mydomain.Address"/>
            <join column="ACCOUNT_ID_OID"/>
            <element column="ADDRESS_ID_EID"/>
            <extension vendor-name="datanucleus" key="relation-discriminator-column" value="ADDRESS_TYPE"/>
            <extension vendor-name="datanucleus" key="relation-discriminator-pk" value="true"/>
            <extension vendor-name="datanucleus" key="relation-discriminator-value" value="home"/>
        </field>
    </class>

    <class name="Address">
        <field name="id" primary-key="true">
            <column name="ADDRESS_ID"/>
        </field>
        ...
    </class>
</package>

So we have defined the same join table for the 2 collections "ACCOUNT_ADDRESSES", and the same columns in the join table, meaning that we will be sharing the same join table to represent both relations. The important step is then to define the 3 DataNucleus extension tags. These define a column in the join table (the same for both relations), and the value that will be populated when a row of that collection is inserted into the join table. In our case, all "home" addresses will have a value of "home" inserted into this column, and all "work" addresses will have "work" inserted. This means we can now identify easily which join table entry represents which relation field.

This results in the following database schema

Shared Foreign Key

The relationships using foreign keys shown above rely on the foreign key relating to the relation in question. DataNucleus allows the possibility of sharing a foreign key between relations between the same classes. The example below demonstrates this. We take the example as show above (1-N Unidirectional Foreign Key relation), and extend Account to have 2 collections of Address records. One for home addresses and one for work addresses, like this

We now change the metadata we had earlier to allow for 2 collections, but sharing the join table

<package name="com.mydomain">
    <class name="Account">
        <field name="id" primary-key="true">
            <column name="ACCOUNT_ID"/>
        </field>
        ...
        <field name="workAddresses" persistence-modifier="persistent">
            <collection element-type="com.mydomain.Address"/>
            <element column="ACCOUNT_ID_OID"/>
            <extension vendor-name="datanucleus" key="relation-discriminator-column" value="ADDRESS_TYPE"/>
            <extension vendor-name="datanucleus" key="relation-discriminator-value" value="work"/>
        </field>
        <field name="homeAddresses" persistence-modifier="persistent">
            <collection element-type="com.mydomain.Address"/>
            <element column="ACCOUNT_ID_OID"/>
            <extension vendor-name="datanucleus" key="relation-discriminator-column" value="ADDRESS_TYPE"/>
            <extension vendor-name="datanucleus" key="relation-discriminator-value" value="home"/>
        </field>
    </class>

    <class name="Address">
        <field name="id" primary-key="true">
            <column name="ADDRESS_ID"/>
        </field>
        ...
    </class>
</package>

So we have defined the same foreign key for the 2 collections "ACCOUNT_ID_OID", The important step is then to define the 2 DataNucleus extension tags. These define a column in the element table (the same for both relations), and the value that will be populated when a row of that collection is inserted into the element table. In our case, all "home" addresses will have a value of "home" inserted into this column, and all "work" addresses will have "work" inserted. This means we can now identify easily which element table entry represents which relation field.

This results in the following database schema


				Quick Link