Object/NoSQL Mapping for Riak with Dasein Persist

By George Reese
May 28, 2012 | Comments: 1

One of the more snarky things you will hear said about NoSQL databases is that they are "write-only" databases or "no query" databases. It is fair to say that NoSQL databases are often challenging to pull data from when you are doing more than fetching values by their keys. Dasein Persist—an Open Source object/relational mapping tool for Java—recently incorporated support for mapping to Riak (one of many NoSQL databases) to help address this problem. Through Dasein Persist, you simply annotate your Java objects and immediately have the ability to store those objects in Riak as well as execute complex searches without having to learn about the intricacies of NoSQL data stores or the specifics of Riak.

Object/Relational Mapping

Java programmers have long relied on object/relational mapping (ORM) tools to automatically persist Java objects in relational databases. When you use an ORM tool, you focus on the creation of Java objects. The tool takes care of all of the database integration. You don't program SQL or JDBC.

Dasein Persist is an ORM tool originally created as part of my book Database Programming with JDBC and Java (the first edition of which was written in 1997). It's one of a bajillion Java ORM tools and was never among the popular Open Source tools. I've always used it my applications largely because I like the way it does memory management in relation to the other ORM tools out there.

Using Dasein Persist is not radically different from other ORM tools out there. First, you write your core Java class:

import org.dasein.util.CachedItem;

import org.dasein.persist.PersistenceException;
import org.dasein.persist.Transaction;
import org.dasein.persist.annotations.Index;
import org.dasein.persist.annotations.IndexType;

import java.util.HashMap;
import java.util.Map;

public class Person implements CachedItem {
    static private PersistentCache<Person> personCache;

    static private PersonCache<Person> getCache() throws PersistenceException {
        if( personCache == null ) {
            personCache = (PersistentCache<Person>)PersistentCache.getCache(Person.class);
        }
        return personCache;
    }

    static public Person newPerson(String givenName, String familyName) throws PersistenceException {
        HashMap<String,Object> state = new HashMap<String,Object>();

        state.put("personId", getCache().getNewKeyValue());
        state.put("givenName", givenName);
        state.put("familyName", familyName);

        Transaction xaction = Transaction.getInstance();

        try {
            Person person = getCache().create(xaction, state);
            xaction.commit();
            return person;
        }
        finally {
            xaction.rollback();
        }
    }
    
    static public Person getPerson(long personId) throws PersistenceException {
        return getCache().get(personId);
    }
    
    @Index(type=IndexType.PRIMARY)
    private long   personId;
    
    private String familyName;
    private String givenName;

    public Person() { }

    public String getFamilyName() {
        return familyName;
    }

    public String getGivenName() {
        return givenName;
    }

    public long getPersonId() {
        return personId;
    }

    private transient long nextSync = -1L;
    
    @Override
    public boolean isValidForCache() {
        synchronized( this ) {
            if( nextSync == -1L ) {
                // in reality, you would do something random here
                // to prevent everything going stale at once
                nextSync = System.currentTimeMillis() + 300000L;
                return true;
            }
            return (nextSync > System.currentTimeMillis();
        }
    }    
}

This simple class is all you have to write for a fully persistent Person object. All of the magic mapping of this object between your object and the relational database happens within the PersistentCache implementation you specify in a configuration file. Until recently, there's been one configuration: RelationalCache.

Your class extends a special class called CachedItem that helps with object caching. You are supposed to create an isValidForCache() method that helps determine your object's life time in memory cache. The implementation in this example is good enough for a small application, but you'd want something more interesting (and centralized) for a large system.

The other two instance methods are simply getters. There are no setters.

The static methods enable you to create and access instances. Depending on the complexity of your application, you may want to split these methods out into a factory class. In this case, I just kept them in the core class as static methods.

These methods interact with the shared object cache that manages the object/relational mapping. This cache makes sure that every row in the person table of your relational database is represented by a shared Person object in the JVM. The cached object is weakly reachable in the cache, meaning that it is eligible for garbage collection at any time unless you are maintaining references to it in your code. In addition, Dasein Persist will discard the reference if the object declares it's no longer valid.

That's it. The only other code you might write would be for other operations like deletes, updates, and searches.

Object/NoSQL Mapping

Dasein Persist introduced Riak support in February, with enhancements this week. All of the code above works without change for integration with Riak. The only change you make is in your dasein-persistence.properties configuration file. Instead of setting your default cache to org.dasein.persist.RelationalCache, you instead set it to org.dasein.persist.riak.RiakCache. Actually, the modeling support for Riak is better than the Dasein Persist relational mapping for several reasons:

  • The relational mapping requires you to create a person table first (Dasein Persist won't create it for you) with the proper schema. The Riak mapping does not require you to do anything in Riak other than get Riak running with the proper table type.
  • The relational mapping expects you to normalized multi-valued fields. The Riak mapping will map Java arrays to JSON arrays and even enable you to index those values.
  • The Riak mapping builds all indexes for you and elegantly handles mixed JSON schemas; the relational mapping expects you to create your SQL database indexes properly and manage data model changes.

In short, the Riak mapping isn't some kind of bastardized overlay that attempts to force relational or object semantics on top of a NoSQL database. Instead, it takes advantage of the features that make Riak an excellent NoSQL database for use in Java applications.

Under the Covers

Dasein Persist maps all non-transient non-static attributes of an object to the backend data store. It expects at least one of those attributes to be annotated with an index annotation of type IndexType.PRIMARY. In Riak, that value is used to access the object. So, the Person instance with the person ID of 1234 running on a local Riak server would be accessible via:

curl -i http://localhost:8098/buckets/person/keys/1234

Dasein Persist maps classes to buckets using the lower case name of the class with underscores replacing camel-case. In other words, a VirtualMachine class would map to a virtual_machine bucket.

The Person object is then serialized into JSON and stored under that key. The JSON mapping is pretty much exactly you would expect except for mapping to custom objects. Dasein Persist allows you to optionally map those fields as JSON or string values.

All of that is probably exactly what you would expect. Where things get interesting is when you want to search Riak for a specific value. For example, let's say you added the finder method:

static public Collection findPersonsWithFamilyName(String familyName) throws PersistenceException {
    return getCache().find(new SearchTerm("familyName", familyName));
}

The PersistentCache has multiple variants of the find() method, all of which accept one or more SearchTerm arguments. This simple query maps to a map/reduce operation in Riak that finds all objects in the person bucket with a familyName matching your query.

While it's cool that you don't have to worry about writing map/reduce functions, what I've described above sucks for performance—especially if it happens exactly as I described it. Fortunately, that's not the case.

First of all, Dasein Persist makes heavy use of Riak secondary indexes. When it creates any object, it adds a special secondary index with the name of the bucket. This special secondary index means Dasein Persist never walks keys for any operation, ever. For example, the above map/reduce first performs a secondary index search on the bucket to get the proper keys and uses those keys as input into the map/reduce function. In fact, any time Dasein Persist is forced into a map/reduce operation, it uses the bucket index instead of a key walk to generate the input keys.

More important, Dasein Persist uses map/reduce operations only as a last resort. Let's make one quick change to the code:

@Index(type=IndexType.SECONDARY)
private String familyName;

By annotating familyName with the Index annotation of type IndexType.SECONDARY, you are telling Dasein Persist that this field should be indexed as a secondary index. Dasein Persist will then create and update each value in the person bucket with the familyName field as a binary index. Dasein Persist base64 encodes the name to create the index so that it will always correspond to a proper Riak index, no matter what you are doing with the string. If the field is numeric, Dasein Persist will create the index as a numeric index. Now, when you search on familyName, Dasein Persist will use the secondary index instead of performing a map/reduce.

Dasein Persist will also create secondary indexes on multi-valued (array) fields as well as multi-column, cascading secondary indexes (for example, familyName and givenName as a single compound index).

The final type of index is one with type=IndexType.FOREIGN and identifies=Class. For example, let's say the Person had an employer reference:

@Index(type=IndexType.FOREIGN, identifies=Employer.class)
private long employerId;

This annotation will create both a secondary index on the employerId field and create a Riak Link to the employer identified by the employerId value.

Things to Remember

First of all, to leverage all of the secondary indexes that Dasein Persist creates, you must use the right storage backing. Currently, the only storage supported in Riak that handles secondary indexes is ELevelDB.

Another thing to remember is that this indexing via annotations won't impact existing objects. Writing a tool to occasionally re-index is fairly trivial. The important thing to remember, however, is that the annotation based searching will miss any record that doesn't have the proper index.


You might also be interested in:

1 Comment

I'm not familiar with other language bindings. I'm sure they have many.

Leave a comment



News Topics

Recommended for You

Got a Question?