Marrying JPA with graph databases


For a month or two, I have been exploring Neo4J, a graph database built for storing huge amount of data. Other popular graph database, that I will be dwelling into is InfiniteGraph from Objectivity.

I have also been working on Kundera (A JPA 2.0 based object-datastore mapping library for NoSQL datastores) as a key contributor. It already supports popular databases like Cassandra, HBase MongoDB and Redis. So next thought on our mind was to support this wonderful and popular graph database, you guessed it right: Neo4J.

JPA specification was not written keeping in mind NoSQL datastores; and graph databases are altogether a different story, they take object mapping challenges to a next level.  It made us sweat and argue countless hours on how to fit JPA into graph world. We dived deeper into both JPA and Neo4J, and here is how our journey unfold and key decision made…

SpringData for Neo4J is another similar effort that attempts POJO based development for Neo4J. In terms of ease of use, our goals converged. It introduces its own annotations for two category of entities: NodeEntity and RelationshipEntity. We were constrained with using JPA standards and decided not to introduce any new annotation.

Next item on our mind was to make rules for entity definition capable enough for users to express graph structure in the form of java entity classes. Here is what we thought best suited and was possible:

1. Both Node and Relationship POJOs would be annotated with @Entity.

2. Because of graph’s very nature, relationships between entities is always @ManyToMany. So, we decided to discard other forms of relationships for the sake of simplicity.

3. Biggest challenge was to fit “Relationship Entity” between different classes of “Node Entities”.

Take, for example case of Actor and Movie nodes entities joined via relationship entity Role. Actor is related to Movie via Role entity. Till now we had been keeping a List or Set of entities as relations. But this approach wasn’t sufficient as there was a third dimension here (in the form of Role).

Map Collections in JPA came to our rescue. This means Actor entity class can have a Map, containing Role as key and Movie as value. So far so good. Next thing was to choose relationship type and direction.

Relationship type could be read from @MapKeyJoinColumn annotation. Direction was implicit because in bidirectional relationship, you always have an owning side of entity. (mappedBy is specified at the other side). So relationship direction could easily be derived as OUTGOING from Actor to Movie.

4. Next consideration was to replicate flexibility of navigation that Neo4J provides in jumping from one node to other nodes via relationships. Bidirectional relationship made it possible to navigate from Actor to Movies via Role and vice versa. We decided to let user define Incoming and Outgoing Node entity attributes in relationship entity too (in addition to relationship entity’s own attributes), that would make it easy to navigate from Role to both Actors and Movies.

5. In my previous experience with other NoSQL databases, it didn’t matter whether database was on localhost or some other machine. We provided host and port for creating connection and use it just like RDBMS. In Neo4J, We’ve got two ways:

  • Embedded Database – In case database is expected to run on the same machine (faster but less flexible)
  • REST interface – In case database is on some other machine. (slower but more flexible)

This means, we required to create two translations for user CRUD calls and give users a way of choosing Embedded/ REST.

6. Next item on our plate was how to interpret JPA queries and run them on indexes. Since indexes in Neo4J are stored in Lucene in simplest configuration (and it was easy to build a JPA to Lucene conversion engine), we decided to translate all JPA queries into Lucene ones and run them directly on indices.

We identified three types of Native queries though. Lucene, Cypher and Gremlin. We started with Lucene first because it was simplest to implement and decided to implement support for remaining ones in subsequent releases.

 

So, summing this up all, It was challenging but rewarding to marry both of these heterogeneous world off. Once this gets fructifies, we shall seek for more refinement and additions. I shall post Kundera-Neo4J documentation links and code examples once it’s released.

 

About these ads

4 thoughts on “Marrying JPA with graph databases

  1. Pingback: A JPA Facade for Neo4J | My Blog

  2. My personal dissatisfaction with Spring-data-neo4j was that entity equals/hashCode was supposed to be implemented using graph ID (and not some natural key) which makes awful problems when one is using entities in some hash collections *before* they were persisted (graph ID is null). Moreover, since persistence is not based on some thread-bound persistence context (like Hibernate Session), same entity can be loaded repeatedly during same transaction if traversed from different references/owning entites because Spring-data-neo4j doesn’t have first-level cache to recognize that entity has already been fetched from DB, so you end up with multiple copies of same entity.
    Since Kudnera is JPA implementation, I assume you have proper aproach to problems above?

  3. Hi Vjeran,
    Node IDs generated by Neo4j for each node are internal to Neo4j and Kundera doesn’t attempt to map @Id fields with them. Instead, it treats @Id fields as a node property embedded into node with manual/ auto-indexing mandatorily created.

    Kundera stores each entity (and in turn nodes and relationships) into persistence cache it maintains which is first level cache as described in JPA. When we search for a node using entityManager.find(), Kundera first searches for it into persistence cache and returns if found. Otherwise it searches node into index. This approach is consistent with all other database implementations adopted into Kundera.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s