Chapter 17. Introduction to Neo4j

Neo4j is a graph database. It is a fully ACID transactional database that stores data structured as graphs. A graph consists of nodes, connected by relationships. It is a flexible data structure that allows for high query performance on complex data, while being intuitive for the developer.

Neo4j has been in commercial development for 10 years and in production for over 7 years. It is a mature and robust graph database that:

has an intuitive graph-oriented model for data representation. Instead of tables, rows, and columns, you work with a flexible graph network consisting of nodes, relationships, and properties.
has a disk-based, native storage manager completely optimized for storing graph structures for maximum performance and scalability.
is scalable. Neo4j can handle graphs of several billion nodes/relationships/properties on a single machine, but can also be scaled out across multiple machines for high availability.
has a powerful traversal framework for fast traversals in the node space.
can be deployed as a standalone server or an embedded database with a very small footprint (~700k jar).
has a simple and convenient API.

In addition, Neo4j includes the usual database features: ACID transactions, durable persistence, concurrency control, transaction recovery, high availability and everything else you’d expect from an enterprise database. Neo4j is released under a dual free software/commercial license model.

17.1. What is a graph database?

A graph database is a storage engine that is specialized in storing and retrieving vast networks of data. It efficiently stores nodes and relationship and allows high performance traversal of those structures. With property graphs it is possible to add an arbitrary number of properties to nodes and relationships.

17.2. GraphDatabaseService

The interface org.neo4j.graphdb.GraphDatabaseService provides access to the storage engine. Its features include creating and retrieving Nodes and Relationships, managing indexes, via an IndexManager, database lifecycle callbacks, transation management and more.

The EmbeddedGraphDatabaseService is an implementation of GraphDatabaseService that is used to embed Neo4j in a Java application. This implmentation is used so as to provide the highest and tightest integration. There are other, remote implementations that provide access to Neo4j stores via REST.

17.3. Creating Nodes and Relationships

Using the API of GraphDatabaseService it is easy to create nodes and relate them to each other. Relationships are named. Both nodes and relationships can have properties. Property values can be primitive Java types and Strings, byte arrays for binary data, or arrays of other Java primitives or Strings. Node creation and modification has to happen within a transaction, while reading from the graph store can be done with or without a transaction.

GraphDatabaseService graphDb = new EmbeddedGraphDatabase( "helloworld" );
Transaction tx = graphDb.beginTx();
try {

	Node firstNode = graphDb.createNode();
	Node secondNode = graphDb.createNode();
	firstNode.setProperty( "message", "Hello, " );
	secondNode.setProperty( "message", "world!" );

	Relationship relationship = firstNode.createRelationshipTo( secondNode, 
		DynamicRelationshipType.of("KNOWS") );
	relationship.setProperty( "message", "brave Neo4j " );
	tx.success();
} finally {
	tx.finish();
}

17.4. Graph traversal

Getting a single node or relationship and examining it is not the main use case of a graph database. Fast graph traversal and application of graph algorithms are. Neo4j provides means via a concise DSL to define TraversalDescriptions that can then be applied to a start node and will produce a stream of nodes and/or relationships as a lazy result using an Iterable.

TraversalDescription traversalDescription = Traversal.description()
          .depthFirst()
          .relationships( KNOWS )
          .relationships( LIKES, Direction.INCOMING )
          .prune( Traversal.pruneAfterDepth( 5 ) );
for ( Path position : traversalDescription.traverse( myStartNode )) {
    System.out.println( "Path from start node to current position is " + position );
}

17.5. Indexing

The best way for retrieving start nodes for traversals is using Neo4j's index facilities. The GraphDatabaseService provides access to the IndexManager which in turn retrieves named indexes for nodes and relationships. Both can be indexed with property names and values. Retrieval is done by query methods on Index to return an IndexHits iterator.

IndexManager indexManager = graphDb.index();
Index<Node> nodeIndex = indexManager.forNodes("a-node-index");
nodeIndex.add(node, "property","value");
for (Node foundNode = nodeIndex.get("property","value")) {
    assert node.getProperty("property").equals("value");
}

Note: Spring Data Graph provides auto-indexing via the @Indexed annotation, while this still is a manual process when using the Neo4j API.

Prev	Home	Next
Preface	Sponsored by SpringSource	Chapter 18. Programming model for Spring Data Graph