Spring Data Key-Value Retwis Sample - Reference Documentation

Costin Leau

SpringSource

${version}

Copies of this document may be made for your own use and for distribution to others, provided that you do not charge any fee for such copies and further provided that each copy contains this Copyright Notice, whether distributed in print or electronically.


Table of Contents

1. Retwis Tutorial
1.1. Requirements
1.2. Setup
1.3. Redis Data Layout
1.4. Application Design
1.4.1. Web Layer
1.4.2. Persistence

1. Retwis Tutorial

The Spring Data Key-Value RetwisJ sample project show-cases a simple, Twitter-like clone built on top of Redis using Spring Data Key Value. It is inspired by the original Redis example, Retwis. In short, it demos a simple, social-like messaging service based entirely on Redis.

1.1 Requirements

To succesfully build and run RetwisJ, one needs:

  • JDK 6.0
  • Redis 2.2.x (Redis 2.0.x should workd as well)
  • Spring Data Key Value 1.0 M3
  • A servlet 2.4 container (such as Tomcat 6)

The version numbers above have been used to develop and test the demo. Other versions (especially higher ones) may or may not work.

It is assumed that users of this tutorial have a basic knowledge of object-oriented design, Java, Spring, JSP, Java web applications and Redis in particular.

1.2 Setup

RetwisJ uses Gradle as its build system. To build the artifact, simply type at the command line:

gradlew war

which will create a WAR ready to be deployed into a container.

[Note]Note
If Gradle is installed, one can use gradle instead of gradlew in the command above.

Once the WAR is created, deploy it into your container of choice and access it using a web browser (typically by accessing http://localhost:8080/retwisj). Goes without saying that before accessing the application, Redis should be started.

1.3 Redis Data Layout

For a detailed introduction to Redis and how it can be used as a datastore for Twitter, take a look at the original Retwis documentation. This document will describe the RetwisJ data structure without going into details of the various Redis features.

To better understand the data layout, it helps to identify the main "domain" objects inside RetwisJ. In its current form, RetwisJ allows users to be created, to post messages, to follow and be followed by other users. Each user automatically sees the posts of the ones she follows but also see other users posts from the timeline. Each italic word represents a "domain" object and its relationship to other objects that need to be represented in Redis.

With a "traditional" database, one would use tables and indexes and so on however Redis is not a RDBMS rather, it is a key-value store. That is, it allows simple values (called strings in Redis terminology), lists, sets and sorted or z-sets to be stored under a unique key. So rather the storing data in a certain table at a certain id, we can store data directly under a key (using a key pattern of choice for easy retrieval) and take advantage of the various Redis key types. Again, for more details, see the "Data Layout" section in Retwis docs.

The user data (name and password) is stored in a hash (or a map). To generate the key for each new user, a dedicated counter is used (called global:uid); thanks to Redis atomic operations, the key can be simply incremented to generate a new user id (uid). We can now store the user data under the key uid:[uid] where [uid] represents the value generated by the global:uid key. For example, with two users "john" and "mike", Redis will contain the following keys:

Table 1.1. 

Key NameTypeValue
global:uidstring2
uid:1hash{name: springrod, pass: interface21}
uid:2hash{name: costinl, pass: secret}

The uid is used internally to store and lookup all user information but we need to store also the relationship between the username and its internal uid - so for example, when a user logs on we can find the uid for the user name. A simple way to do that is to create a lookup or reverse key that relies on the username as the key and the uid as the value, such as user:[name]:uid. We can also add a key that holds the names of all users for easy retrieval - we will use a list for that called users. Following our example above, the layout becomes:

Table 1.2. 

Key NameTypeValue
global:uidstring2
uid:1hash{name: springrod, pass: interface21}
uid:2hash{name: costinl, pass: secret}
user:springrod:uidstring1
user:costinl:uidstring2
userslist{1, 2}

The posts can be stored in a similar way - we can use a key (global:pid) as a counter to generate the post id (pid) and save the post content information (content, data and author) in a hash (pid:[pid]. We can use a list to store the posts for each user or rather their IDs (pids) - say under uid:[uid]:posts and all posts under the timeline key:

Table 1.3. 

Key NameTypeValue
global:pidstring2
pid:1hash{content: Hello World, time: 1301931414757, uid: 1}
pid:2hash{content: Working on some cool stuff, time: 1301931414897, uid: 1}
uid:1:postslist{1, 2}
uid:2:postslist{3}
timelinelist{1, 2, 3}

The same approach can be used to store relationship between users that follow each other. Using the names above, costinl following springrod can be mapped through the uid:[uid]:following and uid:[uid]:followers to indicate the users a certain uid follows or is followed by:

Table 1.4. 

Key NameTypeValue
global:pidstring3
pid:1hash{content: Hello World, time: 1301931414757, uid: 1}
pid:2hash{content: Working on some cool stuff, time: 1301931414897, uid: 1}
pid:3hash{content: Checking out RetwisJ, time: 1301931454897, uid: 2}
uid:1:postslist{1, 2}
uid:2:postslist{3}
timelinelist{1, 2, 3}
uid:1:followerslist{2}
uid:1:followinglist{1}

Since a user following others does see not only her own posts but also of the ones she follows, we add a new key uid:[uid]:timeline similar in functionality to the timeline key, representing the "user post view" or the user timeline.

Just like in the original Retwis, RetwisJ does not rely on Http Session to identify its authenticated users - rather it tracks each user through a dedicated cookie containing a unique random value. Each time a user susccesfully logs in, the tracking value is generated, stored under uid:[uid]:auth and auth:[generated-string] as reverse key and sent as a cookie to the client. On each request, if the cookie is present, the app looks up the associated uid and identifies the user.

To wrap everything up, the Redis store can look as follows:

Table 1.5. 

Key NameTypeValueDescription
User-related keys
global:uidstring2Global user id (uid) counter
uid:1hash{name: springrod, pass: interface21}User info
uid:2hash{name: costinl, pass: secret}User info
user:springrod:uidstring1username -> uid association
user:costinl:uidstring2username -> uid association
userslist{1, 2}List of "active" users
uid:1:followerslist{2}Followers for the user 1 (springrod)
uid:2:followinglist{1}Users followed by user 2 (costinl)
uid:2:authstring{3b7b0677...}uid -> auth key. Random string used for authenticating user 2 (costinl)
auth:3b7b0677...string{2}auth key -> uid
Post-related keys
global:pidstring4Global post id (pid) counter
pid:1hash{content: Hello World, time: 1301931414757, uid: 1}Post 1 data
pid:2hash{content: Working on some cool stuff, time: 1301931414897, uid: 1}Post 2 data
pid:3hash{content: Checking out RetwisJ, time: 1301931454897, uid: 2}Post 3 data
pid:4hash{content: Getting stuff done, time: 1301933414897, uid: 1}Post 4 data
uid:1:postslist{1, 2, 4}User 1 (springrod) posts
uid:1:timelinelist{1, 2, 4}User 1 (springrod) timeline - identical to uid:1:posts if user does not follow other users
uid:2:postslist{3}User 2 (costinl) posts
uid:2:timelinelist{3, 4}User 2 (costinl) timeline - contains the user posts and all the new posts of the users that are followed
timelinelist{1, 2, 3}List of all "active" posts

1.4 Application Design

RetwisJ structure is fairly straight-forward, the application consisting of two layers: the web layer (package org.springframework.data.redis.sample.retwisj.web) and the persistence layer (package org.springframework.data.redis.sample.retwisj.redis). The domain objects is available under package org.springframework.data.redis.sample.retwisj.

1.4.1 Web Layer

RetwisJ web layer is built on top of Spring 3.x MVC framework and JSP as presentation technology. The web tier is intentionally kept to a minimum and simplified as much as possible as it is not the central piece of the application - developers unfamiliar with the two technologies aforementioned should be able to understand the code with minimal effort. It is recommended to use the excellent Spring documentation for more information on Spring MVC.

The entire RetwisJ web front is handled by the RetwisJController, a annotation-based controller that handles the various web requests and model population. Its methods map the actions available to the application: timeline, user mentions, user posts and so forth. The methods of interest are posts and mentions (similar in structure), which handle saving and loading of posts.

The rendering is handled through by JSP pages under WEB-INF/jsp. The bulk of the work, such as listing the followers, the following network, users in commons, posts and so on is handled through the JSP fragments under WEB-INF/templates, namely posts.jspf and network.jspf.

Just like the original Retwis, RetwisJ does not rely on Servlet sessions for user authentication, rather it uses cookie tracking. CookieInterceptor, a Spring MVC interceptor handles that by verifying the cookies of each new request and setting the authentication details. For production environments, we strongly recommend using a dedicated, mature solution such as Spring Security - the sample intent is to showcases the use of Redis in an easy fashion, without introducing other dependencies.

For internationalization (i18n), RetwisJ relies on Spring MVC support, namely ResourceBundleMessageSource and CookieLocaleResolver - see this section in the reference documentation for more information.

1.4.2 Persistence

The Redis interaction is handled through RetwisRepository class that demos many of the Spring Key Redis classes, such as the atomic counters, collection abstractions, the template but also the SORT/GET pattern for avoid the dreaded n+1 problem.

To interact with the post and user counter, RetwisRepository uses two RedisAtomicLong which, just like java.util.concurrent.AtomicLong classes, allow numbers to be manipulated in an atomic fashion on top of Redis.

Since the users and timeline entries are mapped as lists, they can accessed through the Redis collection abstraction; in this case RedisList interface. This way, items can be loaded and saved through the familiar Collection interface, without having to resort to Redis commands. In fact, the Redis collections are used to interact with the user network and execute operations on it. For example, the code for finding out the common followers of two users looks as follows:

private RedisSet<String> following(String uid) {
   return new DefaultRedisSet<String>(KeyUtils.following(uid), template);
}
      
public Collection<String> commonFollowers(String uid, String targetUid) {
   RedisSet<String> tempSet = following(uid).
       intersectAndStore(following(targetUid),
       KeyUtils.commonFollowers(uid, targetUid));
        
   tempSet.expire(5, TimeUnit.SECONDS);

   return covertUidsToNames(tempSet.getKey());
}

The collections do not hold any date, they simply provide a view into the Redis store. So manipulating data this way, does not trigger any traffic between the application and the store. The method above has intersected two sets (the followers of the given users), stored the result in a dedicated set (with a small timeout to avoid the data lingering around) and then returns it to the user. The result is stored so that it can be properly parsed through convertUidToNames (see below) and to properly paginate through it. A potential improvement would be to check whether the set exists before creating a new one - this way reusing the result of a previous intersection.

A common problem with any store is dealing efficiently with normalized data. In RetwisJ, to avoid data redundancy, the users and the posts are referred to by their ids (uid and pid); for example the keys uid:[uid]:posts contains a list of post ids (pids) not the actual posts. This means that when the posts for a user are loaded, the posts referred in the list need to be loaded. A simple approach would be simply iterate through the list and load each post one by one but clearly this is not efficient as it means a lot of (slow) IO activity between the application and the database.

The best solution in such cases it to use the SORT/GET combination which allows data to be loaded based on its key - more information here. SORT/GET can be seen as the equivalent of RDBMS join. It is particularly handy when loading hashes since it permits field selection avoid loading of unnecessary data. Spring Data provides support for the SORT/GET pattern through its sort method and the SortQuery and BulkMapper interface for querying and mapping the bulk result back to an object. Method convertPidsToPosts shows how these classes can be used load the posts by executing a join over a hash.