15. Importing Data

The dusty archives

It was now time to pull the data from themoviedb.org. Registering there and getting an API key was simple, as was using the API on the command-line with curl. Looking at the JSON returned for movies and people, we decided to enhance our domain model and add some more fields to enrich the UI.

Example 15.1. JSON movie response

[{"popularity":3,
"translated":true, "adult":false, "language":"en",
"original_name":"[Rec]", "name":"[Rec]", "alternative_name":"[REC]",
"movie_type":"movie",
"id":8329, "imdb_id":"tt1038988", "url":"http://www.themoviedb.org/movie/8329",
"votes":11, "rating":7.2,
"status":"Released",
"tagline":"One Witness. One Camera",
"certification":"R",
"overview":"\"REC\" turns on a young TV reporter and her cameraman who cover the night shift
 at the local fire station...
"keywords":["terror", "lebende leichen", "obsession", "camcorder", "firemen", "reality tv ",
 "bite", "cinematographer",
"attempt to escape", "virus", "lodger", "live-reportage", "schwerverletzt"],
"released":"2007-08-29",
"runtime":78,
"budget":0,
"revenue":0,
"homepage":"http://www.3l-filmverleih.de/rec",
"trailer":"http://www.youtube.com/watch?v=YQUkX_XowqI",
"genres":[{"type":"genre",
"url":"http://themoviedb.org/genre/horror",
"name":"Horror",
"id":27}],
"studios":[{"url":"http://www.themoviedb.org/company/2270", "name":"Filmax Group", "id":2270}],
"languages_spoken":[{"code":"es", "name":"Spanish", "native_name":"Espa\u00f1ol"}],
"countries":[{"code":"ES", "name":"Spain", "url":"http://www.themoviedb.org/country/es"}],
"posters":[{"image":{"type":"poster",
"size":"original", "height":1000, "width":706,
"url":"http://cf1.imgobject.com/posters/3a0/4cc8df415e73d650240003a0/rec-original.jpg",
"id":"4cc8df415e73d650240003a0"}},
....
"cast":[{"name":"Manuela Velasco",
"job":"Actor", "department":"Actors",
"character":"Angela Vidal",
"id":34793, "order":0, "cast_id":1,
"url":"http://www.themoviedb.org/person/34793",
"profile":"http://cf1.imgobject.com/profiles/390/.../manuela-velasco-thumb.jpg"},
...
{"name":"Gl\u00f2ria Viguer",
"job":"Costume Design", "department":"Costume \u0026 Make-Up",
"character":"",
"id":54531, "order":0, "cast_id":21,
"url":"http://www.themoviedb.org/person/54531",
"profile":""}],
"version":150, "last_modified_at":"2011-02-20 23:16:57"}]
    


Example 15.2. JSON actor response

[{"popularity":3,
"name":"Glenn Strange", "known_as":[{"name":"George Glenn Strange"}, {"name":"Glen Strange"},
{"name":"Glen 'Peewee' Strange"}, {"name":"Peewee Strange"}, {"name":"'Peewee' Strange"}],
"id":30112,
"biography":"",
"known_movies":4,
"birthday":"1899-08-16", "birthplace":"Weed, New Mexico, USA",
"url":"http://www.themoviedb.org/person/30112",
"filmography":[{"name":"Bud Abbott Lou Costello Meet Frankenstein",
"id":3073,
"job":"Actor", "department":"Actors",
"character":"The Frankenstein Monster",
"cast_id":23,
"url":"http://www.themoviedb.org/movie/3073",
"poster":"http://cf1.imgobject.com/posters/4ca/.../bud-abbott-lou-costello-meet-frankenstein-cover.jpg",
"adult":false, "release":"1948-06-15"},
...],
"profile":[],
"version":19, "last_modified_at":"2011-03-07 13:02:35"}]
    


For the import process we created a separate importer using Jackson (a JSON library) to fetch and parse the data, and then some transactional methods in the MovieDbImportService to actually import it as movies, roles, and actors. The importer used a simple caching mechanism to keep downloaded actor and movie data on the filesystem, so that we didn't have to overload the remote API. In the code below you can see that we've changed the actor to a person so that we can also accommodate the other folks that participate in movie production.

Example 15.3. Importing the data

@Transactional
public Movie importMovie(String movieId) {
  Movie movie = movieRepository.findById(movieId);
  if (movie == null) { // Not found: Create fresh
      movie = new Movie(movieId,null);
  }

  Map data = loadMovieData(movieId);
  if (data.containsKey("not_found")) throw
     new RuntimeException("Data for Movie "+movieId+" not found.");
  movieDbJsonMapper.mapToMovie(data, movie);
  movieRepository.save(movie);
  relatePersonsToMovie(movie, data);
  return movie;
}

private void relatePersonsToMovie(Movie movie, Map data) {
  Collection<Map> cast = (Collection<Map>) data.get("cast");
  for (Map entry : cast) {
    String id = "" + entry.get("id");
    String jobName = (String) entry.get("job");
    Roles job = movieDbJsonMapper.mapToRole(jobName);
    if (job==null) {
      continue;
    }
    switch (job) {
      case DIRECTED:
        final Director director = doImportPerson(id, new Director(id));
        director.directed(movie);
        directorRepository.save(director);
        break;
      case ACTS_IN:
        final Actor actor = doImportPerson(id, new Actor(id));
        actor.playedIn(movie, (String) entry.get("character"));
        actorRepository.save(actor);
        break;
    }
  }
}

public void mapToMovie(Map data, Movie movie) {
  movie.setTitle((String) data.get("name"));
  movie.setLanguage((String) data.get("language"));
  movie.setTagline((String) data.get("tagline"));
  movie.setReleaseDate(toDate(data, "released", "yyyy-MM-dd"));
...
  movie.setImageUrl(selectImageUrl((List<Map>) data.get("posters"), "poster", "mid"));
}

        


The last part involved adding a protected URI to the MovieController to allow importing ranges of movies. During testing, it became obvious that the calls to themoviedb.org were a limiting factor. As soon as the data was stored locally, the Neo4j import was a sub-second deal.