Spring for Apache Hadoop requires JDK level 6.0 (just like Hadoop) and above, Spring Framework 3.0 (3.2 recommended) and above and is built against Apache Hadoop 1.2.1. SHDP supports and is tested daily against Apache Hadoop 1.2.1 and also against 1.1.2 and 2.0.x alpha as well as against various Hadoop distributions:
Pivotal HD 1.1
Cloudera CDH4 (cdh4.3.1 MRv1) distributions
Hortonworks Data Platform 1.3
Any distro compatible with Apache Hadoop 1.x stable should be supported.
We have recently added support to allow Hadoop 2.x based distributions to be used with the current functionality. We are running test builds against Apache Hadoop 2.0.x alpha, Pivotal HD 1.1 and Hortonworks Data Platform 2.0.
Note | |
---|---|
Hadoop YARN support is only available in Spring for Apache Hadoop version 2.0 and later. |
Spring Data Hadoop is provided out of the box and it is certified to work on Greenplum HD 1.2 and Pivotal HD 1.0 and 1.1 distributions. It is also certified to run against Hortonworks HDP 1.3.
Instructions for setting up project builds using various supported distributions are provided on the Spring for Apache Hadoop wiki -http://cascading.org/http://hbase.apache.org/http://hive.apache.org/ https://github.com/spring-projects/spring-hadoop/wiki
Regarding Hadoop-related projects, SDHP supports Cascading 2.1, HBase 0.90.x, Hive 0.8.x and Pig 0.9.x and above. As a rule of thumb, when using Hadoop-related projects, such as Hive or Pig, use the required Hadoop version as a basis for discovering the supported versions.
Spring for Apache Hadoop also requires a Hadoop installation up and running. If you don't already have a Hadoop cluster up and running in your environment, a good first step is to create a single-node cluster. To install Hadoop 1.2.1, the "Getting Started" page from the official Apache documentation is a good general guide. If you are running on Ubuntu, the tutorial from Michael G. Noll, "Running Hadoop On Ubuntu Linux (Single-Node Cluster)" provides more details. It is also convenient to download a Virtual Machine where Hadoop is setup and ready to go. Cloudera, Hortonworks and Pivotal all provide virtual machines and provide VM downloads on their product pages. Additionally, the appendix provides information on how to use Spring for Apache Hadoop and setup Hadoop with cloud providers, such as Amazon Web Services.