Spring Hadoop Reference Manual

Costin Leau

SpringSource, a division of VMware

1.0.0.M2

Copies of this document may be made for your own use and for distribution to others, provided that you do not charge any fee for such copies and further provided that each copy contains this Copyright Notice, whether distributed in print or electronically.


Table of Contents

Preface
I. Introduction
1. Requirements
II. Spring and Hadoop
2. Hadoop Configuration, MapReduce, and Distributed Cache
2.1. Using the Spring for Apache Hadoop Namespace
2.2. Configuring Hadoop
2.3. Creating a Hadoop Job
2.3.1. Creating a Hadoop Streaming Job
2.3.2. Running a Hadoop Job
2.4. Using the Hadoop Job tasklet
2.5. Running a Hadoop Tool
2.5.1. Replacing Hadoop shell invocations with tool
2.6. Using the Hadoop Tool tasklet
2.7. Map Reduce Generic Options
2.8. Configuring the Hadoop DistributedCache
3. Working with the Hadoop File System
3.1. Configuring the file-system
3.2. Scripting the Hadoop API
3.2.1. Using scripts
3.3. Scripting implicit variables
3.4. File System Shell (FsShell)
3.4.1. DistCp API
3.5. Scripting Lifecycle
3.6. Using the Scripting tasklet
4. Working with HBase
4.1. Data Access Object (DAO) Support
5. Hive integration
5.1. Starting a Hive Server
5.2. Using the Hive Thrift Client
5.3. Using the Hive JDBC Client
5.4. Using the Hive tasklet
6. Pig support
6.1. Using the Pig tasklet
7. Cascading integration
7.1. Using the Cascading tasklet
7.2. Using Scalding
7.3. Spring-specific local Taps
8. Security Support
8.1. HDFS permissions
8.2. User impersonation (Kerberos)
III. Developing Spring for Apache Hadoop Applications
9. Guidance and Examples
9.1. Scheduling
9.2. Batch Job Listeners
IV. Spring for Apache Hadoop sample applications
10. Sample prerequisites
11. Wordcount sample using the Spring Framework
11.1. Introduction
12. Wordcount sample using Spring Batch
12.1. Introduction
12.2. Basic Spring for Apache Hadoop configuration
12.3. Build and run the sample application
12.4. Run the sample application as a standlone Java application
V. Other Resources
13. Useful Links
VI. Appendices
A. Spring for Apache Hadoop Schema