Spring for Apache Hadoop Reference Manual
		Next

Spring for Apache Hadoop Reference Manual

Authors

Costin Leau , Thomas Risberg , Janne Valkealahti

2.0.0.RC2

Copies of this document may be made for your own use and for distribution to others, provided that you do not charge any fee for such copies and further provided that each copy contains this Copyright Notice, whether distributed in print or electronically.

Table of Contents

I. Introduction

1. Requirements
2. Additional Resources

II. Spring and Hadoop

3. Hadoop Configuration, MapReduce, and Distributed Cache

3.1. Using the Spring for Apache Hadoop Namespace

3.2. Configuring Hadoop

3.3. Creating a Hadoop Job

3.3.1. Creating a Hadoop Streaming Job

3.4. Running a Hadoop Job

3.4.1. Using the Hadoop Job tasklet

3.5. Running a Hadoop Tool

3.5.1. Replacing Hadoop shell invocations with tool-runner
3.5.2. Using the Hadoop Tool tasklet

3.6. Running a Hadoop Jar

3.6.1. Using the Hadoop Jar tasklet

3.7. Configuring the Hadoop DistributedCache

3.8. Map Reduce Generic Options

4. Working with the Hadoop File System

4.1. Configuring the file-system

4.2. Using HDFS Resource Loader

4.3. Scripting the Hadoop API

4.3.1. Using scripts

4.4. Scripting implicit variables

4.4.1. Running scripts
4.4.2. Using the Scripting tasklet

4.5. File System Shell (FsShell)

4.5.1. DistCp API

5. Working with HBase

5.1. Data Access Object (DAO) Support

6. Hive integration

6.1. Starting a Hive Server

6.2. Using the Hive Thrift Client

6.3. Using the Hive JDBC Client

6.4. Running a Hive script or query

6.4.1. Using the Hive tasklet

6.5. Interacting with the Hive API

7. Pig support

7.1. Running a Pig script

7.1.1. Using the Pig tasklet

7.2. Interacting with the Pig API

8. Using the runner classes

9. Security Support

9.1. HDFS permissions
9.2. User impersonation (Kerberos)

10. Yarn Support

10.1. Using the Spring for Apache Yarn Namespace

10.2. Using the Spring for Apache Yarn JavaConfig

10.3. Configuring Yarn

10.4. Local Resources

10.5. Container Environment

10.6. Application Client

10.7. Application Master

10.8. Application Container

10.9. Application Master Services

10.9.1. Basic Concepts
10.9.2. Using JSON
10.9.3. Converters

10.10. Application Master Service

10.11. Application Master Service Client

10.12. Using Spring Batch

10.12.1. Batch Jobs

10.12.2. Partitioning

Configuring Master
Configuring Container

10.13. Using Spring Boot Application Model

10.13.1. Auto Configuration

10.13.2. Application Files

10.13.3. Application Classpath

Simple Executable Jar
Simple Zip Archive

10.13.4. Container Runners

Custom Runner

10.13.5. Resource Localizing

10.13.6. Container as POJO

10.13.7. Configuration Properties

10.13.8. Controlling Applications

Generic Usage
Using Configuration Properties
Using YarnPushApplication
Using YarnSubmitApplication
Using YarnInfoApplication
Using YarnKillApplication

11. Testing Support

11.1. Testing MapReduce

11.1.1. Mini Clusters for MapReduce
11.1.2. Configuration
11.1.3. Simplified Testing
11.1.4. Wordcount Example

11.2. Testing Yarn

11.2.1. Mini Clusters for Yarn
11.2.2. Configuration
11.2.3. Simplified Testing
11.2.4. Multi Context Example

11.3. Testing Boot Based Applications

III. Developing Spring for Apache Hadoop Applications

12. Guidance and Examples

12.1. Scheduling
12.2. Batch Job Listeners

IV. Spring for Apache Hadoop sample applications

V. Other Resources

13. Useful Links

VI. Appendices

A. Using Spring for Apache Hadoop with Amazon EMR

A.1. Start up the cluster
A.2. Open an SSH Tunnel as a SOCKS proxy
A.3. Configuring Hadoop to use a SOCKS proxy
A.4. Accessing the file-system
A.5. Shutting down the cluster
A.6. Example configuration

B. Using Spring for Apache Hadoop with EC2/Apache Whirr

B.1. Setting up the Hadoop cluster on EC2 with Apache Whirr

C. Spring for Apache Hadoop Schema

		Next
		Preface