Spring Hadoop Reference Manual
		Next

Spring Hadoop Reference Manual

Costin Leau

SpringSource, a division of VMware

1.0.0.M1

Copies of this document may be made for your own use and for distribution to others, provided that you do not charge any fee for such copies and further provided that each copy contains this Copyright Notice, whether distributed in print or electronically.

Table of Contents

Preface

I. Introduction

1. Requirements

II. Spring and Hadoop

2. Hadoop Configuration, MapReduce, and Distributed Cache

2.1. Using the Spring Hadoop Namespace

2.2. Configuring Hadoop

2.3. Creating a Hadoop Job

2.3.1. Creating a Hadoop Streaming Job
2.3.2. Running a Hadoop Job

2.4. Configuring the Hadoop DistributedCache

2.5. Using the Hadoop Job tasklet

2.6. Running a Hadoop Tool

2.7. Using the Hadoop Tool tasklet

3. Working with the Hadoop File System

3.1. Scripting the Hadoop API

3.1.1. Using scripts

3.2. Scripting implicit variables

3.3. File System Shell (FsShell)

3.3.1. DistCp API

3.4. Scripting Lifecycle

3.5. Using the Scripting tasklet

4. Working with HBase

5. Hive integration

5.1. Starting a Hive Server
5.2. Using the Hive Thrift Client
5.3. Using the Hive JDBC Client
5.4. Using the Hive tasklet

6. Pig support

6.1. Using the Pig tasklet

7. Cascading integration

7.1. Using the Cascading tasklet
7.2. Using Scalding

III. Developing Spring Hadoop Applications

8. Guidance and Examples

8.1. Scheduling
8.2. Batch Job Listeners

IV. Spring Hadoop sample applications

9. Sample prerequisites

10. Wordcount sample using the Spring Framework

10.1. Introduction

11. Wordcount sample using Spring Batch

11.1. Introduction
11.2. Basic Spring Hadoop configuration
11.3. Build and run the sample application
11.4. Run the sample application as a standlone Java application

V. Other Resources

12. Useful Links

VI. Appendices

A. Spring Data Hadoop Schema

		Next
		Preface