Spring for Apache Hadoop provides for each Hadoop interaction type, whether it is vanilla Map/Reduce, Hive or Pig, a runner, a dedicated class used for declarative (or programmatic) interaction. The list below illustrates the existing runner classes for each type, their name and namespace element.
Table 10.1. Available _Runner_s
Type | Name | Namespace element | Description |
---|---|---|---|
Map/Reduce |
|
| Runner for Map/Reduce jobs, whether vanilla M/R or streaming |
Hadoop |
|
| Runner for Hadoop `Tool`s (whether stand-alone or as jars). |
Hadoop `jar`s |
|
| Runner for Hadoop jars. |
Hive queries and scripts |
|
| Runner for executing Hive queries or scripts. |
Pig queries and scripts |
|
| Runner for executing Pig scripts. |
JSR-223/JVM scripts |
|
| Runner for executing JVM 'scripting' languages (implementing the JSR-223 API). |
While most of the configuration depends on the underlying type, the runners share common attributes and behaviour so one can use them in a predictive, consistent way. Below is a list of common features:
declaration does not imply execution
The runner allows a script, a job to run but the execution can be triggered either programmatically or by the container at start-up.
run-at-startup
Each runner can execute its action at start-up. By default, this flag is
set to false
. For multiple or on demand execution (such as scheduling)
use the Callable contract (see below).
JDK Callable interface
Each runner implements the JDK Callable interface. Thus one can inject the runner into other beans or its own classes to trigger the execution (as many or as little times as she wants).
pre
and post
actions
Each runner allows one or multiple, pre or/and post actions to be
specified (to chain them together such as executing a job after another
or perfoming clean up). Typically other runners can be used but any
Callable
can be specified. The actions will be executed before and
after the main action, in the declaration order. The runner uses a
fail-safe behaviour meaning, any exception will interrupt the run and
will propagated immediately to the caller.
consider Spring Batch
The runners are meant as a way to execute basic tasks. When multiple executions need to be coordinated and the flow becomes non-trivial, we strongly recommend using Spring Batch which provides all the features of the runners and more (a complete, mature framework for batch execution).