3. Spark Cluster Task

This task is intended to launch a Spark application. The task submits the Spark application for execution in a Spark cluster. This task is appropriate for a deployments where any file references can be resolved to a shared location.

3.1 Options

The spark-cluster task has the following options:

spark.app-args
The arguments for the Spark application. (String[], default: [])
spark.app-class
The main class for the Spark application. (String, default: <none>)
spark.app-jar
The path to a bundled jar that includes your application and its dependencies, excluding any Spark dependencies. (String, default: <none>)
spark.app-name
The name to use for the Spark application submission. (String, default: <none>)
spark.app-status-poll-interval
The interval (ms) to use for polling for the App status. (Long, default: 10000)
spark.executor-memory
The memory setting to be used for each executor. (String, default: 1024M)
spark.master
The master setting to be used (spark://host:port). (String, default: spark://localhost:7077)
spark.resource-archives
A comma separated list of archive files to be included with the app submission. (String, default: <none>)
spark.resource-files
A comma separated list of files to be included with the application submission. (String, default: <none>)
spark.rest-url
The URL for the Spark REST API to be used (spark://host:port). (String, default: spark://localhost:6066)

3.2 Building with Maven

$ ./mvnw clean install -PgenerateApps
$ cd apps/spark-cluster-task
$ ./mvnw clean package

3.3 Example

The following example assumes you have a Spark 1.6.3 cluster running. It also assumes that the app jar resource location is reachable from the cluster. You can store this jar in HDFS.

Run the spark-cluster-task app using the following command and parameters (we are using a class name of org.apache.spark.examples.JavaSparkPi for the --spark.app-class parameter in this example)

java -jar spark-cluster-task-{version}.jar --spark.app-class=org.apache.spark.examples.JavaSparkPi \
  --spark.app-jar=/shared/drive/spark-pi-test.jar \
  --spark.master=spark://<host>:7077 \
  --spark.rest-url=spark://<host>:6066 \
  --spark.app-args=10

Then review the stdout log for the finished driver to make sure the app completed.

3.4 Contributing

We welcome contributions! Follow this link for more information on how to contribute. = Spark YARN Task

This task is intended to launch a Spark application. The task submits the Spark application to a YARN cluster for execution. This task is appropriate for a deployment that has access to a Hadoop YARN cluster. The Spark application jar and the Spark Assembly jar should be referenced from an HDFS location.

3.5 Options

The spark-yarn task has the following options:

spark.app-args
The arguments for the Spark application. (String[], default: [])
spark.app-class
The main class for the Spark application. (String, default: <none>)
spark.app-jar
The path to a bundled jar that includes your application and its dependencies, excluding any Spark dependencies. (String, default: <none>)
spark.app-name
The name to use for the Spark application submission. (String, default: <none>)
spark.assembly-jar
The path for the Spark Assembly jar to use. (String, default: <none>)
spark.executor-memory
The memory setting to be used for each executor. (String, default: 1024M)
spark.num-executors
The number of executors to use. (Integer, default: 1)
spark.resource-archives
A comma separated list of archive files to be included with the app submission. (String, default: <none>)
spark.resource-files
A comma separated list of files to be included with the application submission. (String, default: <none>)

3.6 Building with Maven

$ ./mvnw clean install -PgenerateApps
$ cd apps/spark-yarn-task
$ ./mvnw clean package

3.7 Example

The following example assumes you have Hadoop cluster available and have downloaded the Spark 1.6.3 for Hadoop 2.6 release.

Copy the spark-assembly jar to HDFS, we’re using a directory named /app/spark

hadoop fs -copyFromLocal spark-assembly-1.6.3-hadoop2.6.0.jar /app/spark/spark-assembly-1.6.3-hadoop2.6.0.jar

Copy your Spark app jar to HDFS, we are using a jar named spark-pi-test.jar in this example

hadoop fs -copyFromLocal spark-pi-test.jar /app/spark/spark-pi-test.jar

Run the spark-yarn-task app using the following command and parameters (we are using a class name of org.apache.spark.examples.JavaSparkPi for the --spark.app-class parameter in this example)

java -jar spark-yarn-task-{version}.jar --spark.app-class=org.apache.spark.examples.JavaSparkPi \
  --spark.app-jar=hdfs:///app/spark/spark-pi-test.jar \
  --spark.assembly-jar=hdfs:///app/spark/spark-assembly-1.6.3-hadoop2.6.0.jar \
  --spring.hadoop.fsUri=hdfs://{hadoop-host}:8020 --spring.hadoop.resourceManagerHost={hadoop-host} \
  --spring.hadoop.resourceManagerPort=8032 --spring.hadoop.jobHistoryAddress={hadoop-host}:100020 \
  --spark.app-args=10

Then review the stdout log for the launched Hadoop YARN container to make sure the app completed.

3.8 Contributing

We welcome contributions! Follow this link for more information on how to contribute. = Timestamp Task

A task that prints a timestamp to stdout. Intended to primarily be used for testing.

3.9 Options

The timestamp task has the following options:

timestamp.format
The timestamp format, "yyyy-MM-dd HH:mm:ss.SSS" by default. (String, default: yyyy-MM-dd HH:mm:ss.SSS)

3.10 Building with Maven

$ ./mvnw clean install -PgenerateApps
$ cd apps/timestamp-task
$ ./mvnw clean package

3.11 Example

java -jar timestamp-task-<version>.jar

3.12 Contributing

We welcome contributions! Follow this link for more information on how to contribute.