3. Spark Cluster Task

This task is intended to launch a Spark application. The task submits the Spark application for execution in a Spark cluster. This task is appropriate for a deployments where any file references can be resolved to a shared location.

3.1 Options

The spark-cluster task has the following options:

spark.app-args
The arguments for the Spark application. (String[], default: [])
spark.app-class
The main class for the Spark application. (String, default: <none>)
spark.app-jar
The path to a bundled jar that includes your application and its dependencies, excluding any Spark dependencies. (String, default: <none>)
spark.app-name
The name to use for the Spark application submission. (String, default: <none>)
spark.app-status-poll-interval
The interval (ms) to use for polling for the App status. (Long, default: 10000)
spark.executor-memory
The memory setting to be used for each executor. (String, default: 1024M)
spark.master
The master setting to be used (spark://host:port). (String, default: spark://localhost:7077)
spark.resource-archives
A comma separated list of archive files to be included with the app submission. (String, default: <none>)
spark.resource-files
A comma separated list of files to be included with the application submission. (String, default: <none>)
spark.rest-url
The URL for the Spark REST API to be used (spark://host:port). (String, default: spark://localhost:6066)

3.2 Building with Maven

$ ./mvnw clean install -PgenerateApps
$ cd apps/spark-cluster-task
$ ./mvnw clean package

3.3 Example

The following example assumes you have a Spark 1.6.3 cluster running. It also assumes that the app jar resource location is reachable from the cluster. You can store this jar in HDFS.

Run the spark-cluster-task app using the following command and parameters (we are using a class name of org.apache.spark.examples.JavaSparkPi for the --spark.app-class parameter in this example)

java -jar spark-cluster-task-{version}.jar --spark.app-class=org.apache.spark.examples.JavaSparkPi \
  --spark.app-jar=/shared/drive/spark-pi-test.jar \
  --spark.master=spark://<host>:7077 \
  --spark.rest-url=spark://<host>:6066 \
  --spark.app-args=10

Then review the stdout log for the finished driver to make sure the app completed.