3. Spark Cluster Task

3. Spark Cluster Task
Prev	Part II. Tasks	Next

This task is intended to launch a Spark application. The task submits the Spark application for execution in a Spark cluster. This task is appropriate for a deployments where any file references can be resolved to a shared location.

3.1 Options

The spark-cluster task has the following options:

spark.app-args: The arguments for the Spark application. (String[], default: [])
spark.app-class: The main class for the Spark application. (String, default: <none>)
spark.app-jar: The path to a bundled jar that includes your application and its dependencies, excluding any Spark dependencies. (String, default: <none>)
spark.app-name: The name to use for the Spark application submission. (String, default: <none>)
spark.app-status-poll-interval: The interval (ms) to use for polling for the App status. (Long, default: 10000)
spark.executor-memory: The memory setting to be used for each executor. (String, default: 1024M)
spark.master: The master setting to be used (spark://host:port). (String, default: spark://localhost:7077)
spark.resource-archives: A comma separated list of archive files to be included with the app submission. (String, default: <none>)
spark.resource-files: A comma separated list of files to be included with the application submission. (String, default: <none>)
spark.rest-url: The URL for the Spark REST API to be used (spark://host:port). (String, default: spark://localhost:6066)

3.2 Building with Maven

$ ./mvnw clean install -PgenerateApps
$ cd apps/spark-cluster-task
$ ./mvnw clean package

3.3 Example

The following example assumes you have a Spark 1.6.3 cluster running. It also assumes that the app jar resource location is reachable from the cluster. You can store this jar in HDFS.

Run the spark-cluster-task app using the following command and parameters (we are using a class name of org.apache.spark.examples.JavaSparkPi for the --spark.app-class parameter in this example)

java -jar spark-cluster-task-{version}.jar --spark.app-class=org.apache.spark.examples.JavaSparkPi \
  --spark.app-jar=/shared/drive/spark-pi-test.jar \
  --spark.master=spark://<host>:7077 \
  --spark.rest-url=spark://<host>:6066 \
  --spark.app-args=10

Then review the stdout log for the finished driver to make sure the app completed.

3.4 Contributing

We welcome contributions! Follow this link for more information on how to contribute. = Spark YARN Task

This task is intended to launch a Spark application. The task submits the Spark application to a YARN cluster for execution. This task is appropriate for a deployment that has access to a Hadoop YARN cluster. The Spark application jar and the Spark Assembly jar should be referenced from an HDFS location.

3.5 Options

The spark-yarn task has the following options:

spark.app-args: The arguments for the Spark application. (String[], default: [])
spark.app-class: The main class for the Spark application. (String, default: <none>)
spark.app-jar: The path to a bundled jar that includes your application and its dependencies, excluding any Spark dependencies. (String, default: <none>)
spark.app-name: The name to use for the Spark application submission. (String, default: <none>)
spark.assembly-jar: The path for the Spark Assembly jar to use. (String, default: <none>)
spark.executor-memory: The memory setting to be used for each executor. (String, default: 1024M)
spark.num-executors: The number of executors to use. (Integer, default: 1)
spark.resource-archives: A comma separated list of archive files to be included with the app submission. (String, default: <none>)
spark.resource-files: A comma separated list of files to be included with the application submission. (String, default: <none>)

3.6 Building with Maven

$ ./mvnw clean install -PgenerateApps
$ cd apps/spark-yarn-task
$ ./mvnw clean package

3.7 Example

The following example assumes you have Hadoop cluster available and have downloaded the Spark 1.6.3 for Hadoop 2.6 release.

Copy the spark-assembly jar to HDFS, we’re using a directory named /app/spark

hadoop fs -copyFromLocal spark-assembly-1.6.3-hadoop2.6.0.jar /app/spark/spark-assembly-1.6.3-hadoop2.6.0.jar

Copy your Spark app jar to HDFS, we are using a jar named spark-pi-test.jar in this example

hadoop fs -copyFromLocal spark-pi-test.jar /app/spark/spark-pi-test.jar

Run the spark-yarn-task app using the following command and parameters (we are using a class name of org.apache.spark.examples.JavaSparkPi for the --spark.app-class parameter in this example)

java -jar spark-yarn-task-{version}.jar --spark.app-class=org.apache.spark.examples.JavaSparkPi \
  --spark.app-jar=hdfs:///app/spark/spark-pi-test.jar \
  --spark.assembly-jar=hdfs:///app/spark/spark-assembly-1.6.3-hadoop2.6.0.jar \
  --spring.hadoop.fsUri=hdfs://{hadoop-host}:8020 --spring.hadoop.resourceManagerHost={hadoop-host} \
  --spring.hadoop.resourceManagerPort=8032 --spring.hadoop.jobHistoryAddress={hadoop-host}:100020 \
  --spark.app-args=10

Then review the stdout log for the launched Hadoop YARN container to make sure the app completed.

3.8 Contributing

We welcome contributions! Follow this link for more information on how to contribute. = Timestamp Task

A task that prints a timestamp to stdout. Intended to primarily be used for testing.

3.9 Options

The timestamp task has the following options:

timestamp.format: The timestamp format, "yyyy-MM-dd HH:mm:ss.SSS" by default. (String, default: yyyy-MM-dd HH:mm:ss.SSS)

3.10 Building with Maven

$ ./mvnw clean install -PgenerateApps
$ cd apps/timestamp-task
$ ./mvnw clean package

3.11 Example

java -jar timestamp-task-<version>.jar

3.12 Contributing

We welcome contributions! Follow this link for more information on how to contribute.

Prev	Up	Next
2. Spark Client Task	Home	4. JdbcHdfs Task