This task is intended to launch a Spark application. The task submits the Spark application for execution in a Spark cluster. This task is appropriate for a deployments where any file references can be resolved to a shared location.
The spark-cluster task has the following options:
[])<none>)<none>)<none>)10000)1024M)spark://localhost:7077)<none>)<none>)spark://localhost:6066)$ ./mvnw clean install -PgenerateApps $ cd apps/spark-cluster-task $ ./mvnw clean package
The following example assumes you have a Spark 1.6.3 cluster running. It also assumes that the app jar resource location is reachable from the cluster. You can store this jar in HDFS.
Run the spark-cluster-task app using the following command and parameters (we are using a class name of org.apache.spark.examples.JavaSparkPi for the --spark.app-class parameter in this example)
java -jar spark-cluster-task-{version}.jar --spark.app-class=org.apache.spark.examples.JavaSparkPi \
--spark.app-jar=/shared/drive/spark-pi-test.jar \
--spark.master=spark://<host>:7077 \
--spark.rest-url=spark://<host>:6066 \
--spark.app-args=10Then review the stdout log for the finished driver to make sure the app completed.
We welcome contributions! Follow this link for more information on how to contribute. = Spark YARN Task
This task is intended to launch a Spark application. The task submits the Spark application to a YARN cluster for execution. This task is appropriate for a deployment that has access to a Hadoop YARN cluster. The Spark application jar and the Spark Assembly jar should be referenced from an HDFS location.
The spark-yarn task has the following options:
[])<none>)<none>)<none>)<none>)1024M)1)<none>)<none>)$ ./mvnw clean install -PgenerateApps $ cd apps/spark-yarn-task $ ./mvnw clean package
The following example assumes you have Hadoop cluster available and have downloaded the Spark 1.6.3 for Hadoop 2.6 release.
Copy the spark-assembly jar to HDFS, we’re using a directory named /app/spark
hadoop fs -copyFromLocal spark-assembly-1.6.3-hadoop2.6.0.jar /app/spark/spark-assembly-1.6.3-hadoop2.6.0.jar
Copy your Spark app jar to HDFS, we are using a jar named spark-pi-test.jar in this example
hadoop fs -copyFromLocal spark-pi-test.jar /app/spark/spark-pi-test.jar
Run the spark-yarn-task app using the following command and parameters (we are using a class name of org.apache.spark.examples.JavaSparkPi for the --spark.app-class parameter in this example)
java -jar spark-yarn-task-{version}.jar --spark.app-class=org.apache.spark.examples.JavaSparkPi \
--spark.app-jar=hdfs:///app/spark/spark-pi-test.jar \
--spark.assembly-jar=hdfs:///app/spark/spark-assembly-1.6.3-hadoop2.6.0.jar \
--spring.hadoop.fsUri=hdfs://{hadoop-host}:8020 --spring.hadoop.resourceManagerHost={hadoop-host} \
--spring.hadoop.resourceManagerPort=8032 --spring.hadoop.jobHistoryAddress={hadoop-host}:100020 \
--spark.app-args=10Then review the stdout log for the launched Hadoop YARN container to make sure the app completed.
We welcome contributions! Follow this link for more information on how to contribute. = Timestamp Task
A task that prints a timestamp to stdout. Intended to primarily be used for testing.
The timestamp task has the following options:
yyyy-MM-dd HH:mm:ss.SSS)$ ./mvnw clean install -PgenerateApps $ cd apps/timestamp-task $ ./mvnw clean package
We welcome contributions! Follow this link for more information on how to contribute.