This task is intended to launch a Spark application. The task submits the Spark application for execution in a Spark cluster. This task is appropriate for a deployments where any file references can be resolved to a shared location.
The spark-cluster task has the following options:
[]
)<none>
)<none>
)<none>
)10000
)1024M
)spark://localhost:7077
)<none>
)<none>
)spark://localhost:6066
)$ ./mvnw clean install -PgenerateApps $ cd apps/spark-cluster-task $ ./mvnw clean package
The following example assumes you have a Spark 1.6.3 cluster running. It also assumes that the app jar resource location is reachable from the cluster. You can store this jar in HDFS.
Run the spark-cluster-task
app using the following command and parameters (we are using a class name of org.apache.spark.examples.JavaSparkPi
for the --spark.app-class
parameter in this example)
java -jar spark-cluster-task-{version}.jar --spark.app-class=org.apache.spark.examples.JavaSparkPi \ --spark.app-jar=/shared/drive/spark-pi-test.jar \ --spark.master=spark://<host>:7077 \ --spark.rest-url=spark://<host>:6066 \ --spark.app-args=10
Then review the stdout log for the finished driver to make sure the app completed.
We welcome contributions! Follow this link for more information on how to contribute. = Spark YARN Task
This task is intended to launch a Spark application. The task submits the Spark application to a YARN cluster for execution. This task is appropriate for a deployment that has access to a Hadoop YARN cluster. The Spark application jar and the Spark Assembly jar should be referenced from an HDFS location.
The spark-yarn task has the following options:
[]
)<none>
)<none>
)<none>
)<none>
)1024M
)1
)<none>
)<none>
)$ ./mvnw clean install -PgenerateApps $ cd apps/spark-yarn-task $ ./mvnw clean package
The following example assumes you have Hadoop cluster available and have downloaded the Spark 1.6.3 for Hadoop 2.6 release.
Copy the spark-assembly jar to HDFS, we’re using a directory named /app/spark
hadoop fs -copyFromLocal spark-assembly-1.6.3-hadoop2.6.0.jar /app/spark/spark-assembly-1.6.3-hadoop2.6.0.jar
Copy your Spark app jar to HDFS, we are using a jar named spark-pi-test.jar
in this example
hadoop fs -copyFromLocal spark-pi-test.jar /app/spark/spark-pi-test.jar
Run the spark-yarn-task
app using the following command and parameters (we are using a class name of org.apache.spark.examples.JavaSparkPi
for the --spark.app-class
parameter in this example)
java -jar spark-yarn-task-{version}.jar --spark.app-class=org.apache.spark.examples.JavaSparkPi \ --spark.app-jar=hdfs:///app/spark/spark-pi-test.jar \ --spark.assembly-jar=hdfs:///app/spark/spark-assembly-1.6.3-hadoop2.6.0.jar \ --spring.hadoop.fsUri=hdfs://{hadoop-host}:8020 --spring.hadoop.resourceManagerHost={hadoop-host} \ --spring.hadoop.resourceManagerPort=8032 --spring.hadoop.jobHistoryAddress={hadoop-host}:100020 \ --spark.app-args=10
Then review the stdout log for the launched Hadoop YARN container to make sure the app completed.
We welcome contributions! Follow this link for more information on how to contribute. = Timestamp Task
A task that prints a timestamp to stdout
. Intended to primarily be used for testing.
The timestamp task has the following options:
yyyy-MM-dd HH:mm:ss.SSS
)$ ./mvnw clean install -PgenerateApps $ cd apps/timestamp-task $ ./mvnw clean package
We welcome contributions! Follow this link for more information on how to contribute.