This task is intended to launch a Spark application. The task submits the Spark application to a YARN cluster for execution. This task is appropriate for a deployment that has access to a Hadoop YARN cluster. The Spark application jar and the Spark Assembly jar should be referenced from an HDFS location.
The spark-yarn task has the following options:
[])<none>)<none>)<none>)<none>)1024M)1)<none>)<none>)$ ./mvnw clean install -PgenerateApps $ cd apps/spark-yarn-task $ ./mvnw clean package
The following example assumes you have Hadoop cluster available and have downloaded the Spark 1.6.3 for Hadoop 2.6 release.
Copy the spark-assembly jar to HDFS, we’re using a directory named /app/spark
hadoop fs -copyFromLocal spark-assembly-1.6.3-hadoop2.6.0.jar /app/spark/spark-assembly-1.6.3-hadoop2.6.0.jar
Copy your Spark app jar to HDFS, we are using a jar named spark-pi-test.jar in this example
hadoop fs -copyFromLocal spark-pi-test.jar /app/spark/spark-pi-test.jar
Run the spark-yarn-task app using the following command and parameters (we are using a class name of org.apache.spark.examples.JavaSparkPi for the --spark.app-class parameter in this example)
java -jar spark-yarn-task-{version}.jar --spark.app-class=org.apache.spark.examples.JavaSparkPi \
--spark.app-jar=hdfs:///app/spark/spark-pi-test.jar \
--spark.assembly-jar=hdfs:///app/spark/spark-assembly-1.6.3-hadoop2.6.0.jar \
--spring.hadoop.fsUri=hdfs://{hadoop-host}:8020 --spring.hadoop.resourceManagerHost={hadoop-host} \
--spring.hadoop.resourceManagerPort=8032 --spring.hadoop.jobHistoryAddress={hadoop-host}:100020 \
--spark.app-args=10Then review the stdout log for the launched Hadoop YARN container to make sure the app completed.