Spring for Apache Hadoop

org.springframework.yarn.batch.partition
Class MultiHdfsResourcePartitioner

java.lang.Object
  extended by org.springframework.yarn.batch.partition.MultiHdfsResourcePartitioner
All Implemented Interfaces:
org.springframework.batch.core.partition.support.Partitioner

public class MultiHdfsResourcePartitioner
extends java.lang.Object
implements org.springframework.batch.core.partition.support.Partitioner

Implementation of Partitioner that locates multiple resources and associates their file names with execution context keys. Creates an ExecutionContext per resource, and labels them as {partition0, partition1, ..., partitionN}. The grid size is ignored.

Author:
Janne Valkealahti

Constructor Summary
MultiHdfsResourcePartitioner()
           
 
Method Summary
 java.util.Map<java.lang.String,org.springframework.batch.item.ExecutionContext> partition(int gridSize)
          Assign the filename of each of the injected resources to an ExecutionContext.
 void setConfiguration(org.apache.hadoop.conf.Configuration configuration)
          Sets the Yarn configuration.
 void setForceSplit(boolean forceSplit)
          Sets the force split.
 void setKeyFileName(java.lang.String keyFileName)
          The name of the key for the file name in each ExecutionContext.
 void setKeySplitLength(java.lang.String keySplitLength)
          The name of the key for the file split length in each ExecutionContext.
 void setKeySplitStart(java.lang.String keySplitStart)
          The name of the key for the file split start in each ExecutionContext.
 void setResources(org.springframework.core.io.Resource[] resources)
          The resources to assign to each partition.
 void setSplitFile(boolean splitFile)
          Sets the flat indicating if file input should be split.
 void setSplitSize(int splitSize)
          Sets the input split size relative to block size of the HDFS file.
 void setUsePath(boolean usePath)
          If set to true resource path is set using URL.getPath(), otherwise URL.toExternalForm() is used.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MultiHdfsResourcePartitioner

public MultiHdfsResourcePartitioner()
Method Detail

setResources

public void setResources(org.springframework.core.io.Resource[] resources)
The resources to assign to each partition. In Spring configuration you can use a pattern to select multiple resources.

Parameters:
resources - the resources to use

setKeyFileName

public void setKeyFileName(java.lang.String keyFileName)
The name of the key for the file name in each ExecutionContext. Defaults to "fileName".

Parameters:
keyFileName - the value of the key

setKeySplitStart

public void setKeySplitStart(java.lang.String keySplitStart)
The name of the key for the file split start in each ExecutionContext. Defaults to "splitStart".

Parameters:
keySplitStart - the value of the key

setKeySplitLength

public void setKeySplitLength(java.lang.String keySplitLength)
The name of the key for the file split length in each ExecutionContext. Defaults to "splitLength".

Parameters:
keySplitLength - the value of the key

setSplitSize

public void setSplitSize(int splitSize)
Sets the input split size relative to block size of the HDFS file. Default value is 1 meaning if split is set to happen, every partition will handle exactly one block. Setting split size higher will allow better parallel handling of file input.

Parameters:
splitSize - the new split size

setSplitFile

public void setSplitFile(boolean splitFile)
Sets the flat indicating if file input should be split. If split is set to happen, number of splits can be configured using setSplitSize(int). Default value is TRUE

Parameters:
splitFile - the new split file

setForceSplit

public void setForceSplit(boolean forceSplit)
Sets the force split. If set to FALSE input is forced to split if file size is below HDFS block size. Useful for testing and cases where processed data is very cpu intensive. Default value is FALSE

Parameters:
forceSplit - the new force split

setUsePath

public void setUsePath(boolean usePath)
If set to true resource path is set using URL.getPath(), otherwise URL.toExternalForm() is used. Default value is TRUE.

Parameters:
usePath - whether path part is used

setConfiguration

public void setConfiguration(org.apache.hadoop.conf.Configuration configuration)
Sets the Yarn configuration.

Parameters:
configuration - the new Yarn configuration

partition

public java.util.Map<java.lang.String,org.springframework.batch.item.ExecutionContext> partition(int gridSize)
Assign the filename of each of the injected resources to an ExecutionContext.

Specified by:
partition in interface org.springframework.batch.core.partition.support.Partitioner
See Also:
Partitioner.partition(int)

Spring for Apache Hadoop