Spring for Apache Hadoop

org.springframework.data.hadoop.store.split
Class AbstractSplitter

java.lang.Object
  extended by org.springframework.data.hadoop.store.split.AbstractSplitter
All Implemented Interfaces:
Splitter
Direct Known Subclasses:
SlopBlockSplitter

public abstract class AbstractSplitter
extends java.lang.Object
implements Splitter

A base class for Splitter implementations.

Author:
Janne Valkealahti

Constructor Summary
AbstractSplitter()
          Instantiates a new abstract splitter.
AbstractSplitter(org.apache.hadoop.conf.Configuration configuration)
          Instantiates a new abstract splitter.
 
Method Summary
protected  Split buildSplit(long start, long length, java.lang.String[] hosts)
          Builds the split.
protected  long computeSplitSize(long blockSize, long minSize, long maxSize)
          Compute split size.
protected  int getBlockIndex(org.apache.hadoop.fs.BlockLocation[] blocks, long offset)
          Gets the block index.
 org.apache.hadoop.conf.Configuration getConfiguration()
          Gets the hadoop configuration.
abstract  java.util.List<Split> getSplits(org.apache.hadoop.fs.Path path)
          Gets the input splits for a Path.
 void setConfiguration(org.apache.hadoop.conf.Configuration configuration)
          Sets the configuration.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AbstractSplitter

public AbstractSplitter()
Instantiates a new abstract splitter.


AbstractSplitter

public AbstractSplitter(org.apache.hadoop.conf.Configuration configuration)
Instantiates a new abstract splitter.

Parameters:
configuration - the configuration
Method Detail

getSplits

public abstract java.util.List<Split> getSplits(org.apache.hadoop.fs.Path path)
                                         throws java.io.IOException
Description copied from interface: Splitter
Gets the input splits for a Path. A path needs to be a resource which can be split into a list of splits. Actual implementation will define if split is enforced to be a single file or a collection of files.

Specified by:
getSplits in interface Splitter
Parameters:
path - the path
Returns:
the input splits
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

getConfiguration

public org.apache.hadoop.conf.Configuration getConfiguration()
Gets the hadoop configuration.

Returns:
the hadoop configuration

setConfiguration

@Autowired(required=false)
public void setConfiguration(org.apache.hadoop.conf.Configuration configuration)
Sets the configuration.

Parameters:
configuration - the new configuration

computeSplitSize

protected long computeSplitSize(long blockSize,
                                long minSize,
                                long maxSize)
Compute split size. Default implementation takes minSize if it is bigger than minimum from a maxSize or blockSize.

Parameters:
blockSize - the block size
minSize - the min size
maxSize - the max size
Returns:
the long

getBlockIndex

protected int getBlockIndex(org.apache.hadoop.fs.BlockLocation[] blocks,
                            long offset)
Gets the block index.

Parameters:
blocks - the blk locations
offset - the offset
Returns:
the block index
Throws:
java.lang.IllegalArgumentException - if offset is outside of blocks

buildSplit

protected Split buildSplit(long start,
                           long length,
                           java.lang.String[] hosts)
Builds the split.

Parameters:
start - the start
length - the length
hosts - the hosts
Returns:
the split

Spring for Apache Hadoop