Spring for Apache Hadoop

org.springframework.data.hadoop.store.split
Class SlopBlockSplitter

java.lang.Object
  extended by org.springframework.data.hadoop.store.split.AbstractSplitter
      extended by org.springframework.data.hadoop.store.split.SlopBlockSplitter
All Implemented Interfaces:
Splitter
Direct Known Subclasses:
StaticBlockSplitter, StaticLengthSplitter

public class SlopBlockSplitter
extends AbstractSplitter

A SlopBlockSplitter is a Splitter which roughly splitting at least on block boundaries allowing last block to be combined with previous if its size is too small. Behaviour of how big this last block overflow can be is controlled by a slop factor.

Default slop factor is 1.1 which allows last block to overflow by 10%.

Author:
Janne Valkealahti

Field Summary
protected static double DEFAULT_SPLIT_SLOP
           
 
Constructor Summary
SlopBlockSplitter()
          Instantiates a new slop block splitter.
SlopBlockSplitter(org.apache.hadoop.conf.Configuration configuration)
          Instantiates a new slop block splitter.
SlopBlockSplitter(org.apache.hadoop.conf.Configuration configuration, long minSplitSize, long maxSplitSize)
          Instantiates a new slop block splitter.
SlopBlockSplitter(org.apache.hadoop.conf.Configuration configuration, long minSplitSize, long maxSplitSize, double slop)
          Instantiates a new slop block splitter.
 
Method Summary
 long getMaxSplitSize()
          Gets the maximum split size.
 long getMinSplitSize()
          Gets the minimum split size.
 java.util.List<Split> getSplits(org.apache.hadoop.fs.Path path)
          Gets the input splits for a Path.
 void setMaxSplitSize(long maxSplitSize)
          Sets the maximum split size.
 void setMinSplitSize(long minSplitSize)
          Sets the minimum split size.
 void setSlop(double slop)
          Sets the slop factor.
 
Methods inherited from class org.springframework.data.hadoop.store.split.AbstractSplitter
buildSplit, computeSplitSize, getBlockIndex, getConfiguration, setConfiguration
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_SPLIT_SLOP

protected static final double DEFAULT_SPLIT_SLOP
See Also:
Constant Field Values
Constructor Detail

SlopBlockSplitter

public SlopBlockSplitter()
Instantiates a new slop block splitter.


SlopBlockSplitter

public SlopBlockSplitter(org.apache.hadoop.conf.Configuration configuration)
Instantiates a new slop block splitter.

Parameters:
configuration - the configuration

SlopBlockSplitter

public SlopBlockSplitter(org.apache.hadoop.conf.Configuration configuration,
                         long minSplitSize,
                         long maxSplitSize)
Instantiates a new slop block splitter.

Parameters:
configuration - the configuration
minSplitSize - the min split size
maxSplitSize - the max split size

SlopBlockSplitter

public SlopBlockSplitter(org.apache.hadoop.conf.Configuration configuration,
                         long minSplitSize,
                         long maxSplitSize,
                         double slop)
Instantiates a new slop block splitter.

Parameters:
configuration - the configuration
minSplitSize - the min split size
maxSplitSize - the max split size
slop - the slop factor
Method Detail

getSplits

public java.util.List<Split> getSplits(org.apache.hadoop.fs.Path path)
                                throws java.io.IOException
Description copied from interface: Splitter
Gets the input splits for a Path. A path needs to be a resource which can be split into a list of splits. Actual implementation will define if split is enforced to be a single file or a collection of files.

Specified by:
getSplits in interface Splitter
Specified by:
getSplits in class AbstractSplitter
Parameters:
path - the path
Returns:
the input splits
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

getMinSplitSize

public long getMinSplitSize()
Gets the minimum split size.

Returns:
the minimum split size

setMinSplitSize

public void setMinSplitSize(long minSplitSize)
Sets the minimum split size.

Parameters:
minSplitSize - the new minimum split size

getMaxSplitSize

public long getMaxSplitSize()
Gets the maximum split size.

Returns:
the maximum split size

setMaxSplitSize

public void setMaxSplitSize(long maxSplitSize)
Sets the maximum split size.

Parameters:
maxSplitSize - the new maximum split size

setSlop

public void setSlop(double slop)
Sets the slop factor.

Parameters:
slop - the new slop factor

Spring for Apache Hadoop