public class SlopBlockSplitter extends AbstractSplitter
SlopBlockSplitter
is a Splitter
which
roughly splitting at least on block boundaries allowing
last block to be combined with previous if its size is
too small. Behaviour of how big this last block overflow
can be is controlled by a slop factor.
Default slop factor is 1.1 which allows last block to overflow by 10%.
Modifier and Type | Field and Description |
---|---|
protected static double |
DEFAULT_SPLIT_SLOP |
Constructor and Description |
---|
SlopBlockSplitter()
Instantiates a new slop block splitter.
|
SlopBlockSplitter(org.apache.hadoop.conf.Configuration configuration)
Instantiates a new slop block splitter.
|
SlopBlockSplitter(org.apache.hadoop.conf.Configuration configuration,
long minSplitSize,
long maxSplitSize)
Instantiates a new slop block splitter.
|
SlopBlockSplitter(org.apache.hadoop.conf.Configuration configuration,
long minSplitSize,
long maxSplitSize,
double slop)
Instantiates a new slop block splitter.
|
Modifier and Type | Method and Description |
---|---|
long |
getMaxSplitSize()
Gets the maximum split size.
|
long |
getMinSplitSize()
Gets the minimum split size.
|
java.util.List<Split> |
getSplits(org.apache.hadoop.fs.Path path)
Gets the input splits for a
Path . |
void |
setMaxSplitSize(long maxSplitSize)
Sets the maximum split size.
|
void |
setMinSplitSize(long minSplitSize)
Sets the minimum split size.
|
void |
setSlop(double slop)
Sets the slop factor.
|
buildSplit, computeSplitSize, getBlockIndex, getConfiguration, setConfiguration
protected static final double DEFAULT_SPLIT_SLOP
public SlopBlockSplitter()
public SlopBlockSplitter(org.apache.hadoop.conf.Configuration configuration)
configuration
- the configurationpublic SlopBlockSplitter(org.apache.hadoop.conf.Configuration configuration, long minSplitSize, long maxSplitSize)
configuration
- the configurationminSplitSize
- the min split sizemaxSplitSize
- the max split sizepublic SlopBlockSplitter(org.apache.hadoop.conf.Configuration configuration, long minSplitSize, long maxSplitSize, double slop)
configuration
- the configurationminSplitSize
- the min split sizemaxSplitSize
- the max split sizeslop
- the slop factorpublic java.util.List<Split> getSplits(org.apache.hadoop.fs.Path path) throws java.io.IOException
Splitter
Path
. A path needs to
be a resource which can be split into a list of splits. Actual
implementation will define if split is enforced to be a single
file or a collection of files.getSplits
in interface Splitter
getSplits
in class AbstractSplitter
path
- the pathjava.io.IOException
- Signals that an I/O exception has occurred.public long getMinSplitSize()
public void setMinSplitSize(long minSplitSize)
minSplitSize
- the new minimum split sizepublic long getMaxSplitSize()
public void setMaxSplitSize(long maxSplitSize)
maxSplitSize
- the new maximum split sizepublic void setSlop(double slop)
slop
- the new slop factor