Spring for Apache Hadoop

org.springframework.data.hadoop.fs
Class HdfsResourceLoader

java.lang.Object
  extended by org.springframework.core.io.DefaultResourceLoader
      extended by org.springframework.data.hadoop.fs.HdfsResourceLoader
All Implemented Interfaces:
java.io.Closeable, org.springframework.beans.factory.DisposableBean, org.springframework.beans.factory.InitializingBean, org.springframework.core.io.ResourceLoader, org.springframework.core.io.support.ResourcePatternResolver, org.springframework.core.Ordered, org.springframework.core.PriorityOrdered

public class HdfsResourceLoader
extends org.springframework.core.io.DefaultResourceLoader
implements org.springframework.core.io.support.ResourcePatternResolver, org.springframework.core.PriorityOrdered, java.io.Closeable, org.springframework.beans.factory.DisposableBean, org.springframework.beans.factory.InitializingBean

Spring ResourceLoader over Hadoop FileSystem.

Author:
Costin Leau, Janne Valkealahti

Nested Class Summary
 
Nested classes/interfaces inherited from class org.springframework.core.io.DefaultResourceLoader
org.springframework.core.io.DefaultResourceLoader.ClassPathContextResource
 
Field Summary
 
Fields inherited from interface org.springframework.core.io.support.ResourcePatternResolver
CLASSPATH_ALL_URL_PREFIX
 
Fields inherited from interface org.springframework.core.io.ResourceLoader
CLASSPATH_URL_PREFIX
 
Fields inherited from interface org.springframework.core.Ordered
HIGHEST_PRECEDENCE, LOWEST_PRECEDENCE
 
Constructor Summary
HdfsResourceLoader(org.apache.hadoop.conf.Configuration config)
          Constructs a new HdfsResourceLoader instance.
HdfsResourceLoader(org.apache.hadoop.conf.Configuration config, java.net.URI uri)
          Constructs a new HdfsResourceLoader instance.
HdfsResourceLoader(org.apache.hadoop.conf.Configuration config, java.net.URI uri, java.lang.String user)
          Constructs a new HdfsResourceLoader instance.
HdfsResourceLoader(org.apache.hadoop.fs.FileSystem fs)
          Constructs a new HdfsResourceLoader instance.
 
Method Summary
 void afterPropertiesSet()
           
 void close()
           
 void destroy()
           
protected  java.lang.String determineRootDir(java.lang.String location)
          Determine the root directory for the given location.
protected  java.util.Set<org.springframework.core.io.Resource> doFindMatchingFileSystemResources(org.apache.hadoop.fs.Path rootDir, java.lang.String subPattern)
          Find all resources in the file system that match the given location pattern via the Ant-style PathMatcher.
protected  java.util.Set<org.springframework.core.io.Resource> doFindPathMatchingFileResources(org.springframework.core.io.Resource rootDirResource, java.lang.String subPattern)
          Find all resources in the hdfs file system that match the given location pattern via the Ant-style PathMatcher.
protected  void doRetrieveMatchingFiles(java.lang.String fullPattern, org.apache.hadoop.fs.Path dir, java.util.Set<org.apache.hadoop.fs.Path> result)
          Recursively retrieve files that match the given pattern, adding them to the given result list.
protected  org.springframework.core.io.Resource[] findPathMatchingResources(java.lang.String locationPattern)
          Find all resources that match the given location pattern via the Ant-style PathMatcher.
 java.lang.ClassLoader getClassLoader()
           
 org.apache.hadoop.fs.FileSystem getFileSystem()
          Returns the Hadoop file system used by this resource loader.
 int getOrder()
           
 org.springframework.core.io.Resource getResource(java.lang.String location)
           
protected  org.springframework.core.io.Resource getResourceByPath(java.lang.String path)
           
 org.springframework.core.io.Resource[] getResources(java.lang.String locationPattern)
           
protected  java.util.Set<org.apache.hadoop.fs.Path> retrieveMatchingFiles(org.apache.hadoop.fs.Path rootDir, java.lang.String pattern)
          Retrieve files that match the given path pattern, checking the given directory and its subdirectories.
 void setHandleNoprefix(boolean handleNoprefix)
          Sets the handle noprefix.
 void setResourcePatternResolver(org.springframework.core.io.support.ResourcePatternResolver resourcePatternResolver)
          Sets the resource pattern resolver.
 void setUseCodecs(boolean useCodecs)
          Indicates whether to use (or not) the codecs found inside the Hadoop configuration.
 
Methods inherited from class org.springframework.core.io.DefaultResourceLoader
setClassLoader
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HdfsResourceLoader

public HdfsResourceLoader(org.apache.hadoop.conf.Configuration config)
Constructs a new HdfsResourceLoader instance.

Parameters:
config - Hadoop configuration to use.

HdfsResourceLoader

public HdfsResourceLoader(org.apache.hadoop.conf.Configuration config,
                          java.net.URI uri,
                          java.lang.String user)
Constructs a new HdfsResourceLoader instance.

Parameters:
config - Hadoop configuration to use.
uri - Hadoop file system URI.
user - Hadoop user for accessing the file system.

HdfsResourceLoader

public HdfsResourceLoader(org.apache.hadoop.conf.Configuration config,
                          java.net.URI uri)
Constructs a new HdfsResourceLoader instance.

Parameters:
config - Hadoop configuration to use.
uri - Hadoop file system URI.

HdfsResourceLoader

public HdfsResourceLoader(org.apache.hadoop.fs.FileSystem fs)
Constructs a new HdfsResourceLoader instance.

Parameters:
fs - Hadoop file system to use.
Method Detail

getResourceByPath

protected org.springframework.core.io.Resource getResourceByPath(java.lang.String path)
Overrides:
getResourceByPath in class org.springframework.core.io.DefaultResourceLoader

getResource

public org.springframework.core.io.Resource getResource(java.lang.String location)
Specified by:
getResource in interface org.springframework.core.io.ResourceLoader
Overrides:
getResource in class org.springframework.core.io.DefaultResourceLoader

getResources

public org.springframework.core.io.Resource[] getResources(java.lang.String locationPattern)
                                                    throws java.io.IOException
Specified by:
getResources in interface org.springframework.core.io.support.ResourcePatternResolver
Throws:
java.io.IOException

getOrder

public int getOrder()
Specified by:
getOrder in interface org.springframework.core.Ordered

destroy

public void destroy()
             throws java.io.IOException
Specified by:
destroy in interface org.springframework.beans.factory.DisposableBean
Throws:
java.io.IOException

close

public void close()
           throws java.io.IOException
Specified by:
close in interface java.io.Closeable
Throws:
java.io.IOException

afterPropertiesSet

public void afterPropertiesSet()
                        throws java.lang.Exception
Specified by:
afterPropertiesSet in interface org.springframework.beans.factory.InitializingBean
Throws:
java.lang.Exception

getClassLoader

public java.lang.ClassLoader getClassLoader()
Specified by:
getClassLoader in interface org.springframework.core.io.ResourceLoader
Overrides:
getClassLoader in class org.springframework.core.io.DefaultResourceLoader

setHandleNoprefix

public void setHandleNoprefix(boolean handleNoprefix)
Sets the handle noprefix.

Parameters:
handleNoprefix - the new handle noprefix

getFileSystem

public org.apache.hadoop.fs.FileSystem getFileSystem()
Returns the Hadoop file system used by this resource loader.

Returns:
the Hadoop file system in use.

setUseCodecs

public void setUseCodecs(boolean useCodecs)
Indicates whether to use (or not) the codecs found inside the Hadoop configuration. This affects the content of the streams backing this resource - whether the raw content is delivered as is or decompressed on the fly (if the configuration allows it so). The latter is the default.

Parameters:
useCodecs - whether to use any codecs defined in the Hadoop configuration

setResourcePatternResolver

public void setResourcePatternResolver(org.springframework.core.io.support.ResourcePatternResolver resourcePatternResolver)
Sets the resource pattern resolver.

Parameters:
resourcePatternResolver - the new resource pattern resolver

findPathMatchingResources

protected org.springframework.core.io.Resource[] findPathMatchingResources(java.lang.String locationPattern)
                                                                    throws java.io.IOException
Find all resources that match the given location pattern via the Ant-style PathMatcher.

Parameters:
locationPattern - the location pattern to match
Returns:
the result as Resource array
Throws:
java.io.IOException - in case of I/O errors

doFindPathMatchingFileResources

protected java.util.Set<org.springframework.core.io.Resource> doFindPathMatchingFileResources(org.springframework.core.io.Resource rootDirResource,
                                                                                              java.lang.String subPattern)
                                                                                       throws java.io.IOException
Find all resources in the hdfs file system that match the given location pattern via the Ant-style PathMatcher.

Parameters:
rootDirResource - the root directory as Resource
subPattern - the sub pattern to match (below the root directory)
Returns:
the Set of matching Resource instances
Throws:
java.io.IOException - in case of I/O errors

doFindMatchingFileSystemResources

protected java.util.Set<org.springframework.core.io.Resource> doFindMatchingFileSystemResources(org.apache.hadoop.fs.Path rootDir,
                                                                                                java.lang.String subPattern)
                                                                                         throws java.io.IOException
Find all resources in the file system that match the given location pattern via the Ant-style PathMatcher.

Parameters:
rootDir - the root directory in the file system
subPattern - the sub pattern to match (below the root directory)
Returns:
the Set of matching Resource instances
Throws:
java.io.IOException - in case of I/O errors
See Also:
PathMatcher

retrieveMatchingFiles

protected java.util.Set<org.apache.hadoop.fs.Path> retrieveMatchingFiles(org.apache.hadoop.fs.Path rootDir,
                                                                         java.lang.String pattern)
                                                                  throws java.io.IOException
Retrieve files that match the given path pattern, checking the given directory and its subdirectories.

Parameters:
rootDir - the directory to start from
pattern - the pattern to match against, * relative to the root directory
Returns:
the Set of matching Path instances
Throws:
java.io.IOException - if directory contents could not be retrieved

doRetrieveMatchingFiles

protected void doRetrieveMatchingFiles(java.lang.String fullPattern,
                                       org.apache.hadoop.fs.Path dir,
                                       java.util.Set<org.apache.hadoop.fs.Path> result)
                                throws java.io.IOException
Recursively retrieve files that match the given pattern, adding them to the given result list.

Parameters:
fullPattern - the pattern to match against, with prepended root directory path
dir - the current directory
result - the Set of matching File instances to add to
Throws:
java.io.IOException - if directory contents could not be retrieved

determineRootDir

protected java.lang.String determineRootDir(java.lang.String location)
Determine the root directory for the given location.

Used for determining the starting point for file matching, resolving the root directory location and passing it into doFindPathMatchingPathResources, with the remainder of the location as pattern.

Will return "/dir/" for the pattern "/dir/*.xml", for example.

Parameters:
location - the location to check
Returns:
the part of the location that denotes the root directory

Spring for Apache Hadoop