Spring Integration provides support for file transfer operations via FTP and FTPS.
The File Transfer Protocol (FTP) is a simple network protocol which allows you to transfer files between two computers on the Internet.
There are two actors when it comes to FTP communication: client and server. To transfer files with FTP/FTPS, you use a client which initiates a connection to a remote computer that is running an FTP server. After the connection is established, the client can choose to send and/or receive copies of files.
Spring Integration supports sending and receiving files over FTP/FTPS by providing three client side endpoints: Inbound Channel Adapter, Outbound Channel Adapter, and Outbound Gateway. It also provides convenient namespace-based configuration options for defining these client components.
To use the FTP namespace, add the following to the header of your XML file:
xmlns:int-ftp="http://www.springframework.org/schema/integration/ftp" xsi:schemaLocation="http://www.springframework.org/schema/integration/ftp http://www.springframework.org/schema/integration/ftp/spring-integration-ftp.xsd"
Important | |
---|---|
Starting with version 3.0, sessions are no longer cached by default. See Section 14.6, “FTP Session Caching”. |
Before configuring FTP adapters you must configure an FTP Session Factory. You can configure
the FTP Session Factory with a regular bean definition where the implementation class is org.springframework.integration.ftp.session.DefaultFtpSessionFactory
:
Below is a basic configuration:
<bean id="ftpClientFactory" class="org.springframework.integration.ftp.session.DefaultFtpSessionFactory"> <property name="host" value="localhost"/> <property name="port" value="22"/> <property name="username" value="kermit"/> <property name="password" value="frog"/> <property name="clientMode" value="0"/> <property name="fileType" value="2"/> <property name="bufferSize" value="100000"/> </bean>
For FTPS connections all you need to do is use org.springframework.integration.ftp.session.DefaultFtpsSessionFactory
instead.
Below is the complete configuration sample:
<bean id="ftpClientFactory" class="org.springframework.integration.ftp.client.DefaultFtpsClientFactory"> <property name="host" value="localhost"/> <property name="port" value="22"/> <property name="username" value="oleg"/> <property name="password" value="password"/> <property name="clientMode" value="1"/> <property name="fileType" value="2"/> <property name="useClientMode" value="true"/> <property name="cipherSuites" value="a,b.c"/> <property name="keyManager" ref="keyManager"/> <property name="protocol" value="SSL"/> <property name="trustManager" ref="trustManager"/> <property name="prot" value="P"/> <property name="needClientAuth" value="true"/> <property name="authValue" value="oleg"/> <property name="sessionCreation" value="true"/> <property name="protocols" value="SSL, TLS"/> <property name="implicit" value="true"/> </bean>
Every time an adapter requests a session object from its SessionFactory
the session is
returned from a session pool maintained by a caching wrapper around the factory. A Session in the session pool might go stale
(if it has been disconnected by the server due to inactivity) so the SessionFactory
will perform validation to make sure that it never returns a stale session to the adapter. If a stale session was encountered,
it will be removed from the pool, and a new one will be created.
Note | |
---|---|
If you experience connectivity problems and would like to trace Session creation as well as see which Sessions are polled you may enable it by setting the logger to TRACE level (e.g., log4j.category.org.springframework.integration.file=TRACE) |
Now all you need to do is inject these session factories into your adapters. Obviously the protocol (FTP or FTPS) that an adapter will use depends on the type of session factory that has been injected into the adapter.
Note | |
---|---|
A more practical way to provide values for FTP/FTPS Session Factories is by using Spring's property placeholder support (See: http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/beans.html#beans-factory-placeholderconfigurer). |
Advanced Configuration
DefaultFtpSessionFactory
provides an abstraction over the underlying client API which,
since Spring Integration 2.0, is Apache Commons Net.
This spares you from the low level configuration details
of the org.apache.commons.net.ftp.FTPClient
. Several common properties are exposed on the
session factory (since version 4.0, this now includes connectTimeout
,
defaultTimeout
and dataTimeout
).
However there are times when access to lower level FTPClient
configuration is
necessary to achieve more advanced configuration (e.g., setting the port range for active mode etc.).
For that purpose, AbstractFtpSessionFactory
(the base class for all FTP Session Factories) exposes hooks, in the form of the two post-processing methods below.
/** * Will handle additional initialization after client.connect() method was invoked, * but before any action on the client has been taken */ protected void postProcessClientAfterConnect(T t) throws IOException { // NOOP } /** * Will handle additional initialization before client.connect() method was invoked. */ protected void postProcessClientBeforeConnect(T client) throws IOException { // NOOP }
As you can see, there is no default implementation for these two methods. However, by extending DefaultFtpSessionFactory
you can override these methods
to provide more advanced configuration of the FTPClient
. For example:
public class AdvancedFtpSessionFactory extends DefaultFtpSessionFactory { protected void postProcessClientBeforeConnect(FTPClient ftpClient) throws IOException { ftpClient.setActivePortRange(4000, 5000); } }
The FTP Inbound Channel Adapter is a special listener that will connect to the FTP server and will listen for the remote directory events (e.g., new file created) at which point it will initiate a file transfer.
<int-ftp:inbound-channel-adapter id="ftpInbound" channel="ftpChannel" session-factory="ftpSessionFactory" charset="UTF-8" auto-create-local-directory="true" delete-remote-files="true" filename-pattern="*.txt" remote-directory="some/remote/path" remote-file-separator="/" preserve-timestamp="true" local-filename-generator-expression="#this.toUpperCase() + '.a'" local-filter="myFilter" temporary-file-suffix=".writing" local-directory="."> <int:poller fixed-rate="1000"/> </int-ftp:inbound-channel-adapter>
As you can see from the configuration above you can configure an FTP Inbound Channel Adapter via the inbound-channel-adapter
element while also providing values for various attributes such as local-directory
, filename-pattern
(which is based on simple pattern matching, not regular expressions), and of course the reference to a session-factory
.
By default the transferred file will carry the same name as the original file. If you want to override this behavior you
can set the local-filename-generator-expression
attribute which allows you to provide a SpEL Expression to generate
the name of the local file. Unlike outbound gateways and adapters where the root object of the SpEL Evaluation Context
is a Message
, this inbound adapter does not yet have the Message at the time of evaluation since
that's what it ultimately generates with the transferred file as its payload. So, the root object of the SpEL Evaluation Context
is the original name of the remote file (String).
Starting with Spring Integration 3.0, you can specify the preserve-timestamp
attribute (default false
); when true
, the local file's modified timestamp will be set to the value
retrieved from the server; otherwise it will be set to the current time.
Sometimes file filtering based on the simple pattern specified via filename-pattern
attribute might not be
sufficient. If this is the case, you can use the filename-regex
attribute to specify a Regular Expression
(e.g. filename-regex=".*\.test$"
). And of course if you need complete control you can use filter
attribute and provide a reference to any custom implementation of the
org.springframework.integration.file.filters.FileListFilter
, a strategy interface for filtering a
list of files. This filter determines which remote files are retrieved. You can also combine a pattern based filter
with other filters, such as an AcceptOnceFileListFilter
to avoid synchronizing files that
have previously been fetched, by using a CompositeFileListFilter
.
The AcceptOnceFileListFilter
stores its state in memory. If you wish the
state to survive a system restart, consider using the
FtpPersistentAcceptOnceFileListFilter
instead. This filter stores
the accepted file names in an instance of the
MetadataStore
strategy (Section 8.4, “Metadata Store”).
This filter matches on the filename and the remote modified time.
Since version 4.0, this filter requires a ConcurrentMetadataStore
. When used with a shared data store
(such as Redis
with the RedisMetadataStore
) this allows
filter keys to be shared across multiple application or server instances.
The above discussion refers to filtering the files before retrieving them. Once the files have been
retrieved, an additional filter is applied to the files on the file system. By default, this is an
AcceptOnceFileListFilter
which, as discussed, retains state in memory and does
not consider the file's modified time. Unless your application removes files after processing, the
adapter will re-process the files on disk by default after an application restart.
Also, if you configure the filter
to use a
FtpPersistentAcceptOnceFileListFilter
, and the remote file
timestamp changes (causing it to be re-fetched), the default local filter will not allow this
new file to be processed.
Use the local-filter
attribute to configure the behavior of the local file system
filter. To solve these particular use cases, you can use a
FileSystemPersistentAcceptOnceFileListFilter
as a local filter instead.
This filter also stores the accepted file names and modified timestamp in an instance of the
MetadataStore
strategy (Section 8.4, “Metadata Store”), and
will detect the change in the local file modified time.
Important | |
---|---|
Further, if you use a distributed MetadataStore (such as
Section 23.5, “Redis Metadata Store” or Section 15.7, “Gemfire Metadata Store”) you can
have multiple instances of the same adapter/application and be sure that one and only one will
process a file.
|
The actual local filter is a CompositeFileListFilter
containing the supplied
filter and a pattern filter that prevents processing files that are in the process of
being downloaded (based on the temporary-file-suffix
); files are downloaded with
this suffix (default: .writing
) and the file is renamed to its final name when the
transfer is complete, making it 'visible' to the filter.
The remote-file-separator
attribute allows you to configure a
file separator character to use if the default '/' is not applicable for your particular environment.
Please refer to the schema for more details on these attributes.
It is also important to understand that the FTP Inbound Channel Adapter is a Polling Consumer and
therefore you must configure a poller (either via a global default or a local sub-element).
Once a file has been transferred, a Message with a java.io.File
as its payload will be generated and sent to the channel
identified by the channel
attribute.
More on File Filtering and Large Files
Sometimes the file that just appeared in the monitored (remote) directory is not complete. Typically such a file
will be written with temporary extension (e.g., foo.txt.writing) and then renamed after the writing process finished.
As a user in most cases you are only interested in files that are complete and would like to filter only files that are complete.
To handle these scenarios you can use the filtering support provided by the filename-pattern
, filename-regex
and filter
attributes. Here is an example that uses a custom Filter implementation.
<int-ftp:inbound-channel-adapter channel="ftpChannel" session-factory="ftpSessionFactory" filter="customFilter" local-directory="file:/my_transfers"> remote-directory="some/remote/path" <int:poller fixed-rate="1000"/> </int-ftp:inbound-channel-adapter> <bean id="customFilter" class="org.example.CustomFilter"/>
Poller configuration notes for the inbound FTP adapter
The job of the inbound FTP adapter consists of two tasks: 1) Communicate with a remote server in order to transfer files from a remote directory to a local directory. 2) For each transferred file, generate a Message with that file as a payload and send it to the channel identified by the 'channel' attribute. That is why they are called 'channel-adapters' rather than just 'adapters'. The main job of such an adapter is to generate a Message to be sent to a Message Channel. Essentially, the second task mentioned above takes precedence in such a way that *IF* your local directory already has one or more files it will first generate Messages from those, and *ONLY* when all local files have been processed, will it initiate the remote communication to retrieve more files.
Also, when configuring a trigger on the poller you should pay close attention to the max-messages-per-poll
attribute. Its default value is 1 for all SourcePollingChannelAdapter
instances (including FTP).
This means that as soon as one file is processed, it will wait for the next execution time as determined by your
trigger configuration. If you happened to have one or more files sitting in the local-directory
, it would process
those files before it would initiate communication with the remote FTP server. And, if the max-messages-per-poll
were set to 1 (default), then it would be processing only one file at a time with intervals as defined by your trigger,
essentially working as one-poll = one-file.
For typical file-transfer use cases, you most likely want the opposite behavior: to process all the files you can for each
poll and only then wait for the next poll. If that is the case, set max-messages-per-poll
to -1. Then, on
each poll, the adapter will attempt to generate as many Messages as it possibly can. In other words, it will process
everything in the local directory, and then it will connect to the remote directory to transfer everything that is available
there to be processed locally. Only then is the poll operation considered complete, and the poller will wait for the next execution time.
You can alternatively set the 'max-messages-per-poll' value to a positive value indicating the upward limit of Messages to be created from files with each poll. For example, a value of 10 means that on each poll it will attempt to process no more than 10 files.
The FTP Outbound Channel Adapter relies upon a MessageHandler
implementation that will connect to the
FTP server and initiate an FTP transfer for every file it receives in the payload of incoming Messages. It also supports several
representations of the File so you are not limited only to java.io.File typed payloads.
The FTP Outbound Channel Adapter
supports the following payloads: 1) java.io.File
- the actual file object;
2) byte[]
- a byte array that represents the file contents; and 3) java.lang.String
-
text that represents the file contents.
<int-ftp:outbound-channel-adapter id="ftpOutbound" channel="ftpChannel" session-factory="ftpSessionFactory" charset="UTF-8" remote-file-separator="/" auto-create-directory="true" remote-directory-expression="headers.['remote_dir']" temporary-remote-directory-expression="headers.['temp_remote_dir']" filename-generator="fileNameGenerator" use-temporary-filename="true" mode="REPLACE"/>
As you can see from the configuration above you can configure an FTP Outbound Channel Adapter via the
outbound-channel-adapter
element while also providing values for various attributes such as filename-generator
(an implementation of the org.springframework.integration.file.FileNameGenerator
strategy interface),
a reference to a session-factory
, as well as other attributes. You can also see
some examples of *expression
attributes which allow you to use SpEL
to configure things like remote-directory-expression
, temporary-remote-directory-expression
and remote-filename-generator-expression
(a SpEL alternative to filename-generator
shown above). As with any component that allows the usage of SpEL, access to Payload and Message Headers is available via
'payload' and 'headers' variables.
Please refer to the schema for more details on
the available attributes.
Note | |
---|---|
By default Spring Integration will use o.s.i.file.DefaultFileNameGenerator if none is specified.
DefaultFileNameGenerator will determine the file name based on the value of the file_name header (if it exists)
in the MessageHeaders, or if the payload of the Message is already a java.io.File , then it will use the original name of that file.
|
Important | |
---|---|
Defining certain values (e.g., remote-directory) might be platform/ftp server dependent. For example as it was reported on this forum http://forum.springsource.org/showthread.php?p=333478&posted=1#post333478 on some platforms you must add slash to the end of the directory definition (e.g., remote-directory="/foo/bar/" instead of remote-directory="/foo/bar") |
Starting with version 4.1, you can specify the mode
when transferring the file. By default,
an existing file will be overwritten; the modes are defined on enum
FileExistsMode
, having values REPLACE
(default), APPEND
,
IGNORE
, and FAIL
. With IGNORE
and FAIL
, the file is not
transferred; FAIL
causes an exception to be thrown whereas IGNORE
silently
ignores the transfer (although a DEBUG
log entry is produced).
Avoiding Partially Written Files
One of the common problems, when dealing with file transfers, is the possibility of processing a partial file - a file might appear in the file system before its transfer is actually complete.
To deal with this issue, Spring Integration FTP adapters use a very common algorithm where files are transferred under a temporary name and then renamed once they are fully transferred.
By default, every file that is in the process of being transferred will appear in the file system with an additional suffix
which, by default, is .writing
; this can be changed using the temporary-file-suffix
attribute.
However, there may be situations where you don't want to use this technique (for example, if the server does not
permit renaming files). For situations like this, you can disable this feature by setting use-temporary-file-name
to false
(default is true
). When this attribute is false
, the file is written with its
final name and the consuming application will need some other mechanism to detect that the file is completely uploaded before accessing it.
The FTP Outbound Gateway provides a limited set of commands to interact with a remote FTP/FTPS server.
Commands supported are:
ls
ls lists remote file(s) and supports the following options:
FileInfo
objects.
In addition, filename filtering is provided, in the same manner as the
inbound-channel-adapter
.
The message payload resulting from an ls operation is a list of file names,
or a list of FileInfo
objects. These objects provide
information such as modified time, permissions etc.
The remote directory that the ls command acted on is provided
in the file_remoteDirectory
header.
When using the recursive option (-R
), the fileName
includes any subdirectory
elements, representing a relative path to the file (relative to the remote directory). If the -dirs
option is included, each recursive directory is also returned as an element in the list. In this case,
it is recommended that the -1
is not used because you would not be able to determine files Vs.
directories, which is achievable using the FileInfo
objects.
get
get retrieves a remote file and supports the following option:
The message payload resulting from a get operation is a
File
object representing the retrieved file.
The remote directory is provided in the file_remoteDirectory
header, and the filename is
provided in the file_remoteFile
header.
mget
mget retrieves multiple remote files based on a pattern and supports the following option:
The message payload resulting from an mget operation is a
List<File>
object - a List of File objects, each representing
a retrieved file.
The remote directory is provided in the file_remoteDirectory
header, and the pattern
for the filenames is
provided in the file_remoteFile
header.
Notes for when using recursion (-R ) | |
---|---|
The pattern is ignored, and
The
Typically, you would use the |
put
put sends a file to the remote server; the payload of the message can be a
java.io.File
, a byte[]
or a String
.
A remote-filename-generator
(or expression) is used to name the remote file. Other available attributes include
remote-directory
, temporary-remote-directory
(and their *-expression
)
equivalents, use-temporary-file-name
, and auto-create-directory
. Refer to the
schema documentation for more information.
The message payload resulting from a put operation is a
String
representing the full path of the file on the server after transfer.
mput
mput sends multiple files to the server and supports the following option:
The message payload must be a java.io.File
representing a local directory.
The same attributes as the put
command are supported. In addition, files in the local
directory can be filtered with one of mput-pattern
, mput-regex
or
mput-filter
. The filter works with recursion, as long as the subdirectories themselves
pass the filter. Subdirectories that do not pass the filter are not recursed.
The message payload resulting from an mget operation is a
List<String>
object - a List of remote file paths resulting from
the transfer.
rm
The rm command has no options.
The message payload resulting from an rm operation is Boolean.TRUE if the
remove was successful, Boolean.FALSE otherwise.
The remote directory is provided in the file_remoteDirectory
header, and the filename is
provided in the file_remoteFile
header.
mv
The mv command has no options.
The expression attribute defines the "from" path and the
rename-expression attribute defines the "to" path. By default, the
rename-expression is headers['file_renameTo']
. This
expression must not evaluate to null, or an empty String
. If necessary,
any remote directories needed will be created.
The payload of the result message is Boolean.TRUE
.
The original remote directory is provided in the file_remoteDirectory
header, and the filename is
provided in the file_remoteFile
header. The new path is in
the file_renameTo
header.
Additional Information
The get and mget commands support
the local-filename-generator-expression attribute. It
defines a SpEL expression to generate the name of local file(s) during the transfer.
The root object of the evaluation context is the request Message but, in addition, the remoteFileName
variable is also available, which is particularly useful for mget, for
example: local-filename-generator-expression="#remoteFileName.toUpperCase() + headers.foo"
.
The get and mget commands support
the local-directory-expression attribute. It
defines a SpEL expression to generate the name of local directory(ies) during the transfer.
The root object of the evaluation context is the request Message but, in addition, the remoteDirectory
variable is also available, which is particularly useful for mget, for
example: local-directory-expression="'/tmp/local/' + #remoteDirectory.toUpperCase() + headers.foo"
.
This attribute is mutually exclusive with local-directory attribute.
For all commands, the PATH that the command acts on is provided by the 'expression' property of the gateway. For the mget command, the expression might evaluate to '*', meaning retrieve all files, or 'somedirectory/*' etc.
Here is an example of a gateway configured for an ls command...
<int-ftp:outbound-gateway id="gateway1" session-factory="ftpSessionFactory" request-channel="inbound1" command="ls" command-options="-1" expression="payload" reply-channel="toSplitter"/>
The payload of the message sent to the toSplitter channel is a list of String objects
containing the filename of each file. If the command-options
was
omitted, it would be a list of FileInfo
objects. Options are
provided space-delimited, e.g. command-options="-1 -dirs -links"
.
Important | |
---|---|
Starting with Spring Integration version 3.0, sessions are no longer cached by default; the cache-sessions attribute
is no longer supported on endpoints. You must use a CachingSessionFactory (see below) if you
wish to cache sessions.
|
In versions prior to 3.0, the sessions were cached automatically by default. A cache-sessions
attribute was available for
disabling the auto caching, but that solution did not provide a way to configure other session caching attributes. For example,
you could not limit on the number of sessions created. To support that requirement and other configuration options, a
CachingSessionFactory
was provided. It provides sessionCacheSize
and sessionWaitTimeout
properties. As its name suggests, the sessionCacheSize
property controls how many active sessions the factory will
maintain in its cache (the DEFAULT is unbounded). If the sessionCacheSize
threshold has been reached, any attempt to
acquire another session will block until either one of the cached sessions becomes available or until the wait time for a Session
expires (the DEFAULT wait time is Integer.MAX_VALUE). The sessionWaitTimeout
property enables configuration of that value.
If you want your Sessions to be cached, simply configure your default Session Factory as described above and then
wrap it in an instance of CachingSessionFactory
where you may provide those additional properties.
<bean id="ftpSessionFactory" class="o.s.i.ftp.session.DefaultFtpSessionFactory"> <property name="host" value="localhost"/> </bean> <bean id="cachingSessionFactory" class="o.s.i.file.remote.session.CachingSessionFactory"> <constructor-arg ref="ftpSessionFactory"/> <constructor-arg value="10"/> <property name="sessionWaitTimeout" value="1000"/> </bean>
In the above example you see a CachingSessionFactory
created with the
sessionCacheSize
set to 10 and the sessionWaitTimeout
set to 1 second (its value is in millliseconds).
Starting with Spring Integration version 3.0, the CachingConnectionFactory
provides a resetCache()
method. When invoked, all idle sessions are immediately closed and in-use
sessions are closed when they are returned to the cache. New requests for sessions will establish new sessions as necessary.
Starting with Spring Integration version 3.0 a new abstraction is provided over the
FtpSession
object. The template provides methods to send, retrieve (as an
InputStream
), remove, and rename files. In addition an execute
method is provided allowing the caller to execute multiple operations on the session. In all cases,
the template takes care of reliably closing the session.
For more information, refer to the javadocs
for RemoteFileTemplate
There is a subclass for FTP:
FtpRemoteFileTemplate
.
Additional methods were added in version 4.1 including getClientInstance()
which provides access to the underlying FTPClient
enabling access to low-level
APIs.