14. File Support

14.1 Introduction

Spring Integration’s File support extends the Spring Integration Core with a dedicated vocabulary to deal with reading, writing, and transforming files. It provides a namespace that enables elements defining Channel Adapters dedicated to files and support for Transformers that can read file contents into strings or byte arrays.

This section will explain the workings of FileReadingMessageSource and FileWritingMessageHandler and how to configure them as beans. Also the support for dealing with files through file specific implementations of Transformer will be discussed. Finally the file specific namespace will be explained.

14.2 Reading Files

A FileReadingMessageSource can be used to consume files from the filesystem. This is an implementation of MessageSource that creates messages from a file system directory.

<bean id="pollableFileSource"
    class="org.springframework.integration.file.FileReadingMessageSource"
    p:directory="${input.directory}"/>

To prevent creating messages for certain files, you may supply a FileListFilter. By default the following 2 filters are used:

  • IgnoreHiddenFileListFilter
  • AcceptOnceFileListFilter

The IgnoreHiddenFileListFilter ensures that hidden files are not being processed. Please keep in mind that the exact definition of hidden is system-dependent. For example, on UNIX-based systems, a file beginning with a period character is considered to be hidden. Microsoft Windows, on the other hand, has a dedicated file attribute to indicate hidden files.

[Important]Important

The IgnoreHiddenFileListFilter was introduced with version 4.2. In prior versions hidden files were included. With the default configuration, the IgnoreHiddenFileListFilter will be triggered first, then the AcceptOnceFileListFilter.

The AcceptOnceFileListFilter ensures files are picked up only once from the directory.

[Note]Note

The AcceptOnceFileListFilter stores its state in memory. If you wish the state to survive a system restart, consider using the`FileSystemPersistentAcceptOnceFileListFilter` instead. This filter stores the accepted file names in a MetadataStore implementation (Section 9.5, “Metadata Store”). This filter matches on the filename and modified time.

Since version 4.0, this filter requires a ConcurrentMetadataStore. When used with a shared data store (such as Redis with the RedisMetadataStore) this allows filter keys to be shared across multiple application instances, or when a network file share is being used by multiple servers.

Since version 4.1.5, this filter has a new property flushOnUpdate which will cause it to flush the metadata store on every update (if the store implements Flushable).

<bean id="pollableFileSource"
    class="org.springframework.integration.file.FileReadingMessageSource"
    p:inputDirectory="${input.directory}"
    p:filter-ref="customFilterBean"/>

A common problem with reading files is that a file may be detected before it is ready. The default AcceptOnceFileListFilter does not prevent this. In most cases, this can be prevented if the file-writing process renames each file as soon as it is ready for reading. A filename-pattern or filename-regex filter that accepts only files that are ready (e.g. based on a known suffix), composed with the default`AcceptOnceFileListFilter` allows for this. The CompositeFileListFilter enables the composition.

<bean id="pollableFileSource"
    class="org.springframework.integration.file.FileReadingMessageSource"
    p:inputDirectory="${input.directory}"
    p:filter-ref="compositeFilter"/>
<bean id="compositeFilter"
    class="org.springframework.integration.file.filters.CompositeFileListFilter">
    <constructor-arg>
        <list>
            <bean class="o.s.i.file.filters.AcceptOnceFileListFilter"/>
            <bean class="o.s.i.file.filters.RegexPatternFileListFilter">
                <constructor-arg value="^test.*$"/>
            </bean>
        </list>
    </constructor-arg>
</bean>

If it is not possible to create the file with a temporary name and rename to the final name, another alternative is provided. The LastModifiedFileListFilter was added in version 4.2. This filter can be configured with an age property and only files older than this will be passed by the filter. The age defaults to 60 seconds, but you should choose an age that is large enough to avoid picking up a file early, due to, say, network glitches.

<bean id="filter" class="org.springframework.integration.file.filters.LastModifiedFileListFilter">
    <property name="age" value="120" />
</bean>

14.2.1 Namespace Support

The configuration for file reading can be simplified using the file specific namespace. To do this use the following template.

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:int="http://www.springframework.org/schema/integration"
  xmlns:int-file="http://www.springframework.org/schema/integration/file"
  xsi:schemaLocation="http://www.springframework.org/schema/beans
    http://www.springframework.org/schema/beans/spring-beans.xsd
    http://www.springframework.org/schema/integration
    http://www.springframework.org/schema/integration/spring-integration.xsd
    http://www.springframework.org/schema/integration/file
    http://www.springframework.org/schema/integration/file/spring-integration-file.xsd">
</beans>

Within this namespace you can reduce the FileReadingMessageSource and wrap it in an inbound Channel Adapter like this:

<int-file:inbound-channel-adapter id="filesIn1"
    directory="file:${input.directory}" prevent-duplicates="true" ignore-hidden="true"/>

<int-file:inbound-channel-adapter id="filesIn2"
    directory="file:${input.directory}"
    filter="customFilterBean" />

<int-file:inbound-channel-adapter id="filesIn3"
    directory="file:${input.directory}"
    filename-pattern="test*" />

<int-file:inbound-channel-adapter id="filesIn4"
    directory="file:${input.directory}"
    filename-regex="test[0-9]+\.txt" />

The first channel adapter example is relying on the default `FileListFilter`s:

  • IgnoreHiddenFileListFilter (Do not process hidden files)
  • AcceptOnceFileListFilter (Prevents duplication)

Therefore, you can also leave off the 2 attributes prevent-duplicates and ignore-hidden as they are true by default.

[Important]Important

The ignore-hidden attribute was introduced with Spring Integration 4.2. In prior versions hidden files were included.

The second channel adapter example is using a custom filter, the third is using the filename-pattern attribute to add an AntPathMatcher based filter, and the fourth is using the filename-regex attribute to add a regular expression Pattern based filter to the FileReadingMessageSource. The filename-pattern and filename-regex attributes are each mutually exclusive with the regular filter reference attribute. However, you can use the filter attribute to reference an instance of CompositeFileListFilter that combines any number of filters, including one or more pattern based filters to fit your particular needs.

When multiple processes are reading from the same directory it can be desirable to lock files to prevent them from being picked up concurrently. To do this you can use a FileLocker. There is a java.nio based implementation available out of the box, but it is also possible to implement your own locking scheme. The nio locker can be injected as follows

<int-file:inbound-channel-adapter id="filesIn"
    directory="file:${input.directory}" prevent-duplicates="true">
    <int-file:nio-locker/>
</int-file:inbound-channel-adapter>

A custom locker you can configure like this:

<int-file:inbound-channel-adapter id="filesIn"
    directory="file:${input.directory}" prevent-duplicates="true">
    <int-file:locker ref="customLocker"/>
</int-file:inbound-channel-adapter>
[Note]Note

When a file inbound adapter is configured with a locker, it will take the responsibility to acquire a lock before the file is allowed to be received. It will not assume the responsibility to unlock the file. If you have processed the file and keeping the locks hanging around you have a memory leak. If this is a problem in your case you should call FileLocker.unlock(File file) yourself at the appropriate time.

When filtering and locking files is not enough it might be needed to control the way files are listed entirely. To implement this type of requirement you can use an implementation of DirectoryScanner. This scanner allows you to determine entirely what files are listed each poll. This is also the interface that Spring Integration uses internally to wire FileListFilters FileLocker to the FileReadingMessageSource. A custom DirectoryScanner can be injected into the <int-file:inbound-channel-adapter/> on the scanner attribute.

<int-file:inbound-channel-adapter id="filesIn" directory="file:${input.directory}"
    prevent-duplicates="true" scanner="customDirectoryScanner"/>

This gives you full freedom to choose the ordering, listing and locking strategies.

[Important]Important

It is important to understand that filters (including patterns, regex, prevent-duplicates etc) and lockers, are actually used by the scanner. Any of these attributes set on the adapter are subsequently injected into the scanner. For this reason, if you need to provide a custom scanner and you have multiple file inbound adapters in the same application context, each adapter must be provided with its own instance of the scanner, either by declaring separate beans, or declaring scope="prototype" on the scanner bean so that the context will create a new instance for each use.

14.2.2 WatchServiceDirectoryScanner

This scanner was added in version 4.2. It replaces the existing RecursiveLeafOnlyDirectoryScanner which is inefficient for large directory trees. The WatchServiceDirectoryScanner requires Java 7 or above.

This scanner relies on file system events when new files are added to the directory. During initialization, the directory is registered to generate events; the initial file list is also built. While walking the directory tree, any subdirectories encountered are also registered to generate events. On the first poll, the initial file list from walking the directory is returned. On subsequent polls, files from new creation events are returned. If a new subdirectory is added, its creation event is used to walk the new subtree to find existing files, as well as registering any new subdirectories found.

[Note]Note

There is a case with WatchKey, when its internal events queue isn’t drained by the program as quickly as the directory modification events occur. If the queue size is exceeded, a StandardWatchEventKinds.OVERFLOW is emitted to indicate that some file system events may be lost. In this case, the root directory is re-scanned completely. To avoid duplicates consider using an appropriate FileListFilter such as the AcceptOnceFileListFilter and/or remove files when processing is completed.

<bean id="wsScanner" class="org.springframework.integration.file.WatchServiceDirectoryScanner">
    <constructor-arg value="/tmp/myDir" />
</bean>
@Bean
public DirectoryScanner scanner() {
    return new WatchServiceDirectoryScanner("/tmp/myDir");
}

14.2.3 Limiting Memory Consumption

A HeadDirectoryScanner can be used to limit the number of files retained in memory. This can be useful when scanning large directories. With XML configuration, this is enabled using the queue-size property on the inbound channel adapter.

Prior to version 4.2, this setting was incompatible with the use of any other filters. Any other filters (including prevent-duplicates="true") overwrote the filter used to limit the size.

[Note]Note

The use of a HeadDirectoryScanner is incompatible with an AcceptOnceFileListFilter. Since all filters are consulted during the poll decision, the AcceptOnceFileListFilter does not know that other filters might be temporarily filtering files. Even if files that were previously filtered by the HeadDirectoryScanner.HeadFilter are now available, the AcceptOnceFileListFilter will filter them.

Generally, instead of using an AcceptOnceFileListFilter in this case, one would simply remove the processed files so that the previously filtered files will be available on a future poll.

14.2.4 'Tail’ing Files

Another popular use case is to get lines from the end (or tail) of a file, capturing new lines when they are added. Two implementations are provided; the first, OSDelegatingFileTailingMessageProducer, uses the native tail command (on operating systems that have one). This is likely the most efficient implementation on those platforms. For operating systems that do not have a tail command, the second implementation ApacheCommonsFileTailingMessageProducer which uses the Apache commons-io Tailer class.

In both cases, file system events, such as files being unavailable etc, are published as ApplicationEvent s using the normal Spring event publishing mechanism. Examples of such events are:

[message=tail: cannot open `/tmp/foo' for reading: No such file or directory, file=/tmp/foo]

[message=tail: `/tmp/foo' has become accessible, file=/tmp/foo]

[message=tail: `/tmp/foo' has become inaccessible: No such file or directory, file=/tmp/foo]

[message=tail: `/tmp/foo' has appeared; following end of new file, file=/tmp/foo]

This sequence of events might occur, for example, when a file is rotated.

[Note]Note

Not all platforms supporting a tail command provide these status messages.

Example configurations:

<int-file:tail-inbound-channel-adapter id="native"
	channel="input"
	task-executor="exec"
	file="/tmp/foo"/>

This creates a native adapter with default -F -n 0 options (follow the file name from the current end).

<int-file:tail-inbound-channel-adapter id="native"
	channel="input"
	native-options="-F -n +0"
	task-executor="exec"
	file-delay=10000
	file="/tmp/foo"/>

This creates a native adapter with -F -n +0 options (follow the file name, emitting all existing lines). If the tail command fails (on some platforms, a missing file causes the tail to fail, even with -F specified), the command will be retried every 10 seconds.

<int-file:tail-inbound-channel-adapter id="apache"
	channel="input"
	task-executor="exec"
	file="/tmp/bar"
	delay="2000"
	end="false"
	reopen="true"
	file-delay="10000"/>

This creates an Apache commons-io Tailer adapter that examines the file for new lines every 2 seconds, and checks for existence of a missing file every 10 seconds. The file will be tailed from the beginning (end="false") instead of the end (which is the default). The file will be reopened for each chunk (the default is to keep the file open).

[Important]Important

Specifying the delay, end or reopen attributes, forces the use of the Apache commons-io adapter and the native-options attribute is not allowed.

14.3 Writing files

To write messages to the file system you can use a FileWritingMessageHandler. This class can deal with the following payload types:

  • File,
  • String
  • byte array
  • InputStream (since version 4.2)

You can configure the encoding and the charset that will be used in case of a String payload.

To make things easier, you can configure the FileWritingMessageHandler as part of an Outbound Channel Adapter or Outbound Gateway using the provided XML namespace support.

14.3.1 Generating Filenames

In its simplest form, the FileWritingMessageHandler only requires a destination directory for writing the files. The name of the file to be written is determined by the handler’shttp://static.springsource.org/spring-integration/api/org/springframework/integration/file/FileNameGenerator.html[FileNameGenerator]. The default implementation looks for a Message header whose key matches the constant defined as FileHeaders.FILENAME.

Alternatively, you can specify an expression to be evaluated against the Message in order to generate a file name, e.g.:_headers[myCustomHeader] + '.foo'. The expression must evaluate to a String. For convenience, the DefaultFileNameGenerator also provides the _setHeaderName method, allowing you to explicitly specify the Message header whose value shall be used as the filename.

Once setup, the DefaultFileNameGenerator will employ the following resolution steps to determine the filename for a given Message payload:

  1. Evaluate the expression against the Message and, if the result is a non-empty String, use it as the filename.
  2. Otherwise, if the payload is a java.io.File, use the file’s filename.
  3. Otherwise, use the Message ID appended with .msg as the filename.

When using the XML namespace support, both, the File Oubound Channel Adapter and the File Outbound Gateway support the following two mutually exclusive configuration attributes:

  • filename-generator (a reference to a FileNameGenerator) implementation)
  • filename-generator-expression (an expression evaluating to a String)

While writing files, a temporary file suffix will be used (default: .writing). It is appended to the filename while the file is being written. To customize the suffix, you can set the temporary-file-suffix attribute on both the File Oubound Channel Adapter and the File Outbound Gateway.

[Note]Note

When using the APPEND file mode, the temporary-file-suffix attribute is ignored, since the data is appended to the file directly.

14.3.2 Specifying the Output Directory

Both, the File Oubound Channel Adapter and the File Outbound Gateway provide two configuration attributes for specifying the output directory:

  • directory
  • directory-expression
[Note]Note

The directory-expression attribute is available since Spring Integration 2.2.

Using the directory attribute

When using the directory attribute, the output directory will be set to a fixed value, that is set at intialization time of the FileWritingMessageHandler. If you don’t specify this attribute, then you must use the_directory-expression_ attribute.

Using the directory-expression attribute

If you want to have full SpEL support you would choose the directory-expression attribute. This attribute accepts a SpEL expression that is evaluated for each message being processed. Thus, you have full access to a Message’s payload and its headers to dynamically specify the output file directory.

The SpEL expression must resolve to either a String or to java.io.File. Furthermore the resulting String or File must point to a directory. If you don’t specify the_directory-expression_ attribute, then you must set the directory attribute.

Using the auto-create-directory attribute

If the destination directory does not exists, yet, by default the respective destination directory and any non-existing parent directories are being created automatically. You can set the auto-create-directory attribute to false in order to prevent that. This attribute applies to both, the directory and the directory-expression attribute.

[Note]Note

When using the directory attribute and auto-create-directory is false, the following change was made starting with Spring Integration 2.2:

Instead of checking for the existence of the destination directory at initialization time of the adapter, this check is now performed for each message being processed.

Furthermore, if auto-create-directory is true and the directory was deleted between the processing of messages, the directory will be re-created for each message being processed.

14.3.3 Dealing with Existing Destination Files

When writing files and the destination file already exists, the default behavior is to overwrite that target file. This behavior, though, can be changed by setting the mode attribute on the respective File Outbound components. The following options exist:

  • REPLACE (Default)
  • APPEND
  • FAIL
  • IGNORE
[Note]Note

The mode attribute and the options APPEND, FAIL and IGNORE, are available since Spring Integration 2.2.

REPLACE

If the target file already exists, it will be overwritten. If the mode attribute is not specified, then this is the default behavior when writing files.

APPEND

This mode allows you to append Message content to the existing file instead of creating a new file each time. Note that this attribute is mutually exclusive with temporary-file-suffix attribute since when appending content to the existing file, the adapter no longer uses a temporary file.

FAIL

If the target file exists, a MessageHandlingException is thrown.

IGNORE

If the target file exists, the message payload is silently ignored.

[Note]Note

When using a temporary file suffix (default: .writing), the IGNORE mode will apply if the final file name exists, or the temporary file name exists.

14.3.4 File Outbound Channel Adapter

<int-file:outbound-channel-adapter id="filesOut" directory="${input.directory.property}"/>

The namespace based configuration also supports a delete-source-files attribute. If set to true, it will trigger the deletion of the original source files after writing to a destination. The default value for that flag is false.

<int-file:outbound-channel-adapter id="filesOut"
    directory="${output.directory}"
    delete-source-files="true"/>
[Note]Note

The delete-source-files attribute will only have an effect if the inbound Message has a File payload or if the FileHeaders.ORIGINAL_FILE header value contains either the source File instance or a String representing the original file path.

Starting with version 4.2 The FileWritingMessageHandler supports an append-new-line option. If set to true, a new line is appended to the file after a message is written. The default attribute value is false.

<int-file:outbound-channel-adapter id="newlineAdapter"
	append-new-line="true"
    directory="${output.directory}"/>

14.3.5 Outbound Gateway

In cases where you want to continue processing messages based on the written file, you can use the outbound-gateway instead. It plays a very similar role as the outbound-channel-adapter. However, after writing the file, it will also send it to the reply channel as the payload of a Message.

<int-file:outbound-gateway id="mover" request-channel="moveInput"
    reply-channel="output"
    directory="${output.directory}"
    mode="REPLACE" delete-source-files="true"/>

As mentioned earlier, you can also specify the mode attribute, which defines the behavior of how to deal with situations where the destination file already exists. Please seeSection 14.3.3, “Dealing with Existing Destination Files” for further details. Generally, when using the_File Outbound Gateway_, the result file is returned as the Message payload on the reply channel.

This also applies when specifying the IGNORE mode. In that case the pre-existing destination file is returned. If the payload of the request message was a file, you still have access to that original file through the Message Header FileHeaders.ORIGINAL_FILE.

[Note]Note

The outbound-gateway works well in cases where you want to first move a file and then send it through a processing pipeline. In such cases, you may connect the file namespace’s inbound-channel-adapter element to the outbound-gateway and then connect that gateway’s reply-channel to the beginning of the pipeline.

If you have more elaborate requirements or need to support additional payload types as input to be converted to file content you could extend the FileWritingMessageHandler, but a much better option is to rely on a Transformer.

14.4 File Transformers

To transform data read from the file system to objects and the other way around you need to do some work. Contrary to FileReadingMessageSource and to a lesser extent FileWritingMessageHandler, it is very likely that you will need your own mechanism to get the job done. For this you can implement the`Transformer` interface. Or extend the AbstractFilePayloadTransformer for inbound messages. Some obvious implementations have been provided.

FileToByteArrayTransformer transforms Files into byte[]s using Spring’s FileCopyUtils. It is often better to use a sequence of transformers than to put all transformations in a single class. In that case the File to byte[] conversion might be a logical first step.

FileToStringTransformer will convert Files to Strings as the name suggests. If nothing else, this can be useful for debugging (consider using with a Wire Tap).

To configure File specific transformers you can use the appropriate elements from the file namespace.

<int-file:file-to-bytes-transformer  input-channel="input" output-channel="output"
    delete-files="true"/>

<int-file:file-to-string-transformer input-channel="input" output-channel="output"
    delete-files="true" charset="UTF-8"/>

The delete-files option signals to the transformer that it should delete the inbound File after the transformation is complete. This is in no way a replacement for using the`AcceptOnceFileListFilter` when the FileReadingMessageSource is being used in a multi-threaded environment (e.g. Spring Integration in general).

14.5 File Splitter

The FileSplitter was added in version 4.1.2 and namespace support was added in version 4.2. The FileSplitter splits text files into individual lines, based on BufferedReader.readLine(). By default, the splitter uses an Iterator to emit lines one-at-a-time as they are read from the file. Setting the iterator property to false causes it to read all the lines into memory before emitting them as messages. One use case for this might be if you want to detect I/O errors on the file before sending any messages containing lines. However, it is only practical for relatively short files.

Inbound payloads can be File, String (a File path), InputStream, or Reader. Other payload types will be emitted unchanged.

<int-file:splitter id="splitter" 1
    iterator="" 2
    markers="" 3
    apply-sequence="" 4
    requires-reply="" 5
    charset="" 6
    input-channel="" 7
    output-channel="" 8
    send-timeout="" 9
    auto-startup="" 10
    order="" 11
    phase="" /> 12

1

The bean name of the splitter.

2

Set to true to use an iterator (default); false to load the file into memory before sending lines.

3

Set to true to emit start/end of file marker messages before and after the file data. Markers are messages with FileSplitter.FileMarker payloads (with START and END values in the mark property). Markers might be used when sequentially processing files in a downstream flow where some lines are filtered. They enable the downstream processing to know when a file has been completely processed. The END marker includes a line count. Default: false. When true, apply-sequence is false by default.

4

Set to false to disable the inclusion of sequenceSize and sequenceNumber headers in messages. Default: true, unless markers is true. When true and markers is true, the markers are included in the sequencing. When true and iterator is true, the sequenceSize header is set to 0 because the size is unknown.

5

Set to true to cause a RequiresReplyException to be thrown if there are no lines in the file. Default: false.

6

Set the charset name to be used when reading the text data into String payloads. Default: platform charset.

7

Set the input channel used to send messages to the splitter.

8

Set the output channel to which messages will be sent.

9

Set the send timeout - only applies if the output-channel can block - such as a full QueueChannel.

10

Set to false to disable automatically starting the splitter when the context is refreshed. Default: true.

11

Set the order of this endpoint if the input-channel is a <publish-subscribe-channel/>.

12

Set the startup phase for the splitter (used when auto-startup is true).

Java Configuration

@Splitter(inputChannel="toSplitter")
@Bean
public MessageHandler fileSplitter() {
    FileSplitter splitter = new FileSplitter(true, true);
    splitter.setApplySequence(true);
    splitter.setOutputChannel(outputChannel);
    return splitter;
}