Read a file line-by-line and process into database inserts, for example using the Jdbc API. Commit periodically, and if there is a fault where the database transaction rolls back, then the file reader is reset to the place it was after the last successful commit.
To develop a batch process to achieve the goal above should be as simple a process as possible. The more that can be done with simple POJOs and Spring configuration the better.
To keep things simple for now, assume that:
Integration test confirms that
The vanilla successful batch use case proceeds as follows:
If there is an unrecoverable database exception during execution of client code:
If there is an error in the input data in the middle of a chunk (could be manifested as database exception, e.g. uniqueness exception, or nullable exception):
There is no need to reset the input source because the error is fatal.
To restart:
Variations on this theme are also necessary, e.g. a tolerance for a small number of bad records in the input data.
In this version of events there are two kinds of resource in play. The transaction itself, and the data sources that are aware of the transaction. The comparison with DataSourceTransactionManager and JdbcTemplate is obvious. The client is often completely unaware of the transaction manager, which is applied through an interceptor, whereas the data source is used explicitly with its own API through a template. The Client can concentrate on his domain, and not be concerned with infrastructure or resource handling.
RepeatCallback chunkCallback = new RepeatCallback() { public boolean doInIteration(RepeatContext context) { int count = 0; do { Object result = callback.doWithRepeat(context); } while (result!=null && count++<chunkSize); return result!=null; } }); batchTemplate.iterate(chunkCallback);
The transaction boundary is demarcated at the chunk level (chunkCallback.doWithRepeat()). The termination policy depends only on a data source eventually returning null.