11. LDIF Parsing

11.1 Introduction

LDAP Directory Interchange Format (LDIF) files are the standard medium for describing directory data in a flat file format. The most common uses of this format include information transfer and archival. However, the standard also defines a way to describe modifications to stored data in a flat file format. LDIFs of this later type are typically referred to as changetype or modify LDIFs.

The org.springframework.ldap.ldif package provides classes needed to parse LDIF files and deserialize them into tangible objects. The LdifParser is the main class of the org.springframework.ldap.ldif package and is capable of parsing files that are RFC 2849 compliant. This class reads lines from a resource and assembles them into an LdapAttributes object. The LdifParser currently ignores changetype LDIF entries as their usefulness in the context of an application has yet to be determined.

11.2 Object Representation

Two classes in the org.springframework.ldap.core package provide the means to represent an LDIF in code:

  • LdapAttribute - Extends javax.naming.directory.BasicAttribute adding support for LDIF options as defined in RFC2849.

  • LdapAttributes - Extends javax.naming.directory.BasicAttributes adding specialized support for DNs.

LdapAttribute objects represent options as a Set<String>. The DN support added to the LdapAttributes object employs the org.springframework.ldap.core.DistinguishedName class.

11.3 The Parser

The Parser interface provides the foundation for operation and employs three supporting policy definitions:

  • SeparatorPolicy - establishes the mechanism by which lines are assembled into attributes.

  • AttributeValidationPolicy - ensures that attributes are correctly structured prior to parsing.

  • Specification - provides a mechanism by which object structure can be validated after assembly.

The default implementations of these interfaces are the org.springframework.ldap.ldif.parser.LdifParser, the org.springframework.ldap.ldif.support.SeparatorPolicy, and the org.springframework.ldap.ldif.support.DefaultAttributeValidationPolicy, and the org.springframework.ldap.schema.DefaultSchemaSpecification respectively. Together, these 4 classes parse a resource line by line and translate the data into LdapAttributes objects.

The SeparatorPolicy determines how individual lines read from the source file should be interpreted as the LDIF specification allows attributes to span multiple lines. The default policy assess lines in the context of the order in which they were read to determine the nature of the line in consideration. control attributes and changetype records are ignored.

The DefaultAttributeValidationPolicy uses REGEX expressions to ensure each attribute conforms to a valid attribute format according to RFC 2849 once parsed. If an attribute fails validation, an InvalidAttributeFormatException is logged and the record is skipped (the parser returns null).

11.4 Schema Validation

A mechanism for validating parsed objects against a schema and is available via the Specification interface in the org.springframework.ldap.schema package. The DefaultSchemaSpecification does not do any validation and is available for instances where records are known to be valid and not required to be checked. This option saves the performance penalty that validation imposes. The BasicSchemaSpecification applies basic checks such as ensuring DN and object class declarations have been provided. Currently, validation against an actual schema requires implementation of the Specification interface.

11.5 Spring Batch Integration

While the LdifParser can be employed by any application that requires parsing of LDIF files, Spring offers a batch processing framework that offers many file processing utilities for parsing delimited files such as CSV. The org.springframework.ldap.ldif.batch package offers the classes necessary for using the LdifParser as a valid configuration option in the Spring Batch framework.

There are 5 classes in this package which offer three basic use cases:

  • Use Case 1: Read LDIF records from a file and return an LdapAttributes object.

  • Use Case 2: Read LDIF records from a file and map records to Java objects (POJOs).

  • Use Case 3: Write LDIF records to a file.

The first use case is accomplished with the LdifReader. This class extends Spring Batch's AbstractItemCountingItemSteamItemReader and implements its ResourceAwareItemReaderItemStream. It fits naturally into the framework and can be used to read LdapAttributes objects from a file.

The MappingLdifReader can be used to map LDIF objects directly to any POJO. This class requires an implementation of the RecordMapper interface be provided. This implementation should implement the logic for mapping objects to POJOs.

The RecordCallbackHandler can be implemented and provided to either reader. This handler can be used to operate on skipped records. Consult the Spring Batch documentation for more information.

The last member of this package, the LdifAggregator, can be used to write LDIF records to a file. This class simply invokes the toString() method of the LdapAttributes object.