Class Document

java.lang.Object
org.springframework.ai.document.Document

public class Document extends Object
A document is a container for the content and metadata of a document. It also contains the document's unique ID. A Document can hold either text content or media content, but not both. It is intended to be used to take data from external sources as part of spring-ai's ETL pipeline.

Example of creating a text document:


 // Using constructor
 Document textDoc = new Document("Sample text content", Map.of("source", "user-input"));

 // Using builder
 Document textDoc = Document.builder()
     .text("Sample text content")
     .metadata("source", "user-input")
     .build();
 

Example of creating a media document:


 // Using constructor
 Media imageContent = new Media(MediaType.IMAGE_PNG, new byte[] {...});
 Document mediaDoc = new Document(imageContent, Map.of("filename", "sample.png"));

 // Using builder
 Document mediaDoc = Document.builder()
     .media(new Media(MediaType.IMAGE_PNG, new byte[] {...}))
     .metadata("filename", "sample.png")
     .build();
 

Example of checking content type and accessing content:


 if (document.isText()) {
     String textContent = document.getText();
     // Process text content
 } else {
     Media mediaContent = document.getMedia();
     // Process media content
 }
 
  • Field Details

    • DEFAULT_CONTENT_FORMATTER

      public static final ContentFormatter DEFAULT_CONTENT_FORMATTER
  • Constructor Details

  • Method Details

    • builder

      public static Document.Builder builder()
    • getId

      public String getId()
      Returns the unique identifier for this document.

      This ID is either explicitly provided during document creation or generated using the configured IdGenerator (defaults to RandomIdGenerator).

      Returns:
      the unique identifier of this document
      See Also:
    • getContent

      @Deprecated public String getContent()
      Deprecated.
      Use getText() instead as it more accurately reflects the content type
    • getText

      @Nullable public String getText()
      Returns the document's text content, if any.
      Returns:
      the text content if isText() is true, null otherwise
      See Also:
    • isText

      public boolean isText()
      Determines whether this document contains text or media content.
      Returns:
      true if this document contains text content (accessible via getText()), false if it contains media content (accessible via getMedia())
    • getMedia

      @Nullable public Media getMedia()
      Returns the document's media content, if any.
      Returns:
      the media content if isText() is false, null otherwise
      See Also:
    • getFormattedContent

      public String getFormattedContent()
    • getFormattedContent

      public String getFormattedContent(MetadataMode metadataMode)
    • getFormattedContent

      public String getFormattedContent(ContentFormatter formatter, MetadataMode metadataMode)
      Helper content extractor that uses and external ContentFormatter.
    • getMetadata

      public Map<String,Object> getMetadata()
      Returns the metadata associated with this document.

      The metadata values are restricted to simple types (string, int, float, boolean) for compatibility with Vector Databases.

      Returns:
      the metadata map
    • getScore

      @Nullable public Double getScore()
    • getContentFormatter

      @Deprecated(since="1.0.0-M4") public ContentFormatter getContentFormatter()
      Deprecated.
      We are considering getting rid of this, please comment on https://github.com/spring-projects/spring-ai/issues/1782
      Returns the content formatter associated with this document.
      Returns:
      the current ContentFormatter instance used for formatting the document content.
    • setContentFormatter

      public void setContentFormatter(ContentFormatter contentFormatter)
      Replace the document's ContentFormatter.
      Parameters:
      contentFormatter - new formatter to use.
    • mutate

      public Document.Builder mutate()
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object