org.springframework.ai.document.Document

public class Document extends Object

A document is a container for the content and metadata of a document. It also contains the document's unique ID. A Document can hold either text content or media content, but not both. It is intended to be used to take data from external sources as part of spring-ai's ETL pipeline.

Example of creating a text document:


 // Using constructor
 Document textDoc = new Document("Sample text content", Map.of("source", "user-input"));

 // Using builder
 Document textDoc = Document.builder()
     .text("Sample text content")
     .metadata("source", "user-input")
     .build();

Example of creating a media document:


 // Using constructor
 Media imageContent = new Media(MediaType.IMAGE_PNG, new byte[] {...});
 Document mediaDoc = new Document(imageContent, Map.of("filename", "sample.png"));

 // Using builder
 Document mediaDoc = Document.builder()
     .media(new Media(MediaType.IMAGE_PNG, new byte[] {...}))
     .metadata("filename", "sample.png")
     .build();

Example of checking content type and accessing content:


 if (document.isText()) {
     String textContent = document.getText();
     // Process text content
 } else {
     Media mediaContent = document.getMedia();
     // Process media content
 }

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

Document.Builder
Field Summary

Fields

Modifier and Type

Field

Description

static final ContentFormatter

DEFAULT_CONTENT_FORMATTER
Constructor Summary

Constructors

Constructor

Description

Document(String content)

Document(String id, String text, Map<String,Object> metadata)

Document(String text, Map<String,Object> metadata)

Document(String id, Media media, Map<String,Object> metadata)

Document(Media media, Map<String,Object> metadata)
Method Summary

Modifier and Type

Method

Description

static Document.Builder

builder()

boolean

equals(Object o)

ContentFormatter

getContentFormatter()

Returns the content formatter associated with this document.

String

getFormattedContent()

String

getFormattedContent(ContentFormatter formatter, MetadataMode metadataMode)

Helper content extractor that uses and external ContentFormatter.

String

getFormattedContent(MetadataMode metadataMode)

String

getId()

Returns the unique identifier for this document.

Media

getMedia()

Returns the document's media content, if any.

Map<String,Object>

getMetadata()

Returns the metadata associated with this document.

Double

getScore()

String

getText()

Returns the document's text content, if any.

int

hashCode()

boolean

isText()

Determines whether this document contains text or media content.

Document.Builder

mutate()

void

setContentFormatter(ContentFormatter contentFormatter)

Replace the document's ContentFormatter.

String

toString()

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Field Details
- DEFAULT_CONTENT_FORMATTER
  
  public static final ContentFormatter DEFAULT_CONTENT_FORMATTER
Constructor Details
- Document
  
  public Document(String content)
- Document
  
  public Document(String text, Map<String,Object> metadata)
- Document
  
  public Document(String id, String text, Map<String,Object> metadata)
- Document
  
  public Document(Media media, Map<String,Object> metadata)
- Document
  
  public Document(String id, Media media, Map<String,Object> metadata)
Method Details
- builder
  
  public static Document.Builder builder()
- getId
  
  public String getId()
  
  Returns the unique identifier for this document.
  This ID is either explicitly provided during document creation or generated using the configured IdGenerator (defaults to RandomIdGenerator).
  Returns:
  
  the unique identifier of this document
  
  See Also:
  
  RandomIdGenerator
- getText
  
  @Nullable public String getText()
  
  Returns the document's text content, if any.
  Returns:
  
  the text content if isText() is true, null otherwise
  
  See Also:
  
  isText()
  
  getMedia()
- isText
  
  public boolean isText()
  
  Determines whether this document contains text or media content.
  
  Returns:
  
  true if this document contains text content (accessible via getText()), false if it contains media content (accessible via getMedia())
- getMedia
  
  @Nullable public Media getMedia()
  
  Returns the document's media content, if any.
  Returns:
  
  the media content if isText() is false, null otherwise
  
  See Also:
  
  isText()
  
  getText()
- getFormattedContent
  
  public String getFormattedContent()
- getFormattedContent
  
  public String getFormattedContent(MetadataMode metadataMode)
- getFormattedContent
  
  public String getFormattedContent(ContentFormatter formatter, MetadataMode metadataMode)
  
  Helper content extractor that uses and external ContentFormatter.
- getMetadata
  
  public Map<String,Object> getMetadata()
  
  Returns the metadata associated with this document.
  The metadata values are restricted to simple types (string, int, float, boolean) for compatibility with Vector Databases.
  
  Returns:
  
  the metadata map
- getScore
  
  @Nullable public Double getScore()
- getContentFormatter
  
  public ContentFormatter getContentFormatter()
  
  Returns the content formatter associated with this document.
  
  Returns:
  
  the current ContentFormatter instance used for formatting the document content.
- setContentFormatter
  
  public void setContentFormatter(ContentFormatter contentFormatter)
  
  Replace the document's ContentFormatter.
  
  Parameters:
  
  contentFormatter - new formatter to use.
- mutate
  
  public Document.Builder mutate()
- equals
  
  public boolean equals(Object o)
  
  Overrides:
  
  equals in class Object
- hashCode
  
  public int hashCode()
  
  Overrides:
  
  hashCode in class Object
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object

Class Document

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

DEFAULT_CONTENT_FORMATTER

Constructor Details

Document

Document

Document

Document

Document

Method Details

builder

getId

getText

isText

getMedia

getFormattedContent

getFormattedContent

getFormattedContent

getMetadata

getScore

getContentFormatter

setContentFormatter

mutate

equals

hashCode

toString