Package org.springframework.ai.document
Class Document
java.lang.Object
org.springframework.ai.document.Document
A document is a container for the content and metadata of a document. It also contains
the document's unique ID.
A Document can hold either text content or media content, but not both.
It is intended to be used to take data from external sources as part of spring-ai's ETL
pipeline.
Example of creating a text document:
// Using constructor
Document textDoc = new Document("Sample text content", Map.of("source", "user-input"));
// Using builder
Document textDoc = Document.builder()
.text("Sample text content")
.metadata("source", "user-input")
.build();
Example of creating a media document:
// Using constructor
Media imageContent = new Media(MediaType.IMAGE_PNG, new byte[] {...});
Document mediaDoc = new Document(imageContent, Map.of("filename", "sample.png"));
// Using builder
Document mediaDoc = Document.builder()
.media(new Media(MediaType.IMAGE_PNG, new byte[] {...}))
.metadata("filename", "sample.png")
.build();
Example of checking content type and accessing content:
if (document.isText()) {
String textContent = document.getText();
// Process text content
} else {
Media mediaContent = document.getMedia();
// Process media content
}
-
Nested Class Summary
Nested Classes -
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescription -
Method Summary
Modifier and TypeMethodDescriptionstatic Document.Builderbuilder()booleanReturns the content formatter associated with this document.getFormattedContent(ContentFormatter formatter, MetadataMode metadataMode) Helper content extractor that uses and externalContentFormatter.getFormattedContent(MetadataMode metadataMode) getId()Returns the unique identifier for this document.@Nullable MediagetMedia()Returns the document's media content, if any.Returns the metadata associated with this document.@Nullable DoublegetScore()@Nullable StringgetText()Returns the document's text content, if any.inthashCode()booleanisText()Determines whether this document contains text or media content.mutate()voidsetContentFormatter(ContentFormatter contentFormatter) Replace the document'sContentFormatter.toString()
-
Field Details
-
DEFAULT_CONTENT_FORMATTER
-
-
Constructor Details
-
Document
-
Document
-
Document
-
Document
-
Document
-
-
Method Details
-
builder
-
getId
Returns the unique identifier for this document.This ID is either explicitly provided during document creation or generated using the configured
IdGenerator(defaults toRandomIdGenerator).- Returns:
- the unique identifier of this document
- See Also:
-
getText
Returns the document's text content, if any.- Returns:
- the text content if
isText()is true, null otherwise - See Also:
-
isText
public boolean isText()Determines whether this document contains text or media content.- Returns:
- true if this document contains text content (accessible via
getText()), false if it contains media content (accessible viagetMedia())
-
getMedia
Returns the document's media content, if any.- Returns:
- the media content if
isText()is false, null otherwise - See Also:
-
getFormattedContent
-
getFormattedContent
-
getFormattedContent
Helper content extractor that uses and externalContentFormatter. -
getMetadata
Returns the metadata associated with this document.The metadata values are restricted to simple types (string, int, float, boolean) for compatibility with Vector Databases.
- Returns:
- the metadata map
-
getScore
-
getContentFormatter
Returns the content formatter associated with this document.- Returns:
- the current ContentFormatter instance used for formatting the document content.
-
setContentFormatter
Replace the document'sContentFormatter.- Parameters:
contentFormatter- new formatter to use.
-
mutate
-
equals
-
hashCode
public int hashCode() -
toString
-