Package org.springframework.ai.document
Class Document
java.lang.Object
org.springframework.ai.document.Document
A document is a container for the content and metadata of a document. It also contains
the document's unique ID.
A Document can hold either text content or media content, but not both.
It is intended to be used to take data from external sources as part of spring-ai's ETL
pipeline.
Example of creating a text document:
// Using constructor
Document textDoc = new Document("Sample text content", Map.of("source", "user-input"));
// Using builder
Document textDoc = Document.builder()
.text("Sample text content")
.metadata("source", "user-input")
.build();
Example of creating a media document:
// Using constructor
Media imageContent = new Media(MediaType.IMAGE_PNG, new byte[] {...});
Document mediaDoc = new Document(imageContent, Map.of("filename", "sample.png"));
// Using builder
Document mediaDoc = Document.builder()
.media(new Media(MediaType.IMAGE_PNG, new byte[] {...}))
.metadata("filename", "sample.png")
.build();
Example of checking content type and accessing content:
if (document.isText()) {
String textContent = document.getText();
// Process text content
} else {
Media mediaContent = document.getMedia();
// Process media content
}
-
Nested Class Summary
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic Document.Builder
builder()
boolean
Deprecated.Use getText() instead as it more accurately reflects the content typeDeprecated.We are considering getting rid of this, please comment on https://github.com/spring-projects/spring-ai/issues/1782getFormattedContent
(ContentFormatter formatter, MetadataMode metadataMode) Helper content extractor that uses and externalContentFormatter
.getFormattedContent
(MetadataMode metadataMode) getId()
Returns the unique identifier for this document.getMedia()
Returns the document's media content, if any.Returns the metadata associated with this document.getScore()
getText()
Returns the document's text content, if any.int
hashCode()
boolean
isText()
Determines whether this document contains text or media content.mutate()
void
setContentFormatter
(ContentFormatter contentFormatter) Replace the document'sContentFormatter
.toString()
-
Field Details
-
DEFAULT_CONTENT_FORMATTER
-
-
Constructor Details
-
Document
-
Document
-
Document
-
Document
-
Document
-
-
Method Details
-
builder
-
getId
Returns the unique identifier for this document.This ID is either explicitly provided during document creation or generated using the configured
IdGenerator
(defaults toRandomIdGenerator
).- Returns:
- the unique identifier of this document
- See Also:
-
getContent
Deprecated.Use getText() instead as it more accurately reflects the content type -
getText
Returns the document's text content, if any.- Returns:
- the text content if
isText()
is true, null otherwise - See Also:
-
isText
public boolean isText()Determines whether this document contains text or media content.- Returns:
- true if this document contains text content (accessible via
getText()
), false if it contains media content (accessible viagetMedia()
)
-
getMedia
Returns the document's media content, if any. -
getFormattedContent
-
getFormattedContent
-
getFormattedContent
Helper content extractor that uses and externalContentFormatter
. -
getMetadata
Returns the metadata associated with this document.The metadata values are restricted to simple types (string, int, float, boolean) for compatibility with Vector Databases.
- Returns:
- the metadata map
-
getScore
-
getContentFormatter
Deprecated.We are considering getting rid of this, please comment on https://github.com/spring-projects/spring-ai/issues/1782Returns the content formatter associated with this document.- Returns:
- the current ContentFormatter instance used for formatting the document content.
-
setContentFormatter
Replace the document'sContentFormatter
.- Parameters:
contentFormatter
- new formatter to use.
-
mutate
-
equals
-
hashCode
public int hashCode() -
toString
-