Class JsoupDocumentReaderConfig.Builder
java.lang.Object
org.springframework.ai.reader.jsoup.config.JsoupDocumentReaderConfig.Builder
- Enclosing class:
- JsoupDocumentReaderConfig
-
Method Summary
Modifier and TypeMethodDescriptionadditionalMetadata
(String key, Object value) Adds this additional metadata to the all builtDocument
s.additionalMetadata
(Map<String, Object> additionalMetadata) Adds this additional metadata to the all builtDocument
s.allElements
(boolean allElements) Enables extracting text from all elements in the body, creating a single document.build()
Sets the character encoding to use for reading the HTML.groupByElement
(boolean groupByElement) Determines if on the selected element, the content will be read on per-element base.includeLinkUrls
(boolean includeLinkUrls) Enables the inclusion of link URLs in the document metadata.metadataTag
(String metadataTag) Adds a metadata tag name to extract from the HTML tags.metadataTags
(List<String> metadataTags) Sets the metadata tags to extract from the HTML tags.Sets the CSS selector to use for extracting elements.Sets the separator string to use when joining text from multiple elements.
-
Method Details
-
charset
Sets the character encoding to use for reading the HTML. Defaults to UTF-8.- Parameters:
charset
- The charset to use.- Returns:
- This builder.
-
selector
Sets the CSS selector to use for extracting elements. Defaults to "body".- Parameters:
selector
- The CSS selector.- Returns:
- This builder.
-
separator
Sets the separator string to use when joining text from multiple elements. Defaults to "\n".- Parameters:
separator
- The separator string.- Returns:
- This builder.
-
allElements
Enables extracting text from all elements in the body, creating a single document. Overrides the selector setting. Defaults to false.- Parameters:
allElements
- True to extract all text, false otherwise.- Returns:
- This builder.
-
groupByElement
Determines if on the selected element, the content will be read on per-element base.- Parameters:
groupByElement
- to read text using element as a separator.- Returns:
- this builder.
-
includeLinkUrls
Enables the inclusion of link URLs in the document metadata. Defaults to false.- Parameters:
includeLinkUrls
- True to include link URLs, false otherwise.- Returns:
- This builder.
-
metadataTag
Adds a metadata tag name to extract from the HTML tags.- Parameters:
metadataTag
- The name of the metadata tag.- Returns:
- This builder.
-
metadataTags
Sets the metadata tags to extract from the HTML tags. Overwrites any previously added tags.- Parameters:
metadataTags
- The list of metadata tag names.- Returns:
- This builder.
-
additionalMetadata
Adds this additional metadata to the all builtDocument
s.- Returns:
- this builder
-
additionalMetadata
Adds this additional metadata to the all builtDocument
s.- Returns:
- this builder
-
build
-