Class JsoupDocumentReaderConfig.Builder
java.lang.Object
org.springframework.ai.reader.jsoup.config.JsoupDocumentReaderConfig.Builder
- Enclosing class:
- JsoupDocumentReaderConfig
-
Method Summary
Modifier and TypeMethodDescriptionadditionalMetadata(String key, Object value) Adds this additional metadata to the all builtDocuments.additionalMetadata(Map<String, Object> additionalMetadata) Adds this additional metadata to the all builtDocuments.allElements(boolean allElements) Enables extracting text from all elements in the body, creating a single document.build()Sets the character encoding to use for reading the HTML.groupByElement(boolean groupByElement) Determines if on the selected element, the content will be read on per-element base.includeLinkUrls(boolean includeLinkUrls) Enables the inclusion of link URLs in the document metadata.metadataTag(String metadataTag) Adds a metadata tag name to extract from the HTML tags.metadataTags(List<String> metadataTags) Sets the metadata tags to extract from the HTML tags.Sets the CSS selector to use for extracting elements.Sets the separator string to use when joining text from multiple elements.
-
Method Details
-
charset
Sets the character encoding to use for reading the HTML. Defaults to UTF-8.- Parameters:
charset- The charset to use.- Returns:
- This builder.
-
selector
Sets the CSS selector to use for extracting elements. Defaults to "body".- Parameters:
selector- The CSS selector.- Returns:
- This builder.
-
separator
Sets the separator string to use when joining text from multiple elements. Defaults to "\n".- Parameters:
separator- The separator string.- Returns:
- This builder.
-
allElements
Enables extracting text from all elements in the body, creating a single document. Overrides the selector setting. Defaults to false.- Parameters:
allElements- True to extract all text, false otherwise.- Returns:
- This builder.
-
groupByElement
Determines if on the selected element, the content will be read on per-element base.- Parameters:
groupByElement- to read text using element as a separator.- Returns:
- this builder.
-
includeLinkUrls
Enables the inclusion of link URLs in the document metadata. Defaults to false.- Parameters:
includeLinkUrls- True to include link URLs, false otherwise.- Returns:
- This builder.
-
metadataTag
Adds a metadata tag name to extract from the HTML tags.- Parameters:
metadataTag- The name of the metadata tag.- Returns:
- This builder.
-
metadataTags
Sets the metadata tags to extract from the HTML tags. Overwrites any previously added tags.- Parameters:
metadataTags- The list of metadata tag names.- Returns:
- This builder.
-
additionalMetadata
Adds this additional metadata to the all builtDocuments.- Returns:
- this builder
-
additionalMetadata
Adds this additional metadata to the all builtDocuments.- Returns:
- this builder
-
build
-