Class JsoupDocumentReaderConfig.Builder

java.lang.Object
org.springframework.ai.reader.jsoup.config.JsoupDocumentReaderConfig.Builder
Enclosing class:
JsoupDocumentReaderConfig

public static class JsoupDocumentReaderConfig.Builder extends Object
  • Method Details

    • charset

      public JsoupDocumentReaderConfig.Builder charset(String charset)
      Sets the character encoding to use for reading the HTML. Defaults to UTF-8.
      Parameters:
      charset - The charset to use.
      Returns:
      This builder.
    • selector

      public JsoupDocumentReaderConfig.Builder selector(String selector)
      Sets the CSS selector to use for extracting elements. Defaults to "body".
      Parameters:
      selector - The CSS selector.
      Returns:
      This builder.
    • separator

      public JsoupDocumentReaderConfig.Builder separator(String separator)
      Sets the separator string to use when joining text from multiple elements. Defaults to "\n".
      Parameters:
      separator - The separator string.
      Returns:
      This builder.
    • allElements

      public JsoupDocumentReaderConfig.Builder allElements(boolean allElements)
      Enables extracting text from all elements in the body, creating a single document. Overrides the selector setting. Defaults to false.
      Parameters:
      allElements - True to extract all text, false otherwise.
      Returns:
      This builder.
    • groupByElement

      public JsoupDocumentReaderConfig.Builder groupByElement(boolean groupByElement)
      Determines if on the selected element, the content will be read on per-element base.
      Parameters:
      groupByElement - to read text using element as a separator.
      Returns:
      this builder.
    • includeLinkUrls

      public JsoupDocumentReaderConfig.Builder includeLinkUrls(boolean includeLinkUrls)
      Enables the inclusion of link URLs in the document metadata. Defaults to false.
      Parameters:
      includeLinkUrls - True to include link URLs, false otherwise.
      Returns:
      This builder.
    • metadataTag

      public JsoupDocumentReaderConfig.Builder metadataTag(String metadataTag)
      Adds a metadata tag name to extract from the HTML tags.
      Parameters:
      metadataTag - The name of the metadata tag.
      Returns:
      This builder.
    • metadataTags

      public JsoupDocumentReaderConfig.Builder metadataTags(List<String> metadataTags)
      Sets the metadata tags to extract from the HTML tags. Overwrites any previously added tags.
      Parameters:
      metadataTags - The list of metadata tag names.
      Returns:
      This builder.
    • additionalMetadata

      public JsoupDocumentReaderConfig.Builder additionalMetadata(String key, Object value)
      Adds this additional metadata to the all built Documents.
      Returns:
      this builder
    • additionalMetadata

      public JsoupDocumentReaderConfig.Builder additionalMetadata(Map<String,Object> additionalMetadata)
      Adds this additional metadata to the all built Documents.
      Returns:
      this builder
    • build

      public JsoupDocumentReaderConfig build()