Class TokenTextSplitter

java.lang.Object
org.springframework.ai.transformer.splitter.TextSplitter
org.springframework.ai.transformer.splitter.TokenTextSplitter
All Implemented Interfaces:
Function<List<Document>, List<Document>>, DocumentTransformer

public class TokenTextSplitter extends TextSplitter
A TextSplitter that splits text into chunks of a target size in tokens.
Author:
Raphael Yu, Christian Tzolov, Ricken Bazolo
  • Constructor Details

    • TokenTextSplitter

      public TokenTextSplitter()
    • TokenTextSplitter

      public TokenTextSplitter(boolean keepSeparator)
    • TokenTextSplitter

      public TokenTextSplitter(int chunkSize, int minChunkSizeChars, int minChunkLengthToEmbed, int maxNumChunks, boolean keepSeparator)
  • Method Details

    • builder

      public static TokenTextSplitter.Builder builder()
    • splitText

      protected List<String> splitText(String text)
      Specified by:
      splitText in class TextSplitter
    • doSplit

      protected List<String> doSplit(String text, int chunkSize)
      Splits text into chunks based on token count.

      Punctuation-based splitting only applies when the token count exceeds the chunk size (tokens.size() > chunkSize). Text that exactly matches or is smaller than the chunk size is returned as a single chunk without punctuation-based truncation.

      Parameters:
      text - the text to split
      chunkSize - the target chunk size in tokens
      Returns:
      list of text chunks