Class TokenTextSplitter
java.lang.Object
org.springframework.ai.transformer.splitter.TextSplitter
org.springframework.ai.transformer.splitter.TokenTextSplitter
A
TextSplitter that splits text into chunks of a target size in tokens.- Author:
- Raphael Yu, Christian Tzolov, Ricken Bazolo
-
Nested Class Summary
Nested Classes -
Constructor Summary
ConstructorsConstructorDescriptionTokenTextSplitter(boolean keepSeparator) TokenTextSplitter(int chunkSize, int minChunkSizeChars, int minChunkLengthToEmbed, int maxNumChunks, boolean keepSeparator) -
Method Summary
Methods inherited from class org.springframework.ai.transformer.splitter.TextSplitter
apply, isCopyContentFormatter, setCopyContentFormatter, split, splitMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.springframework.ai.document.DocumentTransformer
transform
-
Constructor Details
-
TokenTextSplitter
public TokenTextSplitter() -
TokenTextSplitter
public TokenTextSplitter(boolean keepSeparator) -
TokenTextSplitter
public TokenTextSplitter(int chunkSize, int minChunkSizeChars, int minChunkLengthToEmbed, int maxNumChunks, boolean keepSeparator)
-
-
Method Details
-
builder
-
splitText
- Specified by:
splitTextin classTextSplitter
-
doSplit
Splits text into chunks based on token count.Punctuation-based splitting only applies when the token count exceeds the chunk size (
tokens.size() > chunkSize). Text that exactly matches or is smaller than the chunk size is returned as a single chunk without punctuation-based truncation.- Parameters:
text- the text to splitchunkSize- the target chunk size in tokens- Returns:
- list of text chunks
-