Class TokenTextSplitter
java.lang.Object
org.springframework.ai.transformer.splitter.TextSplitter
org.springframework.ai.transformer.splitter.TokenTextSplitter
A
TextSplitter that splits text into chunks of a target size in tokens.- Author:
- Raphael Yu, Christian Tzolov, Ricken Bazolo
-
Nested Class Summary
Nested Classes -
Constructor Summary
ConstructorsConstructorDescriptionTokenTextSplitter(boolean keepSeparator) TokenTextSplitter(int chunkSize, int minChunkSizeChars, int minChunkLengthToEmbed, int maxNumChunks, boolean keepSeparator) -
Method Summary
Methods inherited from class TextSplitter
apply, isCopyContentFormatter, setCopyContentFormatter, split, splitMethods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface DocumentTransformer
transform
-
Constructor Details
-
TokenTextSplitter
public TokenTextSplitter() -
TokenTextSplitter
public TokenTextSplitter(boolean keepSeparator) -
TokenTextSplitter
public TokenTextSplitter(int chunkSize, int minChunkSizeChars, int minChunkLengthToEmbed, int maxNumChunks, boolean keepSeparator)
-
-
Method Details
-
builder
-
splitText
- Specified by:
splitTextin classTextSplitter
-
doSplit
Splits text into chunks based on token count.Punctuation-based splitting only applies when the token count exceeds the chunk size (
tokens.size() > chunkSize). Text that exactly matches or is smaller than the chunk size is returned as a single chunk without punctuation-based truncation.- Parameters:
text- the text to splitchunkSize- the target chunk size in tokens- Returns:
- list of text chunks
-