Package org.springframework.ai.reader
Class ExtractedTextFormatter
java.lang.Object
org.springframework.ai.reader.ExtractedTextFormatter
A utility to reformat extracted text content before encapsulating it in a
Document
. This formatter provides the following
functionalities:
- Left alignment of text
- Removal of specified lines from the beginning and end of content
- Consolidation of consecutive blank lines
ExtractedTextFormatter.Builder
nested class.- Author:
- Christian Tzolov
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
TheBuilder
class is a nested static class ofExtractedTextFormatter
designed to facilitate the creation and customization of instances ofExtractedTextFormatter
. -
Method Summary
Modifier and TypeMethodDescriptionstatic String
alignToLeft
(String pageText) builder()
Provides an instance of the builder for this formatter.static ExtractedTextFormatter
defaults()
Provides a default instance of the formatter.static String
deleteBottomTextLines
(String pageText, int numberOfLines) Removes the specified number of lines from the bottom part of the text.static String
deleteTopTextLines
(String pageText, int numberOfLines) Removes a specified number of lines from the top part of the given text.Formats the provided text according to the formatter's configuration.Formats the provided text based on the formatter's configuration, considering the page number.static String
trimAdjacentBlankLines
(String pageText) Replaces multiple, adjacent blank lines into a single blank line.
-
Method Details
-
builder
Provides an instance of the builder for this formatter.- Returns:
- an instance of the builder.
-
defaults
Provides a default instance of the formatter.- Returns:
- default instance of the formatter.
-
format
Formats the provided text according to the formatter's configuration.- Parameters:
pageText
- Text to be formatted.- Returns:
- Formatted text.
-
format
Formats the provided text based on the formatter's configuration, considering the page number.- Parameters:
pageText
- Text to be formatted.pageNumber
- Page number of the provided text.- Returns:
- Formatted text.
-
trimAdjacentBlankLines
Replaces multiple, adjacent blank lines into a single blank line.- Parameters:
pageText
- text to adjust the blank lines for.- Returns:
- Returns the same text but with blank lines trimmed.
-
alignToLeft
- Parameters:
pageText
- text to align.- Returns:
- Returns the same text but aligned to the left side.
-
deleteBottomTextLines
Removes the specified number of lines from the bottom part of the text.- Parameters:
pageText
- Text to remove lines from.numberOfLines
- Number of lines to remove.- Returns:
- Returns the text striped from last lines.
-
deleteTopTextLines
Removes a specified number of lines from the top part of the given text.This method takes a text and trims it by removing a certain number of lines from the top. If the provided text is null or contains only whitespace, it will be returned as is. If the number of lines to remove exceeds the actual number of lines in the text, the result will be an empty string.
The method identifies lines based on the system's line separator, making it compatible with different platforms.
- Parameters:
pageText
- The text from which the top lines need to be removed. If this is null, empty, or consists only of whitespace, it will be returned unchanged.numberOfLines
- The number of lines to remove from the top of the text. If this exceeds the actual number of lines in the text, an empty string will be returned.- Returns:
- The text with the specified number of lines removed from the top.
-