Class ParagraphManager
java.lang.Object
org.springframework.ai.reader.pdf.config.ParagraphManager
The ParagraphManager class is responsible for managing the paragraphs and hierarchy of
a PDF document. It can process bookmarks and generate a structured tree of paragraphs,
representing the table of contents (TOC) of the PDF document.
- Author:
- Christian Tzolov
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic final record
Represents a document paragraph metadata and hierarchy. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionflatten()
protected ParagraphManager.Paragraph
generateParagraphs
(ParagraphManager.Paragraph parentParagraph, org.apache.pdfbox.pdmodel.interactive.documentnavigation.outline.PDOutlineNode bookmark, Integer level) For givenPDOutlineNode
bookmark convert all siblingPDOutlineItem
items intoParagraphManager.Paragraph
instances under the parentParagraph.getParagraphsByLevel
(ParagraphManager.Paragraph paragraph, int level, boolean interLevelText)
-
Constructor Details
-
ParagraphManager
public ParagraphManager(org.apache.pdfbox.pdmodel.PDDocument document)
-
-
Method Details
-
flatten
-
getParagraphsByLevel
public List<ParagraphManager.Paragraph> getParagraphsByLevel(ParagraphManager.Paragraph paragraph, int level, boolean interLevelText)
-