Why is this diagraph NOT extracted as images by pymupdf4llm.to_markdown(write_images=True)

Zhaobin · July 18, 2025, 10:03am

hi there, I wonder why this vector graph is not extracted by pymupdf4llm.to_markdown(“uart.pdf”, write_images=True) as images.

https://github.com/pymupdf/RAG/issues/75#issuecomment-2228835925
I’ve found this comment, according to his word about the detection of “significant” vector graph, this should be detected. because inside the rectangle there are texts instead of being empty.

This is the original pdf and the diagram is on page 5.
uart.pdf (221.4 KB)

Jamie_Lemon · July 18, 2025, 1:18pm

Strange, I agree I can’t see the drawings in the MD content ( I ensure to have ignore_graphics=False ) , however if I do:

page = doc[4]
page.get_drawings()

for drawing in drawings:
    print(drawing)

I can get the drawing info okay. Need to investigate what is going on with PyMuDPF4LLM here …

Jamie_Lemon · July 18, 2025, 1:23pm

Aha - I am appreciating the answer in the issue here too - [Bug] A specific diagram recognized as significant is not extracted as images by pymupdf4llm.to_markdown · Issue #296 · pymupdf/RAG · GitHub

HaraldLieder · July 18, 2025, 5:24pm

There is a bug in pymupdf4lm obviously. We usually ignore fill-only vectors (have no borders) that have the same color as the page background. Which is white in this case.

Topic		Replies	Views
Why is this graphic NOT extracted as images by pymupdf4llm.to_markdown(write_images=True) Discussions	5	19	July 22, 2025
Graphic wrongly placed in md file output from pymupdf4llm.to_markdown Discussions	11	14	July 22, 2025
Smart cropping of diagrams How To	8	25	June 26, 2025
How to fix code=4: no font file for digest? How To	3	13	June 30, 2025
PyMuPDFを使った興味深い記事がいくつかあります Japanese Forum	0	5	July 31, 2025

Why is this diagraph NOT extracted as images by pymupdf4llm.to_markdown(write_images=True)

Related topics