Why is this diagraph NOT extracted as images by pymupdf4llm.to_markdown(write_images=True)

hi there, I wonder why this vector graph is not extracted by pymupdf4llm.to_markdown(“uart.pdf”, write_images=True) as images.

https://github.com/pymupdf/RAG/issues/75#issuecomment-2228835925
I’ve found this comment, according to his word about the detection of “significant” vector graph, this should be detected. because inside the rectangle there are texts instead of being empty.

This is the original pdf and the diagram is on page 5.
uart.pdf (221.4 KB)

Strange, I agree I can’t see the drawings in the MD content ( I ensure to have ignore_graphics=False ) , however if I do:

page = doc[4]
page.get_drawings()

for drawing in drawings:
    print(drawing)

I can get the drawing info okay. Need to investigate what is going on with PyMuDPF4LLM here …

Aha - I am appreciating the answer in the issue here too - [Bug] A specific diagram recognized as significant is not extracted as images by pymupdf4llm.to_markdown · Issue #296 · pymupdf/RAG · GitHub :slight_smile:

There is a bug in pymupdf4lm obviously. We usually ignore fill-only vectors (have no borders) that have the same color as the page background. Which is white in this case.