How to fix code=4: no font file for digest?

eamag · June 29, 2025, 11:47am

I’m trying to extract text from this pdf https://openreview.net/pdf?id=g90RNzs8wX using pymupdf4llm.to_markdown(pdf_path), is there a way to fix a font error? Thanks!

Jamie_Lemon · June 30, 2025, 2:07pm

Interesting, I see the error I think on page 26:
[========================================e=RuntimeError('code=4: no font file for digest')

I was running the following command:

md_text = pymupdf4llm.to_markdown("1522_Unifying_Unsupervised_Gra.pdf", page_chunks=False, extract_words=False, show_progress=True)

If I extract that page then it works. ( see my 1522_Unifying_Unsupervised_Gra-edit.pdf file )

@HaraldLieder What do you think is “wrong” with page 26 here?

1522_Unifying_Unsupervised_Gra-26.pdf (720.9 KB)
1522_Unifying_Unsupervised_Gra-edit.pdf (1.0 MB)

Jamie_Lemon · June 30, 2025, 2:08pm

Also @eamag Welcome to the forum and thanks for your post!!!

HaraldLieder · June 30, 2025, 2:28pm

This is caused by an upstream (MuPDF) problem. Recent versions of PyMuPDF4LLM make active use of MuPDF’s advanced detection of “faked” bold text. This is text written with a standard (non-bold) font such that it appears bold by writing the same text twice … with a small displacement.

This algorithm is quite complex and only works for non-Type3 fonts. The error you report currently happens because of a missing check for text in a Type 3 font.
MuPDF bug report has already been submitted.

Topic		Replies	Views
Any idea what is wrong with this PDF? Discussions	6	11	July 9, 2025
PDF File Size Reduction How To size-reduction	0	33	June 14, 2025
Looking for a smart way to extract pdf pages per article Discussions	13	24	July 4, 2025
Watermarking PDFs How To watermarking	3	26	June 13, 2025
日本語フォーラムについて Japanese Forum	0	15	June 6, 2025

How to fix code=4: no font file for digest?

Related topics