一、安装所需要的python库
pip3 install fitz -i https://pypi.tuna.tsinghua.edu.cn/simple
pip3 install PyMuPDF -i https://pypi.tuna.tsinghua.edu.cn/simple
pip3 install paddleocr -i https://pypi.tuna.tsinghua.edu.cn/simple
pip3 install paddlepaddle -i https://mirror.baidu.com/pypi/simple
二、所需提取的内容
发票号码
开票日期
销售方名称
销售方纳税人识别号
金额
import fitz
from paddleocr import PaddleOCR
from openpyxl import load_workbook
pdf = fitz.open('test.pdf')
page = pdf.load_page(0) # 获取第0页
# 提高分辨率
pix = page.get_pixmap()
zoom_x = 8
zoom_y = 8
mat = fitz.Matrix(zoom_x, zoom_y)
pix = page.get_pixmap