1.将Excel文件内容清洗,整理
1.1)不需要的字段去掉
1.2)python+pandas整理输出人可以理解的句子
整理成如下语句:
人员姓名: 阿桂 ,日期: 2022-01-26, 科目编码: 660116, 科目名称: 业务招待费, 金额: 180.0, 摘要: ...
这个工作就让kimi写段程序代劳吧,思路如下:
1)从数据库中取出所有人员姓名
2)比对摘要中是否有该人员姓名(注:摘要中是以_xxx_形式体现人员姓名的)
3)比对好就生成如上样式的语句,存入txt文件
代码如下,主要通过kimi生成,人工智能提高生产力,ye
import pandas as pd
from urllib.parse import quote_plus as urlquote
from sqlalchemy import create_engine
import re
def get_hrnameslist():
USERNAME = 'xx'
PSSWD = 'xx'
SERVERNAME = 'ss'
INSTANCENAME = "\SQLEXPRESS"
PORT = 1433
DB = 'xx'
DRIVER = "ODBC Driver 18 for SQL Server"
engine = create_engine(
f"mssql+pyodbc://{USERNAME}:{urlquote(PSSWD)}@{SERVERNAME}{INSTANCENAME}:{PORT}/{DB}?driver={DRIVER}&charset=utf8mb4&TrustServerCertificate=yes",
fast_executemany=True)
sql_str = f"select lastname from hr"
df = pd.read_sql(sql_str, engine)
data = df.to_dict(orient='records')
names_list = []
for row in data:
names_list.append(row['lastname'])
return names_list
names_list=get_hrnameslist()
filname='序时账(Ewangda2022年)'
infile=f'./static/data/{filname}.xlsx'
outfile= f"./static/data/{filname}.txt" # 输出文件路径
df = pd.read_excel(infile)
data_list = df.to_dict(orient="records")
print(data_list)
# # print(df)
result = []
for item in data_list:
text = item['摘要']
# 使用正则表达式提取 _xxx_ 中的内容
pattern = r"_(.*?)_"
matches = re.findall(pattern, text)
for match in matches:
if match in names_list:
introduction = f"人员: {match}"
sentence = f"{introduction}, {', '.join([f'{key}: {value}' for key, value in item.items()])}"
result.append(sentence)
break
final_string = "\n".join(result)
print(final_string)
# 将字符串写入 .txt 文件
with open(outfile, "w", encoding="utf-8") as file:
file.write(final_string)
print(f"内容已成功写入文件:{outfile}")
2.投喂DeepSeek看看是否有好的效果
1)添加新知识
经过5分多钟对625kB的文件完成的识别
可以测试一下了:
测试的结果非常令人沮丧,完全错误,
提问一句,经过10秒无响应+几秒思考,得出胡说八道的结论
总结:此次测试以失败告终