Python 中文文件统计词频 + 中文词云

最新推荐文章于 2025-06-04 10:09:39 发布

原创

最新推荐文章于 2025-06-04 10:09:39 发布 · 8.6k 阅读

117 ·

CC 4.0 BY-SA版权

本文介绍了如何使用Python进行中文文件的词频统计，展示了一个例子，统计了《三国演义》中的人物出场次数，结果显示曹操、孔明和刘备出现最频繁。此外，还提到了如何解决在生成词云图时可能出现的乱码问题，并提供了相关解决方案的链接。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1. 词频统计：

import jieba
txt = open("threekingdoms3.txt", "r", encoding='utf-8').read()
words  = jieba.lcut(txt)
counts = {}
for word in words:
    if len(word) == 1:
        continue
    else:
        counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True)
for i in range(15):
    word, count = items[i]
    print ("{0:<10}{1:>5}".format(word, count))

结果是：

曹操 946
孔明 737
将军 622
玄德 585
却说 534
关公 509
荆州 413
二人 410
丞相 405
玄德曰 390
不可 387
孔明曰 &nb