Topical Word Embeddings

最新推荐文章于 2025-07-10 10:38:34 发布

woailuo512

最新推荐文章于 2025-07-10 10:38:34 发布

阅读量1.6k

点赞数

CC 4.0 BY-SA版权

分类专栏： Paper/Word Embedding

本文链接：https://blog.csdn.net/woailuo512/article/details/78373740

论文《Topical Word Embeddings》探讨了Word Embedding面临的homonymy和polysemy问题，提出了三种模型以改进multi-prototype方法的不足。这些模型在上下文词相似性和文本分类任务中进行了实验，结果显示单词的多种含义之间存在关联，且宏观和微观平均精度、召回率和F1分数展示了模型的效果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

论文《 Topical Word Embeddings 》记录

paper
code

Word Embedding面临的问题

homonymy and polysemy

解决homonymy and polysemy方法

multi-prototype：对每个word赋予多个embedding

当前multi-prototype方法的缺点

1). These models generate multi-prototype vectors for each word in isolation, ignoring complicated correlations among words as well as their contexts. 说的很抽象
2). In multi-prototype setting, contexts of a word are divided into clusters with no overlaps. In reality, a word’s several senses may correlate with each other, and there is not clear semantic boundary between them.

解决上述缺点的方法(提出三个模型)

TWE

TWE三个模型的缺点

TWE-1: TWE-1 does not consider the immediate interaction between a word and its assigned topic for learning（单词和主题向量没有直接的交互）
TWE-2： TWE-2 considers the inner interaction of a word-topic pair by simply regarding the pair as a pseudo word, but it suffers from the sparsity issue because the occurrences of each word are rigidly discriminated into different topics.（假设单词在语料中出现N次，每个主题下的单词平均只能学习到 $N/T$ 次）
TWE-3: TWE-3 provides trade-off between discrimination and sparsity. But during the learning process of TWE-3, topic embeddings will influence the corresponding word embeddings, which may make those words in the same topic less discriminative.( $T<<W$ )

训练细节

Initialization is important for learning TWE models. In TWE-1, we first learn word embeddings using Skip-Gram. Afterwards, we initialize each topic vector with the average over all words assigned to this topics, and learn topic embeddings while keeping word embeddings unchanged. In TWE-2, we initialize the vector of each topic-word pair with the corresponding word vector from Skip-Gram, and learn TWE models. In TWE-3, we initialize word vectors using those from Skip-Gram, and topic vectors using those from TWE-1, and learn TWE models.

Experiments

Contextual Word Similarity

考虑到每个单词只有在上下文的条件下才可以区分，所以在评价multi-prototype模型的时候，采用Contextual Word Similarity任务，试验结果如下：

个人总结： AvgSimC优于MaxSimC, 反映出单词之间的语义还是有交集的，正如作者所说In reality, a word’s several senses may correlate with each other, and there is not clear semantic boundary between them;