More Related Content Similar to Topic discovery through data dependent and random projections (20)
PDF
第47回TokyoWebMining, トピックモデリングによる評判分析
I_eric_Y
PPTX
Mining topics in documents standing on the shoulders of Big Data. #KDD2014読み...
Hiroki Takanashi
PPTX
Minimally Supervised Classification to Semantic Categories using Automaticall...
sakaizawa
PDF
クラスタリングとレコメンデーション資料
洋資 堅田
PDF
Topic Model Survey (wsdm2012)
ybenjo
PDF
逐次ベイズ学習 - サンプリング近似法の場合 -
y-uti
PPTX
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
Makoto Takenaka
PDF
Detecting Research Topics via the Correlation between Graphs and Texts
Shunya Ueta
PDF
Appendix document of Chapter 6 for Mining Text Data
Yuki Nakayama
More from Takanori Nakai (17)
PDF
WSDM2018 読み会 Latent cross making use of context in recurrent recommender syst...
Takanori Nakai
PDF
ICLR2017読み会 Data Noising as Smoothing in Neural Network Language Models @Dena
Takanori Nakai
PDF
Note : Noise constastive estimation of unnormalized statictics methods
Takanori Nakai
PDF
Adaptive subgradient methods for online learning and stochastic optimization ...
Takanori Nakai
PDF
Learning Better Embeddings for Rare Words Using Distributional Representations
Takanori Nakai
PDF
Preference-oriented Social Networks_Group Recommendation and Inference
Takanori Nakai
PDF
Analysis of Learning from Positive and Unlabeled Data
Takanori Nakai
PDF
Positive Unlabeled Learning for Deceptive Reviews Detection
Takanori Nakai
PDF
Modeling Mass Protest Adoption in Social Network Communities using Geometric ...
Takanori Nakai
PDF
Unsupervised Graph-based Topic Labelling using DBpedia
Takanori Nakai
PDF
Psychological Advertising_Exploring User Psychology for Click Prediction in S...
Takanori Nakai
Topic discovery through data dependent and random projections6. 通常は、
p(w|α,β) = ∫ dz p(w|z, β) * p(z|α)
p(z|α) = ∫dΘ p(z, Θ|α)
p(w|z, β) = ∫dφ p(w, φ|z, β)
= ∫dφ p(w,|z, φ) * p(φ| β)
(α、βはディリクレ分布に従う。)
より、 p(w|α,β)の対数尤度を明示的に書き下
す。
次に、Jensenの不等式から対数尤度の下限値
の最大値を求め(Eステップ)、対数尤度を
最大化(Mステップ)し、α・βを求める。た
9. 1. 仮定
一つのトピックには一つの特徴的な単語
2. 特徴的な単語の発見
コ-パス中の各文書の単語頻度が観測さ
れた時、特徴的な単語を抽出します
Data Dependent Projections Algorithm
Random Projections Algorithm
Binning Algorithm
3. 特徴的な単語のクラスタリング
4. トピック推定
16. シミュレ-ション方法
Step1.1 iid 1×K row-vectors corresponding to
nonnovel words are generated uniformly
Step1.2 W_1 iid Uniform[0, 1] values are gener-
ated for the nonzero entries in the rows of novel
words.
Step1.3 The resulting matrix is then column-
normalized to get one realization of β
ρ := W_1/W
Step2. M iid K × 1 column-vectors are generated
for the θ matrix according to a Dirichlet prior
Step3. we obtain X by generating N iid words for
each document
19. NIPSデ-タセット NY Times
most of the topics extracted by RP
and DDP are similar and are
comparable with that of Gibbs
For example,
RecL2 is not extracted
RecL2 miss “weather” and
“emotions”
Chip designと
いうトピッ
ク