LITM

作者在2002年在WWW上发了一篇《Topic-Sensitive PageRank 》，又在2003年TKDE上发一篇《Topic-sensitive PageRank: a context-sensitive ranking
algorithm for Web search》
《Topic-Sensitive PageRank》，说的是基于随机游走的PersonalRank算法。（《推荐系统实践》P74页）
allow the query to influence the link-based score
we compute offline a set of PageRank vectors, each biased with a different topic, to create for each page a set of importance scores with
respect to particular topics.
Pages considered important in sme subject domains may not be considered important in others, regardless of what keywords may appear
either in the page or in anchor text reffering to the page.
《推荐系统实践》P73-74讨论了如何度量二分图中两个顶点之间的相关性，并给出了几个影响因素，以及基于这几个因素所提出的算法，见2007年的一片
TKDE 《Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation》
Then instead of using a single global ranking vector, we take the linear combination of the topic-sensitive vectors, weighted using the
similarities of the query (and any available context) to the topics. By using a set of rank vectors, we are able to determine more accurately
which pages are truly the most important with respect to a particular query or query-context.
将权重用一定的方法归一化到[0,1]之间的数值，然后最小化误差平方和。
可以基于PageRank方法得到每个用户和电影的影响力，影响力是不对称的。
现实反馈与隐式反馈，进而有正负样本的概念，见《推荐系统实践》P67页，以及P68页的公式。对于如何生成负样本，见论文《One-Class Collaborative
Filtering》
《Bipartite Graph for Topic Extraction》
目的：Our aim is to investigate and develop techniques that combine the expressive network representation made possible by the
complex networks theory with data streaming techniques for dealing with problem of topic extraction.
Introduction中有个二分图的描述及物理映射描述，可以拿来（本文是带权二分图）。
加入新的节点：Introduction末尾有一句话“The bipartite graph structure can be easily adjusted with the insertion of new vertices. Moreover,
our propagation method can be parallelizable and work in a dynamic context of stream of documents.”可以拿来作为可并行化和流式计算的
分析，末尾部分也有讲到。
Introduction中有一段话“Our proposed method is based on exploration of an effective unsupervised learning ...... However, unlikely
tradictional label propagation techniques, ....”说的是方法基础，并有创新性分析.
Dimensionality reduction can be considered a subtype of clustering;these include a well-known technique LSA based on SVD. 出了可以看成
subtype of clustering，还可以看成是 subtype of topic mining.
提到矩阵方法（如SVD，LSA等等）的drawbacks，such as expensive storage requirements and computing time. 然后提到pLSA，由其缺陷进而
提到LDA，然后讲LDA的缺陷（可以直接拿来用）：LDA based models have a rigorous mathematical treatment of decomposed op- erations
that discover the latent groups (topics). From the practitioner’s perspective, creating a new model and deriving it to an effective and

implementable inference algorithm are hard and tiresome tasks [Rajesh et al., 2014]. Moreover, the mathematical rigour hampers a rapid
exploration of new as- sumptions, heuristics, or adaptations that could be useful in many real scenarios. 进而提到用二分图的好处。
NBI(network-based inference)
《How does label propagation algorithm work in bipartite networks》
主要用于社区划分（community detection of networks），算法的结果是：相同标签的为一个community。
二分图描述：A bipartite network is a special and important class of networks, where nodes can be divided into two disjoint sets,such that
no two nodes within the same set are adjacent (Fig.1). Examples of such networks are actor-movie networks,author-paper networks, etc.
分别优化的描述：In synchronous updating [1], node x at the t-th iteration updates its label based on the labels of its neighbors at the (t-1)-
th iteration.
可以只初始化二分图的一边：In light of this, at the start, rather than assigning labels to every node as what the standard LPA does, we only
need to assign each red (blue) node with a unique label and keep blue (red) nodes unlabeled.
并行化: 一处是 we propagate labels from red (blue) nodes to blue (red) nodes, the relabeling process for different blue (red) nodes is
independent of each other. 另一处是 Parallelism is very important if the network is extremely large, or time is demanding, such as the case
of real time community detection for some online social networks。
提到了几个二分图的数据集，其中可以明显用于主题的有：一个是 A network representing the authorship relations between authors and papers
on the condensed matter archive at 另一个是 actor-movie network （）[8], with each edge describing player X plays in movie Y.
《Bipartite network projection and personal recommendation》
两种网络：Two kinds of bipartite networks are important because of their particular significance in social, economic, and information
systems. "Collaboration network" 和 "opinion network".
用的二步路径："w_ij sums the contribution from all two-step paths between x_i and x_j"(公式(6)后面一句)
节点间不对称的依赖性度量（但是，有个缺陷，公式(7)的分母只统计了相邻节点的数目，即该节点的度，并没有考虑不同的相邻节点是不同的），以及
单一节点的独立性度量（公式(8)，不过并没有说为什么是平方和，而不是直接加和等）。
用平均推荐位置，以及hitting rate作为试验对比参数。
连线的叫法：for each incident entriy a->b
Conclusion and Discussion 部分的复杂度分析可以借用。
《Inductive Model Generation for Text Classification Using a Bipartite Heterogeneous Network》包括ICDM（2012）和JCST（2014）两个版本
文本text，文档document和单词term的关系：texts are represented by a document-term matrix.
heterogeneous network的定义
向量的描述：creating a weight vector for each network object, in which each position of the vector corresponds to a category of the
data set. The weight vector for objects in which there is no information is calculated during the learning process by propagating
information from labeled to unlabeled vertices.The weight of the edges could also be considered to improve the learning.

算法收敛点描述：Step 2 and Step 3 are repeated for every document until a stopping criterion is reached. We adopted as stopping
criterion the maximum number of epochs and a minimum mean squared error, i.e., when the mean squared error of an epoch is less than
a small given value.
新来者：The algorithm induces weights to objects that represents terms of the collection, which indicates the influence of these terms in
the definition of the classes of the documents. After obtaining the weights of the terms for each class, this information is used as a model
to classify unseen documents.
《LFM与矩阵分解》讲人来给物品分类的缺陷，为什么不从数据出发，自动地找到那些类（《推荐系统实践》P65-66）
大部分是从书的内容出发，而不是从书的读者群出发，不能代表各种用户的意见
很难控制分类的粒度。
很难给出一个物品多个分类，而有的书可能属于很多的类。
很难给出多维度的分类。比如，按照作者、译者、出版社等维度进行分类。
难决定一个物品在某一个分类中的权重。
项亮的《Temporal Recommendation》
第2.3.3节：Hoffman在文献 [《Latent class models for collaborative filtering》]提出了隐语义模型 (Latent ClassModel)，该模型用隐类 (Latent
Class)将用户和物品联系起来，它认为用户并不是直接对物品产生兴趣，而是用户对几个类别有兴趣，而物品属于不同的类别，因此这个模型会通过用
户行为数据学习出这些类别，以及用户对类别的兴趣。在 Latent Class Model的基础上，后来很多研究人员提出了矩阵分解模型，也被称为 Latent
Factor Model[《Modeling relationships at multiple scales to improve accuracy of large recommender systems》]。基于矩阵分解的模型有很
多种
《Modeling relationships at multiple scales to improve accuracy of large recommender systems》提出LFM
不用填补数据就可以进行：propose a method that avoids the need for a gauge set or for imputation, by working directly on the sparse set
of known ratings.
LFM模型的公式符号说明：Here, p_u is the u-th row of P, which corresponds to user u. Likewise, q_i is the i-th row of Q, which corresponds
to item i. Similar to Roweis' method, we could alternate between fixing Q and P, thereby obtaining a series of efficiently solvable least
squares problems without requiring impu....Each update of Q or P decreases Err(P,Q), so the process must converge.
讨论到了number of latent factors 的厉害关系：number大，更灵活，会有过拟合风险；number 小，目标函数会比较大。切了克服这个问题，提到
根据前f-1个因子计算第f个因子的算法。（虽然有矛盾，但还是容忍了。This is an undesirable situatioin, as we want to benefit by increasing the
number of factors, thereby explaining more latent aspects of the data. However, we find that we can still treat only the known entries, but
accompany the process with shrinkage to alleviate the overfitting problem.）

内积公式的说明，以及模型的线上线下算法的优势。way, each rating rui is estimated as the inner product of the f factors that we learned for u
and i, that is pT rating rui is estimated as the inner product of the f factors that we learned for u and i, that is pT u pT u qi.major advantage
of sucha regional, factorization-based approach is its computational effi-ciency. The computational burden lies in an offline, preprocess-
ing step where all factors are computed. The actual, online ratingprediction is done instantaneously by taking the inner product oftwo
length-f vectors. Moreover, since the factors are computedby an iterative algorithm, it is easy to adapt them to changes inthe data such as
addition of new ratings, users, or items.
《Towards Explaining Latent Factors with Topic Models in Collaborative Recommender Systems》
LFM的缺点，缺乏解释性（摘要部分）（应用的《》）。Latent factor models have been proved to be the state of the art for the
Collaborative Filtering approach in a Recommender System. However, latent factors obtained with mathematical methods applied to the
user-item matrix can be hardly interpreted by humans. （Introduction部分）：Futher more, it is hard to explain users how a specific
recommendation has been derived and why it matches to their presumed preferences when following a white-box explanations strategy,
i.e. their interpretability is rather low as opposed to explicit knowledge-representation formalisms usch as constraints or logic(本文说的是两
个方面，一个是portability(能够适用于新用户，can be applied to new users(i.e. users without or with only few known ratings))，一个是
interpretability(can be exploited for explaining their recommendations)).
Topic model的作用：The main purpose of these algorithms is the analysis of words in natural language texts in order to discover themes
represented by sorted lists of words. 然后讲了LDA的basic idea。
Recommender Systems的作用：The general idea behind those systems is to exploit information about users, items and relationships
between them such as rating or purchasing actions in order to identify additional serendipitous matches, i.e. pointing users to items they
would otherwise not have found.
Latent Factor models的basic idea：these models factorize the user-item matrix containing the ratings into two smaller matrices, which
summarize information in a lower dimensional space. They are based on the assumption that few dimensions capture most of the signal in
the data and mostly noise, such as erratic or inconsistent rating behavior, is filtered.
One-Class Collaborative Filtering
提方法的普适性的句式：Different application scenarios in the field of business intelligence and analytics would benefit from research
progress into this direction, however this paper focuses on the domain of movie recommendation as a first step.
清洗数据：After removing users and movies with no single remaining rating value the dataset consisted of U_M=6038 users and
3086 movies.
The choice of the optimal number of topics in this range, was guided by the consideration that a high number of topics could
bother the user in the evaluation phase. For this reason we set the number of topics equal to T=30, which seemed to be a good
compromise between topics’ granularity and the cognitive effort of users in order to select the appropriate topics.
概率分布，概率矩阵。The LDA algorithm provides for every movie its’ probability distribution over topics. This set of probabilityvalues can
be represented as a vector θi ∈ RT . This vectorprobability distribution over topics. This set of probabilityvalues can be represented as a
vector θi ∈ RT . This vector is strictly non-negative, 0 ≤ θij per j =1,...,T, and sum toa vector θi ∈ RT . This vector is strictly non-negative, 0
≤ θij per j =1,...,T, and sum tovalues can be represented as a vector θi ∈ RT . This vector is strictly non-negative, 0 ≤ θij per j =1,...,T, and
sum to 1, ?Tj=1 θiis strictly non-negative, 0 ≤ θij per j =1,...,T, and sum to 1, ?Tj=1θij =1. All the probability distributions of movieshave
been organized to form the matrix D ∈ RM×T , whichstrictly non-negative, 0 ≤ θij per j =1,...,T, and sum to 1, ?Tj=1 θij =1. All the
probability distributions of movieshave been organizedj=1 θij =1. All the probability distributions of movieshave been organized to form
the matrix D ∈ RM×T , which by construction is a stochastic matrix.have been organized to form the matrix D ∈ RM×T , which by
construction is a stochastic matrix.

《A Taxonomy for Generating Explanations in Recommender Systems》
explanations in recommender systems的定义：by two properties. First, they are information about recommendations, where a
recommendation is typically a ranked list of items（是排名）. Second, explanations support objectives defined by the recommender
system designer（支持目标）.
为什么要解释：the intention behind disclosing the reasoning process of the system could be to increase the user's confidence in
making the right decision or to provide additional information such that the user can validate the rationality of the proposed
purchase.
《A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation》
Optimal alignment scores(最优配置得分) are less powerful than probabilistic scores that integreate over alignment uncertainty.
(概率化的好处)：When parameters are probabilities rather than arbitrary scores, they are more readily optimized by objective mathematical
criteria. This enables buiding more complex, biologically realistic models with large numbers of parameters.
《Transparent User Models for Personalization》
提到需要解释，并且用word cloud图。we must provide users with meaningful and interpretable answers when they ask,"why did I get
badge X?" A convenient way to visualize badge definitions is via word clouds, with the size of an action proportional to its weight in
the badge. Figure 4 shows six examples of badges leaned from running our model on the Twitter data set described above.
《Probabilistic Matrix Factorization》
A variety of probabilistic factor-based models has been proposed recently. All these models can be viewed as graphical models in which
hidden factor variables have directed connections to
variables that represent user ratings. The major drawback of such models is that [填入本文的基点，缺乏解释性。]
引入先验参数的原因探讨：Given sufficiently many factors, a PMF model can approximate any given matrix arbitrarily well....The simplest way
to control the capacity of a PMF model is by changing the dimensionality of feature vectors. However, when the dataset is unbalanced, i.e.
the number of observations differs significantly mong different rows or columns, this approach fails, since any single number of feature
dimensions will be too high for some feature vectors and too low for others. Regularization parameters such as λU and λV defined above
provide a more flexible approach to regularization......The complexity of the model is controlled by the hyperparameters.....
归一化（可以把评分值归一化）：passed through the logistic function g(x)=1/(1+exp(-x)), which bounds the range of predictions.另一个是：
we map the ratings 1,...K to the interval [0,1] using the fuction t(x)=(x-1)/(K-1), so that the range of valid rating values matches the range
of predictions our model makes.
复杂度：scale well to large datasets
为本模型的参数引入提供依据：Second, most of the existing algorithms have trouble making accurate predictions for users who have very
few ratings......remove all users with fewer than some minimal number of ratings.
局部最优解：a local minimum of the objective function given by Eq. 4 can be found by performing gradient descent in U and V.
《User interest and topic detection for personalized recommendation》
topics是从contents中获取的：propose a novel graphical model to extract hidden topics from web contents, cluster web contents, and
detect users' interests on each cluster.
摘要中实验对比的句式：Experiment results on a public dataset demonstrated the limitation of a traditional content-boosted
approach, and also showed the validity of our proposed techniques.
应用场景的广泛性：it can be widely applied to many other scenarios such as
描述了collaborative filtering 和 content-based 及其hybrid方法。
基于content的缺陷: existing techniques suffer in serious sparsity problems, because messages in online social media are usually short and

sparse. In addition, web users use language creatively and generate rarely used and unknown vocabularies. These combined factors cause
a great difficulty ... to make the most of its advantages.
bipartite graph model: one common approach in collaborative filtering is based on a bipartite graph model. A bipartite graph represents
the relatioinships between users and threads.
we first introduce two baseline methods.(这里可以介绍LFM和personal rank。)
A LDA-like model cannot be applied directly to discover users' interest since it lacks of the clustering property.
《Multidimensional mining of large-scale search logs: a topic-concept cube approach》
主要是两张图，一个是二分图的图，一个是graphical representation的图。
一个单词：topic-concept model
《Private traits and attributes are predictable from digital records of human behavior》
有个根据元素是0或1的user-like矩阵，进行了SVD分解，然后用每个用户的低维向量做回归。
GRG法(Generalized Reduced Gradient Method,广义既约梯度法)，将Wolfe既约梯度法的推广到带非线性等式约束的情形。
既约梯度法(Reduced Gradient Method，1963)，将线性规划的单纯形法推广到具有非线性目标函数的问题.其基本思想是吧变量分为基变量和非基变量，将
基变量用非基变量表示，并从目标函数中消去基变量，得到以非基变量为自变量的简化的目标函数，进而利用此函数的负梯度构造下降可行方向。简化后的目
标函数关于非基变量的梯度称为目标函数的既约梯度。
Frank-Wolfe方法，提出于1956年，与既约梯度法目标类似，都是求解线性约束问题的一种算法。但是基本思想不同，Frank-Wolfe在每次迭代中，将目标函
数f(x)线性化，通过解线性规划求得下降可行方向，进而沿此方向在可行域内作一维搜索。
用简约梯度法求函数最小值
《推荐系统实践》P72页，无法提供推荐解释。
《Latent Dirichlet Allocation》
想法（摘要部分）in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn,
modeled as an infinite mixture over an underlying set of topic probabilities.
pLSI的原理：pLSI models each word in a document as a sample from a mixture model, where the mixture components are multinomial
random variables that can be viewed as representations of "topics".
xxx's work (pLSI) is a useful step toward probabilistic modeling of text.
为何是混合分布：A classic representation theorem due to xxx establishes that any collection of exchangeable random variables has a
representation as a mixture distribution---in general an infinite mixture.
Dirichlet带来的便利：The Dirichlet is a convenient distribution......;,these properties will facilitate the development of inference and
parameter estimation algorithms.
《Probabilistic latent semantic indexing》
The rationale is that documents which share frequently co-occurring terms will have a similar representation in the latent space, even if
they have no terms in common. LSA thus performs some sort of noise reduction (降噪，LFM也有啊) and has the potential benefit to detect
synonyms（检测同义词） as well as words that refer to the same topic.
pLSA的数学基础：since it is based on the likelihood principle and defines a proper generative model of the data.

参考：基于概率的pLSI生成模型构建步骤： In terms of a generative model it can be defined in the following way:xxxxx
《Diversity Maximization Under Matroid Constraints》
We study this problem from an algorithmic perspective as well as experimentally using simulations and a user study.
《Joint latent topic models for text and citations》
（背景）Proliferation of large electronic document collections such as the web, news articles, blogs and scientific literature in the recent
past has posed several new, interesting challenges to researchers in the data mining community. In particular, there is an increasing need
for automatic techniques to visualize, analyze and mine these document collections. In the recent past, latent topic modeling has
become very popular as a completely unsupervised technique for topic discovery in large document collections.
《A topic modeling approach and its integration into the random walk framework for academic search》
第二部分有关于LDA的介绍
LDA的缺点，不能directly对user和movie进行建模，只能对movie进行建模。
《Bipartite Networks of Wikipediaʼs Articles and Authors》
Investigations on ...... are of interest to both network and quantitative analysis studies, as well as to the social sciences.
(前面说的事脱离场景)Connecting this to the network of editors and articles ...
《A Neural Probabilistic Language Model》方法要adaptive
Here we must deal with data of variable length, like sentences, so the above approach must be adapted.
《Multi-label learning by exploiting label dependency》
From the Bayesian point of view, this problem can be reduced to model the conditional joint distribution 。。。。
we can first eliminate the influences of x in all labels, and then discover the conditional independen-cies among yk (conditioned on x) by
analyzing the errors.
http://stanford.edu/~rezab/dao/notes/lec14.pd
Notice that this objective is non-convex (because of the x T u yi term); in fact it’s NP-hard to optimize. Gradient descent can be used as an
approximate approach here, however it turns out to be slow and costs lots of iterations. Note however, that if we fix the set of variables X and
treat them as constants, then the objective is a convex function of Y and vice versa. Our approach will therefore be to fix Y and optimize X, then
fix X and optimize Y, and repreat until convergence.
ftp://ftp.cc.gatech.edu/pub/tech_reports/cse/2007/GT-CSE-07-01.pdf
《使用LFM（Latent factor model）隐语义模型开展Top-N推荐》
参数训练的公式，以及LFM的参数（《推荐系统实战》P69）。

在括号中的引用，所占空格。
逻辑，每一句话都要看其存在理由，并且这个理由不能藏着，要点明。
在Methodology用movie-user的场景来描述，如此则 item topic 和 user interest容易理解。
在Methodology里说一下latent自动包含了一些东西，呼应introduction相应的部分。
点一下实验部分用的是extensional model of LITM。
related work里加些 latent topic相关的文献，捋顺一下调理。
公式2,9，10的原理（如最大似然估计原理，Maximizing the log-posterior with respect to U and V is equivalent to minimize the sum-of-squared-
errors objective function）和过程（公式10如何得到）做补充。
时态
关键字的个数，以及每个关键字的最大单词数
重要工作是建模以及优化方法，缺一不可，要在abstract 和 introduction中提到。
vector 中的元素，是elements还是entries，要弄清楚。
topic 和 interest，在introduction中第一次出现时用看电影来举例说明。
Introduction部分有提到LFM的low interpretability（要加个出处），然后提到原因时说“hard to explain to users how a specific recommendation has
been derived”。所提模型LITM作为对LFM的改善，要提一下是怎么derive recommendation的。
引用了专著，要写页数范围。
多行的伪代码是不是要用 Lines No.1.~No.2，并且谓语用复数形式。而单行的伪代码用Line No.，谓语用单数。
参数K的设置。
《Algorithms for non-negative matrix factorization》的参考文献多了个(NIPS)
《Analyzing Entities and Topics in News Articles》
《Joint latent topic models for text and citations》
A tale about LDA2vec: when LDA meets word2vec http://www.datasciencecentral.com/profiles/blogs/a-tale-about-lda2vec-when-lda-meets-
word2vec
Although the topics look dirty enough, it is possible to label some of them with real topic names.

《Sparse Forward-Backward Using Minimum Divergence Beams for Fast Training Of Conditional Random Fields》
Learning curves for CRF training on synthetic data.
《Modeling relationships at multiple scales to improve accuracy of large recommender systems》
RMSEs of regional, factorization-based methods on Probe data, plotted against varying number of factors.
《Learning Large-Scale Conditional Random Fields》
纵坐标是 objective value
《Learning sparse CRFs for feature selection and classification of hyperspectral imagery》
图像的大标题是 Convergence of the training method.
《Using Maximum Entropy for Text Classification》
Accuracy over iterations of improved iterative scaling on the Industry Sector dataset with the full vocabulary, where it does best on this dataset.
For

LITM

More Related Content

Viewers also liked

More from jins0618

Recently uploaded

LITM