The document discusses methods for estimating the probability of unseen word pairs by using information from similar words. It compares four similarity-based estimation methods: KL divergence, total divergence to average, L1 norm, and confusion probability. These are evaluated against Katz's back-off scheme and maximum likelihood estimation (MLE). The total divergence to average method is found to perform the best, estimating probabilities of unseen word pairs up to 40% better than back-off and MLE methods. It works by measuring the similarity between words and combining information from the most similar words, weighted by their similarity.