This document summarizes algorithms for large-scale data mining using MapReduce, including:
1) Information retrieval algorithms like distributed grep, calculating URL access frequency, and constructing the reverse web link graph.
2) Graph algorithms like PageRank, which is computed through an iterative process of message passing between nodes.
3) Clustering algorithms like canopy clustering, which uses two distance thresholds to create overlapping clusters in a single pass over the data.