This document summarizes benchmarking the MinHash and Locality Sensitive Hashing (LSH) algorithm for calculating pairwise similarity on Reddit post data in Spark. The MinHash algorithm was used to reduce the dimensionality of the data before applying LSH to further reduce dimensionality and find similar items. Benchmarking showed that MinHash+LSH was significantly faster than a brute force approach, calculating similarities in 7.68 seconds for 100k entries compared to 9.99 billion seconds for brute force. Precision was lower for MinHash+LSH at 0.009 compared to 1 for brute force, but recall was higher at 0.036 compared to vanishingly small for brute force. The techniques were also applied to a real-time streaming