Skip to main content
Cornell University

In just 5 minutes help us improve arXiv:

Annual Global Survey
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > cs > arXiv:1607.08220

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1607.08220 (cs)
[Submitted on 27 Jul 2016]

Title:PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures

Authors:Md. Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jialin Liu, Peter Sadowski, Evan Racah, Suren Byna, Craig Tull, Wahid Bhimji, Prabhat, Pradeep Dubey
View a PDF of the paper titled PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures, by Md. Mostofa Ali Patwary and 10 other authors
View PDF
Abstract:Computing $k$-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based $O(\log n)$ algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited scalability for Big Data analytics challenges in the scientific domain. In this paper, we present parallel and highly optimized kd-tree based KNN algorithms (both construction and querying) suitable for distributed architectures. Our algorithm includes novel approaches for pruning search space and improving load balancing and partitioning among nodes and threads. Using TB-sized datasets from three science applications: astrophysics, plasma physics, and particle physics, we show that our implementation can construct kd-tree of 189 billion particles in 48 seconds on utilizing $\sim$50,000 cores. We also demonstrate computation of KNN of 19 billion queries in 12 seconds. We demonstrate almost linear speedup both for shared and distributed memory computers. Our algorithms outperforms earlier implementations by more than order of magnitude; thereby radically improving the applicability of our implementation to state-of-the-art Big Data analytics problems. In addition, we showcase performance and scalability on the recently released Intel Xeon Phi processor showing that our algorithm scales well even on massively parallel architectures.
Comments: 11 pages in PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures, Md. Mostofa Ali Patwary this http URL., IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as: arXiv:1607.08220 [cs.DC]
  (or arXiv:1607.08220v1 [cs.DC] for this version)
  https://doi.org/10.48550/arXiv.1607.08220
arXiv-issued DOI via DataCite
Related DOI: https://doi.org/10.1109/IPDPS.2016.57
DOI(s) linking to related resources

Submission history

From: Mostofa Patwary [view email]
[v1] Wed, 27 Jul 2016 19:13:07 UTC (2,790 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures, by Md. Mostofa Ali Patwary and 10 other authors
  • View PDF
  • TeX Source
view license
Current browse context:
cs.DC
< prev   |   next >
new | recent | 2016-07
Change to browse by:
cs

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

listing | bibtex
Md. Mostofa Ali Patwary
Nadathur Rajagopalan Satish
Narayanan Sundaram
Jialin Liu
Peter J. Sadowski
…
export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)

Code, Data and Media Associated with this Article

alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)

Demos

Replicate (What is Replicate?)
Hugging Face Spaces (What is Spaces?)
TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
  • Author
  • Venue
  • Institution
  • Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status