A simple work-optimal broadcast algorithm for message-passing parallel systems

JL Träff - European Parallel Virtual Machine/Message Passing …, 2004 - Springer
JL Träff
European Parallel Virtual Machine/Message Passing Interface Users' Group Meeting, 2004Springer
In this note we give a simple bandwidth-and latency optimal algorithm for the problem of
broadcasting m units of data from a distinguished root processor to all p–1 other processors
in one-ported (hypercubic) message-passing systems. Assuming linear, uniform
communication cost, the time for the broadcast to complete is O (m+ log2 p), more precisely
no processor is involved in more than⌈ log2 p⌉ communication operations (send, receive,
and send-receive), and for any constant message size thresholdb each processor (except …
Abstract
In this note we give a simple bandwidth- and latency optimal algorithm for the problem of broadcasting m units of data from a distinguished root processor to all p–1 other processors in one-ported (hypercubic) message-passing systems. Assuming linear, uniform communication cost, the time for the broadcast to complete is O(m+log2p), more precisely no processor is involved in more than ⌈log2p⌉ communication operations (send, receive, and send-receive), and for any constant message size thresholdb each processor (except the root) sends at most mb′+( ⌈log2p⌉–ℓ)b′ units of data, where b′ is determined by the smallest ℓ≤ ⌈log2p⌉ such that b′=m/2b (the root sends 2mb′+( ⌈log2p⌉ –ℓ)b′ units of data). Non-root processors receive m units of data.
Building on known ideas, the salient features of the algorithm presented here is its simplicity of implementation, and smooth transition from latency to bandwidth dominated performance as data size m increases. The implementation performs very well in practice.
Springer
Showing the best result for this search. See all results