Cannon's algorithm

Next: Sorting Up: Matrix multiplication Previous: Recursive subdivision

Cannon's algorithm

There are many more matrix multiplications, and some that are slightly more efficient than those presented here. We now consider (last but not least) the implementation of Cannon's algorithm (1969).

Cannon's algorithm uses a mesh of processes that are connected as a torus. Process at location initially begins with submatrices $\mathsf{A}_{i,j}$ and $\mathsf{B}_{i,j}$ . As the algorithm progresses, the submatrices are passed left and upwards.

$\includegraphics[width=2in]{matrix-cannon.eps}$

Initially $P_{i,j}$ begins with $\mathsf{A}_{i,j}$ and $\mathsf{B}_{i,j}$ .
Elements are moved from their initial positions to align them so that the correct submatrices are multiplied with one another. Note that submatrices on the diagonal don't actually require alignment. Alignment is done by shifting the -th row of $\mathsf{A}$ positions left and the -th column of $\mathsf{B}$ positions up.
Each process, $P_{i,j}$ multiplies its current submatrices and adds to a cumulative sum.
The submatrices are shifted left and upwards.
The above two steps are repeated through the remaining submatrices.

The total information received by a process is bytes to receive the starting submatrix, at most bytes during the alignment phase, and then another bytes during the computation. The root process must gather bytes from each process to complete the result. This is a total of

$\displaystyle (2m^2+4(s-1)m^2)s^2L\approx(2+4s)n^2$

Cannon's algorithm is suitable when the inputs/outputs of a matrix operation are generated/used locally. In this case, the scatter and gather operations are not required.

New matrix multiplications algorithms have continued to be published into the mid 1990's.

Next: Sorting Up: Matrix multiplication Previous: Recursive subdivision

Aaron HARWOOD 2003-10-22