Table 3 Comparison to FPGA and GPU implementations processing MNIST, DIBCO 2017 datasets.
From: Efficient Hardware Architectures for 1D- and MD-LSTM Networks
Name | Platform | Model, [bits] | Params, [M] | Ops.a, [M] | Acc., [%] | Freq., [MHz] | 𝜃, [GOp/s] | kFPSb | Pchip, [W] | Pboard, [W] | \(\frac {kFPS}{P_{chip}}\) | \(\frac {kFPS}{P_{board}}\) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MNIST | FINN [53] | ZC706 | 1 | 3.34E-1 | 6.69E-1 | 95.83 | 200 | 8265.45 | 12361 | 7.30 | 21.2 | 1.69E + 3 | 5.83E + 2 |
FINN [53] | ZC706 | 1 | 2.91E + 0 | 5.82E + 0 | 98.40 | 200 | 9085.67 | 1561 | 8.80 | 22.6 | 1.77E + 2 | 6.91E + 1 | |
BNN [33] | Stratix-V 5SGSD8 | 1 | 1.00E + 1 | 2.00E + 1 | 98.32 | 150 | 12219.40 | 610.36 | - | 26.2 | - | 2.33E + 1 | |
TNN [4] | Kintex-7 160T | 2 | 1.99E-1 | 3.97E-1 | 97.76 | 200 | 101.28 | 255.102 | 0.32 | - | 7.97E + 2 | - | |
TNN [4] | Kintex-7 160T | 2 | 1.72E + 0 | 3.44E + 0 | 98.33 | 200 | 877.81 | 255.102 | 2.86 | - | 8.92E + 1 | - | |
[42] | ZC706 | 3 | 2.90E + 0 | 5.80E + 0 | 98.92 | 172 | 384.16 | 66.255 | 4.98 | 11.4 | 1.41E + 1 | 4.98E + 0 | |
This work | ZCU102 | 1 | 6.39E-1 | 2.77E + 1 | 99.37 | 300 | 8710.28 | 314.82 | 13.20 | 39.3 | 2.39E + 1 | 8.01E + 0 | |
This workD | Tesla K80 | 32F | 6.39E-1 | 2.77E + 1 | 99.46 | - | 239.83 | 8.66 | 273.85 | - | 3.16E-2 | - | |
This workP | Tesla K80 | 35F | 6.39E-1 | 2.77E + 1 | 99.46 | - | 103.33 | 3.73 | 193.21 | - | 1.93E-2 | - | |
DIBCO | This work | ZCU102 | 4/8 | 6.75E-2 | 1.35E-1 | 87.54 | 240 | 3618.23 | 6.53 | 15.47 | 43.6 | 4.22E-1 | 1.50E-1 |
This workD | Tesla K80 | 32F | 6.75E-2 | 1.35E-1 | 88.00 | - | 319.27 | 0.58 | 247.47 | - | 2.34E-3 | - | |
This workP | Tesla K80 | 35F | 6.75E-2 | 1.35E-1 | 88.00 | - | 101.91 | 0.18 | 183.11 | - | 9.83E-4 | - |