Efficient Hardware Architectures for 1D- and MD-LSTM Networks

Table 3 Comparison to FPGA and GPU implementations processing MNIST, DIBCO 2017 datasets.

	Name	Platform	Model, [bits]	Params, [M]	Ops.^a, [M]	Acc., [%]	Freq., [MHz]	𝜃, [GOp/s]	kFPS^b	P_chip, [W]	P_board, [W]	\(\frac {kFPS}{P_{chip}}\)	\(\frac {kFPS}{P_{board}}\)
MNIST	FINN [53]	ZC706	1	3.34E-1	6.69E-1	95.83	200	8265.45	12361	7.30	21.2	1.69E + 3	5.83E + 2
	FINN [53]	ZC706	1	2.91E + 0	5.82E + 0	98.40	200	9085.67	1561	8.80	22.6	1.77E + 2	6.91E + 1
	BNN [33]	Stratix-V 5SGSD8	1	1.00E + 1	2.00E + 1	98.32	150	12219.40	610.36	-	26.2	-	2.33E + 1
	TNN [4]	Kintex-7 160T	2	1.99E-1	3.97E-1	97.76	200	101.28	255.102	0.32	-	7.97E + 2	-
	TNN [4]	Kintex-7 160T	2	1.72E + 0	3.44E + 0	98.33	200	877.81	255.102	2.86	-	8.92E + 1	-
	[42]	ZC706	3	2.90E + 0	5.80E + 0	98.92	172	384.16	66.255	4.98	11.4	1.41E + 1	4.98E + 0
	This work	ZCU102	1	6.39E-1	2.77E + 1	99.37	300	8710.28	314.82	13.20	39.3	2.39E + 1	8.01E + 0
	This work^D	Tesla K80	32F	6.39E-1	2.77E + 1	99.46	-	239.83	8.66	273.85	-	3.16E-2	-
	This work^P	Tesla K80	35F	6.39E-1	2.77E + 1	99.46	-	103.33	3.73	193.21	-	1.93E-2	-
DIBCO	This work	ZCU102	4/8	6.75E-2	1.35E-1	87.54	240	3618.23	6.53	15.47	43.6	4.22E-1	1.50E-1
	This work^D	Tesla K80	32F	6.75E-2	1.35E-1	88.00	-	319.27	0.58	247.47	-	2.34E-3	-
	This work^P	Tesla K80	35F	6.75E-2	1.35E-1	88.00	-	101.91	0.18	183.11	-	9.83E-4	-

^a indicates number of operations per 28×28 image for MNIST, and number of operations per pixel for DIBCO
^b taking into account 28×28 images in the case of MNIST, and 64×64 patches in the case of DIBCO
^D diagonal-wise order of execution.
^P pixel-by-pixel order of execution.