[0.010.900.010.010.010.010.010.010.010.020.910.010.010.010.010.010.010.010.010.010.900.010.010.010.010.010.010.010.010.02]
\begin{gathered}
\begin{bmatrix}
0.01 & 0.90 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.02
\\
0.91 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01
\\
0.90 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.01 & 0.02
\end{bmatrix}
\end{gathered}
⎣⎡0.010.910.900.900.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.010.020.010.02⎦⎤
E = - [ 1log0.90 + 1log0.01 + 1 * log0.90 ] / 3
PS 相关实现详见教材
关于y[np.arange(batch_size), t]含义的解释:
假设batch_size=3,这里np.arange(batch_size)会生成数组[0,1 ,2]。之后[np.arange(batch_size), t]会将[0,1,2]和t中正确的解数字[1,3,0]进行对应组合,得到坐标(0,1),(1,3),(2,0)。最后用这些坐标到y矩阵里获取对应位置的元素值,比如y(0,1)取第一行第二列的值0.90。所以这行代码会生成numpy数组[0.90,0.01,0.90]。之后用这个数组取对数,求和,除于数据量,取负号。
总结:殊途同归,最后都是用[0.90,0.01,0.90]计算。采用one-hot时,t里面的1对应的位置提供了行坐标和列坐标去取y里面的值;采用非one-hot时,np.arange(batch_size)提供了行坐标,t提供列坐标,组合之后去取y里面的值。所取到的数据都是一样的。