[译] TF-api(3) tf.nn.softmax_cross_entropy_with

本文详细介绍了tf.nn.softmax_cross_entropy_with_logits函数的工作原理及其内部实现细节，包括输入输出参数说明、数学公式解释及代码示例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

tf.nn.softmax_cross_entropy_with_logits

args:

_sentinel: Used to prevent positional parameters. Internal, do not use.

从源码里面来看，这个参数的目的是不让用，因为如果你给它传了值，它会raise一个error出来。所以在传值的时候要指定logits和labels，就是帮助你别写错代码的。

labels: Each row labels[i] must be a valid probability distribution.

这是标签，一般是one hot的表示形式

logits: Unscaled log probabilities.

这个是输入tensor，也就是模型最后一层全连接的输出，shape一般是[batch_size, nb_class]。

dim: The class dimension. Defaulted to -1 which is the last dimension.

这个dim表示的是nb_class

name: A name for the operation (optional).

tf.nn.softmax_cross_entropy_with_logits()这个api实际上等同于tf.nn.softmax(),tf.log(),以及tf.reduce_sum()的组合。这个api实现的步骤主要分为三步：

将input用softmax概率归一化：

$p (x j) = e x j \sum n i = 1 e x i$ $p(x_j) = \frac{e^{x_j}}{\sum_{i=1}^{n}e^{x_i}}$
与标签做交叉熵：

$h (x j) = - y l a b e l log p (x j)$ $h(x_j) = -y_{label}\log p(x_j)$
将数据轴的数据进行加和：

$H (x) = t f . r e d u c e s u m (h (x), a x i s = 1)$ $H(x) = tf.reduce_sum(h(x),axis=1)$
代码验证：

import tensorflow as tf

data = [[i for i in range(4)] for j in range(4)]
label = [[0., 0., 1., 0.] for j in range(4)]
x = tf.constant(value=data, dtype=tf.float32)
y_label = tf.constant(value=label, dtype=tf.float32)

H1 = tf.nn.softmax_cross_entropy_with_logits(logits=x, labels=y_label)

p = tf.nn.softmax(x)
h = -y_label * tf.log(p)
H2 = tf.reduce_sum(h, axis=1)

with tf.Session() as sess:
    print(sess.run(H1))
    print(sess.run(H2))