the line: self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_)) should be: self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1)