CMU 11-785 L02 What can a network represent_mlps with static parameterization can represent ar-CSDN博客

本文探讨了神经网络的基本概念，包括感知器、阈值单元、激活函数等，并深入讲解了多层感知器的结构与功能。通过分析深度学习网络的特性，如布尔函数的通用表达能力、网络深度的重要性以及神经元数量的需求，揭示了深度学习在网络容量、逼近能力和分类任务中的关键作用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Perceptron

Threshold unit
- “Fires” if the weighted sum of inputs exceeds a threshold
Soft perceptron
- Using sigmoid function instead of a threshold at the output
- Activation: The function that acts on the weighted combination of inputs (and threshold)
Affine combination
- Different from Linear combination: the result of mapping zero is not zero.

Multi-layer perceptron

Depth
- Is the length of the longest path from a source to a sink
- Deep: Depth greater than 2
Inputs/Outputs are real or Boolean stimuli
What can this network compute?

A perceptron can model any simple binary Boolean gate
- Using weight 1 or -1 to model function
- The universal AND gate: $(⋀i=1LXi)∧(⋀i=L+1NXˉi)(\bigwedge_{i=1}^{L} X_{i}) \wedge(\bigwedge_{i=L+1}^{N} \bar{X}_{i})$
- The universal OR gate: $(⋁i=1LXi)∨(⋁i=L+1NXˉi)(\bigvee_{i=1}^{L} X_{i}) \vee(\bigvee_{i=L+1}^{N} \bar{X}_{i})$
- Cannot compute an XOR
MLPs can compute the XOR

在这里插入图片描述

MLPs are universal Boolean functions
- Can compute any Boolean function
A Boolean function is just a truth table
- So expressed the result in disjunctive normal form, like
- $\begin{aligned} Y=& \bar{X}_{1} \bar{X}_{2} X_{3} X_{4} \bar{X}_{5}+\bar{X}_{1} X_{2} \bar{X}_{3} X_{4} X_{5}+\bar{X}_{1} X_{2} X_{3} \bar{X}_{4} \bar{X}_{5}+\\ & X_{1} \bar{X}_{2} \bar{X}_{3} \bar{X}_{4} X_{5}+X_{1} \bar{X}_{2} X_{3} X_{4} X_{5}+X_{1} X_{2} \bar{X}_{3} \bar{X}_{4} X_{5} \end{aligned}$
- In this case, need 5 neurons in the hidden layer.

A one-hidden-layer MLP is a Universal Boolean Function
- But the largest number of perceptrons is expontial: $2^N$
How about depth?
- Will require $3 (N - 1)$ perceptrons, linear in $N$ to express the same function
- Using associatable rules, can be arranged in $2\log_2 N$ layers
- eg. model $\oplus X \oplus Y \oplus Z$

在这里插入图片描述

The challenge of depth
- Using only $K$ hidden layers will require $O(2^{CN})$ neurons in the $K$ th layer, where $C = 2^{-(k-1)/2}$
- A network with fewer than the minimum required number of neurons cannot model the function

在这里插入图片描述

Using OR to create more decision boundaries
- Can compose arbitrarily complex decision boundaries
- Even using one-layer MLP

A naïve one-hidden-layer neural network will required infinite hidden neurons
Construct basic unit and add more layers to decrese #neurons
The number of neurons required in a shallow network is potentially exponential in the dimensionality of the input

A one-layer MLP can model an arbitrary function of a single input
MLPs can actually compose arbitrary functions in any number of dimensions
- Even without “activation”
Activation
- A universal map from the entire domain of input values to the entire range of the output activation

Deeper networks will require far fewer neurons for the same approximation error
Sufficiency of architecture
- Not all architectures can represent any function
Continuous activation functions result in graded output at the layer
- To capture information “missed” by the lower layer

Narrow layers can still pass information to subsequent layers if the activation function is sufficiently graded
- But will require greater depth, to permit later layers to capture patterns
Capacity of the network
- Information or Storage: how many patterns can it remember
- VC dimension: bounded by the square of the number of …weights… in the network
- Straight forward: largest number of disconnected convex regions it can represent
A network with insufficient capacity cannot exactly model a function that requires a greater minimal number of convex hulls than the capacity of the network