CMU 11-785 L02 What can a network represent

本文探讨了神经网络的基本概念,包括感知器、阈值单元、激活函数等,并深入讲解了多层感知器的结构与功能。通过分析深度学习网络的特性,如布尔函数的通用表达能力、网络深度的重要性以及神经元数量的需求,揭示了深度学习在网络容量、逼近能力和分类任务中的关键作用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Preliminary

Perceptron

  • Threshold unit
    • Fires” if the weighted sum of inputs exceeds a threshold
  • Soft perceptron
    • Using sigmoid function instead of a threshold at the output
    • Activation: The function that acts on the weighted combination of inputs (and threshold)
  • Affine combination
    • Different from Linear combination: the result of mapping zero is not zero.

Multi-layer perceptron

  • Depth
    • Is the length of the longest path from a source to a sink
    • Deep: Depth greater than 2
  • Inputs/Outputs are real or Boolean stimuli
  • What can this network compute?

Universal Boolean functions

  • A perceptron can model any simple binary Boolean gate
    • Using weight 1 or -1 to model function
    • The universal AND gate: (⋀i=1LXi)∧(⋀i=L+1NXˉi)(\bigwedge_{i=1}^{L} X_{i}) \wedge(\bigwedge_{i=L+1}^{N} \bar{X}_{i})(i=1LXi)(i=L+1NXˉi)
    • The universal OR gate: (⋁i=1LXi)∨(⋁i=L+1NXˉi)(\bigvee_{i=1}^{L} X_{i}) \vee(\bigvee_{i=L+1}^{N} \bar{X}_{i})(i=1LXi)(i=L+1NXˉi)
    • Cannot compute an XOR
  • MLPs can compute the XOR

在这里插入图片描述

  • MLPs are universal Boolean functions

    • Can compute any Boolean function
  • A Boolean function is just a truth table

    • So expressed the result in disjunctive normal form, like

    • Y=Xˉ1Xˉ2X3X4Xˉ5+Xˉ1X2Xˉ3X4X5+Xˉ1X2X3Xˉ4Xˉ5+X1Xˉ2Xˉ3Xˉ4X5+X1Xˉ2X3X4X5+X1X2Xˉ3Xˉ4X5 \begin{aligned} Y=& \bar{X}_{1} \bar{X}_{2} X_{3} X_{4} \bar{X}_{5}+\bar{X}_{1} X_{2} \bar{X}_{3} X_{4} X_{5}+\bar{X}_{1} X_{2} X_{3} \bar{X}_{4} \bar{X}_{5}+\\ & X_{1} \bar{X}_{2} \bar{X}_{3} \bar{X}_{4} X_{5}+X_{1} \bar{X}_{2} X_{3} X_{4} X_{5}+X_{1} X_{2} \bar{X}_{3} \bar{X}_{4} X_{5} \end{aligned} Y=Xˉ1Xˉ2X3X4Xˉ5+Xˉ1X2Xˉ3X4X5+Xˉ1X2X3Xˉ4Xˉ5+X1Xˉ2Xˉ3Xˉ4X5+X1Xˉ2X3X4X5+X1X2Xˉ3Xˉ4X5

    • In this case, need 5 neurons in the hidden layer.

Need for depth

  • A one-hidden-layer MLP is a Universal Boolean Function

    • But the largest number of perceptrons is expontial: 2N2^N2N
  • How about depth?

    • Will require 3(N−1)3(N-1)3(N1) perceptrons, linear in NNN to express the same function
    • Using associatable rules, can be arranged in 2log⁡2N2\log_2 N2log2N layers
    • eg. model O=W⊕X⊕Y⊕ZO=W \oplus X \oplus Y \oplus ZO=WXYZ

在这里插入图片描述

  • The challenge of depth

    • Using only KKK hidden layers will require O(2CN)O(2^{CN})O(2CN) neurons in the KKKth layer, where C=2−(k−1)/2C = 2^{-(k-1)/2}C=2(k1)/2
    • A network with fewer than the minimum required number of neurons cannot model the function

Universal classifiers

  • Composing complicated “decision” boundaries

在这里插入图片描述

  • Using OR to create more decision boundaries
    • Can compose arbitrarily complex decision boundaries
    • Even using one-layer MLP

Need for depth

  • A naïve one-hidden-layer neural network will required infinite hidden neurons
  • Construct basic unit and add more layers to decrese #neurons
  • The number of neurons required in a shallow network is potentially exponential in the dimensionality of the input

Universal approximators

  • A one-layer MLP can model an arbitrary function of a single input
  • MLPs can actually compose arbitrary functions in any number of dimensions
    • Even without “activation”
  • Activation
    • A universal map from the entire domain of input values to the entire range of the output activation

Optimal depth and width

  • Deeper networks will require far fewer neurons for the same approximation error
  • Sufficiency of architecture
    • Not all architectures can represent any function
  • Continuous activation functions result in graded output at the layer
    • To capture information “missed” by the lower layer

Width vs. Activations vs. Depth

  • Narrow layers can still pass information to subsequent layers if the activation function is sufficiently graded
    • But will require greater depth, to permit later layers to capture patterns
  • Capacity of the network
    • Information or Storage: how many patterns can it remember
    • VC dimension: bounded by the square of the number of …weights… in the network
    • Straight forward: largest number of disconnected convex regions it can represent
  • A network with insufficient capacity cannot exactly model a function that requires a greater minimal number of convex hulls than the capacity of the network
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值