深度学习 - RNN训练过程推演

想胖的壮壮

已于 2024-06-20 11:19:19 修改

阅读量1.4k

点赞数 33

CC 4.0 BY-SA版权

文章标签：深度学习 rnn 人工智能

于 2024-06-13 17:11:50 首次发布

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_47552266/article/details/139659451

1. 数据准备

字符序列 “hello” 转换为 one-hot 编码表示：

输入: [‘h’, ‘e’, ‘l’, ‘l’]
输出: [‘e’, ‘l’, ‘l’, ‘o’]

2. 初始化参数

我们使用一个单层的 RNN（N VS N），隐藏层大小为2，每次传1个字符。初始参数如下：

$W_{xh} = \begin{pmatrix} 0.1 & 0.2 \\ 0.3 & 0.4 \\ 0.5 & 0.6 \\ 0.7 & 0.8 \end{pmatrix}, \quad W_{hh} = \begin{pmatrix} 0.1 & 0.2 \\ 0.3 & 0.4 \end{pmatrix}, \quad W_{hy} = \begin{pmatrix} 0.1 & 0.2 & 0.3 & 0.4 \\ 0.5 & 0.6 & 0.7 & 0.8 \end{pmatrix}$

偏置项初始化为0。

3. 前向传播和反向传播

时间步 1（输入 ‘h’）：

输入向量 $x_1 = [1, 0, 0, 0]$

$h_1 = \tanh(W_{xh} x_1 + W_{hh} h_0) = \tanh \left( \begin{pmatrix} 0.1 & 0.2 \\ 0.3 & 0.4 \\ 0.5 & 0.6 \\ 0.7 & 0.8 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \\ 0 \\ 0 \end{pmatrix} + \begin{pmatrix} 0.1 & 0.2 \\ 0.3 & 0.4 \end{pmatrix} \begin{pmatrix} 0 \\ 0 \end{pmatrix} \right) = \tanh \left( \begin{pmatrix} 0.1 \\ 0.3 \end{pmatrix} \right) = \begin{pmatrix} 0.0997 \\ 0.2913 \end{pmatrix}$

$y_1 = W_{hy} h_1 = \begin{pmatrix} 0.1 & 0.2 & 0.3 & 0.4 \\ 0.5 & 0.6 & 0.7 & 0.8 \end{pmatrix} \begin{pmatrix} 0.0997 \\ 0.2913 \end{pmatrix} = \begin{pmatrix} 0.1695 \\ 0.3889 \\ 0.6083 \\ 0.8277 \end{pmatrix}$

预测值 $\hat{y}_1 = \text{softmax}(y_1)$

假设真实输出为 ‘e’，对应 one-hot 编码为 $y_1 = [0, 1, 0, 0]$ 。

交叉熵损失函数：

$\text{loss}_1 = - \sum_{i} y_{1i} \log(\hat{y}_{1i})$

梯度计算：

$\frac{\partial \text{loss}_1}{\partial W_{hy}} = (\hat{y}_1 - y_1) h_1^T$

最低0.47元/天解锁文章

200万优质内容无限畅学

想胖的壮壮

博客等级

码龄5年

93
原创

2229
点赞

1987
收藏

1923
粉丝

关注

私信

热门文章

上一篇：: 深度学习 - CNN

下一篇：: NLP - word2vec详解

最新评论

WorldQuant 关键指标详解与案例解析
2401_83880694: 哥们厉害
使用LoRA微调LLaMA3
想胖的壮壮: 直接把你的代码和报错复制粘贴给AI问问，最好是国外大模型，我以前也遇到过这个问题，但是具体是怎么解决的我忘了
使用LoRA微调LLaMA3
m0_74162679: 换了一个比较小的模型Llama-3.2-1B，但仍然报错ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided ['label'] 0%| | 0/9375 [00:00<?, ?it/s]
使用LoRA微调LLaMA3
m0_74162679: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 0 has a total capacity of 23.68 GiB of which 194.69 MiB is free. Process 284193 has 23.49 GiB memory in use. Of the allocated memory 23.24 GiB is allocated by PyTorch, and 914.50 KiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) 使用的是Meta-Llama-3-8B，理论上应该可以在3090上加载，仅更改了# 假设我们在LLaMA3的某一层引入LoRA模块 # model.transformer.h[0].mlp.fc1 = LoRA(model.transformer.h[0].mlp.fc1) # 替换第 0 层 Decoder Layer 的 mlp.gate_proj 为 LoRA target_layer = model.model.layers[0].mlp.gate_proj model.model.layers[0].mlp.gate_proj = LoRA(target_layer)，为什么会报显存不够呢，求大佬回复
深度学习 - 模型剪枝技术详解
九局下半。。: 这个剪枝能提高模型推理速度吗

大家在看

最新文章

目录

展开全部

收起

评论 1

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。