Pytorch实现多层LSTM模型,并增加emdedding、Dropout、权重共享等优化

简述

本文是 Pytorch封装简单RNN模型,进行中文训练及文本预测 一文的延申,主要做以下改动:

1.将nn.RNN替换为nn.LSTM,并设置多层LSTM:

既然使用pytorch了,自然不需要手动实现多层,注意nn.RNNnn.LSTM 在实例化时均有参数num_layers来指定层数,本文设置num_layers=2

2.新增emdedding层,替换掉原来的nn.functional.one_hot向量化,这样得到的emdedding层可以用来做词向量分布式表示;

3.在emdedding后、LSTM内部、LSTM后均增加Dropout层,来抑制过拟合:

nn.LSTM内部的Dropout可以通过实例化时的参数dropout来设置,需要注意pytorch仅在两层lstm之间应用Dropout,不会在最后一层的LSTM输出上应用Dropout

emdedding后、LSTM后与线性层之间则需要手动添加Dropout层。

4.考虑emdedding与最后的Linear层共享权重:

这样做可以在保证精度的情况下,减少学习参数,但本文代码没有实现该部分。

不考虑第四条时,模型结构如下:

在这里插入图片描述

代码

模型代码:

class MyLSTM(nn.Module):  
    def __init__(self, vocab_size, wordvec_size, hidden_size, num_layers=2, dropout=0.5):  
        super(MyLSTM, self).__init__()  
        self.vocab_size = vocab_size  
        self.word_vec_size = wordvec_size  
        self.hidden_size = hidden_size  
  
        self.embedding = nn.Embedding(vocab_size, wordvec_size)  
        self.dropout = nn.Dropout(dropout)  
        self.rnn = nn.LSTM(wordvec_size, hidden_size, num_layers=num_layers, dropout=dropout)  
        # self.rnn = rnn_layer  
        self.linear = nn.Linear(self.hidden_size, vocab_size)  
  
    def forward(self, x, h0=None, c0=None):  
        # nn.Embedding 需要的类型 (IntTensor or LongTensor)        # 传过来的X是(batch_size, seq), embedding之后 是(batch_size, seq, vocab_size)  
        # nn.LSTM 支持的X默认为(seq, batch_size, vocab_size)  
        # 若想用(batch_size, seq, vocab_size)作参数, 则需要在创建self.embedding实例时指定batch_first=True  
        # 这里用(seq, batch_size, vocab_size) 作参数,所以先给x转置,再embedding,以便再将结果传给lstm  
        x = x.T  
        x.long()  
        x = self.embedding(x)  
  
        x = self.dropout(x)  
        outputs, (h0, c0) = self.rnn(x, (h0, c0))
  
        outputs = self.dropout(outputs)  
  
        outputs = outputs.reshape(-1, self.hidden_size)  
  
        outputs = self.linear(outputs)  
        return outputs, (h0, c0)  
  
    def init_state(self, device, batch_size=1):  
        return (torch.zeros((self.rnn.num_layers, batch_size, self.hidden_size), device=device),  
                torch.zeros((self.rnn.num_layers, batch_size, self.hidden_size), device=device))

训练代码:

模型应用可以参考 Pytorch封装简单RNN模型,进行中文训练及文本预测 一文。

def start_train():  
    # device = torch.device("cpu")  
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")  
    print(f'\ndevice: {device}')  
  
    corpus, vocab = load_corpus("../data/COIG-CQIA/chengyu_qa.txt")  
  
    vocab_size = len(vocab)  
    wordvec_size = 100  
    hidden_size = 256  
    epochs = 1  
    batch_size = 50  
    learning_rate = 0.01  
    time_size = 4  
    max_grad_max_norm = 0.5  
    num_layers = 2  
    dropout = 0.5  
  
    dataset = make_dataset(corpus=corpus, time_size=time_size)  
    data_loader = data.DataLoader(dataset=dataset, batch_size=batch_size, shuffle=True)  
  
    net = MyLSTM(vocab_size=vocab_size, wordvec_size=wordvec_size, hidden_size=hidden_size, num_layers=num_layers, dropout=dropout)  
    net.to(device)  
  
    # print(net.state_dict())  
  
    criterion = nn.CrossEntropyLoss()  
    criterion.to(device)  
    optimizer = optim.Adam(net.parameters(), lr=learning_rate)  
  
    writer = SummaryWriter('./train_logs')  
    # 随便定义个输入, 好使用add_graph  
    tmp = torch.randint(0, 100, size=(batch_size, time_size)).to(device)  
    h0, c0 = net.init_state(batch_size=batch_size, device=device)  
    writer.add_graph(net, [tmp, h0, c0])  
  
    loss_counter = 0  
    total_loss = 0  
    ppl_list = list()  
    total_train_step = 0  
  
    for epoch in range(epochs):  
        print('------------Epoch {}/{}'.format(epoch + 1, epochs))  
  
        for X, y in data_loader:  
            X, y = X.to(device), y.to(device)  
            # 这里batch_size=X.shape[0]是因为在加载数据时, DataLoader没有设置丢弃不完整的批次, 所以存在实际批次不满足设定的batch_size  
            h0, c0 = net.init_state(batch_size=X.shape[0], device=device)  
            outputs, (hn, cn) = net(X, h0, c0)  
            optimizer.zero_grad()  
            # y也变成 时间序列*批次大小的行数, 才和 outputs 一致  
            y = y.T.reshape(-1)  
            # 交叉熵的第二个参数需要LongTorch  
            loss = criterion(outputs, y.long())  
            loss.backward()  
            # 求完梯度之后可以考虑梯度裁剪, 再更新梯度  
            grad_clipping(net, max_grad_max_norm)  
            optimizer.step()  
  
            total_loss += loss.item()  
            loss_counter += 1  
            total_train_step += 1  
            if total_train_step % 10 == 0:  
                print(f'Epoch: {epoch + 1}, 累计训练次数: {total_train_step}, 本次loss: {loss.item():.4f}')  
                writer.add_scalar('train_loss', loss.item(), total_train_step)  
  
        ppl = np.exp(total_loss / loss_counter)  
        ppl_list.append(ppl)  
        print(f'Epoch {epoch + 1} 结束, batch_loss_average: {total_loss / loss_counter}, perplexity: {ppl}')  
        writer.add_scalar('ppl', ppl, epoch + 1)  
        total_loss = 0  
        loss_counter = 0  
  
        torch.save(net.state_dict(), './save/epoch_{}_ppl_{}.pth'.format(epoch + 1, ppl))  
  
    writer.close()  
    return net, ppl_list
### 使用PyTorch实现LSTM模型进行时间序列预测 #### 时间序列数据预处理 对于时间序列预测任务而言,准备合适的数据集至关重要。通常情况下,原始的时间序列数据需要经过一系列转换才能被送入LSTM模型中训练。这包括但不限于: - **标准化/归一化**:为了提高收敛速度以及防止梯度爆炸或消失现象的发生,在输入特征上应用线性变换使得它们具有零均值和单位方差是非常重要的操作之一[^1]。 - **创建滑动窗口样本**:由于LSTM接受固定长度的历史观测作为输入以预测未来时刻的结果,因此需构建由连续时间段组成的小批量样本来供网络学习模式规律。例如给定一个包含每日气温记录的列表`[t_0, t_1,... ,t_n]`, 可能会构造出形如`(X=[t_i,t_(i+1),...,t_(i+w)], y=t_(i+w))`这样的成对组合用于监督式学习中的自回归建模,其中w代表选定的时间步宽[^2]。 ```python import torch from sklearn.preprocessing import StandardScaler def create_sequences(data, seq_length): xs, ys = [], [] for i in range(len(data)-seq_length): x = data[i:(i+seq_length)] y = data[i+seq_length] xs.append(x) ys.append(y) return torch.tensor(xs).float(), torch.tensor(ys).float() scaler = StandardScaler() scaled_data = scaler.fit_transform(raw_series.reshape(-1, 1)) sequence_length = 50 X_train, Y_train = create_sequences(scaled_data[:split_point], sequence_length) ``` #### 构造LSTM模型结构 定义好所需参数之后就可以着手建立具体的架构了。这里给出一段基础版代码片段展示怎样借助于PyTorch框架快速搭建起具备单层隐藏单元数量可配置特性的LSTM实例对象;当然实际项目里往往还会涉及到更复杂的组件设计比如堆叠多层、引入Dropout机制防过拟合等优化措施[^3]。 ```python class LSTMModel(torch.nn.Module): def __init__(self, input_dim=1, hidden_dim=50, output_dim=1, num_layers=1): super().__init__() self.hidden_dim = hidden_dim self.lstm = torch.nn.LSTM(input_dim, hidden_dim, batch_first=True, num_layers=num_layers) self.fc = torch.nn.Linear(hidden_dim, output_dim) def forward(self, x): h_t, _ = self.lstm(x) out = self.fc(h_t[:, -1, :]) return out.unsqueeze(dim=-1) model = LSTMModel().cuda() if use_cuda else LSTMModel() loss_fn = torch.nn.MSELoss(reduction='mean') optimizer = torch.optim.Adam(model.parameters()) ``` #### 训练过程控制逻辑 最后一步就是编写循环迭代语句完成整个端到端的学习流程啦!值得注意的是除了常规意义上的前向传播计算损失反传更新权重之外,还应该考虑加入早停策略(Early Stopping)、动态调整学习率(Learning Rate Scheduler)等功能来进一步提升泛化能力与效率表现。 ```python for epoch in range(num_epochs): model.train(True) running_loss = 0. for X_batch, y_batch in train_loader: optimizer.zero_grad(set_to_none=True) outputs = model(X_batch.cuda()) if use_cuda else model(X_batch) loss = loss_fn(outputs, y_batch.view_as(outputs)) loss.backward() optimizer.step() running_loss += loss.item()*len(X_batch)/dataset_size print(f'Epoch {epoch}: Training Loss={running_loss:.4f}') ```
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值