SlideShare a Scribd company logo
Sequence Model
Orozco Hsu
2024-04-17
1
About me
• Education
• NCU (MIS)、NCCU (CS)
• Experiences
• Telecom big data Innovation
• Retail Media Network (RMN)
• Customer Data Platform (CDP)
• Know-your-customer (KYC)
• Digital Transformation
• Research
• Data Ops (ML Ops)
• Business Data Analysis, AI
2
Tutorial
Content
3
Homework
Sequence Models
Sequence Data & Sequence Models
• Vanilla RNN (Recurrent Neural Network)
• LSTM (Long short term memory)
• GRU (Gated recurrent unit)
• Transformer Model
Pytorch tutorial
• Pytorch Tutorials
• Welcome to PyTorch Tutorials — PyTorch Tutorials 2.2.1+cu121
documentation
4
Sequence Data: Types
• Sequence Data:
• The order of elements is significant.
• It can have variable lengths. In natural language, sentences can be of different
lengths, and in genomics, DNA sequences can vary in length depending on the
organism.
5
Sequence Data: Examples
• Examples:
• Image Captioning.
6
Sequence Data: Examples
• Examples:
• Speech Signals.
7
Sequence Data: Examples
• Examples:
• Time Series Data, such as time stamped transactional data.
8
Lagged features can help you capture the patterns, trends, and seasonality
in your time series data, as well as the effects of external factors or events.
• Examples:
• Language Translation (Natural Language Text).
• Chatbot.
• Text summarization.
• Text categorization.
• Parts of speech tagging.
• Stemming.
• Text mining.
Sequence Data: Examples
9
補充
• Pytorch 模型 forward 練習
10
補充
• RNNs 輸入輸出
11
RNN — PyTorch 2.2 documentation
補充
• 激活值計算
12
Sequence Models: applications
• Sequence models are a class of machine learning models designed for
tasks that involve sequential data, where the order of elements in the
input is important.
• Model applications:
13
one to one: Fixed length input/output, a general neural network model.
one to many: Image captioning.
many to one: Sentiment analysis.
many to many: machine translation.
Sequence Model: Recurrent Neural Networks (RNNs)
• RNNs are a fundamental type of sequence model.
• They process sequences one element at a time sequentially while
maintaining an internal hidden state that stores information about
previous elements in the sequence.
• Traditional RNNs suffer from the Vanishing gradient problem, which
limits their ability to capture long-range dependencies.
14
Sequence Model: Recurrent Neural Networks (RNNs)
15
• Stacked RNNs also called Deep RNNs.
• The hidden state is responsible for
memorizing the information from the
previous timestep and using that for
further adjustment of weights in
Training a model.
16
Sequence Model: Recurrent Neural Networks (RNNs)
If your data sequence is short then don’t use more that 2-3
layers because un-necessarily extra training time may lead to
make your model un-optimized.
Vanilla_RNN02.ipynb (調整參數 (num_layers),比較收斂的結果 )
Sequence Model: Long Short-Term Memory Networks (LSTM)
• They are a type of RNNs designed to overcome the Vanishing gradient
problem.
• They introduce specialized memory cells and gating mechanisms that
allow them to capture and preserve information over long sequences.
• Gates (input, forget, and output) to regulate the flow of information.
• They usually perform better than Hidden Markov Models (HMM).
17
(梯度消失問題仍存在,只是相較 RNNs 會好一些而已)
Sequence Model: Long Short-Term Memory Networks (LSTM)
18
Pytorch_lstm_02.ipynb
Sequence Model: Gated Recurrent Units (GRUs)
• They are another variant of RNNs that are similar to LSTM but with a
simplified structure.
• They also use gating mechanisms to control the flow of information
within the network.
• Gates (rest and update) to regulate the flow of information
• They are computationally more efficient than LSTM while still being
able to capture dependencies in sequential data.
19
pytorch_gru_01.ipynb
(權重參數的數量比較小一些,所以收斂速度上會比較快)
Sequence Model: Seq2seq
20
Sequence Model: Transformer Models
• They are a more recent and highly effective architecture for sequence
modeling.
• They move away from recurrence and rely on a self-attention
mechanism to process sequences in parallel and capture long-term
dependencies in data, making them more efficient than traditional
RNNs.
• Self-attention mechanisms to weight the importance of different parts of
input data.
• They have been particularly successful in NLP tasks and have led to
models like BERT, GPT, and others.
21
Sequence Model: Transformer Models
• In the paper Attention is all you need (2017).
• It abandons traditional CNNs and RNNs and
entire network structure composed of Attention
mechanisms and FNNs.
22
https://arxiv.org/abs/1706.03762
• It includes Encoder and
Decoder.
• Bidirectional Encoder Representations from Transformers
• 使用 Encoder 進行編碼,採用 self-attention 機制在編碼 token 同時考
慮上下文的 token,上下文的意思就是雙向的意思 (bidirectional)。
• 採用 Self-Attention多頭機制提取全面性信息。
• 是一種 pre-trained模型,可視為一種特徵提取器。
• 使用 Transformer 的 Encoder 進行特徵提取,可說 Encoder 就是 BERT
模型。
23
Sequence Model: Transformer Models (BERT)
Sequence Model: Transformer Models (Encoder)
• How are you? • 你好嗎?
Tokenizer
<start> “How” “are” “you” “?” <end>
Vocabulary
How 1
are 10
you 300
? 4
Word to index mapping
d=512 Add PE
(Positional Encoding)
Multi-Head-
Attention (HMA)
In
parallel
Feed
forward
Residual
N
O
R
M
Residual
N
O
R
M
Block Block Block
vectors
Sequence Model: Multi-head-attention
• Attention mechanism:
• Select more useful information from words.
• Q, K and V are obtained by applying a linear transformation to the input word
vector x.
• Each matrix W can be learned through training.
25
Q: Information to be queried
K: Vectors being queried
V: Values obtained from the query
(多頭的意思,想像成 CNN 中多個卷積核的作用)
1. MLP/CNN的情況不同,不同層有不同的參數,因此有各自的梯度。
2. RNN權重參數是共享,最終的計算方式是梯度總和下更新;而梯度不會消失,只是距
離遠的梯度會消失而已,所以梯度被近距離的主導,缺乏遠距離的依賴關係。
3. 難以學到遠距離的依賴關係,因此需要導入 attention 機制。
Sequence Model: Transformer Models (Decoder)
26
vectors
Q,K,V
B
O
S
你 好 嗎
你 好 嗎 ?
max max max max
你 0.8
好 0
嗎 0.1
… …
END 0
Size V
(Common characters)
?
E
N
D
max
Add PE
Masked MHA
Cross Attention
Feed Forward
你 1
好 0
嗎 0
… …
… 0
Ground truth
Minimize
cross
entropy
Sequence Model: Summarized
27
Sequence Model: Transformer Models
28
Homework
• 改寫 HW02.ipynb,新增 LSTM、GRU 兩種模型,比較 RMSE
變化與您使用的參數
29
RNN Types Train Score Test Score Parameters
Vanilla RNN 4.25 7.3 hidden_size=32, seq_length=19, num_layers=1, num_epochs=500
LSTM
GRU
補充
30

More Related Content

Similar to Sequence Model pytorch at colab with gpu.pdf (20)

PDF
Use CNN for Sequence Modeling
Dongang (Sean) Wang
 
PDF
Deep Learning to Text
Jian-Kai Wang
 
PPTX
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
PPTX
Deep learning nlp
Heng-Xiu Xu
 
PPTX
recurrent_neural_networks_april_2020.pptx
SagarTekwani4
 
PDF
Concepts of Temporal CNN, Recurrent Neural Network, Attention
SaumyaMundra3
 
PDF
Convolutional and Recurrent Neural Networks
Ramesh Ragala
 
PDF
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Universitat Politècnica de Catalunya
 
PDF
Recurrent Neural Networks. Part 1: Theory
Andrii Gakhov
 
PDF
attention is all you need.pdf attention is all you need.pdfattention is all y...
Amit Ranjan
 
PPTX
Rabbit challenge 5_dnn3
TOMMYLINK1
 
PPTX
240115_Attention Is All You Need (2017 NIPS).pptx
thanhdowork
 
PDF
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Maninda Edirisooriya
 
PDF
Transformer_tutorial.pdf
fikki11
 
PDF
From_seq2seq_to_BERT
Huali Zhao
 
PDF
Rnn presentation 2
Shubhangi Tandon
 
PDF
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Márton Miháltz
 
PPTX
[Paper Reading] Attention is All You Need
Daiki Tanaka
 
PDF
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Universitat Politècnica de Catalunya
 
Use CNN for Sequence Modeling
Dongang (Sean) Wang
 
Deep Learning to Text
Jian-Kai Wang
 
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
Deep learning nlp
Heng-Xiu Xu
 
recurrent_neural_networks_april_2020.pptx
SagarTekwani4
 
Concepts of Temporal CNN, Recurrent Neural Network, Attention
SaumyaMundra3
 
Convolutional and Recurrent Neural Networks
Ramesh Ragala
 
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Universitat Politècnica de Catalunya
 
Recurrent Neural Networks. Part 1: Theory
Andrii Gakhov
 
attention is all you need.pdf attention is all you need.pdfattention is all y...
Amit Ranjan
 
Rabbit challenge 5_dnn3
TOMMYLINK1
 
240115_Attention Is All You Need (2017 NIPS).pptx
thanhdowork
 
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Maninda Edirisooriya
 
Transformer_tutorial.pdf
fikki11
 
From_seq2seq_to_BERT
Huali Zhao
 
Rnn presentation 2
Shubhangi Tandon
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Márton Miháltz
 
[Paper Reading] Attention is All You Need
Daiki Tanaka
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Universitat Politècnica de Catalunya
 

More from FEG (20)

PDF
Supervised learning in decision tree algorithm
FEG
 
PDF
Unsupervised learning in data clustering
FEG
 
PDF
CNN_Image Classification for deep learning.pdf
FEG
 
PDF
Seq2seq Model introduction with practicing hands on coding.pdf
FEG
 
PDF
AIGEN introduction with practicing hands on coding.pdf
FEG
 
PDF
資料視覺化_Exploation_Data_Analysis_20241015.pdf
FEG
 
PDF
Operation_research_Linear_programming_20241015.pdf
FEG
 
PDF
Operation_research_Linear_programming_20241112.pdf
FEG
 
PDF
非監督是學習_Kmeans_process_visualization20241110.pdf
FEG
 
PDF
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
FEG
 
PDF
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
FEG
 
PDF
Pytorch cnn netowork introduction 20240318
FEG
 
PDF
2023 Decision Tree analysis in business practices
FEG
 
PDF
2023 Clustering analysis using Python from scratch
FEG
 
PDF
2023 Data visualization using Python from scratch
FEG
 
PDF
2023 Supervised Learning for Orange3 from scratch
FEG
 
PDF
2023 Supervised_Learning_Association_Rules
FEG
 
PDF
202312 Exploration Data Analysis Visualization (English version)
FEG
 
PDF
202312 Exploration of Data Analysis Visualization
FEG
 
PDF
Transfer Learning (20230516)
FEG
 
Supervised learning in decision tree algorithm
FEG
 
Unsupervised learning in data clustering
FEG
 
CNN_Image Classification for deep learning.pdf
FEG
 
Seq2seq Model introduction with practicing hands on coding.pdf
FEG
 
AIGEN introduction with practicing hands on coding.pdf
FEG
 
資料視覺化_Exploation_Data_Analysis_20241015.pdf
FEG
 
Operation_research_Linear_programming_20241015.pdf
FEG
 
Operation_research_Linear_programming_20241112.pdf
FEG
 
非監督是學習_Kmeans_process_visualization20241110.pdf
FEG
 
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
FEG
 
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
FEG
 
Pytorch cnn netowork introduction 20240318
FEG
 
2023 Decision Tree analysis in business practices
FEG
 
2023 Clustering analysis using Python from scratch
FEG
 
2023 Data visualization using Python from scratch
FEG
 
2023 Supervised Learning for Orange3 from scratch
FEG
 
2023 Supervised_Learning_Association_Rules
FEG
 
202312 Exploration Data Analysis Visualization (English version)
FEG
 
202312 Exploration of Data Analysis Visualization
FEG
 
Transfer Learning (20230516)
FEG
 
Ad

Recently uploaded (20)

PPTX
grade 5 lesson ENGLISH 5_Q1_PPT_WEEK3.pptx
SireQuinn
 
PDF
The dynastic history of the Chahmana.pdf
PrachiSontakke5
 
PDF
The Different Types of Non-Experimental Research
Thelma Villaflores
 
PPTX
Mathematics 5 - Time Measurement: Time Zone
menchreo
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
PPT
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
PPTX
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
PDF
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
PPTX
Pyhton with Mysql to perform CRUD operations.pptx
Ramakrishna Reddy Bijjam
 
PPTX
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
PPSX
HEALTH ASSESSMENT (Community Health Nursing) - GNM 1st Year
Priyanshu Anand
 
PDF
Lesson 2 - WATER,pH, BUFFERS, AND ACID-BASE.pdf
marvinnbustamante1
 
PPTX
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
PDF
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
PDF
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
PPT
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
PDF
LAW OF CONTRACT ( 5 YEAR LLB & UNITARY LLB)- MODULE-3 - LEARN THROUGH PICTURE
APARNA T SHAIL KUMAR
 
PDF
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
PPTX
Quarter1-English3-W4-Identifying Elements of the Story
FLORRACHELSANTOS
 
PDF
Dimensions of Societal Planning in Commonism
StefanMz
 
grade 5 lesson ENGLISH 5_Q1_PPT_WEEK3.pptx
SireQuinn
 
The dynastic history of the Chahmana.pdf
PrachiSontakke5
 
The Different Types of Non-Experimental Research
Thelma Villaflores
 
Mathematics 5 - Time Measurement: Time Zone
menchreo
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
Pyhton with Mysql to perform CRUD operations.pptx
Ramakrishna Reddy Bijjam
 
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
HEALTH ASSESSMENT (Community Health Nursing) - GNM 1st Year
Priyanshu Anand
 
Lesson 2 - WATER,pH, BUFFERS, AND ACID-BASE.pdf
marvinnbustamante1
 
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
LAW OF CONTRACT ( 5 YEAR LLB & UNITARY LLB)- MODULE-3 - LEARN THROUGH PICTURE
APARNA T SHAIL KUMAR
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
Quarter1-English3-W4-Identifying Elements of the Story
FLORRACHELSANTOS
 
Dimensions of Societal Planning in Commonism
StefanMz
 
Ad

Sequence Model pytorch at colab with gpu.pdf

  • 2. About me • Education • NCU (MIS)、NCCU (CS) • Experiences • Telecom big data Innovation • Retail Media Network (RMN) • Customer Data Platform (CDP) • Know-your-customer (KYC) • Digital Transformation • Research • Data Ops (ML Ops) • Business Data Analysis, AI 2
  • 3. Tutorial Content 3 Homework Sequence Models Sequence Data & Sequence Models • Vanilla RNN (Recurrent Neural Network) • LSTM (Long short term memory) • GRU (Gated recurrent unit) • Transformer Model
  • 4. Pytorch tutorial • Pytorch Tutorials • Welcome to PyTorch Tutorials — PyTorch Tutorials 2.2.1+cu121 documentation 4
  • 5. Sequence Data: Types • Sequence Data: • The order of elements is significant. • It can have variable lengths. In natural language, sentences can be of different lengths, and in genomics, DNA sequences can vary in length depending on the organism. 5
  • 6. Sequence Data: Examples • Examples: • Image Captioning. 6
  • 7. Sequence Data: Examples • Examples: • Speech Signals. 7
  • 8. Sequence Data: Examples • Examples: • Time Series Data, such as time stamped transactional data. 8 Lagged features can help you capture the patterns, trends, and seasonality in your time series data, as well as the effects of external factors or events.
  • 9. • Examples: • Language Translation (Natural Language Text). • Chatbot. • Text summarization. • Text categorization. • Parts of speech tagging. • Stemming. • Text mining. Sequence Data: Examples 9
  • 10. 補充 • Pytorch 模型 forward 練習 10
  • 11. 補充 • RNNs 輸入輸出 11 RNN — PyTorch 2.2 documentation
  • 13. Sequence Models: applications • Sequence models are a class of machine learning models designed for tasks that involve sequential data, where the order of elements in the input is important. • Model applications: 13 one to one: Fixed length input/output, a general neural network model. one to many: Image captioning. many to one: Sentiment analysis. many to many: machine translation.
  • 14. Sequence Model: Recurrent Neural Networks (RNNs) • RNNs are a fundamental type of sequence model. • They process sequences one element at a time sequentially while maintaining an internal hidden state that stores information about previous elements in the sequence. • Traditional RNNs suffer from the Vanishing gradient problem, which limits their ability to capture long-range dependencies. 14
  • 15. Sequence Model: Recurrent Neural Networks (RNNs) 15
  • 16. • Stacked RNNs also called Deep RNNs. • The hidden state is responsible for memorizing the information from the previous timestep and using that for further adjustment of weights in Training a model. 16 Sequence Model: Recurrent Neural Networks (RNNs) If your data sequence is short then don’t use more that 2-3 layers because un-necessarily extra training time may lead to make your model un-optimized. Vanilla_RNN02.ipynb (調整參數 (num_layers),比較收斂的結果 )
  • 17. Sequence Model: Long Short-Term Memory Networks (LSTM) • They are a type of RNNs designed to overcome the Vanishing gradient problem. • They introduce specialized memory cells and gating mechanisms that allow them to capture and preserve information over long sequences. • Gates (input, forget, and output) to regulate the flow of information. • They usually perform better than Hidden Markov Models (HMM). 17 (梯度消失問題仍存在,只是相較 RNNs 會好一些而已)
  • 18. Sequence Model: Long Short-Term Memory Networks (LSTM) 18 Pytorch_lstm_02.ipynb
  • 19. Sequence Model: Gated Recurrent Units (GRUs) • They are another variant of RNNs that are similar to LSTM but with a simplified structure. • They also use gating mechanisms to control the flow of information within the network. • Gates (rest and update) to regulate the flow of information • They are computationally more efficient than LSTM while still being able to capture dependencies in sequential data. 19 pytorch_gru_01.ipynb (權重參數的數量比較小一些,所以收斂速度上會比較快)
  • 21. Sequence Model: Transformer Models • They are a more recent and highly effective architecture for sequence modeling. • They move away from recurrence and rely on a self-attention mechanism to process sequences in parallel and capture long-term dependencies in data, making them more efficient than traditional RNNs. • Self-attention mechanisms to weight the importance of different parts of input data. • They have been particularly successful in NLP tasks and have led to models like BERT, GPT, and others. 21
  • 22. Sequence Model: Transformer Models • In the paper Attention is all you need (2017). • It abandons traditional CNNs and RNNs and entire network structure composed of Attention mechanisms and FNNs. 22 https://arxiv.org/abs/1706.03762 • It includes Encoder and Decoder.
  • 23. • Bidirectional Encoder Representations from Transformers • 使用 Encoder 進行編碼,採用 self-attention 機制在編碼 token 同時考 慮上下文的 token,上下文的意思就是雙向的意思 (bidirectional)。 • 採用 Self-Attention多頭機制提取全面性信息。 • 是一種 pre-trained模型,可視為一種特徵提取器。 • 使用 Transformer 的 Encoder 進行特徵提取,可說 Encoder 就是 BERT 模型。 23 Sequence Model: Transformer Models (BERT)
  • 24. Sequence Model: Transformer Models (Encoder) • How are you? • 你好嗎? Tokenizer <start> “How” “are” “you” “?” <end> Vocabulary How 1 are 10 you 300 ? 4 Word to index mapping d=512 Add PE (Positional Encoding) Multi-Head- Attention (HMA) In parallel Feed forward Residual N O R M Residual N O R M Block Block Block vectors
  • 25. Sequence Model: Multi-head-attention • Attention mechanism: • Select more useful information from words. • Q, K and V are obtained by applying a linear transformation to the input word vector x. • Each matrix W can be learned through training. 25 Q: Information to be queried K: Vectors being queried V: Values obtained from the query (多頭的意思,想像成 CNN 中多個卷積核的作用) 1. MLP/CNN的情況不同,不同層有不同的參數,因此有各自的梯度。 2. RNN權重參數是共享,最終的計算方式是梯度總和下更新;而梯度不會消失,只是距 離遠的梯度會消失而已,所以梯度被近距離的主導,缺乏遠距離的依賴關係。 3. 難以學到遠距離的依賴關係,因此需要導入 attention 機制。
  • 26. Sequence Model: Transformer Models (Decoder) 26 vectors Q,K,V B O S 你 好 嗎 你 好 嗎 ? max max max max 你 0.8 好 0 嗎 0.1 … … END 0 Size V (Common characters) ? E N D max Add PE Masked MHA Cross Attention Feed Forward 你 1 好 0 嗎 0 … … … 0 Ground truth Minimize cross entropy
  • 29. Homework • 改寫 HW02.ipynb,新增 LSTM、GRU 兩種模型,比較 RMSE 變化與您使用的參數 29 RNN Types Train Score Test Score Parameters Vanilla RNN 4.25 7.3 hidden_size=32, seq_length=19, num_layers=1, num_epochs=500 LSTM GRU