AINL 2016: Maraev

Character-level Convolutional Neural
Network for Sentence Paraphrase Detection
Vladislav Maraev
NLX-Group, Faculty of Sciences, University of Lisbon
Paraphrase detection for Russian workshop
AINL FRUCT 2016

Objective
What
Task 2 — Binary classiﬁcation (paraphrase/non-paraphrase).
How
Apply convolutional neural network (CNN) architecture:
Standard Non-standard
Word embeddings ✓ ✓
Character embeddings ✓
Vladislav Maraev AINL FRUCT 2016 Paraphrase detection for Russian workshop (10.11.2016) 2 / 16

Related work
Convolutional neural networks in NLP
• Detecting semantically equivalent questions with CNN and
word embeddings (Bogdanova et al., 2015)
• Convolutional Neural Networks for Sentence Classiﬁcation
(Zhang and Wallace, 2015)
• Attention-based CNN for modeling sentence pairs (Yin et al.,
2016)
• Character embeddings for text classiﬁcation(Zhang et al.,
2015)
• Word+character embeddings for sentiment analysis (dos
Santos and Gatti, 2014)

How CNN works?
TR CONV POOL cosine
similarity
Steps:
1. Token representation (Embedding)
2. Convolution
3. Pooling
4. Pair similarity estimation

Convolutional Neural Network
1. Token representation
Input
s = {t1, t2, . . . , tN}
Token representation
rt
= W0
vt
, (1)
where
• W0 ∈ Rd×V is an embedding matrix
• vt ∈ RV is a one-hot encoded vector of size V
Output
sTR = {rt1 , rt2 , . . . , rtN } , where rtn ∈ Rd

2. Convolution
Convolution
1. Concatenations zn of k-grams
2. Multiply by W1, add bias b1, and
apply tanh function:
rzn
= tanh
(
W1
zn + b1
)
where:
zn ∈ Rdk
W1
∈ Rclu×dk
rzn
∈ Rclu

3. Pooling
Sum (or Max) over all the rzn element-wise and apply tanh
function:
rs
= tanh
(
∑
n
rzn
)
which will give us sentence representation rs ∈ Rclu
* This means that sentence representation doesn’t depend on
sentence length.

4. Compute similarity
TR CONV POOL cosine
similarity
Estimate similarity between the pair of sentence
representations using cosine measure:
similarity =
rs1 · rs2
∥rs1 ∥∥rs2 ∥

Training the network
We train W0, W1 and b1.
Steps
1. Compute mean-squared error (w.r.t. cosine similarity)
2. Use the backpropagation algorithm (SGD/RMSProp) to
compute gradients of the network

Several convolutional ﬁlters

Standard Run Hyperparameters
Word embeddings
Parameter Value Description
k {3, 5, 8, 12} Sizes of k-grams
clu 100 Size of each convolutional ﬁlter
d 300 Size of word representation
epochs 5 Number of training epochs
pooling MAX pooling layer function
optimiser RMSProp Keras’s optimiser
word embeddings Random (uniform)
Sentences were tokenised and lowercased using Keras.

Standard Run Hyperparameters
Character embeddings
k {2, 3, 5, 7, 9, 11} Sizes of k-grams
clu 100 Size of each conv. ﬁlter
char. embeddings Random (uniform)
Characters were lowercased, non-word characters were removed.

Non-Standard Run Hyperparameters
k 3 Size of k-gram
clu 300 Size of convolutional ﬁlter
word embeddings RusVect¯or¯es trained on Russian National Corpus
(Kutuzov and Andreev, 2015)
Input sentences were tokenised, lemmatised and PoS-tagged
with MyStem (Segalovich, 2003).

Main results
Accuracy F1
Standard
NLX (characters) 72.74 78.80
NLX (words) 66.19 76.44
Non-standard NLX (words) 69.94 76.80
BASELINE 49.66 54.03

Discussion
1. The result for standard run is competing with the best system
and can be further improved by tuning hyperparameters
automatically and also picking the epoch for testing
automatically, based on the validation results.
2. Surprisingly, results for the standard run outperformed
non-standard, however, non-standard used external resources
for lemmatisation and initial word embeddings. (Probably due
to a higher focus on the standard run).
Next? Attention-based CNN (Yin et al., 2016), combination of
character and word embeddings (dos Santos and Gatti, 2014).

References
Dasha Bogdanova, Cıcero dos Santos, Luciano Barbosa, and Bianca Zadrozny.
Detecting semantically equivalent questions in online user forums. CoNLL 2015,
page 123, 2015.
Cicero dos Santos and Maira Gatti. Deep convolutional neural networks for sentiment
analysis of short texts. In Proceedings of COLING 2014, the 25th International
Conference on Computational Linguistics: Technical Papers, pages 69–78, Dublin,
Ireland, August 2014. Dublin City University and Association for Computational
Linguistics.
Andrey Kutuzov and Igor Andreev. Texts in, meaning out: neural language models in
semantic similarity task for russian. In Proceedings of the Dialog Conference, 2015.
Ilya Segalovich. A fast morphological algorithm with unknown word guessing induced
by a dictionary for a web search engine. In MLMTA, pages 273–280. Citeseer, 2003.
Wenpeng Yin, Hinrich Schtze, Bing Xiang, and Bowen Zhou. Abcnn: Attention-based
convolutional neural network for modeling sentence pairs. Transactions of the
Association for Computational Linguistics, 4:259–272, 2016. ISSN 2307-387X.
Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks
for text classiﬁcation. In Advances in Neural Information Processing Systems,
pages 649–657, 2015.
Ye Zhang and Byron Wallace. A sensitivity analysis of (and practitioners’ guide to)
convolutional neural networks for sentence classiﬁcation. arXiv preprint
arXiv:1510.03820, 2015.

AINL 2016: Maraev

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to AINL 2016: Maraev (20)

More from Lidia Pivovarova (8)

Recently uploaded (20)

AINL 2016: Maraev