SlideShare a Scribd company logo
Automated product categorization
Chapters
1. Introduction
2. Service Architecture
3. Data
4. Training
5. Inference
6. Evaluation
#1 What is Skroutz.gr?
• Skroutz.gr is a marketplace & shopping assistant which
makes online shopping easier and more reliable
• It includes more than 11,000,000 products from 3,200
different e-shops
• On a monthly basis the website welcomes more than 8
million unique visitors ranking in the top positions in the
Greek Web
#1 Some Numbers
3,200
merchants
11
million products
270 mil.
pageviews /
mo
1.1 mil.
searches/day
33 mil.
sessions/mo
#1 The Problem
• Each day we collect thousands of new products by
downloading e-shop feeds (XML, CSV etc. - product
catalogs)
• We want to categorize incoming product payloads as
provided by eshops to the most relevant categories in
Skroutz category tree taxonomy with the minimum human
intervention.
- Difficult
- Important
#1 Why Difficult?
• Many leaf categories in
Skroutz taxonomy (>2k)
• Sibling categories
(subjective categorization)
• Misleading product titles
and shop-categories from
shops
#1 Why Important?
Robot MO collects
products from shop
feeds and stores them
to DB
Megatron category
classifier categorizes
products to the correct
category
Tron groups similar
products to entities
called SKUs to be
ready for indexing
Elasticsearch indexes
products to be
searchable from user
interface
#1 Facts
•Merchants send more than ~15k new products every day in
Skroutz!!!
•2.3k unique leaf categories in our category tree (taxonomy)
•Manual “move-to-category” action:
- Costs ~7.8s on average for content managers
- Subjective decisions may add extra overhead
#1 Old Solution - Overview
•Use Elasticseach to match specific product attributes:
- PN (manufacturer part number)
- Name
- Shop-category
•Aggregate matches and group by categories
•Normalize results and use custom weights to calculate a score
•Take Top-K results
#1 Old Solution - Limitations
•Plain cosine similarity distance on TF/IDF weights:
- No learning feedback loop
- No advanced statistics utilization (e.g. correlation between price
value and text features)
•No easy way to tune custom weights applied on final scoring
•Heuristics don’t take into account category specific context
•Heuristics don’t take into account word level context. E.g.
word “samsung” is followed by word “galaxy” most of the time
and then probably follows a model number.
#1 Old Solution - Good Parts
•Simple solution (except for custom scoring stuff)
•Easy to debug
•Easy to deploy
•Online
#1 New Solution - “Megatron”
#1 Overview
•Approach problem as a supervised learning task
•Rely on probabilities to obtain a meaningful score
•Use more features from multiple sources and use datasets
•Learn new patterns and relations by training
•Measure performance on dataset splits
•Use a microservice to serve classification requests
•Apply threshold for low confidence results
Chapters
1. Introduction
2. Service Architecture
3. Data
4. Training
5. Inference
6. Evaluation
#2 Service Architecture
1.Training Phase
2.Inference Phase
3.APIs
Automated product categorization
#2.1 Training Phase
1. Export dataset (product features labeled with category_id) and upload to Swift
2. Download specific dataset version in “training VM”
3. Start a training session using a train/val split from dataset
4. Save best performing model params snapshot (based on validation set loss)
5. Compress and upload model params to Swift container
#2.2 Inference Phase
1. Application Part: Send classification request
upon new product arrivals:
- Kafka producer (asynchronous request)
- Megatron Client HTTP synchronous
requests (2nd alternative)
2. Category Classifier Microservice Part:
- Pop messages from stream (Kafka
consumer)
- Dispatch messages to in-memory Neural
Network instance
- Fetch predictions (scores) and post-back
to Core Application API endpoint
#2.3 APIs
1. Megatron microservice internal API
- Common API (wraps Keras API)
- Basic methods:
✓ build
✓ train
✓ save
✓ load
✓ predict
- CLI commands
#2.3 APIs(2)
1. Skroutz Application Ecosystem (Ruby client)
- Megatron::Client
✓ Issues requests to microservice
- Megatron DB model
✓ Stores prediction results
- ApiController endpoint
✓ Receives callbacks from microservice
Chapters
1. Introduction
2. Service Architecture
3. Data
4. Training
5. Inference
6. Evaluation
#3 Data
•Product attribute values (potential features)
Product
Name
Shop manufacturer
Part number
EAN
Price
Shop category
...
Samsung TV 32'' DF324 (PNDFD22) Full HD Black NEW
Αρχική > Ηλεκτρονικά > Τηλεοράσεις
PNDFD22
300 €
#3 Data(2)
•Training Dataset - Raw Features
Image
Numerical
Categorical
Label
Text
#3 Data(3)
•Preprocessing
- Text
- Numerical
- Categorical
- Labels
X
y
#3 Preprocessing - Text
• Our best solution involves “Word Vectors”
• Steps to prepare for word vectors:
- Learn a words Vocabulary (mapping of words to numeric id)
- Transform text sentences to Sequences of ids based on Vocabulary
- Decide a representative sequence length (E.g. 60 words)
- Apply zero padding (pre or post) and truncation to maintain a fixed length
#3 Preprocessing - Text(2)
• Use of Pretrained Embeddings (see W2Vec, FastText, GloVe etc.)
• We use FastText library with skipgram algorithm (unsupervised)
- https://fasttext.cc/docs/en/unsupervised-tutorial.html
#3 Preprocessing - Text(3)
• Embeddings:
- Outputs 100 dim Vector
- Total 1,500,000 rows (vocab)
• 2 versions (Name, Shop-category)
#3 Preprocessing - Numerical
• “Pricevat” and “Name Length” values
• Apply Standard Scaling
#3 Preprocessing - Categorical
• All discrete value attributes/features:
- shop_id
- matching Product PNs category_id list
• One-Hot encoding:
#3 Label Encoding
• “category_id “ values are the “true” labels which should be learned by NN
• One-Hot encoding
• OR just use IDs and rely to “Keras” conventions (E.g. use an internal sparse categorical
representation to save huge amounts of RAM)
Chapters
1. Introduction
2. Service Architecture
3. Data
4. Training
5. Inference
6. Evaluation
#4 Training
1.Basic Concepts
2.Model Architecture
3.Training “In Action”
#4.1 Basic Concepts
•Objective:
- Find a combination of mathematical functions and a set of
corresponding params to maximize prediction accuracy (or minimize
error rate).
- Ensure that the above generalizes well for production.
- Learn params in an acceptable time window.
•Experiment with Neural Network architectures
•GPUS to the rescue (speedup x10)
#4.1 Basic Concepts(2)
•Loss function
- Categorical Crossentropy
•Optimizer
- Adam (Gradient Descent)
•Hyper-params
- Mini-Batch Size
- Learning Rate
- Epochs
#4.1 Validation
•Why?
- Simulate unseen data
- Compare different:
✓ training methods
✓ hyper -params
- Avoid Overfitting
•Should be representative
•Validation Strategy
- 10% of whole Dataset
- Stratification on Categories
#4.2 Model Architecture
Text
#4.2 Model Architecture
•Hybrid End-to-End architecture
•4 branches (4 input vectors):
A. Name Features Branch
B. Shop-Category Features Branch
C. Basic Features Branch (Numerics, Categorical)
D. Matching PNs Branch (Categorical)
Text
#4.2 Text Branches
• Inspired by “Embed, Encode, Attend, Predict”
- https://explosion.ai/blog/deep-learning-formula-nlp
• Each of “name” and “shop-category” sequence flows through:
- 1 x Embeddings Layer
- 1 x Bi-LSTM Encoder
- 1 x Attention Module
- 1 x LSTM Encoder
#4.2 Text Branches - Why LSTM?
• LSTM stands for “Long Short Term Memory” Layer (Encoder):
- Memory Cells / Captures context
- Propagates signal from previous words to the next in a Sequence
- 2 Stacked Layers performed better in our experiments
- 128 dimension output vector
- https://colah.github.io/posts/2015-08-Understanding-LSTMs/
128dim
#cells = sequence length
#4.2 Text Branches - Why pay Attention?
• Attention Mechanism:
- Controll how much signal should be propagated to next layers
- https://distill.pub/2016/augmented-rnns/
#4.2 Other Branches
• Basic Features Branch
- Inputs a concatenation of basic feats
- 1 Dense layer with #classes output
- ReLU activation
• Matching PNs Branch
- Inputs a concatenation of PN feats
- Short-circuited to final layer
InputVector
#classes
(~2kforSkroutz)
#4.2 Final Layer
• Merging Layer
- Concatenates all 4 branches outputs
- softmax activation
- Output: probabilities for each class
#4.2
#4.2 Model Architecture
• Model Capacity/Complexity:
#4.3 Training In Action - Model Selection
•Conducted 100s of experiments with different combinations
of features, layers, modules (e.g. Embeddings, Bag of Words,
TF/IDF, LSTM, etc.)
•10s of Ablations studies: remove specific features to see how
performance is affected
•Read many papers and applied some common tricks (Bi-LSTM,
AdaptivePooling etc.)
•It is an alchemy!
#4.3 Training In Action - Tools
•Training Scheduler Process runs weekly
•CLI training commands
- CUDA_VISIBLE_DEVICES=1 python -m category_classifier.cli scrooge --model end2end --train --epochs
8 --batch_size 128
•Model Versioning
- E.g. “skroutz_models_2018_09_01_v1.tar.gz”
#4.3 Training In Action
Training run output example:
GPU monitoring:
#4.3 Training In Action
Learning Curves (Tensorboard):
Current best
Previous Arch Current bestPrevious Arch
Chapters
1. Introduction
2. Service Architecture
3. Data
4. Training
5. Inference
6. Evaluation
#5 Inference
1.Inference Pipeline
2.Inference API
3.Production
#5.1 Inference Pipeline
•Online execution:
- preprocessing
- vectorization
- Prediction
•Utilized by CategoryClassifier Class
- Wrapper of external API
•Utilize scikit-learn Pipelines
- http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
#5.2 Inference API
• REPL
• Kafka Worker
• Flask App
#5.3 Production
•x2 inference VMs
- inference1.skroutz.gr, inference2.skroutz.gr (Kafka Workers)
•x2 Flavors (Greece, UK)
•Grafana Monitoring for Kafka Part
Chapters
1. Introduction
2. Service Architecture
3. Data
4. Training
5. Inference
6. Evaluation
#6 Evaluation
•More than 6% error rate reduction overall in Skroutz!
•Currently, more than ~2 content-editor hours saved per day in
Skroutz (this is scaling)!
•Move operations from list with “uncategorized” products
reduced significantly (by an order of magnitude)!
#6 Performance Summary
Success Rate Failure Rate No Prediction Rate
Megatron Old Megatron Old Megatron Old
Skroutz (GR)
2.3k categories
90.10% 82.6% 7.9% 13.8% 2% 3.5%
91.85% 85.7% 8.14% 14.32% N/A N/A
Scrooge (UK)
350 categories
87.56% 38.9% 2.5% 26.24% 9.9% 58.48%
97.1% 93.67% 2.8% 6.32% N/A N/A
#6 Monitoring Dashboard
#Future Improvements
• Utilize Image Features (in End-To-End model)
• Utilize Entity Recognition to extract more features
• Find ways to utilize more features (color, sizes etc.)
• Categorical Self-Trained Embeddings
• Experiment with newer solutions like “Transformer”
#Contact Info
Andreas Loupasakis
• Email: alup@skroutz.gr
• Kaggle: https://www.kaggle.com/andreaslup
• Twitter: https://twitter.com/andy_lupo
• LinkedIn: https://www.linkedin.com/in/andreas-loupasakis-06399a47
Thank you!

More Related Content

PPTX
Identifying-the-inquiry-and-stating-the-problem.lesson.3.pptx
MariaRuffaDulayIrinc
 
PDF
AI meets Big Data
Jan Wiegelmann
 
PDF
IRJET- Automatic Detection of Characteristics of Clothing using Image Process...
IRJET Journal
 
PDF
Boosting Product Categorization with Machine Learning
Amadeus Magrabi
 
PDF
DataEngConf 2017 - Machine Learning Models in Production
Sharath Rao
 
PDF
Keith Moon "Machine learning for iOS developers"
IT Event
 
PPTX
Machine Learning for iOS developers
Keith Moon
 
PDF
Deep learning for e-commerce: current status and future prospects
Rakuten Group, Inc.
 
Identifying-the-inquiry-and-stating-the-problem.lesson.3.pptx
MariaRuffaDulayIrinc
 
AI meets Big Data
Jan Wiegelmann
 
IRJET- Automatic Detection of Characteristics of Clothing using Image Process...
IRJET Journal
 
Boosting Product Categorization with Machine Learning
Amadeus Magrabi
 
DataEngConf 2017 - Machine Learning Models in Production
Sharath Rao
 
Keith Moon "Machine learning for iOS developers"
IT Event
 
Machine Learning for iOS developers
Keith Moon
 
Deep learning for e-commerce: current status and future prospects
Rakuten Group, Inc.
 

Similar to Automated product categorization (20)

PDF
DN 2017 | Boosting Product Categorization with Machine Learning | Amadeus Mag...
Dataconomy Media
 
PDF
Python and Machine Learning Applications in Industry
stermedia
 
PPTX
Keras on tensorflow in R & Python
Longhow Lam
 
PPTX
Machine Learning With Spark
Shivaji Dutta
 
PPTX
Demystifying-AI-Frameworks-TensorFlow-PyTorch-JAX-and-More (1).pptx
Anant Garg
 
PPTX
Major_Project_Presentaion_B14.pptx
LokeshKumarReddy8
 
PDF
Training and deploying an image classification model
Knoldus Inc.
 
PDF
Creating a custom ML model for your application - DevFest Lima 2019
Isabel Palomar
 
PDF
Machine Learning from a Software Engineer's perspective
Marijn van Zelst
 
PDF
Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Codemotion
 
PDF
Creating a custom Machine Learning Model for your applications - Java Dev Day...
Isabel Palomar
 
PPTX
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
PDF
IRJET- Machine Learning based Object Identification System using Python
IRJET Journal
 
PPTX
Deep learning image classification aplicado al mundo de la moda
Javier Abadía
 
PDF
A survey on Machine Learning In Production (July 2018)
Arnab Biswas
 
PPTX
Deep learning with keras
MOHITKUMAR1379
 
PPTX
B4UConference_machine learning_deeplearning
Hoa Le
 
PDF
Data Science Crash Course
DataWorks Summit
 
PPTX
Deep learning summary
ankit_ppt
 
PDF
Deep Learning for Semantic Search in E-commerce​
Somnath Banerjee
 
DN 2017 | Boosting Product Categorization with Machine Learning | Amadeus Mag...
Dataconomy Media
 
Python and Machine Learning Applications in Industry
stermedia
 
Keras on tensorflow in R & Python
Longhow Lam
 
Machine Learning With Spark
Shivaji Dutta
 
Demystifying-AI-Frameworks-TensorFlow-PyTorch-JAX-and-More (1).pptx
Anant Garg
 
Major_Project_Presentaion_B14.pptx
LokeshKumarReddy8
 
Training and deploying an image classification model
Knoldus Inc.
 
Creating a custom ML model for your application - DevFest Lima 2019
Isabel Palomar
 
Machine Learning from a Software Engineer's perspective
Marijn van Zelst
 
Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Codemotion
 
Creating a custom Machine Learning Model for your applications - Java Dev Day...
Isabel Palomar
 
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
IRJET- Machine Learning based Object Identification System using Python
IRJET Journal
 
Deep learning image classification aplicado al mundo de la moda
Javier Abadía
 
A survey on Machine Learning In Production (July 2018)
Arnab Biswas
 
Deep learning with keras
MOHITKUMAR1379
 
B4UConference_machine learning_deeplearning
Hoa Le
 
Data Science Crash Course
DataWorks Summit
 
Deep learning summary
ankit_ppt
 
Deep Learning for Semantic Search in E-commerce​
Somnath Banerjee
 
Ad

Recently uploaded (20)

PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PPTX
Presentation about variables and constant.pptx
safalsingh810
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PDF
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PPTX
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
PDF
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PPTX
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
DOCX
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
PDF
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PDF
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
PDF
Immersive experiences: what Pharo users do!
ESUG
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PDF
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
Presentation about variables and constant.pptx
safalsingh810
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
Immersive experiences: what Pharo users do!
ESUG
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Ad

Automated product categorization

  • 2. Chapters 1. Introduction 2. Service Architecture 3. Data 4. Training 5. Inference 6. Evaluation
  • 3. #1 What is Skroutz.gr? • Skroutz.gr is a marketplace & shopping assistant which makes online shopping easier and more reliable • It includes more than 11,000,000 products from 3,200 different e-shops • On a monthly basis the website welcomes more than 8 million unique visitors ranking in the top positions in the Greek Web
  • 4. #1 Some Numbers 3,200 merchants 11 million products 270 mil. pageviews / mo 1.1 mil. searches/day 33 mil. sessions/mo
  • 5. #1 The Problem • Each day we collect thousands of new products by downloading e-shop feeds (XML, CSV etc. - product catalogs) • We want to categorize incoming product payloads as provided by eshops to the most relevant categories in Skroutz category tree taxonomy with the minimum human intervention. - Difficult - Important
  • 6. #1 Why Difficult? • Many leaf categories in Skroutz taxonomy (>2k) • Sibling categories (subjective categorization) • Misleading product titles and shop-categories from shops
  • 7. #1 Why Important? Robot MO collects products from shop feeds and stores them to DB Megatron category classifier categorizes products to the correct category Tron groups similar products to entities called SKUs to be ready for indexing Elasticsearch indexes products to be searchable from user interface
  • 8. #1 Facts •Merchants send more than ~15k new products every day in Skroutz!!! •2.3k unique leaf categories in our category tree (taxonomy) •Manual “move-to-category” action: - Costs ~7.8s on average for content managers - Subjective decisions may add extra overhead
  • 9. #1 Old Solution - Overview •Use Elasticseach to match specific product attributes: - PN (manufacturer part number) - Name - Shop-category •Aggregate matches and group by categories •Normalize results and use custom weights to calculate a score •Take Top-K results
  • 10. #1 Old Solution - Limitations •Plain cosine similarity distance on TF/IDF weights: - No learning feedback loop - No advanced statistics utilization (e.g. correlation between price value and text features) •No easy way to tune custom weights applied on final scoring •Heuristics don’t take into account category specific context •Heuristics don’t take into account word level context. E.g. word “samsung” is followed by word “galaxy” most of the time and then probably follows a model number.
  • 11. #1 Old Solution - Good Parts •Simple solution (except for custom scoring stuff) •Easy to debug •Easy to deploy •Online
  • 12. #1 New Solution - “Megatron”
  • 13. #1 Overview •Approach problem as a supervised learning task •Rely on probabilities to obtain a meaningful score •Use more features from multiple sources and use datasets •Learn new patterns and relations by training •Measure performance on dataset splits •Use a microservice to serve classification requests •Apply threshold for low confidence results
  • 14. Chapters 1. Introduction 2. Service Architecture 3. Data 4. Training 5. Inference 6. Evaluation
  • 15. #2 Service Architecture 1.Training Phase 2.Inference Phase 3.APIs
  • 17. #2.1 Training Phase 1. Export dataset (product features labeled with category_id) and upload to Swift 2. Download specific dataset version in “training VM” 3. Start a training session using a train/val split from dataset 4. Save best performing model params snapshot (based on validation set loss) 5. Compress and upload model params to Swift container
  • 18. #2.2 Inference Phase 1. Application Part: Send classification request upon new product arrivals: - Kafka producer (asynchronous request) - Megatron Client HTTP synchronous requests (2nd alternative) 2. Category Classifier Microservice Part: - Pop messages from stream (Kafka consumer) - Dispatch messages to in-memory Neural Network instance - Fetch predictions (scores) and post-back to Core Application API endpoint
  • 19. #2.3 APIs 1. Megatron microservice internal API - Common API (wraps Keras API) - Basic methods: ✓ build ✓ train ✓ save ✓ load ✓ predict - CLI commands
  • 20. #2.3 APIs(2) 1. Skroutz Application Ecosystem (Ruby client) - Megatron::Client ✓ Issues requests to microservice - Megatron DB model ✓ Stores prediction results - ApiController endpoint ✓ Receives callbacks from microservice
  • 21. Chapters 1. Introduction 2. Service Architecture 3. Data 4. Training 5. Inference 6. Evaluation
  • 22. #3 Data •Product attribute values (potential features) Product Name Shop manufacturer Part number EAN Price Shop category ... Samsung TV 32'' DF324 (PNDFD22) Full HD Black NEW Αρχική > Ηλεκτρονικά > Τηλεοράσεις PNDFD22 300 €
  • 23. #3 Data(2) •Training Dataset - Raw Features Image Numerical Categorical Label Text
  • 24. #3 Data(3) •Preprocessing - Text - Numerical - Categorical - Labels X y
  • 25. #3 Preprocessing - Text • Our best solution involves “Word Vectors” • Steps to prepare for word vectors: - Learn a words Vocabulary (mapping of words to numeric id) - Transform text sentences to Sequences of ids based on Vocabulary - Decide a representative sequence length (E.g. 60 words) - Apply zero padding (pre or post) and truncation to maintain a fixed length
  • 26. #3 Preprocessing - Text(2) • Use of Pretrained Embeddings (see W2Vec, FastText, GloVe etc.) • We use FastText library with skipgram algorithm (unsupervised) - https://fasttext.cc/docs/en/unsupervised-tutorial.html
  • 27. #3 Preprocessing - Text(3) • Embeddings: - Outputs 100 dim Vector - Total 1,500,000 rows (vocab) • 2 versions (Name, Shop-category)
  • 28. #3 Preprocessing - Numerical • “Pricevat” and “Name Length” values • Apply Standard Scaling
  • 29. #3 Preprocessing - Categorical • All discrete value attributes/features: - shop_id - matching Product PNs category_id list • One-Hot encoding:
  • 30. #3 Label Encoding • “category_id “ values are the “true” labels which should be learned by NN • One-Hot encoding • OR just use IDs and rely to “Keras” conventions (E.g. use an internal sparse categorical representation to save huge amounts of RAM)
  • 31. Chapters 1. Introduction 2. Service Architecture 3. Data 4. Training 5. Inference 6. Evaluation
  • 32. #4 Training 1.Basic Concepts 2.Model Architecture 3.Training “In Action”
  • 33. #4.1 Basic Concepts •Objective: - Find a combination of mathematical functions and a set of corresponding params to maximize prediction accuracy (or minimize error rate). - Ensure that the above generalizes well for production. - Learn params in an acceptable time window. •Experiment with Neural Network architectures •GPUS to the rescue (speedup x10)
  • 34. #4.1 Basic Concepts(2) •Loss function - Categorical Crossentropy •Optimizer - Adam (Gradient Descent) •Hyper-params - Mini-Batch Size - Learning Rate - Epochs
  • 35. #4.1 Validation •Why? - Simulate unseen data - Compare different: ✓ training methods ✓ hyper -params - Avoid Overfitting •Should be representative •Validation Strategy - 10% of whole Dataset - Stratification on Categories
  • 37. #4.2 Model Architecture •Hybrid End-to-End architecture •4 branches (4 input vectors): A. Name Features Branch B. Shop-Category Features Branch C. Basic Features Branch (Numerics, Categorical) D. Matching PNs Branch (Categorical) Text
  • 38. #4.2 Text Branches • Inspired by “Embed, Encode, Attend, Predict” - https://explosion.ai/blog/deep-learning-formula-nlp • Each of “name” and “shop-category” sequence flows through: - 1 x Embeddings Layer - 1 x Bi-LSTM Encoder - 1 x Attention Module - 1 x LSTM Encoder
  • 39. #4.2 Text Branches - Why LSTM? • LSTM stands for “Long Short Term Memory” Layer (Encoder): - Memory Cells / Captures context - Propagates signal from previous words to the next in a Sequence - 2 Stacked Layers performed better in our experiments - 128 dimension output vector - https://colah.github.io/posts/2015-08-Understanding-LSTMs/ 128dim #cells = sequence length
  • 40. #4.2 Text Branches - Why pay Attention? • Attention Mechanism: - Controll how much signal should be propagated to next layers - https://distill.pub/2016/augmented-rnns/
  • 41. #4.2 Other Branches • Basic Features Branch - Inputs a concatenation of basic feats - 1 Dense layer with #classes output - ReLU activation • Matching PNs Branch - Inputs a concatenation of PN feats - Short-circuited to final layer InputVector #classes (~2kforSkroutz)
  • 42. #4.2 Final Layer • Merging Layer - Concatenates all 4 branches outputs - softmax activation - Output: probabilities for each class
  • 43. #4.2
  • 44. #4.2 Model Architecture • Model Capacity/Complexity:
  • 45. #4.3 Training In Action - Model Selection •Conducted 100s of experiments with different combinations of features, layers, modules (e.g. Embeddings, Bag of Words, TF/IDF, LSTM, etc.) •10s of Ablations studies: remove specific features to see how performance is affected •Read many papers and applied some common tricks (Bi-LSTM, AdaptivePooling etc.) •It is an alchemy!
  • 46. #4.3 Training In Action - Tools •Training Scheduler Process runs weekly •CLI training commands - CUDA_VISIBLE_DEVICES=1 python -m category_classifier.cli scrooge --model end2end --train --epochs 8 --batch_size 128 •Model Versioning - E.g. “skroutz_models_2018_09_01_v1.tar.gz”
  • 47. #4.3 Training In Action Training run output example: GPU monitoring:
  • 48. #4.3 Training In Action Learning Curves (Tensorboard): Current best Previous Arch Current bestPrevious Arch
  • 49. Chapters 1. Introduction 2. Service Architecture 3. Data 4. Training 5. Inference 6. Evaluation
  • 51. #5.1 Inference Pipeline •Online execution: - preprocessing - vectorization - Prediction •Utilized by CategoryClassifier Class - Wrapper of external API •Utilize scikit-learn Pipelines - http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
  • 52. #5.2 Inference API • REPL • Kafka Worker • Flask App
  • 53. #5.3 Production •x2 inference VMs - inference1.skroutz.gr, inference2.skroutz.gr (Kafka Workers) •x2 Flavors (Greece, UK) •Grafana Monitoring for Kafka Part
  • 54. Chapters 1. Introduction 2. Service Architecture 3. Data 4. Training 5. Inference 6. Evaluation
  • 55. #6 Evaluation •More than 6% error rate reduction overall in Skroutz! •Currently, more than ~2 content-editor hours saved per day in Skroutz (this is scaling)! •Move operations from list with “uncategorized” products reduced significantly (by an order of magnitude)!
  • 56. #6 Performance Summary Success Rate Failure Rate No Prediction Rate Megatron Old Megatron Old Megatron Old Skroutz (GR) 2.3k categories 90.10% 82.6% 7.9% 13.8% 2% 3.5% 91.85% 85.7% 8.14% 14.32% N/A N/A Scrooge (UK) 350 categories 87.56% 38.9% 2.5% 26.24% 9.9% 58.48% 97.1% 93.67% 2.8% 6.32% N/A N/A
  • 58. #Future Improvements • Utilize Image Features (in End-To-End model) • Utilize Entity Recognition to extract more features • Find ways to utilize more features (color, sizes etc.) • Categorical Self-Trained Embeddings • Experiment with newer solutions like “Transformer”
  • 59. #Contact Info Andreas Loupasakis • Email: [email protected] • Kaggle: https://www.kaggle.com/andreaslup • Twitter: https://twitter.com/andy_lupo • LinkedIn: https://www.linkedin.com/in/andreas-loupasakis-06399a47