Anatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur Datar
● Search is one of the most
important discovery tools in
E-commerce.
● Powers other features like
merchandising (promotions),
recommendations etc.
● Accounts for big fraction of the
units sold and GMV.
● Important signals that
affect search: Price,
offers, popularity,
availability, serviceability
etc.
● Used in ranking of
products.
● Exposed as filters and
sorts to end users.
● These signals are very
dynamic, particularly
during sales.
● E-commerce search != websearch.
● Documents have a structure to them
● Queries have an implicit structure
● Challenges:
○ Large document collection with a long heavy tail
○ Extremely high rate of changes/updates (Thousands per sec)
○ Geo specific ranking
○ Multi-objective optimization (GMV, Units, Ads revenue, Long
Term Value)
● Opportunities:
○ Broad queries: personalization can play a huge role
● Queries per day: XXX Millions / week
● Latencies:
○ Average: ~ 100 ms
○ Median: ~ 50 ms
○ 90th percentile: ~ 500 ms
● Documents retrieved and scored from index:
○ Median: 1K to 10K
○ 95th percentile: 200K to 500K
○ 99th percentile: 500K to 3M+
● Search CTR: Around 50%
● Architectural overview of the search platform
○ Serving and Ingestion
○ Serving functional view
○ Serving architectural view
○ Ingestion architectural view
○ Example ingestion topology
● Search quality
○ Challenges
○ Life of a query: Typical flow for query understanding
○ Illustrative problems
● 1,000,000 Compute Cores
● 2.56 Petabytes RAM
● 120 Petabytes Disk
Storage
● 1 Petabytes NVMe SSD
● 128 Tbps bisection
bandwidth Clos network
Anatomy of an eCommerce Search Engine by Mayur Datar
Query Rewriter
(Spell Check, Concept, NLP, Intent,
Augmentation,Retrieval/Scoring query
formulation)
Reverse Proxy
(Geo Coding, User Context, Caching,
Isolation, Rate Limit, Tee-off test framework)
Search Broker
(Distributed Search across shards, Blending
Of Results from shards)
Searcher
(Matching, Scoring, Faceting, Top-K Retrieval
(pass-1 ranking))
Text index NRT index
Metadata
Re-ranking
(Pass-2 Ranking) - ML Model
Pluggable
Ranking Models
Pluggable
Rewriter Modules
Serving:
Arch View
Anatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur Datar
● Architectural overview of the search platform
○ Serving and Ingestion
○ Serving functional view
○ Serving architectural view
○ Ingestion architectural view
○ Example ingestion topology
● Search quality
○ Challenges
○ Life of a query: Typical flow for query understanding
○ Illustrative problems
● Marketplace
○ Catalog entries vary in quality from seller to seller. Spam is
rampant.
● Diversity of users
● Mobile heavy users: Real estate on UI
● Poor internet connectivity
● Literacy/Internet awareness
● Language
● Economic power
● Regional preferences
Abstraction: City-tier
Query/Intent Solicitation
Result Presentation
Product Ranking
40% increase in proportion of tier-3 customers vis-a-vis metro
Query: samsang
Relative ratio of query Tier-3 Vs Metro: 1.8
Query: jins
Relative ratio of query Tier-3 Vs Metro: 2.2
Anatomy of an eCommerce Search Engine by Mayur Datar
Query
Scoring
Normalisation(Index time as well)
- String clean-up
- lower
Spell Correction
- Resource-based
- term->term
- Query->query
- Online
Init
Context
Phrasing (Index time as well)
- Frequent bi/tri grams
Stemming (Index time as well)
- Core e-commerce
stemmer
- plurals
Common MetaData Store (Query Level)
- Raw Data: metrics (CTR, Impression, NDCG…)
- Derived Data: Store, LM score, Features
Synonyms
- Resource-based
Intent
- Deductions
- Tagging (CRF)
Query Rewrite
- Best query selection
- Partial match
SOLR interface
Query Understanding
Output Generator
Retrieval
ranking
logic
Store Classifier
Query LM
Feature Store
Classification
• Special patterns:
– Segmented words: lgnexus5
Counting: “samsang” & no-click followed
by “samsung”& click a million times
– Context aware counting
• Language modeling and edit distance
• Term to vector models in deep learning.
Specific
General
● Intent: From query tokens to (implicit) attributes that are
represented by those tokens
● Examples:
○ “red tape shoes” -> (brand) “red tape” (store) “shoes”
○ “kids party dress 4-5 years pack of 2” -> (ideal_for) “kids”
(occasion) “party” (store) “dress” (size) “4-5 years”
(pack_of) “pack of 2”
○ “samsung e6 cases” -> (“compatible_with”) “samsung e6”
(store) “cases”
● Memorization, Language modeling, CRF
Past orders Product Views
Users’ activity on the platform
Customised Search Ranking
for User-segment
economical expensive
shoes
watches
Past orders Product Views
5 price ranges defined for each
vertical.
1 2 3 4 5
User-Segments based on price affinities
Users’ past activity on the platform.
Customised Search Ranking
for each User-segment
Price
Personalization
#ofusers
Anatomy of an eCommerce Search Engine by Mayur Datar

More Related Content

PDF
Search@flipkart
PDF
System design for recommendations and search
PPTX
Near RealTime search @Flipkart
PDF
How to build a recommender system?
PDF
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
PDF
Contextualization at Netflix
PPTX
Slash n near real time indexing
PDF
Artwork Personalization at Netflix
Search@flipkart
System design for recommendations and search
Near RealTime search @Flipkart
How to build a recommender system?
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges an...
Contextualization at Netflix
Slash n near real time indexing
Artwork Personalization at Netflix

What's hot (20)

PDF
GTC 2021: Counterfactual Learning to Rank in E-commerce
PPTX
Recommender system introduction
PPTX
Better Search Through Query Understanding
PDF
Making Netflix Machine Learning Algorithms Reliable
PPTX
Recommender systems: Content-based and collaborative filtering
PDF
Past, present, and future of Recommender Systems: an industry perspective
PDF
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
PDF
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
PDF
How Lazada ranks products to improve customer experience and conversion
PDF
Deep Learning for Personalized Search and Recommender Systems
PDF
Past, Present & Future of Recommender Systems: An Industry Perspective
PDF
How to Build your Training Set for a Learning To Rank Project
PDF
Supporting decisions with ML
PDF
Recommender system algorithm and architecture
PDF
Recommender Systems In Industry
PDF
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
PDF
Overview of recommender system
ODP
Searching Relational Data with Elasticsearch
PPTX
Neural Learning to Rank
PPT
Item Based Collaborative Filtering Recommendation Algorithms
GTC 2021: Counterfactual Learning to Rank in E-commerce
Recommender system introduction
Better Search Through Query Understanding
Making Netflix Machine Learning Algorithms Reliable
Recommender systems: Content-based and collaborative filtering
Past, present, and future of Recommender Systems: an industry perspective
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
How Lazada ranks products to improve customer experience and conversion
Deep Learning for Personalized Search and Recommender Systems
Past, Present & Future of Recommender Systems: An Industry Perspective
How to Build your Training Set for a Learning To Rank Project
Supporting decisions with ML
Recommender system algorithm and architecture
Recommender Systems In Industry
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Overview of recommender system
Searching Relational Data with Elasticsearch
Neural Learning to Rank
Item Based Collaborative Filtering Recommendation Algorithms
Ad

Similar to Anatomy of an eCommerce Search Engine by Mayur Datar (20)

PPTX
A Survey of Recommender System Techniques and the E-commerce Domain.pptx
PPTX
Matthias Bettag - Challenges for each the multi-channel, multi-device and mul...
PDF
Anatomy of Relevance - From Data to Action: Presented by Saïd Radhouani, Yell...
PDF
Anatomy of Search Relevance: From Data To Action
PPT
Search analytics what why how - By Otis Gospodnetic
ODP
Search analytics what why how - By Otis Gospodnetic
PPTX
Groupon at H2O World - London
PPTX
Kp-Data Analytics-ts.pptx
PDF
Personalized search
PDF
Big data: Bringing competition policy to the digital era – VARIAN – November ...
PDF
Nicholas Gorski: Real-time revenue science at Twitter
PPTX
Data Science Salon: Digital Transformation: The Data Science Catalyst
PPTX
Dicon interactive
PDF
Being a Data Science Product Manager
PPTX
Estudio34 Presents- Dara Fitzgerald Brighton SEO-Next Gen Measurement With Go...
PDF
Big Data in Ecommerce
PPTX
TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director o...
PPTX
The TAUS Translation Data Landscape Report, by Jaap van der Meer, TAUS
PDF
Deepak Tiwari, Lyft
PPTX
Computational Marketing at Groupon - JCSSE 2017
A Survey of Recommender System Techniques and the E-commerce Domain.pptx
Matthias Bettag - Challenges for each the multi-channel, multi-device and mul...
Anatomy of Relevance - From Data to Action: Presented by Saïd Radhouani, Yell...
Anatomy of Search Relevance: From Data To Action
Search analytics what why how - By Otis Gospodnetic
Search analytics what why how - By Otis Gospodnetic
Groupon at H2O World - London
Kp-Data Analytics-ts.pptx
Personalized search
Big data: Bringing competition policy to the digital era – VARIAN – November ...
Nicholas Gorski: Real-time revenue science at Twitter
Data Science Salon: Digital Transformation: The Data Science Catalyst
Dicon interactive
Being a Data Science Product Manager
Estudio34 Presents- Dara Fitzgerald Brighton SEO-Next Gen Measurement With Go...
Big Data in Ecommerce
TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director o...
The TAUS Translation Data Landscape Report, by Jaap van der Meer, TAUS
Deepak Tiwari, Lyft
Computational Marketing at Groupon - JCSSE 2017
Ad

More from Naresh Jain (20)

PDF
Problem Solving Techniques For Evolutionary Design
PDF
Agile India 2019 Conference Welcome Note
PDF
Organizational Resilience
PDF
Improving the Quality of Incoming Code
PDF
Agile India 2018 Conference Summary
PDF
Agile India 2018 Conference
PDF
Agile India 2018 Conference
PDF
Agile India 2018 Conference
PDF
Pilgrim's Progress to the Promised Land by Robert Virding
PDF
Concurrent languages are Functional by Francesco Cesarini
PDF
Erlang from behing the trenches by Francesco Cesarini
PDF
Setting up Continuous Delivery Culture for a Large Scale Mobile App
PDF
Towards FutureOps: Stable, Repeatable environments from Dev to Prod
PDF
Value Driven Development by Dave Thomas
PDF
No Silver Bullets in Functional Programming by Brian McKenna
PDF
Functional Programming Conference 2016
PDF
Agile India 2017 Conference
PDF
The Eclipse Way
PDF
Unleashing the Power of Automated Refactoring with JDT
PDF
Getting2Alpha: Turbo-charge your product with Game Thinking by Amy Jo Kim
Problem Solving Techniques For Evolutionary Design
Agile India 2019 Conference Welcome Note
Organizational Resilience
Improving the Quality of Incoming Code
Agile India 2018 Conference Summary
Agile India 2018 Conference
Agile India 2018 Conference
Agile India 2018 Conference
Pilgrim's Progress to the Promised Land by Robert Virding
Concurrent languages are Functional by Francesco Cesarini
Erlang from behing the trenches by Francesco Cesarini
Setting up Continuous Delivery Culture for a Large Scale Mobile App
Towards FutureOps: Stable, Repeatable environments from Dev to Prod
Value Driven Development by Dave Thomas
No Silver Bullets in Functional Programming by Brian McKenna
Functional Programming Conference 2016
Agile India 2017 Conference
The Eclipse Way
Unleashing the Power of Automated Refactoring with JDT
Getting2Alpha: Turbo-charge your product with Game Thinking by Amy Jo Kim

Recently uploaded (20)

PDF
Website Design & Development_ Professional Web Design Services.pdf
PDF
Practical Indispensable Project Management Tips for Delivering Successful Exp...
PPTX
Odoo ERP for Injection Molding Industry – Optimize Production & Reduce Scrap
PDF
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
PDF
Engineering Document Management System (EDMS)
PDF
CapCut PRO for PC Crack New Download (Fully Activated 2025)
PPTX
ROI from Efficient Content & Campaign Management in the Digital Media Industry
PDF
PDF-XChange Editor Plus 10.7.0.398.0 Crack Free Download Latest 2025
PPTX
Lecture 5 Software Requirement Engineering
PDF
What Makes a Great Data Visualization Consulting Service.pdf
PDF
IDM Crack 6.42 Build 42 Patch Serial Key 2025 Free New Version
PPTX
Folder Lock 10.1.9 Crack With Serial Key
PPTX
WJQSJXNAZJVCVSAXJHBZKSJXKJKXJSBHJBJEHHJB
PPTX
Human-Computer Interaction for Lecture 2
PDF
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
PPTX
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
PPTX
ERP Manufacturing Modules & Consulting Solutions : Contetra Pvt Ltd
PDF
AI-Powered Fuzz Testing: The Future of QA
PPTX
Why 2025 Is the Best Year to Hire Software Developers in India
PDF
Sun and Bloombase Spitfire StoreSafe End-to-end Storage Security Solution
Website Design & Development_ Professional Web Design Services.pdf
Practical Indispensable Project Management Tips for Delivering Successful Exp...
Odoo ERP for Injection Molding Industry – Optimize Production & Reduce Scrap
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
Engineering Document Management System (EDMS)
CapCut PRO for PC Crack New Download (Fully Activated 2025)
ROI from Efficient Content & Campaign Management in the Digital Media Industry
PDF-XChange Editor Plus 10.7.0.398.0 Crack Free Download Latest 2025
Lecture 5 Software Requirement Engineering
What Makes a Great Data Visualization Consulting Service.pdf
IDM Crack 6.42 Build 42 Patch Serial Key 2025 Free New Version
Folder Lock 10.1.9 Crack With Serial Key
WJQSJXNAZJVCVSAXJHBZKSJXKJKXJSBHJBJEHHJB
Human-Computer Interaction for Lecture 2
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
DevOpsDays Halifax 2025 - Building 10x Organizations Using Modern Productivit...
ERP Manufacturing Modules & Consulting Solutions : Contetra Pvt Ltd
AI-Powered Fuzz Testing: The Future of QA
Why 2025 Is the Best Year to Hire Software Developers in India
Sun and Bloombase Spitfire StoreSafe End-to-end Storage Security Solution

Anatomy of an eCommerce Search Engine by Mayur Datar

  • 3. ● Search is one of the most important discovery tools in E-commerce. ● Powers other features like merchandising (promotions), recommendations etc. ● Accounts for big fraction of the units sold and GMV.
  • 4. ● Important signals that affect search: Price, offers, popularity, availability, serviceability etc. ● Used in ranking of products. ● Exposed as filters and sorts to end users. ● These signals are very dynamic, particularly during sales.
  • 5. ● E-commerce search != websearch. ● Documents have a structure to them ● Queries have an implicit structure ● Challenges: ○ Large document collection with a long heavy tail ○ Extremely high rate of changes/updates (Thousands per sec) ○ Geo specific ranking ○ Multi-objective optimization (GMV, Units, Ads revenue, Long Term Value) ● Opportunities: ○ Broad queries: personalization can play a huge role
  • 6. ● Queries per day: XXX Millions / week ● Latencies: ○ Average: ~ 100 ms ○ Median: ~ 50 ms ○ 90th percentile: ~ 500 ms ● Documents retrieved and scored from index: ○ Median: 1K to 10K ○ 95th percentile: 200K to 500K ○ 99th percentile: 500K to 3M+ ● Search CTR: Around 50%
  • 7. ● Architectural overview of the search platform ○ Serving and Ingestion ○ Serving functional view ○ Serving architectural view ○ Ingestion architectural view ○ Example ingestion topology ● Search quality ○ Challenges ○ Life of a query: Typical flow for query understanding ○ Illustrative problems
  • 8. ● 1,000,000 Compute Cores ● 2.56 Petabytes RAM ● 120 Petabytes Disk Storage ● 1 Petabytes NVMe SSD ● 128 Tbps bisection bandwidth Clos network
  • 10. Query Rewriter (Spell Check, Concept, NLP, Intent, Augmentation,Retrieval/Scoring query formulation) Reverse Proxy (Geo Coding, User Context, Caching, Isolation, Rate Limit, Tee-off test framework) Search Broker (Distributed Search across shards, Blending Of Results from shards) Searcher (Matching, Scoring, Faceting, Top-K Retrieval (pass-1 ranking)) Text index NRT index Metadata Re-ranking (Pass-2 Ranking) - ML Model Pluggable Ranking Models Pluggable Rewriter Modules
  • 14. ● Architectural overview of the search platform ○ Serving and Ingestion ○ Serving functional view ○ Serving architectural view ○ Ingestion architectural view ○ Example ingestion topology ● Search quality ○ Challenges ○ Life of a query: Typical flow for query understanding ○ Illustrative problems
  • 15. ● Marketplace ○ Catalog entries vary in quality from seller to seller. Spam is rampant. ● Diversity of users ● Mobile heavy users: Real estate on UI ● Poor internet connectivity
  • 16. ● Literacy/Internet awareness ● Language ● Economic power ● Regional preferences Abstraction: City-tier Query/Intent Solicitation Result Presentation Product Ranking
  • 17. 40% increase in proportion of tier-3 customers vis-a-vis metro
  • 18. Query: samsang Relative ratio of query Tier-3 Vs Metro: 1.8 Query: jins Relative ratio of query Tier-3 Vs Metro: 2.2
  • 20. Query Scoring Normalisation(Index time as well) - String clean-up - lower Spell Correction - Resource-based - term->term - Query->query - Online Init Context Phrasing (Index time as well) - Frequent bi/tri grams Stemming (Index time as well) - Core e-commerce stemmer - plurals Common MetaData Store (Query Level) - Raw Data: metrics (CTR, Impression, NDCG…) - Derived Data: Store, LM score, Features Synonyms - Resource-based Intent - Deductions - Tagging (CRF) Query Rewrite - Best query selection - Partial match SOLR interface Query Understanding Output Generator Retrieval ranking logic Store Classifier Query LM Feature Store Classification
  • 21. • Special patterns: – Segmented words: lgnexus5 Counting: “samsang” & no-click followed by “samsung”& click a million times – Context aware counting • Language modeling and edit distance • Term to vector models in deep learning. Specific General
  • 22. ● Intent: From query tokens to (implicit) attributes that are represented by those tokens ● Examples: ○ “red tape shoes” -> (brand) “red tape” (store) “shoes” ○ “kids party dress 4-5 years pack of 2” -> (ideal_for) “kids” (occasion) “party” (store) “dress” (size) “4-5 years” (pack_of) “pack of 2” ○ “samsung e6 cases” -> (“compatible_with”) “samsung e6” (store) “cases” ● Memorization, Language modeling, CRF
  • 23. Past orders Product Views Users’ activity on the platform Customised Search Ranking for User-segment
  • 24. economical expensive shoes watches Past orders Product Views 5 price ranges defined for each vertical. 1 2 3 4 5 User-Segments based on price affinities Users’ past activity on the platform. Customised Search Ranking for each User-segment Price Personalization #ofusers