SlideShare a Scribd company logo
Real Time Recommender System
with
Jan 22, 2014

Daqing Zhao, Director of Advanced Analytics

Macy’s.com
Agenda

 Big data analytics versus traditional BI

 Macy’s Advanced Analytics Team
 Our analytics projects
 Example: site recommendations using Kiji
 High level architecture
 Kiji Schema table structure
 Model deployment using Kiji
 Key benefits of Kiji and WibiData team

1
Traditional BI process
Knowledge
Discovery

Segmentation and
Predictive Modeling
Most companies
Stay in this area
Multidimensional Report

Standard Report

Schema definition, ETL into RDMS

Baseline Consulting

 Data can be accessed and analyzed only after ETL
 Schema definition may not be optimal
2
Hadoop/NoSQL: paradigm shift

Decisions

Insights

Models

Decision Agent

Segmentation
and
Predictive
Modeling

Multi
dimensional
Report

Reports

Standard
Report

Hive, Mahout, Cascading, Scalding, Kiji, …

MapReduce
Raw
data

Volume
Velocity
Variety

Write
Append
Read

Distributed
storage

Computation
near data

Hadoop, HBase, avro, …

 We can access raw data and analyze using MapReduce
 With pros and cons
3
Macy.com’s Advanced Analytics Group
 We are at the frontiers of Big Data science:
• Using Big Data technology
• Machine learning and Statistical algorithms

 We have predictive modeling, experimental design and data science
teams

 Our team members have very strong background in
• Quantitative fields, math, stat, physics, bioinformatics, decision sciences, and cs
• We collaborate with systems and IT teams internally as well as 3rd party vendors like WibiData,
SAS Research, IBM Research…

 We use a wide range of tools
• Hadoop, SAS, R, Mahout, and others, as well as Kiji Models

 We are data scientists with keen focus on domain problems

4
Customer acquisition and retention
 Targeting the right message to the right customer at the right time
• Build predictive models of purchase behavior and identify drivers

 Site recommendation algorithms
• Recommend products based on items that are added to bag for cross- and up-sell
• We also look at market basket analysis
• Most work is in batch mode, expanding slowly into real time

 Rapid-prototyping and testing of algorithms and policies
• All done in short development cycles

 Output of the team’s work support other marketing teams to identify,
and reach best customers
• Search, display, social network, affiliates, retention, customer services, …

5
Some other projects
 Data organization or data munging
•
•
•
•
•

Data collections, individual and event level, 360 degrees, …
Segmentation of customers
Customer value, revenue, costs
Multiple channel attribution of marketing contacts
Product attributes

 Experimentation platform
• Success of online marketing depends highly on testing, learning and optimization
• Both for site layout as well as contents and recommendations

 Forecast and optimization
• Prediction, simulation, and search and optimize

 Big data refinement and scalability
• New data sources, more efficient ways of accessing data, and organizing and
processing data

6
Example: similar and complementary products

7
Example: customer segmentation

Demographic
Socio-economic
Behavioral
Values and styles
Channels
Modality

8
Example: product social network

Demographic
Style
Size

Brand
Price range
Season

9
Example: site product recommendation
 Customer Adds to Bag one or more products

 We recommend in real time similar/complementary products
• Based on product associations and customer profile

 We use various machine learning algorithms
•
•
•
•
•
•

Association rules
Collaborative filtering
Predictive modeling
Business rules
And others, …
Models built offline

 Real time data, real time model scoring and real time decision
 Champion/challenger tests, models evolve quickly in time
 Frequent model updates, add new data

10
Architecture

Real Time
Data access, Scoring
Decisions

Others
data mining
Kiji Express
environment
data mining
Mahout
environment
data mining
R
environment
SAS
Environment

products

Kiji Model
Kiji Kiji Scoring
Scoring
Kiji Kiji Rest
Rest
Kiji Kiji Rest
Rest

Hadoop
HBase

11
Kiji Schema table structure

Customer table

entity id

customer

email

metadata

order

Product table

entity id

product

category

metadata

inventory

Schema have column names and types, compared to bits stored in HBase
Group column families are structured, while Map column families are flexible
Accessible as collections from Kiji Express
Scala code focuses on model and business logic
Scalding underneath takes care of generating MapReduce jobs

12
Model Build and Deployment

Model
Model
building
Model
building
Model
building
Model
building
building

Kiji Express
Kiji Scoring
Kiji PMML
Kiji MR
Deployment

Kiji
Schema
HBase
Hadoop

Offline
Kiji Modeling
R, SAS, Mahout, …

Real time data update
Real time scoring
Real time decisions

13
Key benefits of partnership with WibiData

 Open source, Kiji suite, abstracted with focus in modeling
• Kiji Schema, KijiMR, Kiji Model, Kiji Scoring, Kiji Express, Kiji REST
• Allow quick development cycle

 Package popular open source projects
• Hadoop, HBase, Avro, Cascading, Scalding, Scala

 Better organization
• Create tables, query by field name, flexibility, …, more DB like than HBase

 WibiData professional services team help develop, integrate, maintain,
train in-house team, consult,…
• Competence, knowledge
• Support infrastructure, so that we can focus on the science

 Real time model deployment environment and scalable
• Interactive
• In milliseconds

14
Acknowledgement

 Macy’s teams

 Analytics team: Kerem Tomak, Albert Zhai
 Infrastructure team: Winslow Holmes, Rakesh Sharma, Cherry Peng

 WibiData team
 Professional Services team: Adam, Christophe, Renuka, Lynn

15

More Related Content

PDF
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
Daqing Zhao
 
PPTX
Data Driven Decisions Google Business Group (GBG) Mumbai by @sachinuppal
Sachin Uppal
 
PDF
Big Data LDN 2018: AGILE DATA MASTERING: THE RIGHT APPROACH FOR DATAOPS
Matt Stubbs
 
PDF
SpeedTrack Tech Overview 2015
Michael Zoltowski
 
PPTX
Data Modeling for Security, Privacy and Data Protection
Karen Lopez
 
PDF
Watson Analytics Presentation
fabianau
 
PPTX
Watson Analytic
Shaily Dubey
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
Daqing Zhao
 
Data Driven Decisions Google Business Group (GBG) Mumbai by @sachinuppal
Sachin Uppal
 
Big Data LDN 2018: AGILE DATA MASTERING: THE RIGHT APPROACH FOR DATAOPS
Matt Stubbs
 
SpeedTrack Tech Overview 2015
Michael Zoltowski
 
Data Modeling for Security, Privacy and Data Protection
Karen Lopez
 
Watson Analytics Presentation
fabianau
 
Watson Analytic
Shaily Dubey
 

What's hot (20)

PPTX
Predictive Analytics - Big Data Warehousing Meetup
Caserta
 
PPTX
Watson Analytics for HSE - Copy
Alexei Cherenkov
 
PDF
Evaluating Big Data Predictive Analytics Platforms
Teradata Aster
 
PPT
Future of Data - Big Data
shankar_radhakrishnan
 
PDF
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...
Michelle Zhou
 
PPT
Ecr presentation ss chain - jeffrey - final
ECR Community
 
PDF
Value Delivery through RakutenBig Data Intelligence Ecosystem and Technology
Rakuten Group, Inc.
 
PPTX
Data analytics
BindhuBhargaviTalasi
 
PPTX
Personalized Search at Sandia National Labs
Lucidworks
 
PPTX
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...
NICSA
 
PPTX
Analytics & Data Strategy 101 by Deko Dimeski
Deko Dimeski
 
PDF
Strategic Value from Enterprise Search and Insights - Viren Patel, PwC
Lucidworks
 
PPTX
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Lviv Startup Club
 
PDF
A Dynamic Data Catalog for Autonomy and Self-Service
Denodo
 
PDF
Mastering Customer Data on Apache Spark
Caserta
 
PDF
What Watson Explorer is and How it works
Virginia Fernandez
 
PDF
Guiding through a typical Machine Learning Pipeline
Michael Gerke
 
PDF
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Think Big, a Teradata Company
 
PPTX
Tips for Effective Data Science in the Enterprise
Lisa Cohen
 
PDF
Consumer Data Management
ijtsrd
 
Predictive Analytics - Big Data Warehousing Meetup
Caserta
 
Watson Analytics for HSE - Copy
Alexei Cherenkov
 
Evaluating Big Data Predictive Analytics Platforms
Teradata Aster
 
Future of Data - Big Data
shankar_radhakrishnan
 
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...
Michelle Zhou
 
Ecr presentation ss chain - jeffrey - final
ECR Community
 
Value Delivery through RakutenBig Data Intelligence Ecosystem and Technology
Rakuten Group, Inc.
 
Data analytics
BindhuBhargaviTalasi
 
Personalized Search at Sandia National Labs
Lucidworks
 
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...
NICSA
 
Analytics & Data Strategy 101 by Deko Dimeski
Deko Dimeski
 
Strategic Value from Enterprise Search and Insights - Viren Patel, PwC
Lucidworks
 
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Lviv Startup Club
 
A Dynamic Data Catalog for Autonomy and Self-Service
Denodo
 
Mastering Customer Data on Apache Spark
Caserta
 
What Watson Explorer is and How it works
Virginia Fernandez
 
Guiding through a typical Machine Learning Pipeline
Michael Gerke
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Think Big, a Teradata Company
 
Tips for Effective Data Science in the Enterprise
Lisa Cohen
 
Consumer Data Management
ijtsrd
 
Ad

Viewers also liked (10)

PDF
Lufei Kui: Pioneer of China's Information Revolution, 陆费逵:中国信息革命先驱
Daqing Zhao
 
PDF
Cgc2 cdn gamingsummit-real-time-customer-analytics
brock55
 
PDF
Realtime Analytics With Elasticsearch [New Media Inspiration 2013]
Karel Minarik
 
PPTX
Real-Time Personalization
Richard Veryard
 
KEY
Near-realtime analytics with Kafka and HBase
dave_revell
 
PDF
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
Sri Ambati
 
PDF
Big Data Predictive Analytics for Retail businesses
Gopalakrishna Palem
 
KEY
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Kevin Weil
 
PPTX
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
DataStax
 
PPTX
Customer Journey Analytics and Big Data
McKinsey on Marketing & Sales
 
Lufei Kui: Pioneer of China's Information Revolution, 陆费逵:中国信息革命先驱
Daqing Zhao
 
Cgc2 cdn gamingsummit-real-time-customer-analytics
brock55
 
Realtime Analytics With Elasticsearch [New Media Inspiration 2013]
Karel Minarik
 
Real-Time Personalization
Richard Veryard
 
Near-realtime analytics with Kafka and HBase
dave_revell
 
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
Sri Ambati
 
Big Data Predictive Analytics for Retail businesses
Gopalakrishna Palem
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Kevin Weil
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
DataStax
 
Customer Journey Analytics and Big Data
McKinsey on Marketing & Sales
 
Ad

Similar to Real Time Recommendation System using Kiji (20)

PDF
1000 track3 Zhao
Rising Media, Inc.
 
PPT
Retail Design
jagishar
 
PDF
Turning Big Data to Business Advantage
Teradata Aster
 
PPTX
Understanding customer behaviour and segmentation
Anirudh K.M
 
PPTX
Big data analytics
Amr Kamel Deklel
 
PDF
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
Cloudera, Inc.
 
PDF
Analytics&IoT
Selvaraj Kesavan
 
PPTX
Bdml ecom
BDML_Ecomm
 
PPTX
Data Science in E-commerce
Vincent Michel
 
PDF
Building data pipelines: from simple to more advanced - hands-on experience /...
Sergii Khomenko
 
PPTX
roll no 38 for all topic presentation.pptx
zayankhan368369
 
PPTX
Data-Science-Fundamentals- Session 2.pptx
EzhilmathiManinathan
 
PDF
Site market-analysis
phanquoccuong
 
PPTX
How your favorite retailers make money out of analytics
Sridhar Bollam
 
PDF
Big Data for Retail
Dhiren Gala
 
PDF
[Webinar] High Speed Retail Analytics
Infochimps, a CSC Big Data Business
 
PDF
AI Ukraine'17 (eng) - Oleksii Potapenko
Oleksii Potapenko
 
PPTX
datadynamos presented by Abhijeet shinde.pptx
zayankhan368369
 
PPTX
1. Introduction of big data in mca .pptx
meneg45524
 
PDF
Data Science in Retail-as-a-Service (RaaS)
zhiking
 
1000 track3 Zhao
Rising Media, Inc.
 
Retail Design
jagishar
 
Turning Big Data to Business Advantage
Teradata Aster
 
Understanding customer behaviour and segmentation
Anirudh K.M
 
Big data analytics
Amr Kamel Deklel
 
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
Cloudera, Inc.
 
Analytics&IoT
Selvaraj Kesavan
 
Bdml ecom
BDML_Ecomm
 
Data Science in E-commerce
Vincent Michel
 
Building data pipelines: from simple to more advanced - hands-on experience /...
Sergii Khomenko
 
roll no 38 for all topic presentation.pptx
zayankhan368369
 
Data-Science-Fundamentals- Session 2.pptx
EzhilmathiManinathan
 
Site market-analysis
phanquoccuong
 
How your favorite retailers make money out of analytics
Sridhar Bollam
 
Big Data for Retail
Dhiren Gala
 
[Webinar] High Speed Retail Analytics
Infochimps, a CSC Big Data Business
 
AI Ukraine'17 (eng) - Oleksii Potapenko
Oleksii Potapenko
 
datadynamos presented by Abhijeet shinde.pptx
zayankhan368369
 
1. Introduction of big data in mca .pptx
meneg45524
 
Data Science in Retail-as-a-Service (RaaS)
zhiking
 

Recently uploaded (20)

PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 

Real Time Recommendation System using Kiji

  • 1. Real Time Recommender System with Jan 22, 2014 Daqing Zhao, Director of Advanced Analytics Macy’s.com
  • 2. Agenda  Big data analytics versus traditional BI  Macy’s Advanced Analytics Team  Our analytics projects  Example: site recommendations using Kiji  High level architecture  Kiji Schema table structure  Model deployment using Kiji  Key benefits of Kiji and WibiData team 1
  • 3. Traditional BI process Knowledge Discovery Segmentation and Predictive Modeling Most companies Stay in this area Multidimensional Report Standard Report Schema definition, ETL into RDMS Baseline Consulting  Data can be accessed and analyzed only after ETL  Schema definition may not be optimal 2
  • 4. Hadoop/NoSQL: paradigm shift Decisions Insights Models Decision Agent Segmentation and Predictive Modeling Multi dimensional Report Reports Standard Report Hive, Mahout, Cascading, Scalding, Kiji, … MapReduce Raw data Volume Velocity Variety Write Append Read Distributed storage Computation near data Hadoop, HBase, avro, …  We can access raw data and analyze using MapReduce  With pros and cons 3
  • 5. Macy.com’s Advanced Analytics Group  We are at the frontiers of Big Data science: • Using Big Data technology • Machine learning and Statistical algorithms  We have predictive modeling, experimental design and data science teams  Our team members have very strong background in • Quantitative fields, math, stat, physics, bioinformatics, decision sciences, and cs • We collaborate with systems and IT teams internally as well as 3rd party vendors like WibiData, SAS Research, IBM Research…  We use a wide range of tools • Hadoop, SAS, R, Mahout, and others, as well as Kiji Models  We are data scientists with keen focus on domain problems 4
  • 6. Customer acquisition and retention  Targeting the right message to the right customer at the right time • Build predictive models of purchase behavior and identify drivers  Site recommendation algorithms • Recommend products based on items that are added to bag for cross- and up-sell • We also look at market basket analysis • Most work is in batch mode, expanding slowly into real time  Rapid-prototyping and testing of algorithms and policies • All done in short development cycles  Output of the team’s work support other marketing teams to identify, and reach best customers • Search, display, social network, affiliates, retention, customer services, … 5
  • 7. Some other projects  Data organization or data munging • • • • • Data collections, individual and event level, 360 degrees, … Segmentation of customers Customer value, revenue, costs Multiple channel attribution of marketing contacts Product attributes  Experimentation platform • Success of online marketing depends highly on testing, learning and optimization • Both for site layout as well as contents and recommendations  Forecast and optimization • Prediction, simulation, and search and optimize  Big data refinement and scalability • New data sources, more efficient ways of accessing data, and organizing and processing data 6
  • 8. Example: similar and complementary products 7
  • 10. Example: product social network Demographic Style Size Brand Price range Season 9
  • 11. Example: site product recommendation  Customer Adds to Bag one or more products  We recommend in real time similar/complementary products • Based on product associations and customer profile  We use various machine learning algorithms • • • • • • Association rules Collaborative filtering Predictive modeling Business rules And others, … Models built offline  Real time data, real time model scoring and real time decision  Champion/challenger tests, models evolve quickly in time  Frequent model updates, add new data 10
  • 12. Architecture Real Time Data access, Scoring Decisions Others data mining Kiji Express environment data mining Mahout environment data mining R environment SAS Environment products Kiji Model Kiji Kiji Scoring Scoring Kiji Kiji Rest Rest Kiji Kiji Rest Rest Hadoop HBase 11
  • 13. Kiji Schema table structure Customer table entity id customer email metadata order Product table entity id product category metadata inventory Schema have column names and types, compared to bits stored in HBase Group column families are structured, while Map column families are flexible Accessible as collections from Kiji Express Scala code focuses on model and business logic Scalding underneath takes care of generating MapReduce jobs 12
  • 14. Model Build and Deployment Model Model building Model building Model building Model building building Kiji Express Kiji Scoring Kiji PMML Kiji MR Deployment Kiji Schema HBase Hadoop Offline Kiji Modeling R, SAS, Mahout, … Real time data update Real time scoring Real time decisions 13
  • 15. Key benefits of partnership with WibiData  Open source, Kiji suite, abstracted with focus in modeling • Kiji Schema, KijiMR, Kiji Model, Kiji Scoring, Kiji Express, Kiji REST • Allow quick development cycle  Package popular open source projects • Hadoop, HBase, Avro, Cascading, Scalding, Scala  Better organization • Create tables, query by field name, flexibility, …, more DB like than HBase  WibiData professional services team help develop, integrate, maintain, train in-house team, consult,… • Competence, knowledge • Support infrastructure, so that we can focus on the science  Real time model deployment environment and scalable • Interactive • In milliseconds 14
  • 16. Acknowledgement  Macy’s teams  Analytics team: Kerem Tomak, Albert Zhai  Infrastructure team: Winslow Holmes, Rakesh Sharma, Cherry Peng  WibiData team  Professional Services team: Adam, Christophe, Renuka, Lynn 15