SlideShare a Scribd company logo
Making Sense of Data




        Lily goes shopping –
real-time recommendations with HBase
                         HBaseCon, May 2012




         Steven Noels – VP Product – @stevenn


                             WWW.NGDATA.COM
Lily Core 2’ recap
•  HBase-backed data repository,
   with batteries included
•  Data model:
    •  high-level data model on top of HBase’s
                                                       client app
       byte[]’s
    •  schema
    •  versioning (schema and data)                         Lily
    •  links, variants
                                                           RowLog
•  Java & REST API's
•  Indexing:                                       HBase           Solr et al.

    •  through configuration, not implementation
    •  incremental and batch index maintenance
•  RowLog: distributed, durable queue for sec.
   actions
•  Open Source: www.lilyproject.org (Apache
   License)


                                                            WWW.NGDATA.COM
Why HBase?
•  BigTable model
•  sparseness
•  atomic row updates aka concistency
•  auto-partitioning
•  Apache license
•  A great community led by a Saint J




                                         WWW.NGDATA.COM
Portfolio Overview

                                               Real-time AI
                                               Recommendations
                                               Industry algorithms and rules


                                             commercial availability	
  
                 Trend Analytics
               Pattern Detection



          Profile Development
  Context and Activity Tracking              open source	
  
       Social Stream Ingestion


                                   Schema and Data Management
                                   Total Data Aggregation
                                   Real-time Index and Retrieval
                                   Security and Enterprise Connectors




                                                              WWW.NGDATA.COM
Lily (=HBase) In Use
Some of the larger Lily deployments

•  media
    •  aggregation, database publishing and online archives
•  finance
     •  real-time identity fraud detection
•  retail banking
     •  contextualized (time+loc+person) mobile coupons
•  retail
    •  e-commerce platform:
       product catalog, consumer data store, real-time
       indexing




                                                              WWW.NGDATA.COM
Collaborative Filtering?

  Recommend items similar to a user’s highly-preferred items




                                                          WWW.NGDATA.COM
Collaborative Filtering is … Matrixes


   Sean likes “Scarface” a lot             (123,654,5.0)!
   Robin likes “Scarface” somewhat         (789,654,3.0)!
   Grant likes “The Notebook” not at all   (345,876,1.0)!
   …                                       …!

                                              (Magic)




   Grant may like “Scarface” quite a bit   (345,654,4.5)!
   …                                       …!



                                                    WWW.NGDATA.COM
Contextualized recommendations


                                  Personalized
                                     offers




                                                        shops & merchants
             Profile   Acitvity                  Item   product families
                                                        offers/coupons




creditcard
statements

                                                             WWW.NGDATA.COM
Fitting Recommendations into the Lily
Architecture

            LILY CRUD API

                                                       Lily/HBase Secondary Indexes


       read/write demultiplexer

                                                                                        co-occurence
                                                                                        lookup matrix


               rowlog                       activity store
                                                                               Steven Noels
                                                                           stevenn@ngdata.com
                                                                             www.ngdata.com
                                                                        telephone: +32 9 33 engine
                                                                               LILY recommender 88 220
                data        profile   data, activity, profile scoring
  indexes
                store       store                                             Gent (Belgium)




                                                                                                     propensity


                                                                                                                   custom ...
                                                                                           k-means
                                                                                  ALS
                                                                                                                                Makers of


    Lily Core Repository
                                                                                        algorithm support



                                                                                                                  WWW.NGDATA.COM
Preferencing aka Feeding the Matrix
•  Transaction-based preferencing
     •  Pluggable preference strategies, using Lily-based data
        (HBase&Solr) for decision making
        •  e.g. credit card statement = transactions between users and product
           families
    •  Preference weighting
    •  Ingest: REST API, bulk support
    •  Real-time updating of the recommendation model



•  Profile Store
     •  Profile activities can be preferenced
    •  Support for Profile behavior analysis



                                                                   WWW.NGDATA.COM
Making recommendations
•  Recommender
    •  Pluggable recommender strategies, using Lily-based data
       (HBase&Solr) for decision making
    •  Multi-model support: user-item & item-user recommendations
    •  Estimation of both preferenced and non-preferenced items
    •  Geolocation-based recommendations
    •  Re-scoring
    •  REST API



•  (Planned)
     •  Support for Classifications
        (scenario - Recommend me all (possible) coffee drinkers)
     •  Matrix / recommendation indexing


                                                              WWW.NGDATA.COM
Other upcoming Lily Features
•  Secondary indexes (= Lily Core!)
    •  indexes are defined through configuration
    •  single or multi-field indexes
    •  range queries and prefix queries
    •  asc or desc sorted results
    •  can read huge, sorted lists
    •  synchronously updated: index updates are applied by rowlog
       secondary actions
    •  online building of new indexes (no table locks)
    •  MapReduce integration


•  SolrCloud integration
    •  Index shards and configuration managed through ZooKeeper



                                                          WWW.NGDATA.COM
Making Sense of Data




Questions? Thank you!




               WWW.NGDATA.COM

More Related Content

What's hot (20)

PDF
Building a Hadoop Data Warehouse with Impala
huguk
 
KEY
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Bradford Stephens
 
PPTX
Design Patterns for Building 360-degree Views with HBase and Kiji
HBaseCon
 
PDF
Engineering practices in big data storage and processing
Schubert Zhang
 
PDF
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
 
PDF
What database
Regunath B
 
PPTX
Apache Drill
Ted Dunning
 
PDF
New Security Features in Apache HBase 0.98: An Operator's Guide
HBaseCon
 
PDF
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera, Inc.
 
PDF
Cloudera Impala
Scott Leberknight
 
PDF
HBase and Impala Notes - Munich HUG - 20131017
larsgeorge
 
PPTX
Architecting Applications with Hadoop
markgrover
 
PDF
Impala: Real-time Queries in Hadoop
Cloudera, Inc.
 
PPTX
In Search of Database Nirvana: Challenges of Delivering HTAP
HBaseCon
 
PDF
An introduction to apache drill presentation
MapR Technologies
 
PDF
Application architectures with hadoop – big data techcon 2014
Jonathan Seidman
 
PPTX
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon
 
PDF
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA
 
PDF
SQL Engines for Hadoop - The case for Impala
markgrover
 
PDF
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Adam Muise
 
Building a Hadoop Data Warehouse with Impala
huguk
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Bradford Stephens
 
Design Patterns for Building 360-degree Views with HBase and Kiji
HBaseCon
 
Engineering practices in big data storage and processing
Schubert Zhang
 
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
 
What database
Regunath B
 
Apache Drill
Ted Dunning
 
New Security Features in Apache HBase 0.98: An Operator's Guide
HBaseCon
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera, Inc.
 
Cloudera Impala
Scott Leberknight
 
HBase and Impala Notes - Munich HUG - 20131017
larsgeorge
 
Architecting Applications with Hadoop
markgrover
 
Impala: Real-time Queries in Hadoop
Cloudera, Inc.
 
In Search of Database Nirvana: Challenges of Delivering HTAP
HBaseCon
 
An introduction to apache drill presentation
MapR Technologies
 
Application architectures with hadoop – big data techcon 2014
Jonathan Seidman
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA
 
SQL Engines for Hadoop - The case for Impala
markgrover
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Adam Muise
 

Viewers also liked (20)

PPTX
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
Cloudera, Inc.
 
PDF
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
Cloudera, Inc.
 
PPTX
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
Cloudera, Inc.
 
PDF
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
Cloudera, Inc.
 
PDF
HBaseCon 2013: Scalable Network Designs for Apache HBase
Cloudera, Inc.
 
PPTX
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
Cloudera, Inc.
 
PPTX
HBaseCon 2013: Full-Text Indexing for Apache HBase
Cloudera, Inc.
 
PPTX
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Cloudera, Inc.
 
PPTX
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
Cloudera, Inc.
 
PPTX
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
Cloudera, Inc.
 
PPTX
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
Cloudera, Inc.
 
PPTX
HBaseCon 2013: Near Real Time Indexing for eBay Search
Cloudera, Inc.
 
PDF
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
Cloudera, Inc.
 
PDF
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Cloudera, Inc.
 
PPT
HBase for Dealing with Large Matrices
gcapan
 
ZIP
Google
guest08e2d3
 
PDF
20130404 emacs conf 2013 sketchnotes
Sacha Chua
 
PDF
Quantified Awesome: Tracking Clothes, Groceries, and Other Small Things
Sacha Chua
 
PDF
Auto Focus
Charles Cave
 
PDF
Emacs Modes I can't work without
Hitesh Sharma
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
Cloudera, Inc.
 
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
Cloudera, Inc.
 
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
Cloudera, Inc.
 
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
Cloudera, Inc.
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
Cloudera, Inc.
 
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
Cloudera, Inc.
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
Cloudera, Inc.
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Cloudera, Inc.
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
Cloudera, Inc.
 
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
Cloudera, Inc.
 
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
Cloudera, Inc.
 
HBaseCon 2013: Near Real Time Indexing for eBay Search
Cloudera, Inc.
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
Cloudera, Inc.
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Cloudera, Inc.
 
HBase for Dealing with Large Matrices
gcapan
 
Google
guest08e2d3
 
20130404 emacs conf 2013 sketchnotes
Sacha Chua
 
Quantified Awesome: Tracking Clothes, Groceries, and Other Small Things
Sacha Chua
 
Auto Focus
Charles Cave
 
Emacs Modes I can't work without
Hitesh Sharma
 
Ad

Similar to HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata (20)

PDF
Streaming Hadoop for Enterprise Adoption
DATAVERSITY
 
PDF
Common MongoDB Use Cases
DATAVERSITY
 
PPT
Slash n: Tech Talk Track 1 – Art and Science of Cataloguing - Utkarsh
slashn
 
PPTX
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Caserta
 
PPTX
Bigdata antipatterns
Anurag S
 
PPTX
Common MongoDB Use Cases Webinar
MongoDB
 
PDF
Next Generation Data Platforms - Deon Thomas
Thoughtworks
 
PDF
Combining Hadoop RDBMS for Large-Scale Big Data Analytics
DataWorks Summit
 
PPTX
Millions quotes per second in pure java
Roman Elizarov
 
PPTX
The Microsoft BigData Story
Lynn Langit
 
PPTX
7 Databases in 70 minutes
Karen Lopez
 
PPSX
2011 - TDWI Big Data Forum - The New Analytics
Casey Kiernan
 
PPT
Big Data Paris : Hadoop and NoSQL
Tugdual Grall
 
KEY
Processing Big Data
cwensel
 
PPTX
No Sql Movement
Ajit Koti
 
PPTX
How we use Hive at SnowPlow, and how the role of HIve is changing
yalisassoon
 
PPTX
Big Data with Not Only SQL
Philippe Julio
 
PDF
NoSQL-Overview
Ranjeet Jha - OCM-JEA
 
PDF
Reporting _ Paul Vella _ OBI Analytics for JDE.pdf
InSync2011
 
PPTX
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
Streaming Hadoop for Enterprise Adoption
DATAVERSITY
 
Common MongoDB Use Cases
DATAVERSITY
 
Slash n: Tech Talk Track 1 – Art and Science of Cataloguing - Utkarsh
slashn
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Caserta
 
Bigdata antipatterns
Anurag S
 
Common MongoDB Use Cases Webinar
MongoDB
 
Next Generation Data Platforms - Deon Thomas
Thoughtworks
 
Combining Hadoop RDBMS for Large-Scale Big Data Analytics
DataWorks Summit
 
Millions quotes per second in pure java
Roman Elizarov
 
The Microsoft BigData Story
Lynn Langit
 
7 Databases in 70 minutes
Karen Lopez
 
2011 - TDWI Big Data Forum - The New Analytics
Casey Kiernan
 
Big Data Paris : Hadoop and NoSQL
Tugdual Grall
 
Processing Big Data
cwensel
 
No Sql Movement
Ajit Koti
 
How we use Hive at SnowPlow, and how the role of HIve is changing
yalisassoon
 
Big Data with Not Only SQL
Philippe Julio
 
NoSQL-Overview
Ranjeet Jha - OCM-JEA
 
Reporting _ Paul Vella _ OBI Analytics for JDE.pdf
InSync2011
 
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
Ad

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
PPTX
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
PPTX
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
PPTX
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
PPTX
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
PPTX
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
PPTX
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
PPTX
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
PPTX
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 

Recently uploaded (20)

PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
July Patch Tuesday
Ivanti
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 

HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

  • 1. Making Sense of Data Lily goes shopping – real-time recommendations with HBase HBaseCon, May 2012 Steven Noels – VP Product – @stevenn WWW.NGDATA.COM
  • 2. Lily Core 2’ recap •  HBase-backed data repository, with batteries included •  Data model: •  high-level data model on top of HBase’s client app byte[]’s •  schema •  versioning (schema and data) Lily •  links, variants RowLog •  Java & REST API's •  Indexing: HBase Solr et al. •  through configuration, not implementation •  incremental and batch index maintenance •  RowLog: distributed, durable queue for sec. actions •  Open Source: www.lilyproject.org (Apache License) WWW.NGDATA.COM
  • 3. Why HBase? •  BigTable model •  sparseness •  atomic row updates aka concistency •  auto-partitioning •  Apache license •  A great community led by a Saint J WWW.NGDATA.COM
  • 4. Portfolio Overview Real-time AI Recommendations Industry algorithms and rules commercial availability   Trend Analytics Pattern Detection Profile Development Context and Activity Tracking open source   Social Stream Ingestion Schema and Data Management Total Data Aggregation Real-time Index and Retrieval Security and Enterprise Connectors WWW.NGDATA.COM
  • 5. Lily (=HBase) In Use Some of the larger Lily deployments •  media •  aggregation, database publishing and online archives •  finance •  real-time identity fraud detection •  retail banking •  contextualized (time+loc+person) mobile coupons •  retail •  e-commerce platform: product catalog, consumer data store, real-time indexing WWW.NGDATA.COM
  • 6. Collaborative Filtering? Recommend items similar to a user’s highly-preferred items WWW.NGDATA.COM
  • 7. Collaborative Filtering is … Matrixes Sean likes “Scarface” a lot (123,654,5.0)! Robin likes “Scarface” somewhat (789,654,3.0)! Grant likes “The Notebook” not at all (345,876,1.0)! … …! (Magic) Grant may like “Scarface” quite a bit (345,654,4.5)! … …! WWW.NGDATA.COM
  • 8. Contextualized recommendations Personalized offers shops & merchants Profile Acitvity Item product families offers/coupons creditcard statements WWW.NGDATA.COM
  • 9. Fitting Recommendations into the Lily Architecture LILY CRUD API Lily/HBase Secondary Indexes read/write demultiplexer co-occurence lookup matrix rowlog activity store Steven Noels [email protected] www.ngdata.com telephone: +32 9 33 engine LILY recommender 88 220 data profile data, activity, profile scoring indexes store store Gent (Belgium) propensity custom ... k-means ALS Makers of Lily Core Repository algorithm support WWW.NGDATA.COM
  • 10. Preferencing aka Feeding the Matrix •  Transaction-based preferencing •  Pluggable preference strategies, using Lily-based data (HBase&Solr) for decision making •  e.g. credit card statement = transactions between users and product families •  Preference weighting •  Ingest: REST API, bulk support •  Real-time updating of the recommendation model •  Profile Store •  Profile activities can be preferenced •  Support for Profile behavior analysis WWW.NGDATA.COM
  • 11. Making recommendations •  Recommender •  Pluggable recommender strategies, using Lily-based data (HBase&Solr) for decision making •  Multi-model support: user-item & item-user recommendations •  Estimation of both preferenced and non-preferenced items •  Geolocation-based recommendations •  Re-scoring •  REST API •  (Planned) •  Support for Classifications (scenario - Recommend me all (possible) coffee drinkers) •  Matrix / recommendation indexing WWW.NGDATA.COM
  • 12. Other upcoming Lily Features •  Secondary indexes (= Lily Core!) •  indexes are defined through configuration •  single or multi-field indexes •  range queries and prefix queries •  asc or desc sorted results •  can read huge, sorted lists •  synchronously updated: index updates are applied by rowlog secondary actions •  online building of new indexes (no table locks) •  MapReduce integration •  SolrCloud integration •  Index shards and configuration managed through ZooKeeper WWW.NGDATA.COM
  • 13. Making Sense of Data Questions? Thank you! WWW.NGDATA.COM