Event Processing and Data Analytics
with Lucidworks Fusion
Kiran Chitturi,
Software Engineer
Lucidworks Is Search
Connector Framework
Index Pipelines (ETL)
( )Scale
Fault Tolerance
Real-Time
Fusion APIs
Recommendations Personalization Contextual Search
Relevancy Tool
Machine Learning / Signal Processing
Analytics
Security
Ecommerce
Site
Customer
Analytics
Product
Catalog
User
History
Conversion
Data
Lucidworks Fusion
5
• How to capture user events ?
• How to use events for recommendations ?
• How to produce reports from user events ?
• What type of recommendations can be generated for different user
types?
Problem Statement
6
• Library to collect user events from client-side tier of websites and apps
• Sends events using tracking pixel
• Signals API acts as a collector for Snowplow events
• Tracks page views, page pings, clicks, links and any custom configured events
• https://github.com/snowplow/snowplow/wiki/javascript-tracker
Event collection - Snowplow JS tracker
test
Primary
collection
Raw
signals
collection
Aggregated
signals
collection
test_signals
test_signals
_aggr
Signals
Service
JSON
payloads
Snowplow
payloads
Solr
Signals - data flow
9
• Examples:
• page-view, query, search-click, add-to-cart, rating
• Signals Schema:
• required fields: type
• additional properties can be specified in ‘params’ map
• Special treatment for fields ‘docId’, ‘userId’, ‘query’, ‘filterQueries’, ‘collection’,
‘weight’, ‘count’
• Processing logic in ‘_signals_ingest’ pipeline
Event collection - JSON payloads
10
Example: page-view signal
{
"timestamp": "2015-09-14T10:12:13.456Z",
"type": "pv",
"params": {
"url": "http://www.ecommerce.com/abws-mcl008-080201"
}
}
{
"type_s": "pv",
"flag_s": "event",
"params.url_s": "http://www.ecommerce.com/abws-mcl008-080201",
"id": "62a26152-7971-406e-bf06-3df44974c220",
"timestamp_tdt": "2015-09-14T10:12:13.45Z",
"count_i": 1,
"_version_": 1515057367743463400
}
Input signal Indexed signal document
11
Example: page-view signal
{
"timestamp": "2015-09-14T10:12:13.456Z",
"type": "pv",
"params": {
"page": "Dark Gray Wool Suit",
"url": "http://www.ecommerce.com/abws-mcl008-080201",
"userId": "12891291",
"useragent_type_name_s": "Browser",
"ipAddr": "64.134.151.1"
"tz": "America/NewYork"
}
}
{
"type_s": "pv",
"params.tz_s": "America/NewYork",
"user_id_s": "12891291",
"params.page_s": "Dark Gray Wool Suit",
"tz_timestamp_txt": [
"Mon 2015-09-14 10:12:13.456 UTC"
],
"flag_s": "event",
"params.ipAddr_s": "64.134.151.1",
"params.url_s": "http://www.ecommerce.com/abws-mcl008-080201",
"id": "4b993f85-67d3-4523-b2b3-cf4e3ff2f202",
"timestamp_tdt": "2015-09-14T10:12:13.45Z",
"count_i": 1,
"_version_": 1515057643959353300
}
Input signal Indexed signal document
12
Example: click signal
{
"type": "click",
"params": {
"query": "Madden 12",
"docId": "2375201",
"userId": "abc121",
"position" : "4",
"filterQueries": [
"cat00000",
"abcat0700000",
"abcat0703000",
"abcat0703002",
"abcat0703008"
]
}
}
{
"filters_orig_ss":[
"abcat0700000",
"abcat0703000",
"abcat0703002",
"abcat0703008",
"cat00000"
],
"user_id_s":"abc121",
"query_s":"madden 12",
"type_s":"click",
"params.position_s" : "4",
"query_t": "madden 12",
"doc_id_s":"2375201",
"tz_timestamp_txt":["Tue 2015-10-13 18:33:04.012 UTC"],
"filters_s":"abcat0700000 $ abcat0703000 $ abcat0703002
$ abcat0703008 $ cat00000",
"flag_s":"event",
"query_orig_s":"Madden 12",
"id":"69c609f6-a2c1-4f89-990e-88a63e68063d",
"timestamp_tdt":"2015-10-13T18:33:04.01Z",
"count_i":1,
"_version_":1514941903557099520
}
Input signal Indexed signal document
13
• Batch processing using Apache Spark
• spark-solr library (https://github.com/LucidWorks/spark-solr)
• Types
• Simple
• Click
• EventMiner
Aggregations
14
Aggregations - data flow
Aggregation job
Aggregator
Spark
Agent
test
Primary
collection
Raw signals
collection
Worker Worker Cluster Mgr.
Spark
Aggregated signals
collection
Spark
Driver
Stores
aggregated results
Fetches raw signals
for processing
test_signals
test_signals_
aggr
15
• Simple aggregations
• Top queries
• Top clicked documents
• Most popular categories
• …
• Complex aggregations
• Click stream aggregations with decaying weights
• Generate a Co-occurence matrix for (user, docId, query) tuple
Aggregation examples
16
Example: simple aggregation
{
"type": "rating",
"params": {
"rating": “5.0”,
"source": “web”
}
},
{
"type": "rating",
"params": {
"rating": “1.0”,
"source": “web”
}
},
{
"type": "rating",
"params": {
"rating": “2.0”,
"source": “web”,
}
},
{
"type": "rating",
"params": {
"rating": “2.0”,
"source": “web”,
}
},
{
"type": "rating",
"params": {
"rating": “1.0”,
"source": “web”
}
}
API
test
Primary
collection
Raw signals
collection
Aggregated
signals
collection
test_signals
test_signals
_aggr
Solr
Signals
Service
17
Example: simple aggregation (continued)
17
test
Primary
collection
Raw signals
collection
Aggregated
signals
collection
test_signals
test_signals
_aggr
Solr
Submitted
manually or
via scheduler
Aggregation
Service
Spark
Fetches raw signals
for processing
Stores
aggregated results
{
"id" : "test_simple_aggr",
"signalTypes" : [ "rating" ],
"selectQuery" : "*:*",
"aggregator" : "simple",
"groupingFields" : "params.source_s",
"aggregates" : [ {
"type" : "stddev",
"sourceFields" : [ "params.rating_s" ],
"targetField" : "stddev_rating_d"
},
{
"type": "topk",
"sourceFields": ["params.rating_s"],
"targetField": "topk_rating_ss"
},
{
"type": "mean",
"sourceFields": ["params.rating_s"],
"targetField": "mean_position_d"
}
]
}
Aggregation
definition
job
submission
18
• Aggregated document:
Example: simple aggregation (continued)
{
"aggr_job_id_s": "b91ffdebc44d4e128a8431c2f8a3deb7",
"aggr_type_s": "simple@doc_id_s-query_s-filters_s",
"flag_s": "aggr",
"type_s": "rating",
"id": "24494dba-93a6-4fc5-bb4d-5b546c3c0c5e",
"aggr_id_s": "test_simple_aggr",
"timestamp_tdt": "2015-10-15T02:26:17.337Z",
"count_i": 5,
“grouping_key_s": "web",
"stddev_rating_d": 1.6431676725154982,
"mean_position_d": 2.2,
"values.topk_rating_ss": ["2.0", "1.0", "5.0"],
"counts.topk_rating_ss": ["2", "2", "1"],
"errors.topk_rating_ss": ["0", "0", "0"]
}
19
Example: Click aggregation
[
{
"timestamp": "2014-09-01T23:44:52.533Z",
"params": {
"query": "Sharp",
"docId": "2009324"
},
"type": "click"
},
{
"timestamp": "2014-09-05T12:25:37.420Z",
"params": {
"query": "Sharp",
"docId": "2009324"
},
"type": "click"
},
{
"timestamp": "2014-08-24T12:56:58.910Z",
"params": {
"query": "Sharp TV",
"docId": "1517163"
},
"type": "click"
},
{
"timestamp": "2015-10-25T07:18:14.722Z",
"params": {
"query": "rca",
"docId": "2877125"
},
"type": "click"
}
]
Signals indexed
and aggregated
{
"doc_id_s": "1517163",
"query_s": "sharp tv",
"weight_d": 0.000006602878329431405,
"count_i": 1
},
{
"doc_id_s": "2009324",
"query_s": "sharp",
"weight_d": 0.000016734602468204685,
"count_i": 2
},
{
“doc_id_s”: "2877125",
"query_s": "rca",
"weight_d": 0.06324164569377899,
"count_i": 1
}
aggregated
docsraw docs
20
• How to mix signals with search results ?
• Recommendation API
• Generic query pipeline configuration using 3 stage approach
• Sub-query
• Rollup-results
• Advanced-boost
Driving search relevancy
21
Boosting search results using aggregated documents
User
App
Search
query
Query-pipeline
stages
Set Params Query Solr
Raw signals
collection
Aggregated
signals
collection
test_signals
test_signals
_aggr
Recommendation
Stages
test
Primary
collection
1. Query aggregated documents
2. Process results
3. Add parameters to the request
Search
response
22
Before
After
25
Demo
26
Using Signals
=
Modifying Your Behavior in Response to your Environment
Events & Signals
Webinar: Event Processing & Data Analytics with Lucidworks Fusion

Webinar: Event Processing & Data Analytics with Lucidworks Fusion

  • 2.
    Event Processing andData Analytics with Lucidworks Fusion Kiran Chitturi, Software Engineer
  • 3.
  • 4.
    Connector Framework Index Pipelines(ETL) ( )Scale Fault Tolerance Real-Time Fusion APIs Recommendations Personalization Contextual Search Relevancy Tool Machine Learning / Signal Processing Analytics Security Ecommerce Site Customer Analytics Product Catalog User History Conversion Data Lucidworks Fusion
  • 5.
    5 • How tocapture user events ? • How to use events for recommendations ? • How to produce reports from user events ? • What type of recommendations can be generated for different user types? Problem Statement
  • 6.
    6 • Library tocollect user events from client-side tier of websites and apps • Sends events using tracking pixel • Signals API acts as a collector for Snowplow events • Tracks page views, page pings, clicks, links and any custom configured events • https://github.com/snowplow/snowplow/wiki/javascript-tracker Event collection - Snowplow JS tracker
  • 8.
  • 9.
    9 • Examples: • page-view,query, search-click, add-to-cart, rating • Signals Schema: • required fields: type • additional properties can be specified in ‘params’ map • Special treatment for fields ‘docId’, ‘userId’, ‘query’, ‘filterQueries’, ‘collection’, ‘weight’, ‘count’ • Processing logic in ‘_signals_ingest’ pipeline Event collection - JSON payloads
  • 10.
    10 Example: page-view signal { "timestamp":"2015-09-14T10:12:13.456Z", "type": "pv", "params": { "url": "http://www.ecommerce.com/abws-mcl008-080201" } } { "type_s": "pv", "flag_s": "event", "params.url_s": "http://www.ecommerce.com/abws-mcl008-080201", "id": "62a26152-7971-406e-bf06-3df44974c220", "timestamp_tdt": "2015-09-14T10:12:13.45Z", "count_i": 1, "_version_": 1515057367743463400 } Input signal Indexed signal document
  • 11.
    11 Example: page-view signal { "timestamp":"2015-09-14T10:12:13.456Z", "type": "pv", "params": { "page": "Dark Gray Wool Suit", "url": "http://www.ecommerce.com/abws-mcl008-080201", "userId": "12891291", "useragent_type_name_s": "Browser", "ipAddr": "64.134.151.1" "tz": "America/NewYork" } } { "type_s": "pv", "params.tz_s": "America/NewYork", "user_id_s": "12891291", "params.page_s": "Dark Gray Wool Suit", "tz_timestamp_txt": [ "Mon 2015-09-14 10:12:13.456 UTC" ], "flag_s": "event", "params.ipAddr_s": "64.134.151.1", "params.url_s": "http://www.ecommerce.com/abws-mcl008-080201", "id": "4b993f85-67d3-4523-b2b3-cf4e3ff2f202", "timestamp_tdt": "2015-09-14T10:12:13.45Z", "count_i": 1, "_version_": 1515057643959353300 } Input signal Indexed signal document
  • 12.
    12 Example: click signal { "type":"click", "params": { "query": "Madden 12", "docId": "2375201", "userId": "abc121", "position" : "4", "filterQueries": [ "cat00000", "abcat0700000", "abcat0703000", "abcat0703002", "abcat0703008" ] } } { "filters_orig_ss":[ "abcat0700000", "abcat0703000", "abcat0703002", "abcat0703008", "cat00000" ], "user_id_s":"abc121", "query_s":"madden 12", "type_s":"click", "params.position_s" : "4", "query_t": "madden 12", "doc_id_s":"2375201", "tz_timestamp_txt":["Tue 2015-10-13 18:33:04.012 UTC"], "filters_s":"abcat0700000 $ abcat0703000 $ abcat0703002 $ abcat0703008 $ cat00000", "flag_s":"event", "query_orig_s":"Madden 12", "id":"69c609f6-a2c1-4f89-990e-88a63e68063d", "timestamp_tdt":"2015-10-13T18:33:04.01Z", "count_i":1, "_version_":1514941903557099520 } Input signal Indexed signal document
  • 13.
    13 • Batch processingusing Apache Spark • spark-solr library (https://github.com/LucidWorks/spark-solr) • Types • Simple • Click • EventMiner Aggregations
  • 14.
    14 Aggregations - dataflow Aggregation job Aggregator Spark Agent test Primary collection Raw signals collection Worker Worker Cluster Mgr. Spark Aggregated signals collection Spark Driver Stores aggregated results Fetches raw signals for processing test_signals test_signals_ aggr
  • 15.
    15 • Simple aggregations •Top queries • Top clicked documents • Most popular categories • … • Complex aggregations • Click stream aggregations with decaying weights • Generate a Co-occurence matrix for (user, docId, query) tuple Aggregation examples
  • 16.
    16 Example: simple aggregation { "type":"rating", "params": { "rating": “5.0”, "source": “web” } }, { "type": "rating", "params": { "rating": “1.0”, "source": “web” } }, { "type": "rating", "params": { "rating": “2.0”, "source": “web”, } }, { "type": "rating", "params": { "rating": “2.0”, "source": “web”, } }, { "type": "rating", "params": { "rating": “1.0”, "source": “web” } } API test Primary collection Raw signals collection Aggregated signals collection test_signals test_signals _aggr Solr Signals Service
  • 17.
    17 Example: simple aggregation(continued) 17 test Primary collection Raw signals collection Aggregated signals collection test_signals test_signals _aggr Solr Submitted manually or via scheduler Aggregation Service Spark Fetches raw signals for processing Stores aggregated results { "id" : "test_simple_aggr", "signalTypes" : [ "rating" ], "selectQuery" : "*:*", "aggregator" : "simple", "groupingFields" : "params.source_s", "aggregates" : [ { "type" : "stddev", "sourceFields" : [ "params.rating_s" ], "targetField" : "stddev_rating_d" }, { "type": "topk", "sourceFields": ["params.rating_s"], "targetField": "topk_rating_ss" }, { "type": "mean", "sourceFields": ["params.rating_s"], "targetField": "mean_position_d" } ] } Aggregation definition job submission
  • 18.
    18 • Aggregated document: Example:simple aggregation (continued) { "aggr_job_id_s": "b91ffdebc44d4e128a8431c2f8a3deb7", "aggr_type_s": "simple@doc_id_s-query_s-filters_s", "flag_s": "aggr", "type_s": "rating", "id": "24494dba-93a6-4fc5-bb4d-5b546c3c0c5e", "aggr_id_s": "test_simple_aggr", "timestamp_tdt": "2015-10-15T02:26:17.337Z", "count_i": 5, “grouping_key_s": "web", "stddev_rating_d": 1.6431676725154982, "mean_position_d": 2.2, "values.topk_rating_ss": ["2.0", "1.0", "5.0"], "counts.topk_rating_ss": ["2", "2", "1"], "errors.topk_rating_ss": ["0", "0", "0"] }
  • 19.
    19 Example: Click aggregation [ { "timestamp":"2014-09-01T23:44:52.533Z", "params": { "query": "Sharp", "docId": "2009324" }, "type": "click" }, { "timestamp": "2014-09-05T12:25:37.420Z", "params": { "query": "Sharp", "docId": "2009324" }, "type": "click" }, { "timestamp": "2014-08-24T12:56:58.910Z", "params": { "query": "Sharp TV", "docId": "1517163" }, "type": "click" }, { "timestamp": "2015-10-25T07:18:14.722Z", "params": { "query": "rca", "docId": "2877125" }, "type": "click" } ] Signals indexed and aggregated { "doc_id_s": "1517163", "query_s": "sharp tv", "weight_d": 0.000006602878329431405, "count_i": 1 }, { "doc_id_s": "2009324", "query_s": "sharp", "weight_d": 0.000016734602468204685, "count_i": 2 }, { “doc_id_s”: "2877125", "query_s": "rca", "weight_d": 0.06324164569377899, "count_i": 1 } aggregated docsraw docs
  • 20.
    20 • How tomix signals with search results ? • Recommendation API • Generic query pipeline configuration using 3 stage approach • Sub-query • Rollup-results • Advanced-boost Driving search relevancy
  • 21.
    21 Boosting search resultsusing aggregated documents User App Search query Query-pipeline stages Set Params Query Solr Raw signals collection Aggregated signals collection test_signals test_signals _aggr Recommendation Stages test Primary collection 1. Query aggregated documents 2. Process results 3. Add parameters to the request Search response
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
    26 Using Signals = Modifying YourBehavior in Response to your Environment Events & Signals