Ryan Kohl
POLITICO
 Business Overview (5 slides)
 Business Case (3 slides)
 Evaluation (5 slides)
 Prototype (6 slides)
 Lessons Learned (3 slides)
 Production System (7 slides)
Core Site Subscription Site
Oregon judge says he’ll block Trump’s
abortion rule
Pelosi, Schumer to meet with Trump on
infrastructure next week
Trump met with Twitter CEO amid bias
complaints
Bob Corker: Primary challenger for Trump
would be ‘good thing for our country’
FERC denies groups’ legal fees in
pipeline challenge
House Democrats say Wheeler left
biofuels client off disclosure
Court sides with EPA in ozone region
expansion fight
Virginia uranium case may set nuclear
precedent
Core Site
Subscriber Site
Agriculture
Budget & Appropriations
Campaigns
Cybersecurity
Defense
Education
eHealth
Employment & Immigration
Energy
Financial Services
Health Care
Tax
Technology
Transportation
California
Florida
New Jersey
New York
Canada
Web Reads ~25% Email Reads ~75%
Most users want to customize
the emails they receive
They do this by selecting
• Topics
• People
• Keywords
of interest
Sometimes news happens
that
• Is not the kind of thing
you usually care about
• But you find very
interesting
We want to recommend stories
• Because a user may have missed
something of interest / importance
• Because a user may not have been
aware of an interesting kind of news
that we write about
Defense
Agriculture
New York &
New Jersey
Education
Health Care
Content read by a
user
In a case like this, we
want to
• Recommend Health
Care stories
• Occasionally suggest
Defense and Education
news
• Stay away from New
Jersey
cluster analysis of ~2000
stories from 2018 by topic
 We evaluate our system to
 Figure out if the current version of the
system doing better than the previous
version
 Identify users for which the system is doing
particularly bad
Version 2
1. Senate Commerce taps Ireland data chief
for privacy hearing
2. U.S. Navy drafting new guidelines for
reporting UFOs
3. 5G fight among Trump advisers likely to
continue
Version 1
1. 5G fight among Trump advisers likely to
continue
2. Lockheed Martin net sales jump to $14.3B
3. U.S. tech companies see hope that talks
could pry open China’s market
How do we determine if this
is interesting?
Our situation
 No direct feedback
 historically, our users have not interacted with
rating systems on our site
 Dynamic interests
 Reads are driven by big events in the news cycle in
addition to a user’s historical behavior
 Recommendations strongly tied to time
 A news organization publishes new content
throughout the day, so we can’t compare a week’s
worth of consumption with the recommendations
made on Monday.
1 2 3 4 5
(insert popular Presidential tweet)
Short-term prediction of news reads
• Sum of
• news over the past 7 days
• that you read
• was in your top 10 recommended news at the
time of reading
• discounted by how far down that top 10 list it
appears (10 – rank + 1)
• Normalized by total possible score
• 100 * (score / (10 * # read))
Stories read this week Rec.
Rank
Score
5G fight among Trump
advisers likely to continue
2 9
U.S. tech companies see hope
that talks could pry open
China’s market
7 4
Senate Commerce taps
Ireland data chief for privacy
hearing
- 0
U.S. Navy drafting new
guidelines for reporting UFOs
- 0
5G fight among Trump
advisers likely to continue
3 8
Evaluation Score 42
A very low score means our user could be
missing news they’ve demonstrated an
interest in
Stories read this week Rec.
Rank
Score
Northrop Grumman's sales up
22 percent
- 0
General Dynamics reports 23
percent jump in revenue
- 0
Lockheed Martin net sales
jump to $14.3
3 8
5G fight among Trump
advisers likely to continue
- 0
Evaluation Score 20
Recommendations
Inhofe ‘no longer concerned’ about border
deployments harming readiness
Supreme Court divided on citizenship question
for census
Budget reform gets a reboot as talks on a broader
deal begin
A very high score indicates our user could be
missing news they didn’t know they were
interested in
Stories read this week Rec.
Rank
Score
Northrop Grumman's sales up
22 percent
1 10
General Dynamics reports 23
percent jump in revenue
2 9
Lockheed Martin net sales
jump to $14.3
1 10
5G fight among Trump
advisers likely to continue
3 8
Evaluation Score 92.5
We started with two streams of
information
• Published News Documents
• Content Reads (web clicks, email opens)
CMS
Annotation
Pipeline
User
Activit
y
Transform
Pipeline
Redshift
Elasticsearc
h
??? Recommendations
Content Filtering
 You read certain kinds of news
 We think you’d like to keep reading those kinds of news
 Based on annotations of news that we do in a separate
system
 People
 Organizations & Committees
 Taxonomic topics
 We do this because the market for old news is very small.
 Thus we need to deal with kinds of news
Cluster Model
Elasticsearc
h
• Content id
• tags
Apache Spark
Cluster
maker
Cluster Model
Cluster Model Training
• K-means clustering
• Normal metrics to
choose K
• Used Jaccard distances
based on Content Tags
Collaborative Filtering
 There are people who read the kind of stuff that you do
 We think you’d like to read the stuff they’ve been reading
People who read math
books like to color
turtles.
We see you’ve been
reading a bit of math
lately…
Recommendation
Model
• Visitor id
• Cluster 0 preference
• Cluster 1 preference
• …
• Cluster N preference
Redshift
aggregate Collaborative
filtering
clusterElasticsearc
h
• Content id
• tags
• Visitor id
• Content id
• timestamp
• Content id
• Cluster id
• Visitor id
• Cluster id
• timestam
p
join
• Visitor id
• Cluster id
• # views
Recommendation
Model
Apache Spark
Recommendation Model Training
Cluster Model
Runtime System
Cluster Model
Recommendation
Model
CMS
Annotation
Pipeline
Recommendation
App
• Visitor id
• Cluster 0 preference
• Cluster 1 preference
• …
• Cluster N preference
• Content id
• Cluster id
• Content id
• tags
• Visitor id
• Content id
 Performance was good
 Able to train a model in a few hours
 Evaluation scores were decent
 Iteration was hard
 We couldn’t give a good explanation for why a recommendation was made
 Improving the model felt like guesswork
 The system was rather complex
 Lots of moving parts
The real world intervened
Two months later, our new
search system was
humming along in
production
That gave us time to think
about recommendations…
We got together and figured out how we’d want to
explain/defend a recommendation:
 Similar to what you’ve (recently) read?
 Something that a lot of people read?
 Something that a lot of subscribers read?
 Something that a lot of people like you read?
 Something that a lot of your colleagues read?
This made it sound like a search problem…
(ironic picture of people getting
excited in a meeting)
CMS
Annotation
Pipeline
User
Activit
y
Transform
Pipeline
Redshift
Elasticsearc
h
Elasticsearc
h
Recommendation
App
We had no idea if
these searches would
• Give good results
• Be fast enough
General Reads Search
What is popular amongst all of our readers?
Transform
• we roll up reads by the hour
Search
• All reads within the last 2 days
• Sum aggregation on content id over # reads
Notes
• Very fast
• Relatively small data footprint
Date Content # reads
2019-04-25 13:00 id-1 20,000
2019-04-25 13:00 id-2 15,000
2019-04-25 14:00 id-1 3,000
2019-04-25 15:00 id-2 40,000
2019-04-25 15:00 id-3 25,000
Data Used
Subscriber Reads Search
What is popular amongst our subscribers?
Search
• All reads within the last 2 days
• Count aggregation on content id
Notes
• Very fast
• Larger data footprint
• We determined it’s tolerable for 50k – 100k subscribers
• More than that would call for scaling up the
Elasticsearch
Date User Content
2019-04-25 13:23:47 A id-1
2019-04-25 13:38:10 A id-2
2019-04-25 14:12:57 B id-1
2019-04-25 15:00:07 C id-2
2019-04-25 15:32:54 A id-3
Data Used
Account Reads Search
What is popular amongst people you work with?
Search
• All reads within the last 2 days
• Term query to restrict to user’s account
• Count aggregation on content id
Notes
• Very fast
• Introduces some serendipity
Data Used
Date User Content
2019-04-25 13:23:47 A id-1
2019-04-25 13:38:10 A id-2
2019-04-25 14:12:57 B id-1
2019-04-25 15:00:07 C id-2
2019-04-25 15:32:54 A id-3
Date User Account
2019-04-25 A X
2019-04-25 B X
2019-04-25 C Y
Community Reads Search
What are people like you reading?
Search
A series of 3 queries per request
 Bucket 1: the 150 most recent news you’ve read in the last 7
days
 Bucket 2: the 50 users who have read news in Bucket 1, ranked
by clicks/opens
 Bucket 3: the 150 most recent news that users in Bucket 2 have
read, ranked by how many of them clicked/opened each
Notes
 Surprisingly fast
 Introduces some serendipity
Date User Content
2019-04-25 13:23:47 A id-1
2019-04-25 13:38:10 A id-2
2019-04-25 14:12:57 B id-1
2019-04-25 15:00:07 C id-2
2019-04-25 15:32:54 A id-3
Data Used
1
2
3
Similar News Search
What kind of stuff do you usually read?
Search
 All news you’ve read in the past 30 days
 Count aggregation on annotations
 News with at least one annotation the 30-day bucket
• Boosted by the frequency of the annotations in the
user’s reads
Notes
 Very fast
 Addresses the cold-start problem
 But: loses correlations between annotations
 A user may like articles about
 Corn & Boats
Content Annotations
id-5 Airplanes
id-6 Boats, Corn
id-7 Tables, Walls,
Corn
Data Used
Date User Content Annotations
2019-04-25 13:23:47 A id-1 Water, Corn
2019-04-25 13:38:10 A id-2 Corn, Boats
2019-04-25 14:12:57 A id-1 Water, Corn
2019-04-25 15:00:07 A id-3 Tables,
Walls
2019-04-25 15:32:54 A id-4 Chairs,
Boats
Things we’re happy about
• The system has relatively few moving parts
• We can explain our recommendations (and troubleshoot them)
• Recommendations are available for newly published content immediately
• Our scaling is mostly managed by scaling Elasticsearch
• It’s very easy to add additional constraints
• Ex/ If you don’t subscribe to the Energy vertical, we don’t want any of its content
affecting your recommendations
A few challenges opportunities we’ve identified
• It’s weird to use something so different than the standard architecture
• That’s a big reason we want your feedback
• We want to revisit the Similar News Search
• It seems like we should honor the correlations between annotations
• Each recommendation search/component should not be equally weighted
• Some are likely to be more pertinent to some users
• There are obvious dependencies
• If something is generally popular, it’s more likely to be popular for people in your account
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl

Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl

  • 1.
  • 2.
     Business Overview(5 slides)  Business Case (3 slides)  Evaluation (5 slides)  Prototype (6 slides)  Lessons Learned (3 slides)  Production System (7 slides)
  • 4.
    Core Site SubscriptionSite Oregon judge says he’ll block Trump’s abortion rule Pelosi, Schumer to meet with Trump on infrastructure next week Trump met with Twitter CEO amid bias complaints Bob Corker: Primary challenger for Trump would be ‘good thing for our country’ FERC denies groups’ legal fees in pipeline challenge House Democrats say Wheeler left biofuels client off disclosure Court sides with EPA in ozone region expansion fight Virginia uranium case may set nuclear precedent
  • 5.
  • 6.
    Agriculture Budget & Appropriations Campaigns Cybersecurity Defense Education eHealth Employment& Immigration Energy Financial Services Health Care Tax Technology Transportation California Florida New Jersey New York Canada
  • 7.
    Web Reads ~25%Email Reads ~75%
  • 8.
    Most users wantto customize the emails they receive They do this by selecting • Topics • People • Keywords of interest
  • 10.
    Sometimes news happens that •Is not the kind of thing you usually care about • But you find very interesting
  • 11.
    We want torecommend stories • Because a user may have missed something of interest / importance • Because a user may not have been aware of an interesting kind of news that we write about
  • 12.
    Defense Agriculture New York & NewJersey Education Health Care Content read by a user In a case like this, we want to • Recommend Health Care stories • Occasionally suggest Defense and Education news • Stay away from New Jersey cluster analysis of ~2000 stories from 2018 by topic
  • 14.
     We evaluateour system to  Figure out if the current version of the system doing better than the previous version  Identify users for which the system is doing particularly bad Version 2 1. Senate Commerce taps Ireland data chief for privacy hearing 2. U.S. Navy drafting new guidelines for reporting UFOs 3. 5G fight among Trump advisers likely to continue Version 1 1. 5G fight among Trump advisers likely to continue 2. Lockheed Martin net sales jump to $14.3B 3. U.S. tech companies see hope that talks could pry open China’s market How do we determine if this is interesting?
  • 15.
    Our situation  Nodirect feedback  historically, our users have not interacted with rating systems on our site  Dynamic interests  Reads are driven by big events in the news cycle in addition to a user’s historical behavior  Recommendations strongly tied to time  A news organization publishes new content throughout the day, so we can’t compare a week’s worth of consumption with the recommendations made on Monday. 1 2 3 4 5 (insert popular Presidential tweet)
  • 16.
    Short-term prediction ofnews reads • Sum of • news over the past 7 days • that you read • was in your top 10 recommended news at the time of reading • discounted by how far down that top 10 list it appears (10 – rank + 1) • Normalized by total possible score • 100 * (score / (10 * # read)) Stories read this week Rec. Rank Score 5G fight among Trump advisers likely to continue 2 9 U.S. tech companies see hope that talks could pry open China’s market 7 4 Senate Commerce taps Ireland data chief for privacy hearing - 0 U.S. Navy drafting new guidelines for reporting UFOs - 0 5G fight among Trump advisers likely to continue 3 8 Evaluation Score 42
  • 17.
    A very lowscore means our user could be missing news they’ve demonstrated an interest in Stories read this week Rec. Rank Score Northrop Grumman's sales up 22 percent - 0 General Dynamics reports 23 percent jump in revenue - 0 Lockheed Martin net sales jump to $14.3 3 8 5G fight among Trump advisers likely to continue - 0 Evaluation Score 20 Recommendations Inhofe ‘no longer concerned’ about border deployments harming readiness Supreme Court divided on citizenship question for census Budget reform gets a reboot as talks on a broader deal begin
  • 18.
    A very highscore indicates our user could be missing news they didn’t know they were interested in Stories read this week Rec. Rank Score Northrop Grumman's sales up 22 percent 1 10 General Dynamics reports 23 percent jump in revenue 2 9 Lockheed Martin net sales jump to $14.3 1 10 5G fight among Trump advisers likely to continue 3 8 Evaluation Score 92.5
  • 20.
    We started withtwo streams of information • Published News Documents • Content Reads (web clicks, email opens) CMS Annotation Pipeline User Activit y Transform Pipeline Redshift Elasticsearc h ??? Recommendations
  • 21.
    Content Filtering  Youread certain kinds of news  We think you’d like to keep reading those kinds of news  Based on annotations of news that we do in a separate system  People  Organizations & Committees  Taxonomic topics  We do this because the market for old news is very small.  Thus we need to deal with kinds of news Cluster Model
  • 22.
    Elasticsearc h • Content id •tags Apache Spark Cluster maker Cluster Model Cluster Model Training • K-means clustering • Normal metrics to choose K • Used Jaccard distances based on Content Tags
  • 23.
    Collaborative Filtering  Thereare people who read the kind of stuff that you do  We think you’d like to read the stuff they’ve been reading People who read math books like to color turtles. We see you’ve been reading a bit of math lately… Recommendation Model
  • 24.
    • Visitor id •Cluster 0 preference • Cluster 1 preference • … • Cluster N preference Redshift aggregate Collaborative filtering clusterElasticsearc h • Content id • tags • Visitor id • Content id • timestamp • Content id • Cluster id • Visitor id • Cluster id • timestam p join • Visitor id • Cluster id • # views Recommendation Model Apache Spark Recommendation Model Training Cluster Model
  • 25.
    Runtime System Cluster Model Recommendation Model CMS Annotation Pipeline Recommendation App •Visitor id • Cluster 0 preference • Cluster 1 preference • … • Cluster N preference • Content id • Cluster id • Content id • tags • Visitor id • Content id
  • 27.
     Performance wasgood  Able to train a model in a few hours  Evaluation scores were decent  Iteration was hard  We couldn’t give a good explanation for why a recommendation was made  Improving the model felt like guesswork  The system was rather complex  Lots of moving parts
  • 28.
    The real worldintervened Two months later, our new search system was humming along in production That gave us time to think about recommendations…
  • 29.
    We got togetherand figured out how we’d want to explain/defend a recommendation:  Similar to what you’ve (recently) read?  Something that a lot of people read?  Something that a lot of subscribers read?  Something that a lot of people like you read?  Something that a lot of your colleagues read? This made it sound like a search problem… (ironic picture of people getting excited in a meeting)
  • 31.
  • 32.
    General Reads Search Whatis popular amongst all of our readers? Transform • we roll up reads by the hour Search • All reads within the last 2 days • Sum aggregation on content id over # reads Notes • Very fast • Relatively small data footprint Date Content # reads 2019-04-25 13:00 id-1 20,000 2019-04-25 13:00 id-2 15,000 2019-04-25 14:00 id-1 3,000 2019-04-25 15:00 id-2 40,000 2019-04-25 15:00 id-3 25,000 Data Used
  • 33.
    Subscriber Reads Search Whatis popular amongst our subscribers? Search • All reads within the last 2 days • Count aggregation on content id Notes • Very fast • Larger data footprint • We determined it’s tolerable for 50k – 100k subscribers • More than that would call for scaling up the Elasticsearch Date User Content 2019-04-25 13:23:47 A id-1 2019-04-25 13:38:10 A id-2 2019-04-25 14:12:57 B id-1 2019-04-25 15:00:07 C id-2 2019-04-25 15:32:54 A id-3 Data Used
  • 34.
    Account Reads Search Whatis popular amongst people you work with? Search • All reads within the last 2 days • Term query to restrict to user’s account • Count aggregation on content id Notes • Very fast • Introduces some serendipity Data Used Date User Content 2019-04-25 13:23:47 A id-1 2019-04-25 13:38:10 A id-2 2019-04-25 14:12:57 B id-1 2019-04-25 15:00:07 C id-2 2019-04-25 15:32:54 A id-3 Date User Account 2019-04-25 A X 2019-04-25 B X 2019-04-25 C Y
  • 35.
    Community Reads Search Whatare people like you reading? Search A series of 3 queries per request  Bucket 1: the 150 most recent news you’ve read in the last 7 days  Bucket 2: the 50 users who have read news in Bucket 1, ranked by clicks/opens  Bucket 3: the 150 most recent news that users in Bucket 2 have read, ranked by how many of them clicked/opened each Notes  Surprisingly fast  Introduces some serendipity Date User Content 2019-04-25 13:23:47 A id-1 2019-04-25 13:38:10 A id-2 2019-04-25 14:12:57 B id-1 2019-04-25 15:00:07 C id-2 2019-04-25 15:32:54 A id-3 Data Used 1 2 3
  • 36.
    Similar News Search Whatkind of stuff do you usually read? Search  All news you’ve read in the past 30 days  Count aggregation on annotations  News with at least one annotation the 30-day bucket • Boosted by the frequency of the annotations in the user’s reads Notes  Very fast  Addresses the cold-start problem  But: loses correlations between annotations  A user may like articles about  Corn & Boats Content Annotations id-5 Airplanes id-6 Boats, Corn id-7 Tables, Walls, Corn Data Used Date User Content Annotations 2019-04-25 13:23:47 A id-1 Water, Corn 2019-04-25 13:38:10 A id-2 Corn, Boats 2019-04-25 14:12:57 A id-1 Water, Corn 2019-04-25 15:00:07 A id-3 Tables, Walls 2019-04-25 15:32:54 A id-4 Chairs, Boats
  • 37.
    Things we’re happyabout • The system has relatively few moving parts • We can explain our recommendations (and troubleshoot them) • Recommendations are available for newly published content immediately • Our scaling is mostly managed by scaling Elasticsearch • It’s very easy to add additional constraints • Ex/ If you don’t subscribe to the Energy vertical, we don’t want any of its content affecting your recommendations
  • 38.
    A few challengesopportunities we’ve identified • It’s weird to use something so different than the standard architecture • That’s a big reason we want your feedback • We want to revisit the Similar News Search • It seems like we should honor the correlations between annotations • Each recommendation search/component should not be equally weighted • Some are likely to be more pertinent to some users • There are obvious dependencies • If something is generally popular, it’s more likely to be popular for people in your account