SlideShare a Scribd company logo
Building an Activity
Feed with Cassandra
Mark Dunphy, Software Engineer
Behance/Adobe
@dunphtastic
Disclaimer
Not an operations person.
Will pretend to be one for the purpose of this talk.
Quick Overview
What is the Behance Activity Feed?
Building an Activity Feed with Cassandra
• Actions
• Comments, Appreciations, Etc
• Entities
• Projects, Works in Progress
• Actors
• Users
Project Entity
Actions taken
by actors
Activity Fan Out
User A publishes
a new project
Write to Follower A’s feed
Write to Follower B’s feed
Write to Follower C’s feed
Write to Follower D’s feed
Now that that’s over…
MongoDB
2011
• Smaller user base (~340,000).
• Built very quickly. Worked well at the time.
• Not well researched.
Fast forward to 2014
• Frequent node failures
• Heavy disk fragmentation caused by deletes
• Slow reads from disk. Started storing in RAM.
• Primary -> Secondary caused downtime for
some.
• Scaled out vertically and horizontally.
Why Cassandra?
• Riak
• Very close. Community seemed lacking.
• Redis
• No native cluster. Too much maintenance.
• Memcached/MySQL
• Too much complex app logic.
Cassandra Wins.
• Fantastic community. #cassandra on Twitter
• Easy to read documentation
• Linearly scalable. Easy to grow cluster.
• Low maintenance overhead for ops team.
• Handles time series data very well.
Learning
• Cassandra Summit 2014
• Other team in Adobe
• Long nights reading documentation
Our Data
• Ephemeral
• “Source of truth” lives in a MySQL database
• Okay with *some* data loss
Our Rules
• User’s feed is comprised of entities with one set
of actions
• User’s feed only contains one of any given entity
• An entity’s set of actions contains up to seven of
the most recent actions taken by that user’s
network
Planning
Language Support
• Most services on Behance are PHP
• No official Datastax PHP driver
–Mark Dunphy, 2014
“Looks like I’m learning python.”
Go to Production
No, nothing is working yet. I didn’t skip a slide.
• App/cluster in production before anything works
• Test real life load
• Fail spectacularly without anybody noticing
• Deploy risky changes without fear
• Run alongside MongoDB
January 19th, 2015
Query Patterns
• “Create your data models based on the queries
you want to run” - Basically Everybody
• Wanted to…
• Read a user’s feed entities by type and time of
most recent action…separately.
• Write/Update a user’s feed entities with new
actions while knowing only user id and entity id
Data Models
–Mark Dunphy, January 2015
“An UPDATE in Cassandra works like an
UPSERT! Let’s store the user’s entire feed in a
single row in a table! It’s so simple!”
First Data Model
CREATE TYPE activity.action (
created_on timestamp,
secondary_entity_id int,
actor_id int,
verb_id int
);
CREATE TYPE activity.entity (
entity_type_id int,
entity_id int
);
CREATE TABLE activity.project_actions (
modified_on timestamp,
entity_id int,
user_id int,
actions list<frozen<action>>,
PRIMARY KEY(user_id, entity_id)
)
CREATE TABLE activity.feeds (
modified_entities list<frozen<entity>>,
modified_on timestamp,
project_ids list<int>,
user_id int,
wip_revision_ids list<int>,
PRIMARY KEY(user_id)
)
First Data Model
First Data Model
Moments Before Everything Exploded
–Mark Dunphy, January 2015
“Okay let’s keep nearly the same model, but
use INSERT and DELETE instead of always
UPDATE. Just use batch statements.”
Second Data Model
Second Data Model
This was also a very very bad idea.
• Lose the benefit of Cassandra being distributed
• All queries go through the same coordinator
which puts a lot of stress and responsibility on
one node.
• Use concurrency and prepared statements
instead. Datastax drivers make this easy.
Second Data Model
Second Data Model
Oops
Okay…
Now we’ve got it.
Winning Data Model
CREATE TYPE activity.action (
created_on timestamp,
secondary_entity_id int,
actor_id int,
verb_id int
);
CREATE TABLE activity.projects (
created_on timestamp,
user_id int,
entity_id int,
actions list<frozen<action>>,
PRIMARY KEY(user_id, created_on, entity_id)
)
CREATE TABLE activity.project_actions (
modified_on timestamp,
entity_id int,
user_id int,
actions list<frozen<action>>,
PRIMARY KEY(user_id, entity_id)
)
Much Nicer
Write Strategy
• “User A comments on Project A. User B follows
User A.”
• Request out to add the comment action to User
B’s feed
• Read existing actions for that entity (Project A) in
B’s feed. Push new action on top.
• Write new actions list into new “row” in projects
table
Read Strategy
• SELECT * FROM projects WHERE user_id
= 123 AND created_on > 123214373
• Optimized for quick/easy reads. More important
that a user’s feed loads quickly than it updating
quickly.
• Use timestamp to “page” through data.
Lessons Learned
• Duplicate your data to achieve desired queries.
Storage is cheap. Writes are cheap.
• Think outside the box. Cassandra is not
relational.
• Never ever ever ignore inserts/deletes in favor of
an update only workflow. Never. It is literally
insane.
Final Specs
• 16 node cluster on AWS EC2 c3.8xlarge
• Mix of SizeTieredCompactionStrategy and
DateTieredCompactionStrategy
• NetworkTopologyStrategy
• Replication factor 3
• ConsistencyLevel = ONE for most requests
Final Specs
• Bursty write volume. Consistent read volume.
• 5k to 80k writes per second
• 2k to 4k reads per second
Questions?
I might have answers.
Thank you!
Mark Dunphy, Software Engineer
Behance/Adobe
@dunphtastic

More Related Content

What's hot (20)

PDF
Hazelcast
oztalip
 
PDF
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
John Beresniewicz
 
PDF
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Mydbops
 
PDF
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder
 
PDF
MyRocks Deep Dive
Yoshinori Matsunobu
 
PPTX
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
DataStax
 
PDF
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Databricks
 
PDF
Etsy Activity Feeds Architecture
Dan McKinley
 
PDF
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
PDF
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
PPTX
ProxySQL for MySQL
Mydbops
 
PDF
Introduction VAUUM, Freezing, XID wraparound
Masahiko Sawada
 
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
PDF
My first 90 days with ClickHouse.pdf
Alkin Tezuysal
 
PPTX
OLTP+OLAP=HTAP
EDB
 
PDF
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 
PDF
Storing time series data with Apache Cassandra
Patrick McFadin
 
Hazelcast
oztalip
 
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
John Beresniewicz
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Mydbops
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder
 
MyRocks Deep Dive
Yoshinori Matsunobu
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
DataStax
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Databricks
 
Etsy Activity Feeds Architecture
Dan McKinley
 
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
ProxySQL for MySQL
Mydbops
 
Introduction VAUUM, Freezing, XID wraparound
Masahiko Sawada
 
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
My first 90 days with ClickHouse.pdf
Alkin Tezuysal
 
OLTP+OLAP=HTAP
EDB
 
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 
Storing time series data with Apache Cassandra
Patrick McFadin
 

Viewers also liked (20)

POTX
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
MongoDB
 
PPTX
Socialite, the Open Source Status Feed
MongoDB
 
PPTX
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
MongoDB
 
PPTX
Building a Directed Graph with MongoDB
Tony Tam
 
PPTX
Agg framework selectgroup feb2015 v2
MongoDB
 
PDF
MongoGraph - MongoDB Meets the Semantic Web
DATAVERSITY
 
PDF
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB
 
PPTX
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB
 
PPTX
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
PPTX
Back to Basics Webinar 2: Your First MongoDB Application
MongoDB
 
PPTX
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB
 
PPTX
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
PDF
Using MongoDB as a high performance graph database
Chris Clarke
 
PDF
Mongo DB
Edureka!
 
PDF
Intro To MongoDB
Alex Sharp
 
PPT
Introduction to MongoDB
Ravi Teja
 
PDF
Introduction to MongoDB
Mike Dirolf
 
PDF
MongoDB World 2016: Poster Sessions eBook
MongoDB
 
PPTX
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB
 
PDF
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
MongoDB
 
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
MongoDB
 
Socialite, the Open Source Status Feed
MongoDB
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
MongoDB
 
Building a Directed Graph with MongoDB
Tony Tam
 
Agg framework selectgroup feb2015 v2
MongoDB
 
MongoGraph - MongoDB Meets the Semantic Web
DATAVERSITY
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB
 
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
Back to Basics Webinar 2: Your First MongoDB Application
MongoDB
 
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB
 
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
Using MongoDB as a high performance graph database
Chris Clarke
 
Mongo DB
Edureka!
 
Intro To MongoDB
Alex Sharp
 
Introduction to MongoDB
Ravi Teja
 
Introduction to MongoDB
Mike Dirolf
 
MongoDB World 2016: Poster Sessions eBook
MongoDB
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
MongoDB
 
Ad

Similar to Building an Activity Feed with Cassandra (20)

PPT
UnConference for Georgia Southern Computer Science March 31, 2015
Christopher Curtin
 
PPTX
Hofstra University - Overview of Big Data
sarasioux
 
PPTX
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
PPTX
Migrating Data Pipeline from MongoDB to Cassandra
Demi Ben-Ari
 
PDF
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax
 
PDF
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
DataStax Academy
 
PDF
Escalando una PHP App con DB sharding - PHP Conference
Matias Paterlini
 
PDF
2013 - Matías Paterlini: Escalando PHP con sharding y Amazon Web Services
PHP Conference Argentina
 
PDF
Piano Media - approach to data gathering and processing
MartinStrycek
 
PPTX
Cassandra from the trenches: migrating Netflix (update)
Jason Brown
 
PDF
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
ScyllaDB
 
PPTX
Social Security Company Nexgate's Success Relies on Apache Cassandra
DataStax Academy
 
PPTX
Cassandra implementation for collecting data and presenting data
Chen Robert
 
PPTX
The Big Data Stack
Zubair Nabi
 
PPTX
Exploring NoSQL and implementing through Cassandra
Dileep Kalidindi
 
PPTX
Using Cassandra with your Web Application
supertom
 
PDF
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Julien Anguenot
 
PDF
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
DataStax Academy
 
PDF
Introduction to Apache Cassandra
Robert Stupp
 
DOCX
Cassandra data modelling best practices
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
UnConference for Georgia Southern Computer Science March 31, 2015
Christopher Curtin
 
Hofstra University - Overview of Big Data
sarasioux
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
Migrating Data Pipeline from MongoDB to Cassandra
Demi Ben-Ari
 
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax
 
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
DataStax Academy
 
Escalando una PHP App con DB sharding - PHP Conference
Matias Paterlini
 
2013 - Matías Paterlini: Escalando PHP con sharding y Amazon Web Services
PHP Conference Argentina
 
Piano Media - approach to data gathering and processing
MartinStrycek
 
Cassandra from the trenches: migrating Netflix (update)
Jason Brown
 
Persistence Pipelines in a Processing Graph: Mutable Big Data at Salesforce b...
ScyllaDB
 
Social Security Company Nexgate's Success Relies on Apache Cassandra
DataStax Academy
 
Cassandra implementation for collecting data and presenting data
Chen Robert
 
The Big Data Stack
Zubair Nabi
 
Exploring NoSQL and implementing through Cassandra
Dileep Kalidindi
 
Using Cassandra with your Web Application
supertom
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Julien Anguenot
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
DataStax Academy
 
Introduction to Apache Cassandra
Robert Stupp
 
Cassandra data modelling best practices
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Ad

Recently uploaded (20)

PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 

Building an Activity Feed with Cassandra