Hailo - a case study for Cassandra & Acunu

dai clegg
october 2013
JAX London
@daiclegg @acunu
What is Hailo?

‣ The world’s highest-rated taxi app – over 11,000 five-star
reviews
‣ Over 500,000 registered passengers
‣ A Hailo hail is accepted around the world every 4 seconds
‣ Hailo operates in 15 cities on 3 continents from Tokyo to
Toronto in nearly 2 years of operation

2

@daiclegg @acunu
The Adoption of Cassandra & Acunu at Hailo

‣ Launched on AWS
‣ Two PHP/MySQL web apps plus a Java backend
‣ Mostly built by a team of 3 or 4 backend engineers
‣ MySQL multi-master for single available zone resilience
‣

Get/create/update entity

‣

Analytics

‣

Text search

3

@daiclegg @acunu
The Adoption of Cassandra & Acunu at Hailo

‣ A desire for greater resilience – “become a utility”
‣

Cassandra is designed for high availability

‣ Plans for international expansion around a single consumer app
‣

Cassandra is good at global replication

‣ Expected growth
‣

Cassandra scales linearly for both reads and writes

‣ Prior experience
‣

successful in-team experience with Cassandra

4

@daiclegg @acunu
The Adoption of Cassandra & Acunu at Hailo

‣ Replacement of key consumer app functionality,
‣

split PHP/MySQL web app into:
‣ a mixture of PHP/Java services
‣ backed by a Cassandra data store

‣ Launched into production in September 2012
‣

originally just powering North American expansion,

‣

gradually switching over Dublin and London

5

@daiclegg @acunu
The Adoption of Cassandra & Acunu at Hailo

‣ Further decompose functionality into Go/Java SOA
‣ Migrating:
‣

Entity databases to Cassandra

‣

Analytics to Acunu

‣

Search into Elastic Search

6

@daiclegg @acunu
Cassandra

@daiclegg @acunu
“Cassandra just works”
Dom W, Senior Engineer, Hailo

8

@daiclegg @acunu
Some Considerations for Data Modeling
‣ Do not read the entire entity, update one property and then
write back a mutation containing every column
‣

Only mutate columns that have been set

‣

This avoids read-before-write race conditions

‣ Choose row key carefully, since this partitions the records
‣ Think about how many records you want in a single row
‣ Denormalise on write into many indexes/views

9

@daiclegg @acunu
Some Considerations for Data Modeling
not obvious!

Average years experience per team member
10

MySQL

Cassandra
10

@daiclegg @acunu
Some Repercussions of Data Modeling
whoops!

11

@daiclegg @acunu
Some considerations for Application Development
People who can
attempt to query
MySQL
People who can
attempt to
query Cassandra

12

@daiclegg @acunu
Some Considerations for Applications development

13

@daiclegg @acunu
Acunu Analytics

@daiclegg @acunu
Acunu Analytics
Hailo needed to understand system performance/business SLAs

‣ Raw Cassandra lacks analytic primitives
‣

eg: COUNT, SUM, AVG, GROUP BY

‣ Acunu Analytics provides a platform for real time
‣

for pre-planned query templates

‣ It uses Cassandra as the store
‣

so it is highly available, resilient and globally distributed

‣ Integration is straightforward

15

@daiclegg @acunu
Acunu Analytics: technology
Real-time incremental cubing provides instant answers to Big Data questions

build cube
from history

16

@daiclegg @acunu
Acunu Analytics: technology
Apache Cassandra is the repository

build cube
from history

Apache
Cassandra

17

@daiclegg @acunu
Acunu Analytics: an example
Rich instant queries over cubes

Define aggregate cubes:

SELECT
FROM
WHERE
AND
GROUP BY
JOIN
HAVING
ORDER BY

CREATE CUBE APPROX TOP(keyword)
WHERE browser, time
GROUP BY time

New events update cubes

TOP(keyword)
table
browser = ‘chrome’
time BETWEEN..
d1, d2, ...
...
..
..

build cube
from history

Populate new cubes
from historic data

Drill down to raw events

18

@daiclegg @acunu
Acunu Analytics: summary
Overview of the workflow
develop queries in AQL, query
builder or self-service data explorer

invoke queries from within
applications with JSON query API

define aggregation cubes with DDL
or infer from self-service queries

define alerts to be raised
on trigger conditions
fill cube
from history

define connector: either
from library, toolkit or REST

populate new cubes from
historic data

define pre-processors:
programmatic, Java or
Javascript; or AQL query

define event schema with DDL
or infer from sample events

19

@daiclegg @acunu
Acunu Analytics at Hailo
some sample screenshots

“drill-across” to see
breakdown of data
and in-depth
analysis

20

@daiclegg @acunu
Acunu Analytics at Hailo
use cases

‣ Infrastructure and Application monitoring
‣ Real-time A/B testing of app layout and incentives
‣ Real time geo-view of supply/demand for drivers
‣ More in the pipeline

21

@daiclegg @acunu
Conclusions

@daiclegg @acunu
Conclusions
Choosing the Platform

‣ Solid Cassandra design
‣

High availability characteristics

‣

Easy multi-data centre setup

‣

Simplicity of operation

‣ With Acunu
‣

SQL-like rich queries

‣

easier data modeling

23

@daiclegg @acunu
Conclusions
Exploiting the platform

‣ Have an advocate
‣

sell the dream

‣ Learn the fundamentals
‣

get the best out of Cassandra

‣ Invest in tools to make life easier
‣ Keep management in the loop
‣

explain the trade offs

24

@daiclegg @acunu
Thank You.

Apache, Apache Cassandra, Cassandra and the eye logo
are trademarks of the Apache Software Foundation.

@daiclegg @acunu

More Related Content

PDF
Acunu Analytics and Cassandra at Hailo All Your Base 2013
PDF
Aws cost optimization: lessons learned, strategies, tips and tools
PPTX
The Fermilab HEPCloud Facility
PPTX
Hacking google cloud run
PDF
SAS integration with NoSQL data
PDF
Cloud Connect 2012, Big Data @ Netflix
PPTX
The evolution of the big data platform @ Netflix (OSCON 2015)
PDF
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Aws cost optimization: lessons learned, strategies, tips and tools
The Fermilab HEPCloud Facility
Hacking google cloud run
SAS integration with NoSQL data
Cloud Connect 2012, Big Data @ Netflix
The evolution of the big data platform @ Netflix (OSCON 2015)
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra

What's hot (19)

PPTX
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
PDF
Google Dataflow Intro
PDF
Elastic Data Analytics Platform @Datadog
PDF
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
PDF
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
PDF
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
PDF
Span Conference: Why your company needs a unified log
PDF
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
PPTX
Building a Lambda Architecture with Elasticsearch at Yieldbot
PDF
Lambda architecture
PDF
#lspe Q1 2013 dynamically scaling netflix in the cloud
PDF
Elastic Stack roadmap deep dive
PDF
JBCN barcelona 2017 kappa architecture 2.0
PDF
Streams, Tables, and Time in KSQL
PPTX
Data Analysis on AWS
PDF
Presto Summit 2018 - 07 - Lyft
PPTX
Open source big data landscape and possible ITS applications
PDF
Introduction to Data Engineer and Data Pipeline at Credit OK
PPTX
goto; London: Keeping your Cloud Footprint in Check
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Google Dataflow Intro
Elastic Data Analytics Platform @Datadog
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Span Conference: Why your company needs a unified log
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Building a Lambda Architecture with Elasticsearch at Yieldbot
Lambda architecture
#lspe Q1 2013 dynamically scaling netflix in the cloud
Elastic Stack roadmap deep dive
JBCN barcelona 2017 kappa architecture 2.0
Streams, Tables, and Time in KSQL
Data Analysis on AWS
Presto Summit 2018 - 07 - Lyft
Open source big data landscape and possible ITS applications
Introduction to Data Engineer and Data Pipeline at Credit OK
goto; London: Keeping your Cloud Footprint in Check
Ad

Viewers also liked (7)

PDF
Castle: Reinventing Storage for Big Data
KEY
2011.06.20 stratified-btree
PDF
Realtime Analytics with Apache Cassandra
PDF
Sasi, cassandra on full text search ride
PDF
Hardware Startups: The VC Perspective
PDF
Big data landscape v 3.0 - Matt Turck (FirstMark)
PDF
10 Event Technology Trends to Watch in 2016
Castle: Reinventing Storage for Big Data
2011.06.20 stratified-btree
Realtime Analytics with Apache Cassandra
Sasi, cassandra on full text search ride
Hardware Startups: The VC Perspective
Big data landscape v 3.0 - Matt Turck (FirstMark)
10 Event Technology Trends to Watch in 2016
Ad

Similar to Acunu and Hailo: a realtime analytics case study on Cassandra (20)

PDF
C* Summit EU 2013: No Whistling Required: Cabs, Cassandra, and Hailo
PDF
How Hailo fuels its growth using NoSQL storage and analytics - Dave Gardner (...
PPTX
Cabs, Cassandra, and Hailo
PPTX
Cabs, Cassandra, and Hailo (at Cassandra EU)
PDF
C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...
PDF
About VisualDNA Architecture @ Rubyslava 2014
PDF
Apache Cassandra at Target - Cassandra Summit 2014
PDF
Real-time data analytics with Cassandra at iland
PPTX
Tsunami alerting with Cassandra (From 0 to Cassandra on AWS in 30 days)
PDF
Acunu Analytics: Simpler Real-Time Cassandra Apps
PDF
Libon cassandra summiteu2014
PDF
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
PPTX
BigData Developers MeetUp
PDF
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
PDF
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
PPTX
Migrating Data Pipeline from MongoDB to Cassandra
PDF
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
PDF
From rdbms to cassandra without a hitch
PDF
Migration Best Practices: From RDBMS to Cassandra without a Hitch
PDF
Cascading concurrent yahoo lunch_nlearn
C* Summit EU 2013: No Whistling Required: Cabs, Cassandra, and Hailo
How Hailo fuels its growth using NoSQL storage and analytics - Dave Gardner (...
Cabs, Cassandra, and Hailo
Cabs, Cassandra, and Hailo (at Cassandra EU)
C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...
About VisualDNA Architecture @ Rubyslava 2014
Apache Cassandra at Target - Cassandra Summit 2014
Real-time data analytics with Cassandra at iland
Tsunami alerting with Cassandra (From 0 to Cassandra on AWS in 30 days)
Acunu Analytics: Simpler Real-Time Cassandra Apps
Libon cassandra summiteu2014
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
BigData Developers MeetUp
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Migrating Data Pipeline from MongoDB to Cassandra
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
From rdbms to cassandra without a hitch
Migration Best Practices: From RDBMS to Cassandra without a Hitch
Cascading concurrent yahoo lunch_nlearn

More from Acunu (20)

PDF
Virtual nodes: Operational Aspirin
PDF
Understanding Cassandra internals to solve real-world problems
PDF
All Your Base
PDF
Realtime Analytics with Apache Cassandra - JAX London
PDF
Real-time Cassandra
PDF
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
PDF
Realtime Analytics with Cassandra
PDF
Acunu Analytics @ Cassandra London
KEY
Exploring Big Data value for your business
PDF
Realtime Analytics on the Twitter Firehose with Cassandra
PDF
Progressive NOSQL: Cassandra
PPTX
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
KEY
Cassandra EU 2012 - Putting the X Factor into Cassandra
PPTX
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
PDF
Next Generation Cassandra
PDF
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
PDF
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
PDF
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
PDF
Cassandra EU 2012 - Data modelling workshop by Richard Low
PDF
Acunu Analytics
Virtual nodes: Operational Aspirin
Understanding Cassandra internals to solve real-world problems
All Your Base
Realtime Analytics with Apache Cassandra - JAX London
Real-time Cassandra
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics with Cassandra
Acunu Analytics @ Cassandra London
Exploring Big Data value for your business
Realtime Analytics on the Twitter Firehose with Cassandra
Progressive NOSQL: Cassandra
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Next Generation Cassandra
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
Cassandra EU 2012 - Data modelling workshop by Richard Low
Acunu Analytics

Recently uploaded (20)

PDF
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
Advancing precision in air quality forecasting through machine learning integ...
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
Electrocardiogram sequences data analytics and classification using unsupervi...
PPTX
Build automations faster and more reliably with UiPath ScreenPlay
PPTX
Internet of Everything -Basic concepts details
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
Ensemble model-based arrhythmia classification with local interpretable model...
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PPTX
Presentation - Principles of Instructional Design.pptx
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
Early detection and classification of bone marrow changes in lumbar vertebrae...
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Advancing precision in air quality forecasting through machine learning integ...
SGT Report The Beast Plan and Cyberphysical Systems of Control
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Electrocardiogram sequences data analytics and classification using unsupervi...
Build automations faster and more reliably with UiPath ScreenPlay
Internet of Everything -Basic concepts details
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
Ensemble model-based arrhythmia classification with local interpretable model...
NewMind AI Weekly Chronicles – August ’25 Week IV
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Basics of Cloud Computing - Cloud Ecosystem
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
EIS-Webinar-Regulated-Industries-2025-08.pdf
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Presentation - Principles of Instructional Design.pptx

Acunu and Hailo: a realtime analytics case study on Cassandra

  • 1. Hailo - a case study for Cassandra & Acunu dai clegg october 2013 JAX London @daiclegg @acunu
  • 2. What is Hailo? ‣ The world’s highest-rated taxi app – over 11,000 five-star reviews ‣ Over 500,000 registered passengers ‣ A Hailo hail is accepted around the world every 4 seconds ‣ Hailo operates in 15 cities on 3 continents from Tokyo to Toronto in nearly 2 years of operation 2 @daiclegg @acunu
  • 3. The Adoption of Cassandra & Acunu at Hailo ‣ Launched on AWS ‣ Two PHP/MySQL web apps plus a Java backend ‣ Mostly built by a team of 3 or 4 backend engineers ‣ MySQL multi-master for single available zone resilience ‣ Get/create/update entity ‣ Analytics ‣ Text search 3 @daiclegg @acunu
  • 4. The Adoption of Cassandra & Acunu at Hailo ‣ A desire for greater resilience – “become a utility” ‣ Cassandra is designed for high availability ‣ Plans for international expansion around a single consumer app ‣ Cassandra is good at global replication ‣ Expected growth ‣ Cassandra scales linearly for both reads and writes ‣ Prior experience ‣ successful in-team experience with Cassandra 4 @daiclegg @acunu
  • 5. The Adoption of Cassandra & Acunu at Hailo ‣ Replacement of key consumer app functionality, ‣ split PHP/MySQL web app into: ‣ a mixture of PHP/Java services ‣ backed by a Cassandra data store ‣ Launched into production in September 2012 ‣ originally just powering North American expansion, ‣ gradually switching over Dublin and London 5 @daiclegg @acunu
  • 6. The Adoption of Cassandra & Acunu at Hailo ‣ Further decompose functionality into Go/Java SOA ‣ Migrating: ‣ Entity databases to Cassandra ‣ Analytics to Acunu ‣ Search into Elastic Search 6 @daiclegg @acunu
  • 8. “Cassandra just works” Dom W, Senior Engineer, Hailo 8 @daiclegg @acunu
  • 9. Some Considerations for Data Modeling ‣ Do not read the entire entity, update one property and then write back a mutation containing every column ‣ Only mutate columns that have been set ‣ This avoids read-before-write race conditions ‣ Choose row key carefully, since this partitions the records ‣ Think about how many records you want in a single row ‣ Denormalise on write into many indexes/views 9 @daiclegg @acunu
  • 10. Some Considerations for Data Modeling not obvious! Average years experience per team member 10 MySQL Cassandra 10 @daiclegg @acunu
  • 11. Some Repercussions of Data Modeling whoops! 11 @daiclegg @acunu
  • 12. Some considerations for Application Development People who can attempt to query MySQL People who can attempt to query Cassandra 12 @daiclegg @acunu
  • 13. Some Considerations for Applications development 13 @daiclegg @acunu
  • 15. Acunu Analytics Hailo needed to understand system performance/business SLAs ‣ Raw Cassandra lacks analytic primitives ‣ eg: COUNT, SUM, AVG, GROUP BY ‣ Acunu Analytics provides a platform for real time ‣ for pre-planned query templates ‣ It uses Cassandra as the store ‣ so it is highly available, resilient and globally distributed ‣ Integration is straightforward 15 @daiclegg @acunu
  • 16. Acunu Analytics: technology Real-time incremental cubing provides instant answers to Big Data questions build cube from history 16 @daiclegg @acunu
  • 17. Acunu Analytics: technology Apache Cassandra is the repository build cube from history Apache Cassandra 17 @daiclegg @acunu
  • 18. Acunu Analytics: an example Rich instant queries over cubes Define aggregate cubes: SELECT FROM WHERE AND GROUP BY JOIN HAVING ORDER BY CREATE CUBE APPROX TOP(keyword) WHERE browser, time GROUP BY time New events update cubes TOP(keyword) table browser = ‘chrome’ time BETWEEN.. d1, d2, ... ... .. .. build cube from history Populate new cubes from historic data Drill down to raw events 18 @daiclegg @acunu
  • 19. Acunu Analytics: summary Overview of the workflow develop queries in AQL, query builder or self-service data explorer invoke queries from within applications with JSON query API define aggregation cubes with DDL or infer from self-service queries define alerts to be raised on trigger conditions fill cube from history define connector: either from library, toolkit or REST populate new cubes from historic data define pre-processors: programmatic, Java or Javascript; or AQL query define event schema with DDL or infer from sample events 19 @daiclegg @acunu
  • 20. Acunu Analytics at Hailo some sample screenshots “drill-across” to see breakdown of data and in-depth analysis 20 @daiclegg @acunu
  • 21. Acunu Analytics at Hailo use cases ‣ Infrastructure and Application monitoring ‣ Real-time A/B testing of app layout and incentives ‣ Real time geo-view of supply/demand for drivers ‣ More in the pipeline 21 @daiclegg @acunu
  • 23. Conclusions Choosing the Platform ‣ Solid Cassandra design ‣ High availability characteristics ‣ Easy multi-data centre setup ‣ Simplicity of operation ‣ With Acunu ‣ SQL-like rich queries ‣ easier data modeling 23 @daiclegg @acunu
  • 24. Conclusions Exploiting the platform ‣ Have an advocate ‣ sell the dream ‣ Learn the fundamentals ‣ get the best out of Cassandra ‣ Invest in tools to make life easier ‣ Keep management in the loop ‣ explain the trade offs 24 @daiclegg @acunu
  • 25. Thank You. Apache, Apache Cassandra, Cassandra and the eye logo are trademarks of the Apache Software Foundation. @daiclegg @acunu