Lessons learned from embedding Cassandra in xPatterns

1 Atigeo Confidential
Lessons learned from embedding
Cassandra in the xPatterns Platform
Seattle Cassandra Users
April 2014

• Cassandra use within xPatterns
• What we had to build
• Data model optimization
• Robust REST API’s
• Geo-Replication
• Demo: Export to NoSql API
Agenda

xPatterns
The Cloud-based, Big Data Analytics Platform
Benefits
Intelligent apps in man-days
Differentiators
End-to-End Big Data Platform
Cutting-Edge Intelligence
Real-time unsupervised analyticsHybrid Intelligence System
Learning & Feedback Automated repair & inductive reasoning
Measurably, best-ever analytical performance

Tools Roles
Tools Roles
Data
Scientist
Tools Rolesconnect IaaS (INFRASTRUCTURE as a SERVICE)
Cooperative Distributed Inferencing (CDI)
Neural
Network
Inference Natural
Language
Topic
Modeling
Data Mining Prediction Optimization
Machine
Learning
Relevance
Meta
Learning
AaaS (ANALYTICS as a SERVICE)discover
Dashboards
• 40+ report types
• Live dashboards
• Self-serve Studio
Visualization
• 2D & 3D Viewer
• Interactive explorer
• Search & Connect
Web Services
• Rich query language
• Add & edit content
act SaaS (SOFTWARE as a SERVICE)
Admin
Consoles
Data
Integration
Studio
Data
Analyst
Application
Engineer
Dashboard
Studio
REST API’s
Experimentation
Platform
Ad-Hoc Queries
Virtual Private Cloud
Hadoop NoSQL Search
Streaming Batch / ELT Federated
Interactive
Metadata
Processing
Framework
Labeling Tools
Extrapolation
Platform

Provider Referral Network: An interactive big data visualization tool for investigating
upstream and downstream referral patterns among physicians, connecting physicians to
specialties and to other physicians’ practice details.

Cassandra multi DC ring – read latency

• Export to NoSQL demo
• Data model optimization
 Publishing from HDFS/Hive/Shark to Cassandra
• Robust REST API’s
 Instrumentation
 Throttling & auto-retries
• Geo-Replication
 Cross-data-center replication, encryption & failover
• Lessons Learned since 0.6 till 2.0.6
What we’d like to share tonight

VPC-to-VPC IPSEC Tunnel

Export to NoSql API
• Datasets in the warehouse need to be exposed to high-throughput low-latency real-time
APIs. Each application requires extra processing performed on top of the core
datasets, hence additional transformations are executed for building data marts inside the
warehouse
• Exporter tool builds the efficient data model and runs an export of data from a Shark/Hive
table to a Cassandra Column Family, through a custom Spark job with configurable
throughput (configurable Spark processors against a Cassandra ring) (instrumentation
dashboard embedded, logs, progress and instrumentation events pushed though SSE)
• Data Modeling is driven by the read access patterns provided by an application engineer
building dashboards and visualizations: lookup key, columns (record fields to
read), paging, sorting, filtering
• The end result of a job run is a REST API endpoint (instrumented, monitored, resilient, geo-
replicated) that uses the underlying generated Cassandra data model and fuels the data in
the dashboards
• Configuration API provided for creating export jobs and executing them (ad-hoc or
scheduled).

Cassandra multi DC ring – write latency

Mesos/Spark cluster

Nagios monitoring

• NTP: synchronize ALL clocks (servers and clients)
• Reduce the number of CFs (avoid OOM)
• Rows not too skinny and not too wide (avoid OOM)
o Less memory pressure during high-throughput writes
o Reduced network I/O, less rows, more column slices
o Key cache & bloom filter index size affects perf
o Efficient compaction, avoid hot spots
• Custom serialization and dynamic columns for maximum perf gain
• Do not drop CFs before emptying them (truncate/compact first)
• Monitoring, instrumentation, automatic restarts
• ConsistencyLevel: ONE is best … for our use cases
• Key cache, Snappy compression
Lessons learned 0.6 - 2.0.6

© 2013 Atigeo, LLC. All rights reserved. Atigeo and the xPatterns logo are trademarks of Atigeo. The information herein is for informational purposes only and represents the current view of Atigeo as of the date of this
presentation. Because Atigeo must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Atigeo, and Atigeo cannot guarantee the accuracy of any information provided
after the date of this presentation. ATIGEO MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Lessons learned from embedding Cassandra in xPatterns

More Related Content

What's hot (20)

Viewers also liked (12)

Similar to Lessons learned from embedding Cassandra in xPatterns (20)

Recently uploaded (20)

Lessons learned from embedding Cassandra in xPatterns

Editor's Notes