SlideShare a Scribd company logo
introduction to cassandra
eben hewitt
september 29. 2010
web 2.0 expo
new york city
• director, application architecture
at a global corp
• focus on SOA, SaaS, Events
• i wrote this
@ebenhewitt
agenda
• context
• features
• data model
• api
“nosql”  “big data”
• mongodb
• couchdb
• tokyo cabinet
• redis
• riak
• what about?
– Poet, Lotus, Xindice
– they’ve been around forever…
– rdbms was once the new kid…
innovation at scale
• google bigtable (2006)
– consistency model: strong
– data model: sparse map
– clones: hbase, hypertable
• amazon dynamo (2007)
– O(1) dht
– consistency model: client tune-able
– clones: riak, voldemort
cassandra ~= bigtable + dynamo
proven
• The Facebook stores 150TB of data on 150 nodes
web 2.0
• used at Twitter, Rackspace, Mahalo, Reddit,
Cloudkick, Cisco, Digg, SimpleGeo, Ooyala, OpenX,
others
cap theorem
•consistency
– all clients have same view of data
•availability
– writeable in the face of node failure
•partition tolerance
– processing can continue in the face of network failure
(crashed router, broken network)
daniel abadi: pacelc
write consistency
Level Description
ZERO Good luck with that
ANY 1 replica (hints count)
ONE 1 replica. read repair in bkgnd
QUORUM (DCQ for RackAware) (N /2) + 1
ALL N = replication factor
Level Description
ZERO Ummm…
ANY Try ONE instead
ONE 1 replica
QUORUM (DCQ for RackAware) Return most recent TS after (N /2) + 1
report
ALL N = replication factor
read consistency
agenda
• context
• features
• data model
• api
cassandra properties
• tuneably consistent
• very fast writes
• highly available
• fault tolerant
• linear, elastic scalability
• decentralized/symmetric
• ~12 client languages
– Thrift RPC API
• ~automatic provisioning of new nodes
• 0(1) dht
• big data
write op
Staged Event-Driven Architecture
• A general-purpose framework for high
concurrency & load conditioning
• Decomposes applications into stages
separated by queues
• Adopt a structured approach to event-driven
concurrency
instrumentation
data replication
• configurable replication factor
• replica placement strategy
rack unaware  Simple Strategy
rack aware  Old Network Topology Strategy
data center shard  Network Topology Strategy
partitioner smack-down
Random Preserving
• system will use MD5(key) to
distribute data across nodes
• even distribution of keys
from one CF across
ranges/nodes
Order Preserving
• key distribution determined
by token
• lexicographical ordering
• required for range queries
– scan over rows like cursor in
index
• can specify the token for
this node to use
• ‘scrabble’ distribution
agenda
• context
• features
• data model
• api
structure
keyspace
settings
(eg,
partitioner)
column family
settings (eg,
comparator,
type [Std])
column
name value clock
keyspace
• ~= database
• typically one per application
• some settings are configurable only per
keyspace
column family
• group records of similar kind
• not same kind, because CFs are sparse tables
• ex:
– User
– Address
– Tweet
– PointOfInterest
– HotelRoom
think of cassandra as
row-oriented
• each row is uniquely identifiable by key
• rows group columns and super columns
column family
n=
42
user=eben
key
123
key
456
user=alison
icon=
nickname=
The
Situation
json-like notation
User {
123 : { email: alison@foo.com,
icon: },
456 : { email: eben@bar.com,
location: The Danger Zone}
}
0.6 example
$cassandra –f
$bin/cassandra-cli
cassandra> connect localhost/9160
cassandra> set
Keyspace1.Standard1[‘eben’][‘age’]=‘29’
cassandra> set
Keyspace1.Standard1[‘eben’][‘email’]=‘e@e.com’
cassandra> get Keyspace1.Standard1[‘eben'][‘age']
=> (column=6e616d65, value=39,
timestamp=1282170655390000)
a column has 3 parts
1. name
– byte[]
– determines sort order
– used in queries
– indexed
2. value
– byte[]
– you don’t query on column values
3. timestamp
– long (clock)
– last write wins conflict resolution
column comparators
• byte
• utf8
• long
• timeuuid
• lexicaluuid
• <pluggable>
– ex: lat/long
super column
super columns group columns under a common name
<<SCF>>PointOfInterest
super column family
<<SC>>Central
Park
10017
<<SC>>
Empire State Bldg
<<SC>>
Phoenix
Zoo
85255
desc=Fun to
walk in.
phone=212.
555.11212
desc=Great
view from
102nd floor!
PointOfInterest {
key: 85255 {
Phoenix Zoo { phone: 480-555-5555, desc: They have animals here. },
Spring Training { phone: 623-333-3333, desc: Fun for baseball fans. },
}, //end phx
key: 10019 {
Central Park { desc: Walk around. It's pretty.} ,
Empire State Building { phone: 212-777-7777,
desc: Great view from 102nd floor. }
} //end nyc
}
s
super column
super column family
flexible schema
key
column
super column family
about super column families
• sub-column names in a SCF are not indexed
– top level columns (SCF Name) are always indexed
• often used for denormalizing data from
standard CFs
agenda
• context
• features
• data model
• api
slice predicate
• data structure describing columns to return
– SliceRange
• start column name
• finish column name (can be empty to stop on count)
• reverse
• count (like LIMIT)
read api
• get() : Column
– get the Col or SC at given ColPath
COSC cosc = client.get(key, path, CL);
• get_slice() : List<ColumnOrSuperColumn>
– get Cols in one row, specified by SlicePredicate:
List<ColumnOrSuperColumn> results =
client.get_slice(key, parent, predicate, CL);
• multiget_slice() : Map<key, List<CoSC>>
– get slices for list of keys, based on SlicePredicate
Map<byte[],List<ColumnOrSuperColumn>> results =
client.multiget_slice(rowKeys, parent, predicate, CL);
• get_range_slices() : List<KeySlice>
– returns multiple Cols according to a range
– range is startkey, endkey, starttoken, endtoken:
List<KeySlice> slices = client.get_range_slices(
parent, predicate, keyRange, CL);
write api
client.insert(userKeyBytes, parent,
new Column(“band".getBytes(UTF8),
“Funkadelic".getBytes(), clock), CL);
batch_mutate
– void batch_mutate(
map<byte[], map<String, List<Mutation>>> , CL)
remove
– void remove(byte[],
ColumnPath column_path, Clock, CL)
batch_mutate
//create param
Map<byte[], Map<String, List<Mutation>>> mutationMap =
new HashMap<byte[], Map<String, List<Mutation>>>();
//create Cols for Muts
Column nameCol = new Column("name".getBytes(UTF8),
“Funkadelic”.getBytes("UTF-8"), new Clock(System.nanoTime()););
Mutation nameMut = new Mutation();
nameMut.column_or_supercolumn = nameCosc; //also phone, etc
Map<String, List<Mutation>> muts = new HashMap<String, List<Mutation>>();
List<Mutation> cols = new ArrayList<Mutation>();
cols.add(nameMut);
cols.add(phoneMut);
muts.put(CF, cols);
//outer map key is a row key; inner map key is the CF name
mutationMap.put(rowKey.getBytes(), muts);
//send to server
client.batch_mutate(mutationMap, CL);
raw thrift: for masochists only
• pycassa (python)
• fauna (ruby)
• hector (java)
• pelops (java)
• kundera (JPA)
• hectorSharp (C#)
what about…
SELECT WHERE
ORDER BY
JOIN ON
GROUP
rdbms: domain-based model
what answers do I have?
cassandra: query-based model
what questions do I have?
SELECT WHERE
cassandra is an index factory
<<cf>>USER
Key: UserID
Cols: username, email, birth date, city, state
How to support this query?
SELECT * FROM User WHERE city = ‘Scottsdale’
Create a new CF called UserCity:
<<cf>>USERCITY
Key: city
Cols: IDs of the users in that city.
Also uses the Valueless Column pattern
• Use an aggregate key
state:city: { user1, user2}
• Get rows between AZ: & AZ;
for all Arizona users
• Get rows between AZ:Scottsdale &
AZ:Scottsdale1
for all Scottsdale users
SELECT WHERE pt 2
ORDER BY
Rows
are placed according to their Partitioner:
•Random: MD5 of key
•Order-Preserving: actual key
are sorted by key, regardless of partitioner
Columns
are sorted according to
CompareWith or
CompareSubcolumnsWith
rdbms
cassandra
is cassandra a good fit?
• you need really fast writes
• you need durability
• you have lots of data
> GBs
>= three servers
• your app is evolving
– startup mode, fluid data
structure
• loose domain data
– “points of interest”
• your programmers can deal
– documentation
– complexity
– consistency model
– change
– visibility tools
• your operations can deal
– hardware considerations
– can move data
– JMX monitoring
thank you!
@ebenhewitt

More Related Content

Similar to Scaling Web Applications with Cassandra Presentation.ppt (20)

PPT
5266732.ppt
hothyfa
 
PDF
About "Apache Cassandra"
Jihyun Ahn
 
PDF
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Data Con LA
 
PPTX
Cassandra - A decentralized storage system
Arunit Gupta
 
PPT
Cassandra Data Model
ebenhewitt
 
PDF
Cassandra Explained
Eric Evans
 
PPTX
Learning Cassandra NoSQL
Pankaj Khattar
 
PDF
Jan 2015 - Cassandra101 Manchester Meetup
Christopher Batey
 
PDF
Cassandra Explained
Eric Evans
 
PDF
Slide presentation pycassa_upload
Rajini Ramesh
 
PPTX
Presentation
Dimitris Stripelis
 
PDF
Moving from a Relational Database to Cassandra: Why, Where, When, and How
Anant Corporation
 
PPTX
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
PDF
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Piotr Kolaczkowski
 
PPTX
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Anant Corporation
 
PDF
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
PPTX
Cassandra
exsuns
 
PPTX
Netcetera
Scandit
 
PDF
Apache cassandra & apache spark for time series data
Patrick McFadin
 
PPTX
An Introduction to Cassandra - Oracle User Group
Carlos Juzarte Rolo
 
5266732.ppt
hothyfa
 
About "Apache Cassandra"
Jihyun Ahn
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Data Con LA
 
Cassandra - A decentralized storage system
Arunit Gupta
 
Cassandra Data Model
ebenhewitt
 
Cassandra Explained
Eric Evans
 
Learning Cassandra NoSQL
Pankaj Khattar
 
Jan 2015 - Cassandra101 Manchester Meetup
Christopher Batey
 
Cassandra Explained
Eric Evans
 
Slide presentation pycassa_upload
Rajini Ramesh
 
Presentation
Dimitris Stripelis
 
Moving from a Relational Database to Cassandra: Why, Where, When, and How
Anant Corporation
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Piotr Kolaczkowski
 
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Anant Corporation
 
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
Cassandra
exsuns
 
Netcetera
Scandit
 
Apache cassandra & apache spark for time series data
Patrick McFadin
 
An Introduction to Cassandra - Oracle User Group
Carlos Juzarte Rolo
 

More from ssuserbad56d (11)

PPTX
Dictionaries in Python programming language
ssuserbad56d
 
PPTX
Introduction to functions in C programming language
ssuserbad56d
 
PPTX
OOP in Python Programming: Classes and Objects
ssuserbad56d
 
PPTX
Software Testing and JUnit and Best Practices
ssuserbad56d
 
PPT
search
ssuserbad56d
 
PPT
search
ssuserbad56d
 
PPT
Cassandra
ssuserbad56d
 
PPT
Redis
ssuserbad56d
 
PPTX
Covered Call
ssuserbad56d
 
PDF
Lec04.pdf
ssuserbad56d
 
PDF
Project.pdf
ssuserbad56d
 
Dictionaries in Python programming language
ssuserbad56d
 
Introduction to functions in C programming language
ssuserbad56d
 
OOP in Python Programming: Classes and Objects
ssuserbad56d
 
Software Testing and JUnit and Best Practices
ssuserbad56d
 
search
ssuserbad56d
 
search
ssuserbad56d
 
Cassandra
ssuserbad56d
 
Covered Call
ssuserbad56d
 
Lec04.pdf
ssuserbad56d
 
Project.pdf
ssuserbad56d
 

Recently uploaded (20)

PPTX
Distribution reservoir and service storage pptx
dhanashree78
 
PDF
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
PDF
Digital water marking system project report
Kamal Acharya
 
PPTX
OCS353 DATA SCIENCE FUNDAMENTALS- Unit 1 Introduction to Data Science
A R SIVANESH M.E., (Ph.D)
 
PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PPTX
How Industrial Project Management Differs From Construction.pptx
jamespit799
 
PPTX
MODULE 03 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
PPTX
MODULE 04 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
PDF
SERVERLESS PERSONAL TO-DO LIST APPLICATION
anushaashraf20
 
PDF
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
PPTX
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
PPTX
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
PPTX
澳洲电子毕业证澳大利亚圣母大学水印成绩单UNDA学生证网上可查学历
Taqyea
 
PDF
3rd International Conference on Machine Learning and IoT (MLIoT 2025)
ClaraZara1
 
PPTX
Knowledge Representation : Semantic Networks
Amity University, Patna
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
PDF
mbse_An_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PPTX
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
Distribution reservoir and service storage pptx
dhanashree78
 
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
Digital water marking system project report
Kamal Acharya
 
OCS353 DATA SCIENCE FUNDAMENTALS- Unit 1 Introduction to Data Science
A R SIVANESH M.E., (Ph.D)
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
How Industrial Project Management Differs From Construction.pptx
jamespit799
 
MODULE 03 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
MODULE 04 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
SERVERLESS PERSONAL TO-DO LIST APPLICATION
anushaashraf20
 
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
澳洲电子毕业证澳大利亚圣母大学水印成绩单UNDA学生证网上可查学历
Taqyea
 
3rd International Conference on Machine Learning and IoT (MLIoT 2025)
ClaraZara1
 
Knowledge Representation : Semantic Networks
Amity University, Patna
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
mbse_An_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 

Scaling Web Applications with Cassandra Presentation.ppt

  • 1. introduction to cassandra eben hewitt september 29. 2010 web 2.0 expo new york city
  • 2. • director, application architecture at a global corp • focus on SOA, SaaS, Events • i wrote this @ebenhewitt
  • 4. “nosql”  “big data” • mongodb • couchdb • tokyo cabinet • redis • riak • what about? – Poet, Lotus, Xindice – they’ve been around forever… – rdbms was once the new kid…
  • 5. innovation at scale • google bigtable (2006) – consistency model: strong – data model: sparse map – clones: hbase, hypertable • amazon dynamo (2007) – O(1) dht – consistency model: client tune-able – clones: riak, voldemort cassandra ~= bigtable + dynamo
  • 6. proven • The Facebook stores 150TB of data on 150 nodes web 2.0 • used at Twitter, Rackspace, Mahalo, Reddit, Cloudkick, Cisco, Digg, SimpleGeo, Ooyala, OpenX, others
  • 7. cap theorem •consistency – all clients have same view of data •availability – writeable in the face of node failure •partition tolerance – processing can continue in the face of network failure (crashed router, broken network)
  • 9. write consistency Level Description ZERO Good luck with that ANY 1 replica (hints count) ONE 1 replica. read repair in bkgnd QUORUM (DCQ for RackAware) (N /2) + 1 ALL N = replication factor Level Description ZERO Ummm… ANY Try ONE instead ONE 1 replica QUORUM (DCQ for RackAware) Return most recent TS after (N /2) + 1 report ALL N = replication factor read consistency
  • 11. cassandra properties • tuneably consistent • very fast writes • highly available • fault tolerant • linear, elastic scalability • decentralized/symmetric • ~12 client languages – Thrift RPC API • ~automatic provisioning of new nodes • 0(1) dht • big data
  • 13. Staged Event-Driven Architecture • A general-purpose framework for high concurrency & load conditioning • Decomposes applications into stages separated by queues • Adopt a structured approach to event-driven concurrency
  • 15. data replication • configurable replication factor • replica placement strategy rack unaware  Simple Strategy rack aware  Old Network Topology Strategy data center shard  Network Topology Strategy
  • 16. partitioner smack-down Random Preserving • system will use MD5(key) to distribute data across nodes • even distribution of keys from one CF across ranges/nodes Order Preserving • key distribution determined by token • lexicographical ordering • required for range queries – scan over rows like cursor in index • can specify the token for this node to use • ‘scrabble’ distribution
  • 19. keyspace • ~= database • typically one per application • some settings are configurable only per keyspace
  • 20. column family • group records of similar kind • not same kind, because CFs are sparse tables • ex: – User – Address – Tweet – PointOfInterest – HotelRoom
  • 21. think of cassandra as row-oriented • each row is uniquely identifiable by key • rows group columns and super columns
  • 23. json-like notation User { 123 : { email: [email protected], icon: }, 456 : { email: [email protected], location: The Danger Zone} }
  • 24. 0.6 example $cassandra –f $bin/cassandra-cli cassandra> connect localhost/9160 cassandra> set Keyspace1.Standard1[‘eben’][‘age’]=‘29’ cassandra> set Keyspace1.Standard1[‘eben’][‘email’]=‘[email protected]’ cassandra> get Keyspace1.Standard1[‘eben'][‘age'] => (column=6e616d65, value=39, timestamp=1282170655390000)
  • 25. a column has 3 parts 1. name – byte[] – determines sort order – used in queries – indexed 2. value – byte[] – you don’t query on column values 3. timestamp – long (clock) – last write wins conflict resolution
  • 26. column comparators • byte • utf8 • long • timeuuid • lexicaluuid • <pluggable> – ex: lat/long
  • 27. super column super columns group columns under a common name
  • 28. <<SCF>>PointOfInterest super column family <<SC>>Central Park 10017 <<SC>> Empire State Bldg <<SC>> Phoenix Zoo 85255 desc=Fun to walk in. phone=212. 555.11212 desc=Great view from 102nd floor!
  • 29. PointOfInterest { key: 85255 { Phoenix Zoo { phone: 480-555-5555, desc: They have animals here. }, Spring Training { phone: 623-333-3333, desc: Fun for baseball fans. }, }, //end phx key: 10019 { Central Park { desc: Walk around. It's pretty.} , Empire State Building { phone: 212-777-7777, desc: Great view from 102nd floor. } } //end nyc } s super column super column family flexible schema key column super column family
  • 30. about super column families • sub-column names in a SCF are not indexed – top level columns (SCF Name) are always indexed • often used for denormalizing data from standard CFs
  • 32. slice predicate • data structure describing columns to return – SliceRange • start column name • finish column name (can be empty to stop on count) • reverse • count (like LIMIT)
  • 33. read api • get() : Column – get the Col or SC at given ColPath COSC cosc = client.get(key, path, CL); • get_slice() : List<ColumnOrSuperColumn> – get Cols in one row, specified by SlicePredicate: List<ColumnOrSuperColumn> results = client.get_slice(key, parent, predicate, CL); • multiget_slice() : Map<key, List<CoSC>> – get slices for list of keys, based on SlicePredicate Map<byte[],List<ColumnOrSuperColumn>> results = client.multiget_slice(rowKeys, parent, predicate, CL); • get_range_slices() : List<KeySlice> – returns multiple Cols according to a range – range is startkey, endkey, starttoken, endtoken: List<KeySlice> slices = client.get_range_slices( parent, predicate, keyRange, CL);
  • 34. write api client.insert(userKeyBytes, parent, new Column(“band".getBytes(UTF8), “Funkadelic".getBytes(), clock), CL); batch_mutate – void batch_mutate( map<byte[], map<String, List<Mutation>>> , CL) remove – void remove(byte[], ColumnPath column_path, Clock, CL)
  • 35. batch_mutate //create param Map<byte[], Map<String, List<Mutation>>> mutationMap = new HashMap<byte[], Map<String, List<Mutation>>>(); //create Cols for Muts Column nameCol = new Column("name".getBytes(UTF8), “Funkadelic”.getBytes("UTF-8"), new Clock(System.nanoTime());); Mutation nameMut = new Mutation(); nameMut.column_or_supercolumn = nameCosc; //also phone, etc Map<String, List<Mutation>> muts = new HashMap<String, List<Mutation>>(); List<Mutation> cols = new ArrayList<Mutation>(); cols.add(nameMut); cols.add(phoneMut); muts.put(CF, cols); //outer map key is a row key; inner map key is the CF name mutationMap.put(rowKey.getBytes(), muts); //send to server client.batch_mutate(mutationMap, CL);
  • 36. raw thrift: for masochists only • pycassa (python) • fauna (ruby) • hector (java) • pelops (java) • kundera (JPA) • hectorSharp (C#)
  • 38. rdbms: domain-based model what answers do I have? cassandra: query-based model what questions do I have?
  • 39. SELECT WHERE cassandra is an index factory <<cf>>USER Key: UserID Cols: username, email, birth date, city, state How to support this query? SELECT * FROM User WHERE city = ‘Scottsdale’ Create a new CF called UserCity: <<cf>>USERCITY Key: city Cols: IDs of the users in that city. Also uses the Valueless Column pattern
  • 40. • Use an aggregate key state:city: { user1, user2} • Get rows between AZ: & AZ; for all Arizona users • Get rows between AZ:Scottsdale & AZ:Scottsdale1 for all Scottsdale users SELECT WHERE pt 2
  • 41. ORDER BY Rows are placed according to their Partitioner: •Random: MD5 of key •Order-Preserving: actual key are sorted by key, regardless of partitioner Columns are sorted according to CompareWith or CompareSubcolumnsWith
  • 42. rdbms
  • 44. is cassandra a good fit? • you need really fast writes • you need durability • you have lots of data > GBs >= three servers • your app is evolving – startup mode, fluid data structure • loose domain data – “points of interest” • your programmers can deal – documentation – complexity – consistency model – change – visibility tools • your operations can deal – hardware considerations – can move data – JMX monitoring