SlideShare a Scribd company logo
BIG DATA! 
Great! Now what? 
Ricard Clau 
SymfonyCon 2014
HELLO WORLD! 
• Ricard Clau, born and grown up in Barcelona 
• Server engineer at Another Place Productions 
• Symfony2 lover and PHP believer (sometimes…) 
• Open-source contributor, sometimes I give talks 
• Twitter (@ricardclau) / Gmail ricard.clau@gmail.com
WE WILL TALK ABOUT… 
• Where / How to store / query our “BIG” DATA 
• SQL vs NoSQL, why we ended up here? 
• Strengths and weaknesses of both approaches 
• PHP / Symfony Status with these technologies 
• Some war stories and recommendations
QUICK DISCLAIMERS 
• Not your average PHP talk, not sure if you will 
be able to use this next week at work 
• Continuous learner about all these technologies 
• 100M records is NOT BIG DATA
“Big data is like teenage sex; 
everyone talks about it, 
nobody really knows how to do it, 
everyone thinks everyone else is doing it, 
so everyone claims they are doing it”. 
Dan Ariely, Duke University
2 BIG PROBLEMS
PROBLEM 1: STORAGE
PROBLEM 2: QUERYING
A BIT OF HISTORY 
Maybe we have not learnt so much…
A (NOT SO) LONG TIME AGO 
• Programmers processed files directly 
• Lots of people doing the same, first 
databases appeared, different APIs, 
strengths and weaknesses 
• In the early 70s IBM came with the 
SEQUEL (Structured English Query 
Language) idea, and the rest is story
Big Data! Great! Now What? #SymfonyCon 2014
WHY NOSQL EXISTS? 
• RDBMS are not brilliant to scale horizontally 
• Google, Amazon, Facebook, etc… started building 
their own solutions to meet their unique needs 
• When your data does not fit in one box, you need to 
give up consistency or availability 
• Some problems need a different approach
THE CURRENT CHAOS
RDBMS SYSTEMS 
Old rockers never die
SQL 
• A “common” query language 
• We can normalise data and query it 
• Easy to do joins, filters, aggregations 
• We don’t need to know in advance how we access data 
• We rely on each database server’s query optimiser (and 
sometimes we need a DBA)
ACID PROPERTIES 
A C I D 
Atomicity 
Transactions 
are all or 
nothing 
Consistency 
A transaction 
is subject to a 
set of rules 
Isolation 
Transactions 
do not affect 
each other 
Durability 
Written data 
will not get 
lost
WE NEED ACID 
• Banking, logistics, finance, e-commerce,… 
• Systems we started building 30 years ago… and we 
still work on them generating millions of $ daily! 
• There are many applications that still fit the relational 
model and have structured data
USUAL PROBLEMS 
• You can painfully achieve sharding, but 
you need to give up some ACID goods 
• Tricky for unstructured data 
• Not great for small read / write ratio 
• Some data structures
TRICKY SCENARIOS 
• Geospatial queries for augmented reality 
• Leaderboards for social activity, Sets operations 
• Columnar aggregations on big tables 
• Graph data traversing to analyse your customers 
• Search engines over big chunks of text
NOSQL SYSTEMS 
Different problems, different solutions
BASE PROPERTIES 
• Basically Available: appears 
to work most of the time 
• Soft state: state of the 
system may change even 
without a query 
• Eventual consistency
CAP THEOREM 
• A shared-data system cannot guarantee 
simultaneously: 
• Consistency: All clients have the same view of the data 
• Availability: Each client can always read and write 
• Partition tolerance: The system works well even 
when there are network partitions
“During a network partition, a 
distributed system must choose 
between either Consistency or 
Availability”
Availability 
Consistency 
Partition 
Tolerance 
Single Node, 
mostly RDBMS 
(MySQL, PostgreSQL, 
DB2, SQLite…) 
All nodes same role 
(Cassandra, Riak, 
DynamoDB…) 
Special nodes (Zookeeper, HBase, 
MongoDB, Redis…)
CONSISTENT HASHING
I TOTALLY NEED ACID! 
Are you sure about that?
EVENTUAL CONSISTENCY 
If you are using master-slave replication, 
you already have eventual consistency in your reads
ANALYTICS / STATS 
We can possibly afford losing a small % of the data
TRANSACTIONS 
Bank transfers happen asynchronously as well!
WHAT ABOUT PHP & SYMFONY? 
Is there any hope for us?
PHP: BEST WEB PLATFORM? 
• PHP is still heavily used, despite its many quirks 
• Mature, actively maintained libraries for everything 
• Composer makes things much easier these days 
• Symfony bundles for almost everything 
• Some databases consider PHP a second class citizen
Key-value Graph 
Column Document
KEY-VALUE STORES 
• Simple APIs, easy to install and use. You are 
already using them for caching, sessions, etc… 
• PHP Extensions: memcached, phpredis 
• Libraries: nrk/predis, basho/riak, aws/aws-sdk-php 
• Bundles: snc/redis-bundle, leaseweb/memcache-bundle, 
kbrw/riak-bundle
GRAPH DATABASES 
• Very verbose queries, access via REST APIs 
• Maybe not mature enough for source of truth 
• Libraries: everyman/neo4jphp 
• Bundles: klaussilveira/neo4j-ogm-bundle 
• IMHO, one of the next big things
CYPHER QUERY EXAMPLES 
Top 5 Sushi restaurants 
in New York for 
Philip’s friends 
2nd degree co-actors 
who have never acted 
with Tom Hanks
COLUMN-BASED STORAGES 
• Possibly the most suitable for Big Data 
• Redshift supports SQL in a petabyte scale 
database 
• Libraries: thobbs/phpcassa, pop/pop_hbase, 
PDO for Redshift (with some quirks) 
• IMHO, Cassandra will become THE database
DOCUMENT DATABASES 
• MongoDB and Couchbase look very shiny… but the 
Internet is FULL of horror scaling stories 
• PHP Extensions: mongodb, couchbase 
• Libraries: doctrine/mongodb 
• Bundles: doctrine/mongodb-odm-bundle
SEARCH ENGINES 
• Mostly Lucene based 
• PHP Extensions: solr, sphinx 
• Libraries: solarium/solarium, elasticsearch/ 
elasticsearch 
• Bundles: nelmio/solarium-bundle, 
friendsofsymfony/elastica-bundle
DATA ANALYSIS 
All businesses need this!
QUERY VS PROCESSING 
• SQL is great because we can query by any field 
• There is no standard in NoSQL databases 
• NoSQL systems are more limited, only keys (some 
allow secondary indexes) or complex graph syntax 
• We sometimes need processing for complex queries
MAP-REDUCE
HADOOP VS SPARK 
• Techniques to extract subsets of the data (MAP) and 
operate them in parallel before aggregating (REDUCE) 
• Not real time, Hadoop the most popular 
• Apache Spark opens a new paradigm for near real-time 
• You need other languages for these techniques
FINAL THOUGHTS 
Now what?
ENGINEERING CHALLENGES 
• The Internet of things will generate real BIG DATA 
• SQL / ACID technologies are not going anywhere 
• Be very careful when using NoSQL in production 
• Databases… and life… are full of tradeoffs 
• The next decade will be fascinating for the industry
READ CAREFULLY THE DOCS
CHOOSE THE RIGHT TOOL
QUESTIONS? 
• Twitter: @ricardclau 
• E-mail: ricard.clau@gmail.com 
• Github: https://github.com/ricardclau 
• Please rate the talk at https://joind.in/talk/view/12958

More Related Content

What's hot (20)

PDF
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
PPTX
Caching solutions with Redis
George Platon
 
PPTX
Redis and it's data types
Aniruddha Chakrabarti
 
PPTX
Your 1st Ceph cluster
Mirantis
 
PDF
Implementing High Availability Caching with Memcached
Gear6
 
PPT
Introduction to redis
Tanu Siwag
 
PDF
Advanced Percona XtraDB Cluster in a nutshell... la suite
Kenny Gryp
 
PDF
Oracle 12c and its pluggable databases
Gustavo Rene Antunez
 
PPTX
Redis introduction
Federico Daniel Colombo Gennarelli
 
PPTX
Redis Introduction
Alex Su
 
PPTX
Ceph Introduction 2017
Karan Singh
 
PDF
Introduction of Java GC Tuning and Java Java Mission Control
Leon Chen
 
PDF
Introduction to redis - version 2
Dvir Volk
 
PPTX
MongoDB at Scale
MongoDB
 
PPTX
Ceph and Openstack in a Nutshell
Karan Singh
 
PDF
Blazing Performance with Flame Graphs
Brendan Gregg
 
PDF
Introduction to MongoDB
Mike Dirolf
 
KEY
PostgreSQL
Reuven Lerner
 
PDF
Standard Edition High Availability (SEHA) - The Why, What & How
Markus Michalewicz
 
ODP
ansible why ?
Yashar Esmaildokht
 
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
Caching solutions with Redis
George Platon
 
Redis and it's data types
Aniruddha Chakrabarti
 
Your 1st Ceph cluster
Mirantis
 
Implementing High Availability Caching with Memcached
Gear6
 
Introduction to redis
Tanu Siwag
 
Advanced Percona XtraDB Cluster in a nutshell... la suite
Kenny Gryp
 
Oracle 12c and its pluggable databases
Gustavo Rene Antunez
 
Redis Introduction
Alex Su
 
Ceph Introduction 2017
Karan Singh
 
Introduction of Java GC Tuning and Java Java Mission Control
Leon Chen
 
Introduction to redis - version 2
Dvir Volk
 
MongoDB at Scale
MongoDB
 
Ceph and Openstack in a Nutshell
Karan Singh
 
Blazing Performance with Flame Graphs
Brendan Gregg
 
Introduction to MongoDB
Mike Dirolf
 
PostgreSQL
Reuven Lerner
 
Standard Edition High Availability (SEHA) - The Why, What & How
Markus Michalewicz
 
ansible why ?
Yashar Esmaildokht
 

Similar to Big Data! Great! Now What? #SymfonyCon 2014 (20)

PDF
Modern software architectures - PHP UK Conference 2015
Ricard Clau
 
KEY
What ya gonna do?
CQD
 
PPTX
Sql vs NoSQL
RTigger
 
PDF
Scaling with Symfony - PHP UK
Ricard Clau
 
PDF
NoSQL for great good [hanoi.rb talk]
Huy Do
 
PDF
Why we love ArangoDB. The hunt for the right NosQL Database
Andreas Jung
 
PDF
Redis Everywhere - Sunshine PHP
Ricard Clau
 
PDF
Scalability, Availability & Stability Patterns
Jonas Bonér
 
PDF
Speed up your Symfony2 application and build awesome features with Redis
Ricard Clau
 
PPTX
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
KEY
Why ruby and rails
Reuven Lerner
 
PPTX
Big Data (NJ SQL Server User Group)
Don Demcsak
 
PDF
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Bob Pusateri
 
PPTX
Mapping Life Science Informatics to the Cloud
Chris Dagdigian
 
PPTX
Be faster then rabbits
Vladislav Bauer
 
PPTX
Intro to Big Data and NoSQL
Don Demcsak
 
PPT
The Economies of Scaling Software
Abdelmonaim Remani
 
PDF
Oracle Week 2016 - Modern Data Architecture
Arthur Gimpel
 
PPT
The economies of scaling software - Abdel Remani
jaxconf
 
PDF
Database Technologies
Michel de Goede
 
Modern software architectures - PHP UK Conference 2015
Ricard Clau
 
What ya gonna do?
CQD
 
Sql vs NoSQL
RTigger
 
Scaling with Symfony - PHP UK
Ricard Clau
 
NoSQL for great good [hanoi.rb talk]
Huy Do
 
Why we love ArangoDB. The hunt for the right NosQL Database
Andreas Jung
 
Redis Everywhere - Sunshine PHP
Ricard Clau
 
Scalability, Availability & Stability Patterns
Jonas Bonér
 
Speed up your Symfony2 application and build awesome features with Redis
Ricard Clau
 
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Why ruby and rails
Reuven Lerner
 
Big Data (NJ SQL Server User Group)
Don Demcsak
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Bob Pusateri
 
Mapping Life Science Informatics to the Cloud
Chris Dagdigian
 
Be faster then rabbits
Vladislav Bauer
 
Intro to Big Data and NoSQL
Don Demcsak
 
The Economies of Scaling Software
Abdelmonaim Remani
 
Oracle Week 2016 - Modern Data Architecture
Arthur Gimpel
 
The economies of scaling software - Abdel Remani
jaxconf
 
Database Technologies
Michel de Goede
 
Ad

More from Ricard Clau (13)

PDF
Essential Info for the Devops Barcelona 2024 Conference
Ricard Clau
 
PDF
devopsbcn23.pdf
Ricard Clau
 
PDF
devopsbcn22.pdf
Ricard Clau
 
PDF
NoEresTanEspecial-PulpoCon22.pdf
Ricard Clau
 
PDF
DevOps & Infraestructura como código: Promesas Rotas
Ricard Clau
 
PDF
DevOps Barcelona Conference 2018 - Intro
Ricard Clau
 
PDF
Hashicorp at holaluz
Ricard Clau
 
PDF
What we talk about when we talk about DevOps
Ricard Clau
 
PDF
Building a bakery of Windows servers with Packer - London WinOps
Ricard Clau
 
PDF
Redis everywhere - PHP London
Ricard Clau
 
PDF
Escalabilidad y alto rendimiento con Symfony2
Ricard Clau
 
PDF
Betabeers Barcelona - Buenas prácticas
Ricard Clau
 
PDF
Desymfony - Servicios
Ricard Clau
 
Essential Info for the Devops Barcelona 2024 Conference
Ricard Clau
 
devopsbcn23.pdf
Ricard Clau
 
devopsbcn22.pdf
Ricard Clau
 
NoEresTanEspecial-PulpoCon22.pdf
Ricard Clau
 
DevOps & Infraestructura como código: Promesas Rotas
Ricard Clau
 
DevOps Barcelona Conference 2018 - Intro
Ricard Clau
 
Hashicorp at holaluz
Ricard Clau
 
What we talk about when we talk about DevOps
Ricard Clau
 
Building a bakery of Windows servers with Packer - London WinOps
Ricard Clau
 
Redis everywhere - PHP London
Ricard Clau
 
Escalabilidad y alto rendimiento con Symfony2
Ricard Clau
 
Betabeers Barcelona - Buenas prácticas
Ricard Clau
 
Desymfony - Servicios
Ricard Clau
 
Ad

Recently uploaded (20)

PPTX
Climate Action.pptx action plan for climate
justfortalabat
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
Climate Action.pptx action plan for climate
justfortalabat
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 

Big Data! Great! Now What? #SymfonyCon 2014

  • 1. BIG DATA! Great! Now what? Ricard Clau SymfonyCon 2014
  • 2. HELLO WORLD! • Ricard Clau, born and grown up in Barcelona • Server engineer at Another Place Productions • Symfony2 lover and PHP believer (sometimes…) • Open-source contributor, sometimes I give talks • Twitter (@ricardclau) / Gmail [email protected]
  • 3. WE WILL TALK ABOUT… • Where / How to store / query our “BIG” DATA • SQL vs NoSQL, why we ended up here? • Strengths and weaknesses of both approaches • PHP / Symfony Status with these technologies • Some war stories and recommendations
  • 4. QUICK DISCLAIMERS • Not your average PHP talk, not sure if you will be able to use this next week at work • Continuous learner about all these technologies • 100M records is NOT BIG DATA
  • 5. “Big data is like teenage sex; everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it”. Dan Ariely, Duke University
  • 9. A BIT OF HISTORY Maybe we have not learnt so much…
  • 10. A (NOT SO) LONG TIME AGO • Programmers processed files directly • Lots of people doing the same, first databases appeared, different APIs, strengths and weaknesses • In the early 70s IBM came with the SEQUEL (Structured English Query Language) idea, and the rest is story
  • 12. WHY NOSQL EXISTS? • RDBMS are not brilliant to scale horizontally • Google, Amazon, Facebook, etc… started building their own solutions to meet their unique needs • When your data does not fit in one box, you need to give up consistency or availability • Some problems need a different approach
  • 14. RDBMS SYSTEMS Old rockers never die
  • 15. SQL • A “common” query language • We can normalise data and query it • Easy to do joins, filters, aggregations • We don’t need to know in advance how we access data • We rely on each database server’s query optimiser (and sometimes we need a DBA)
  • 16. ACID PROPERTIES A C I D Atomicity Transactions are all or nothing Consistency A transaction is subject to a set of rules Isolation Transactions do not affect each other Durability Written data will not get lost
  • 17. WE NEED ACID • Banking, logistics, finance, e-commerce,… • Systems we started building 30 years ago… and we still work on them generating millions of $ daily! • There are many applications that still fit the relational model and have structured data
  • 18. USUAL PROBLEMS • You can painfully achieve sharding, but you need to give up some ACID goods • Tricky for unstructured data • Not great for small read / write ratio • Some data structures
  • 19. TRICKY SCENARIOS • Geospatial queries for augmented reality • Leaderboards for social activity, Sets operations • Columnar aggregations on big tables • Graph data traversing to analyse your customers • Search engines over big chunks of text
  • 20. NOSQL SYSTEMS Different problems, different solutions
  • 21. BASE PROPERTIES • Basically Available: appears to work most of the time • Soft state: state of the system may change even without a query • Eventual consistency
  • 22. CAP THEOREM • A shared-data system cannot guarantee simultaneously: • Consistency: All clients have the same view of the data • Availability: Each client can always read and write • Partition tolerance: The system works well even when there are network partitions
  • 23. “During a network partition, a distributed system must choose between either Consistency or Availability”
  • 24. Availability Consistency Partition Tolerance Single Node, mostly RDBMS (MySQL, PostgreSQL, DB2, SQLite…) All nodes same role (Cassandra, Riak, DynamoDB…) Special nodes (Zookeeper, HBase, MongoDB, Redis…)
  • 26. I TOTALLY NEED ACID! Are you sure about that?
  • 27. EVENTUAL CONSISTENCY If you are using master-slave replication, you already have eventual consistency in your reads
  • 28. ANALYTICS / STATS We can possibly afford losing a small % of the data
  • 29. TRANSACTIONS Bank transfers happen asynchronously as well!
  • 30. WHAT ABOUT PHP & SYMFONY? Is there any hope for us?
  • 31. PHP: BEST WEB PLATFORM? • PHP is still heavily used, despite its many quirks • Mature, actively maintained libraries for everything • Composer makes things much easier these days • Symfony bundles for almost everything • Some databases consider PHP a second class citizen
  • 33. KEY-VALUE STORES • Simple APIs, easy to install and use. You are already using them for caching, sessions, etc… • PHP Extensions: memcached, phpredis • Libraries: nrk/predis, basho/riak, aws/aws-sdk-php • Bundles: snc/redis-bundle, leaseweb/memcache-bundle, kbrw/riak-bundle
  • 34. GRAPH DATABASES • Very verbose queries, access via REST APIs • Maybe not mature enough for source of truth • Libraries: everyman/neo4jphp • Bundles: klaussilveira/neo4j-ogm-bundle • IMHO, one of the next big things
  • 35. CYPHER QUERY EXAMPLES Top 5 Sushi restaurants in New York for Philip’s friends 2nd degree co-actors who have never acted with Tom Hanks
  • 36. COLUMN-BASED STORAGES • Possibly the most suitable for Big Data • Redshift supports SQL in a petabyte scale database • Libraries: thobbs/phpcassa, pop/pop_hbase, PDO for Redshift (with some quirks) • IMHO, Cassandra will become THE database
  • 37. DOCUMENT DATABASES • MongoDB and Couchbase look very shiny… but the Internet is FULL of horror scaling stories • PHP Extensions: mongodb, couchbase • Libraries: doctrine/mongodb • Bundles: doctrine/mongodb-odm-bundle
  • 38. SEARCH ENGINES • Mostly Lucene based • PHP Extensions: solr, sphinx • Libraries: solarium/solarium, elasticsearch/ elasticsearch • Bundles: nelmio/solarium-bundle, friendsofsymfony/elastica-bundle
  • 39. DATA ANALYSIS All businesses need this!
  • 40. QUERY VS PROCESSING • SQL is great because we can query by any field • There is no standard in NoSQL databases • NoSQL systems are more limited, only keys (some allow secondary indexes) or complex graph syntax • We sometimes need processing for complex queries
  • 42. HADOOP VS SPARK • Techniques to extract subsets of the data (MAP) and operate them in parallel before aggregating (REDUCE) • Not real time, Hadoop the most popular • Apache Spark opens a new paradigm for near real-time • You need other languages for these techniques
  • 44. ENGINEERING CHALLENGES • The Internet of things will generate real BIG DATA • SQL / ACID technologies are not going anywhere • Be very careful when using NoSQL in production • Databases… and life… are full of tradeoffs • The next decade will be fascinating for the industry
  • 47. QUESTIONS? • Twitter: @ricardclau • E-mail: [email protected] • Github: https://github.com/ricardclau • Please rate the talk at https://joind.in/talk/view/12958