SlideShare a Scribd company logo
CURRENT TRENDS IN BIG DATA
AGENDA
 B U S I N E S S S C E N A R I O S - B I G D A T A
 Technologies solving business challenges
 Big data Tools – Solving Business Cases
 E N G A G E M E N T D A T A B A S E V S T R A N S A C T I O N A L
 W H Y C O U C H B A S E
 Couchbase Key Capabilities
 Couchbase Unique Value Proposition
 Fourth dimensions of Big Data
 Framework – Layered Approach
 Tools & Technologies
 Q U E S T I O N S
3
Business Scenarios – Need of Big Data
Big
Data
Actionable
Insights with
Right Team
Customer
Experience
with
Connected
Device
Data
Governance &
Compliance
Provide
Insights for
Humans and
AI Tools
Choose right
technology
and right
platform
Leverage proper
model to align
with right product
for right
customers
4
Technologies solving business challenges
Batch Use cases
Analytic ( )
Real time Use cases
Engagement ( )
 Customer segmentation,
 Associations of products,
 Customer churn analysis,
 Cart abandonment
 Fraudulent transactions,
 Credit card application validation,
 Customer churn analysis,
 Customer experience analytics
 Personalization
Hadoop features :
 To process large volume of data
 Take seconds for processing
 Variable based on the load
 Disk-Centric
 Can handle Petabytes of data
 Batch processing and Adhoc
Couchbase feature:
 Real time and Interactive access to data
 Very low latency < 1ms ops
 Predictable performance
 Memory-Centric
 Can handles 10s of Terabytes
 Online and Adhoc
5
Big data Tools – Solving Business Cases
Spark Use Cases Kafka Use Cases Storm Use Cases Flink Use Cases
 Real-time recommendations
 Component failure detection
 Network intruder detection
 Fraud detection
 Web Activity tracking
 Log Aggregations
 Event sourcing
 RSS Feed stream analysis
 Social media analytics
 Real time trade-in analytics
 Malfunction detection
 Log & Metric analysis
Fraud detection
Ana moly detection
Rule based alerting
Real-time search index building
Spark features:
 Spark uses Dstream for
streaming data RDD (Resilient
distributed dataset) for batch
data.
 Cache dataset mainly used for
ML
 Persist RDD
 Micro-Batch processing
Kafka features:
 Distributed messaging system
 Acts as a Message Broker
 Persist data in File system such
as EXT4 OR XFS
 Small-batch processing
 Zookeeper (scheduler)
dependent
Storm features:
 Real-time messaging
 Stream processing
 Data is not persisted
 Micro-batch processing
Flink features:
 Built exclusively for streaming
 Data is not persisted.
 Use checkpoints on streaming
data to finite sets
6
Characteristics of RDBMS Versus Engagement (NOSQL) Databases
RDBMS NOSQL Databases
Moderate velocity of data High velocity of data (devices, sensor..etc.)
Data coming in from one/few locations Data coming from many locations
Primary structured data Structured, with semi-unstructured
Complex/Nested transactions Simple transactions
Protect uptime via failover/log shipping Protect uptime via architecture
High Availability Continuous Availability
Deploy app central location/one server Deploy app everywhere/many servers
Primary concerns: scale reads Scale reads/writes
Scale up more users/data Scale out for more users/data
Maintain data volumes with Purge High data volumes; retains forever.
Couchbase - Key Capabilities
 ANSI Join is a
standardized join syntax
widely used in relational
databases.
 Eventing Service
enables server-side
functions using the
familiar EventCondition-
Action mode
Automate multi-cloud
deployment and
management with
Kubernetes.
▪ Auto failover for common
disk failures without
operator intervention
▪ Compliance with
enhanced certificate
authentication, auditing
support, and log redaction
on user sensitive data
 Ease of deployment
 Gain app insights
Large scale N1QL
deployments with hash
joins, faster aggregates, and
index partitioning.
▪ Orders of magnitude faster
N1QL queries with hash
joins and aggregate
optimization
▪ End-to-end data
compression from client to
server
 Find answer faster
 Scale more easily
 Improve TCO
Agility Manageability Performance
End-to-end Compression GSI INDEX PARTITIONING
Couchbase Unique Value Proposition
AGGREGATE PUSHDOWN
INDEX BUILD
TIME 50%
INDEX SIZE
-70%
FTS SCORCH INDEXING (DP)
Strategy :The fourth dimension of Big Data:
Volume Velocity Veracity*
9
Variety
Data at Rest
Terabytes to exabytes
of existing data to
process
Data in Motion
Streaming data,
milliseconds to
seconds to respond
Data in Many
Forms
Structured,
unstructured, text,
multimedia
Data in Doubt
Uncertainty due to data
inconsistency
& incompleteness,
ambiguities, latency,
deception, model
approximations
Framework – Layered Approach
Collection Layer
Data collection, registration and Profile
Storage & Consolidation Layer
Storage solution for structured/
Unstructured data
Process Layer
Data mining data set
Point of Sale data set
Reporting data set
Analytical Layer
Miscellaneous ways in
which data is consumed to
derive value
• Predictive analytics
• Ad-hoc analytics and
optimization.
• Segmentation, Profiling.
• Customer tastes, affinities
an segments
• Recommendations and
context engines.
Monetization and
Experience
• Personalization and
experiences.
• Brand Marketing
• Channel and Operation
Optimization
Payable Omniture Other
Tools and Technologies
Ingest/Propagate
Apache Flume, Apache Kafka, Apache Scoop, Spark, Storm, Flink
Implement, Search, Analyze
Apache MapReduce, Apache Solr, Apache Blur and Lucene
Apache Hama, Apache Giraph, Cloudera Impala, Apache Storm, Apache HBase
Persist
File System: Apache HDFS, MapR Distributed File System
Serialization: Apache Avro
DBMS: Apache Cassandra, Couchbase ,Apache Hbase, Apache Accumulo, MongoDB
Monitor, Administer, Manage
Apache Ambari, Apache Oozie, Apache Zoopeper, Cloudera Enterprise Manager
Advanced Analytics and Machine Learning
Apache Drill, Apache Mahout, Datameer, Rhadoop, IBM SPSS, SAS, MatLab, R, D3, Python

More Related Content

What's hot (18)

PPTX
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ali Hodroj
 
PDF
Sensordaten analysieren mit Docker, CrateDB und Grafana
Claus Matzinger
 
PPTX
Big Data Ecosystem
Ivo Vachkov
 
PDF
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Neo4j
 
PPTX
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Ali Hodroj
 
PDF
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Denodo
 
PDF
Big Data Analytics for Real Time Systems
Kamalika Dutta
 
PDF
Creating a Modern Data Architecture for Digital Transformation
MongoDB
 
PPTX
Obfuscating LinkedIn Member Data
DataWorks Summit
 
PPTX
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Caserta
 
PPTX
Building a future-proof cyber security platform with Apache Metron
DataWorks Summit
 
PPTX
LendingClub RealTime BigData Platform with Oracle GoldenGate
Rajit Saha
 
PPTX
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Stratio
 
PPT
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...
Yahoo Developer Network
 
PDF
Threat Detection and Response at Scale with Dominique Brezinski
Databricks
 
PDF
Getting started with Cosmos DB + Linkurious Enterprise
Linkurious
 
PPTX
Delivering Quality Open Data by Chelsea Ursaner
Data Con LA
 
PPTX
O2 060814
Richard Edwards
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ali Hodroj
 
Sensordaten analysieren mit Docker, CrateDB und Grafana
Claus Matzinger
 
Big Data Ecosystem
Ivo Vachkov
 
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Neo4j
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Ali Hodroj
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Denodo
 
Big Data Analytics for Real Time Systems
Kamalika Dutta
 
Creating a Modern Data Architecture for Digital Transformation
MongoDB
 
Obfuscating LinkedIn Member Data
DataWorks Summit
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Caserta
 
Building a future-proof cyber security platform with Apache Metron
DataWorks Summit
 
LendingClub RealTime BigData Platform with Oracle GoldenGate
Rajit Saha
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Stratio
 
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...
Yahoo Developer Network
 
Threat Detection and Response at Scale with Dominique Brezinski
Databricks
 
Getting started with Cosmos DB + Linkurious Enterprise
Linkurious
 
Delivering Quality Open Data by Chelsea Ursaner
Data Con LA
 
O2 060814
Richard Edwards
 

Similar to Big data presentationandoverview_of_couchbase (20)

PPTX
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Philip Filleul
 
PPTX
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
SoftServe
 
PPTX
Big Data Session 1.pptx
ElsonPaul2
 
PDF
Big Data application - OSS / BSS
Keyur Thakore
 
PDF
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
PDF
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Data Con LA
 
PPTX
Big Data Analytics PPT - S1 working .pptx
VivekChaurasia43
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PDF
Customer value analysis of big data products
Vikas Sardana
 
PDF
big_data_case_studies.pdf
vishal choudhary
 
PDF
Innovating With Data and Analytics
VMware Tanzu
 
PDF
Data Virtualization. An Introduction (ASEAN)
Denodo
 
PDF
CWIN17 India / Bigdata architecture yashowardhan sowale
Capgemini
 
PDF
Data Driven Advanced Analytics using Denodo Platform on AWS
Denodo
 
PPTX
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Streamsets Inc.
 
PPTX
Fast Data Strategy Houston Roadshow Presentation
Denodo
 
PDF
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
PPTX
Data lake-itweekend-sharif university-vahid amiry
datastack
 
PPTX
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
MapR Technologies
 
PDF
Analytics&IoT
Selvaraj Kesavan
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Philip Filleul
 
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...
SoftServe
 
Big Data Session 1.pptx
ElsonPaul2
 
Big Data application - OSS / BSS
Keyur Thakore
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Data Con LA
 
Big Data Analytics PPT - S1 working .pptx
VivekChaurasia43
 
big data eco system fundamentals of data science
arivukarasi
 
Customer value analysis of big data products
Vikas Sardana
 
big_data_case_studies.pdf
vishal choudhary
 
Innovating With Data and Analytics
VMware Tanzu
 
Data Virtualization. An Introduction (ASEAN)
Denodo
 
CWIN17 India / Bigdata architecture yashowardhan sowale
Capgemini
 
Data Driven Advanced Analytics using Denodo Platform on AWS
Denodo
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Streamsets Inc.
 
Fast Data Strategy Houston Roadshow Presentation
Denodo
 
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
Data lake-itweekend-sharif university-vahid amiry
datastack
 
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
MapR Technologies
 
Analytics&IoT
Selvaraj Kesavan
 
Ad

Recently uploaded (20)

PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PPTX
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
Ad

Big data presentationandoverview_of_couchbase

  • 1. CURRENT TRENDS IN BIG DATA
  • 2. AGENDA  B U S I N E S S S C E N A R I O S - B I G D A T A  Technologies solving business challenges  Big data Tools – Solving Business Cases  E N G A G E M E N T D A T A B A S E V S T R A N S A C T I O N A L  W H Y C O U C H B A S E  Couchbase Key Capabilities  Couchbase Unique Value Proposition  Fourth dimensions of Big Data  Framework – Layered Approach  Tools & Technologies  Q U E S T I O N S
  • 3. 3 Business Scenarios – Need of Big Data Big Data Actionable Insights with Right Team Customer Experience with Connected Device Data Governance & Compliance Provide Insights for Humans and AI Tools Choose right technology and right platform Leverage proper model to align with right product for right customers
  • 4. 4 Technologies solving business challenges Batch Use cases Analytic ( ) Real time Use cases Engagement ( )  Customer segmentation,  Associations of products,  Customer churn analysis,  Cart abandonment  Fraudulent transactions,  Credit card application validation,  Customer churn analysis,  Customer experience analytics  Personalization Hadoop features :  To process large volume of data  Take seconds for processing  Variable based on the load  Disk-Centric  Can handle Petabytes of data  Batch processing and Adhoc Couchbase feature:  Real time and Interactive access to data  Very low latency < 1ms ops  Predictable performance  Memory-Centric  Can handles 10s of Terabytes  Online and Adhoc
  • 5. 5 Big data Tools – Solving Business Cases Spark Use Cases Kafka Use Cases Storm Use Cases Flink Use Cases  Real-time recommendations  Component failure detection  Network intruder detection  Fraud detection  Web Activity tracking  Log Aggregations  Event sourcing  RSS Feed stream analysis  Social media analytics  Real time trade-in analytics  Malfunction detection  Log & Metric analysis Fraud detection Ana moly detection Rule based alerting Real-time search index building Spark features:  Spark uses Dstream for streaming data RDD (Resilient distributed dataset) for batch data.  Cache dataset mainly used for ML  Persist RDD  Micro-Batch processing Kafka features:  Distributed messaging system  Acts as a Message Broker  Persist data in File system such as EXT4 OR XFS  Small-batch processing  Zookeeper (scheduler) dependent Storm features:  Real-time messaging  Stream processing  Data is not persisted  Micro-batch processing Flink features:  Built exclusively for streaming  Data is not persisted.  Use checkpoints on streaming data to finite sets
  • 6. 6 Characteristics of RDBMS Versus Engagement (NOSQL) Databases RDBMS NOSQL Databases Moderate velocity of data High velocity of data (devices, sensor..etc.) Data coming in from one/few locations Data coming from many locations Primary structured data Structured, with semi-unstructured Complex/Nested transactions Simple transactions Protect uptime via failover/log shipping Protect uptime via architecture High Availability Continuous Availability Deploy app central location/one server Deploy app everywhere/many servers Primary concerns: scale reads Scale reads/writes Scale up more users/data Scale out for more users/data Maintain data volumes with Purge High data volumes; retains forever.
  • 7. Couchbase - Key Capabilities  ANSI Join is a standardized join syntax widely used in relational databases.  Eventing Service enables server-side functions using the familiar EventCondition- Action mode Automate multi-cloud deployment and management with Kubernetes. ▪ Auto failover for common disk failures without operator intervention ▪ Compliance with enhanced certificate authentication, auditing support, and log redaction on user sensitive data  Ease of deployment  Gain app insights Large scale N1QL deployments with hash joins, faster aggregates, and index partitioning. ▪ Orders of magnitude faster N1QL queries with hash joins and aggregate optimization ▪ End-to-end data compression from client to server  Find answer faster  Scale more easily  Improve TCO Agility Manageability Performance
  • 8. End-to-end Compression GSI INDEX PARTITIONING Couchbase Unique Value Proposition AGGREGATE PUSHDOWN INDEX BUILD TIME 50% INDEX SIZE -70% FTS SCORCH INDEXING (DP)
  • 9. Strategy :The fourth dimension of Big Data: Volume Velocity Veracity* 9 Variety Data at Rest Terabytes to exabytes of existing data to process Data in Motion Streaming data, milliseconds to seconds to respond Data in Many Forms Structured, unstructured, text, multimedia Data in Doubt Uncertainty due to data inconsistency & incompleteness, ambiguities, latency, deception, model approximations
  • 10. Framework – Layered Approach Collection Layer Data collection, registration and Profile Storage & Consolidation Layer Storage solution for structured/ Unstructured data Process Layer Data mining data set Point of Sale data set Reporting data set Analytical Layer Miscellaneous ways in which data is consumed to derive value • Predictive analytics • Ad-hoc analytics and optimization. • Segmentation, Profiling. • Customer tastes, affinities an segments • Recommendations and context engines. Monetization and Experience • Personalization and experiences. • Brand Marketing • Channel and Operation Optimization Payable Omniture Other
  • 11. Tools and Technologies Ingest/Propagate Apache Flume, Apache Kafka, Apache Scoop, Spark, Storm, Flink Implement, Search, Analyze Apache MapReduce, Apache Solr, Apache Blur and Lucene Apache Hama, Apache Giraph, Cloudera Impala, Apache Storm, Apache HBase Persist File System: Apache HDFS, MapR Distributed File System Serialization: Apache Avro DBMS: Apache Cassandra, Couchbase ,Apache Hbase, Apache Accumulo, MongoDB Monitor, Administer, Manage Apache Ambari, Apache Oozie, Apache Zoopeper, Cloudera Enterprise Manager Advanced Analytics and Machine Learning Apache Drill, Apache Mahout, Datameer, Rhadoop, IBM SPSS, SAS, MatLab, R, D3, Python

Editor's Notes

  • #8: Source: Couchbase website