SlideShare a Scribd company logo
MongoDBEurope2016
Old Billingsgate, London
15th November
Distributed Ledgers, Blockchain + MongoDB
Bryan Reinero
MongoDB Connector For Spark
Live Demo: Introducing the Spark Connector for MongoDB
HDFS
Distributed Data
Spark
Stand
Alone
YARN
Mesos
HDFS
Distributed Resources
YARN
Spark
Mesos
HDFS
Spark
Stand
Alone
Hadoop
Distributed Processing
YARN
Spark
Mesos
Hive
Pig
HDFS
Hadoop
Spark
Stand
Alone
Domain Specific Languages
YARN
Spark
Mesos
Hive
Pig
Spark
SQL
Spark Shell
Spark
Streaming
HDFS
Spark
Stand
Alone
Hadoop
YARN
Spark
Mesos
Hive
Pig
Spark
SQL
Spark Shell
Spark
Streaming
Spark
Stand
Alone
Hadoop
Stand
Alone
YARN
Spark
Mesos
Spark
SQL
Spark
Shell
Spark
Streaming
Stand
Alone
YAR
N
Spark
Mesos
Spar
k
SQL
Spar
k
Shell
Spark
Streamin
g
executor
Worker
Node
executor
Worker
Node
Master
Spark
Connector
Driver
Application
Parellelize
Parellelize
Parellelize
Parellelize
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transformations
filter( func )
union( func )
intersection( set )
distinct( n )
map( function )
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Action
Action
Action
Action
Actions
collect()
count()
first()
take( n )
reduce( function )
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Action
Action
Action
Action
Result
Result
Result
Result
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Action
Action
Action
Action
Result
Result
Result
Result
Lineage
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Action
Action
Action
Action
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Action
Action
Action
Action
Parellelize
Parellelize
Parellelize
Parellelize
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Transform
Action
Action
Action
Action
Result
Result
Result
Result
Using the Connector
https://github.com/mongodb/mongo-spark
Live Demo: Introducing the Spark Connector for MongoDB
http://spark.apache.org/docs/latest/
Live Demo: Introducing the Spark Connector for MongoDB
Live Demo: Introducing the Spark Connector for MongoDB
{
"_id" : ObjectId("578be1fe1fe699f2deb80807"),
"user_id" : 196,
"movie_id" : 242,
"rating" : 3,
"timestamp" : 881250949
}
./bin/spark-shell 
--conf 
"spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" 
--conf 
"spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" 
--packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0
./bin/spark-shell 
--conf 
"spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" 
--conf 
"spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" 
--packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0
./bin/spark-shell 
--conf 
"spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" 
--conf 
"spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" 
--packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0
./bin/spark-shell 
--conf 
"spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" 
--conf 
"spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" 
--packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0
import com.mongodb.spark._
import com.mongodb.spark.rdd.MongoRDD
import org.bson.Document
val rdd = sc.loadFromMongoDB()
for( doc <- rdd.take( 10 ) ) println( doc )
Read Config Write Config
Aggregation Filters
$match | $project | $group
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
val aggRdd =
rdd.withPipeline(
Seq(
Document.parse(
"{ $match: { Country: "USA" } }"
)
)
)
Spark SQL +
Dataframes
RDD + Schema =
Dataframe
Live Demo: Introducing the Spark Connector for MongoDB
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
JSON
$sample
Live Demo: Introducing the Spark Connector for MongoDB
Data
Locality
mongo
s
Courses and
Resources
https://university.mongodb.com/courses/M233/a
bout
THANKS!
@blimpyacht

More Related Content

What's hot (20)

PDF
Guru4Pro Data Vault Best Practices
CGI
 
PPTX
Introduction to Data Engineering
Hadi Fadlallah
 
PPTX
Modernize & Automate Analytics Data Pipelines
Carole Gunst
 
PDF
Performance Analysis of Apache Spark and Presto in Cloud Environments
Databricks
 
ODP
Introduction to MongoDB
Dineesha Suraweera
 
PDF
Introduction to Cassandra
Gokhan Atil
 
PDF
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
PDF
Machine Learning for retail and ecommerce
Andrei Lopatenko
 
PDF
An overview of Neo4j Internals
Tobias Lindaaker
 
PPTX
Spark
Heena Madan
 
PPTX
SHACL by example
Jose Emilio Labra Gayo
 
PDF
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Semantic Web Company
 
PDF
Simplifying Big Data Analytics with Apache Spark
Databricks
 
PDF
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Databricks
 
PPTX
FIWARE Wednesday Webinars - How to Design DataModels
FIWARE
 
PDF
Spark overview
Lisa Hua
 
PPTX
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
PDF
Apache Hudi: The Path Forward
Alluxio, Inc.
 
PDF
Graph Databases: Trends in the Web of Data
Marko Rodriguez
 
PPTX
Azure Databricks - An Introduction (by Kris Bock)
Daniel Toomey
 
Guru4Pro Data Vault Best Practices
CGI
 
Introduction to Data Engineering
Hadi Fadlallah
 
Modernize & Automate Analytics Data Pipelines
Carole Gunst
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Databricks
 
Introduction to MongoDB
Dineesha Suraweera
 
Introduction to Cassandra
Gokhan Atil
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
Machine Learning for retail and ecommerce
Andrei Lopatenko
 
An overview of Neo4j Internals
Tobias Lindaaker
 
SHACL by example
Jose Emilio Labra Gayo
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Semantic Web Company
 
Simplifying Big Data Analytics with Apache Spark
Databricks
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Databricks
 
FIWARE Wednesday Webinars - How to Design DataModels
FIWARE
 
Spark overview
Lisa Hua
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Apache Hudi: The Path Forward
Alluxio, Inc.
 
Graph Databases: Trends in the Web of Data
Marko Rodriguez
 
Azure Databricks - An Introduction (by Kris Bock)
Daniel Toomey
 

Viewers also liked (20)

PDF
Webinar: MongoDB Connector for Spark
MongoDB
 
PDF
How To Connect Spark To Your Own Datasource
MongoDB
 
PDF
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
MongoDB
 
PDF
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB
 
PPTX
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
PPTX
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
PDF
Webinar: Working with Graph Data in MongoDB
MongoDB
 
PDF
Blazing Fast Analytics with MongoDB & Spark
MongoDB
 
PDF
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
João Gabriel Lima
 
PPTX
An Introduction to MongoDB Ops Manager
MongoDB
 
PDF
MongoDB Launchpad 2016: What’s New in the 3.4 Server
MongoDB
 
PDF
Intro to OpenShift, MongoDB Atlas & Live Demo
MongoDB
 
PDF
Spark and MongoDB
Norberto Leite
 
PPTX
Webinar: Simplifying the Database Experience with MongoDB Atlas
MongoDB
 
PDF
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
NoSQLmatters
 
PPTX
Big Data Analytics with Storm, Spark and GraphLab
Impetus Technologies
 
PPTX
Spark Summit East 2017: Apache spark and object stores
Steve Loughran
 
PPTX
2017 02-07 - elastic & spark. building a search geo locator
Alberto Paro
 
PPTX
Big Data Testing: Ensuring MongoDB Data Quality
RTTS
 
PDF
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Spark Summit
 
Webinar: MongoDB Connector for Spark
MongoDB
 
How To Connect Spark To Your Own Datasource
MongoDB
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
MongoDB
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
Webinar: Working with Graph Data in MongoDB
MongoDB
 
Blazing Fast Analytics with MongoDB & Spark
MongoDB
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
João Gabriel Lima
 
An Introduction to MongoDB Ops Manager
MongoDB
 
MongoDB Launchpad 2016: What’s New in the 3.4 Server
MongoDB
 
Intro to OpenShift, MongoDB Atlas & Live Demo
MongoDB
 
Spark and MongoDB
Norberto Leite
 
Webinar: Simplifying the Database Experience with MongoDB Atlas
MongoDB
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
NoSQLmatters
 
Big Data Analytics with Storm, Spark and GraphLab
Impetus Technologies
 
Spark Summit East 2017: Apache spark and object stores
Steve Loughran
 
2017 02-07 - elastic & spark. building a search geo locator
Alberto Paro
 
Big Data Testing: Ensuring MongoDB Data Quality
RTTS
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Spark Summit
 
Ad

Similar to Live Demo: Introducing the Spark Connector for MongoDB (20)

PDF
MongoDB + Spark
Bryan Reinero
 
PPTX
Webinar: Building Your First App in Node.js
MongoDB
 
PPTX
Webinar: Building Your First App in Node.js
MongoDB
 
PPTX
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
CAPSiDE
 
PPTX
Mongo db and hadoop driving business insights - final
MongoDB
 
PDF
MongoDB and Node.js
Norberto Leite
 
PDF
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
PPTX
Introduction-to-MongoDB with mongoose and Node
PrakashSingh320275
 
POTX
Webinar: MongoDB + Hadoop
MongoDB
 
PDF
Confluent & MongoDB APAC Lunch & Learn
confluent
 
PDF
MongoDB - General Purpose Database
Ashnikbiz
 
KEY
Mongodb intro
christkv
 
PDF
Building your first app with MongoDB
Norberto Leite
 
POTX
What's the Scoop on Hadoop? How It Works and How to WORK IT!
MongoDB
 
PDF
MongoDB and Hadoop: Driving Business Insights
MongoDB
 
PDF
Introduction to MongoDB
Mike Dirolf
 
PDF
Barcelona MUG MongoDB + Hadoop Presentation
Norberto Leite
 
PDF
Using MongoDB and Python
Mike Bright
 
PDF
2016 feb-23 pyugre-py_mongo
Michael Bright
 
PDF
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
Laura Ventura
 
MongoDB + Spark
Bryan Reinero
 
Webinar: Building Your First App in Node.js
MongoDB
 
Webinar: Building Your First App in Node.js
MongoDB
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
CAPSiDE
 
Mongo db and hadoop driving business insights - final
MongoDB
 
MongoDB and Node.js
Norberto Leite
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
Introduction-to-MongoDB with mongoose and Node
PrakashSingh320275
 
Webinar: MongoDB + Hadoop
MongoDB
 
Confluent & MongoDB APAC Lunch & Learn
confluent
 
MongoDB - General Purpose Database
Ashnikbiz
 
Mongodb intro
christkv
 
Building your first app with MongoDB
Norberto Leite
 
What's the Scoop on Hadoop? How It Works and How to WORK IT!
MongoDB
 
MongoDB and Hadoop: Driving Business Insights
MongoDB
 
Introduction to MongoDB
Mike Dirolf
 
Barcelona MUG MongoDB + Hadoop Presentation
Norberto Leite
 
Using MongoDB and Python
Mike Bright
 
2016 feb-23 pyugre-py_mongo
Michael Bright
 
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
Laura Ventura
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

Recently uploaded (20)

PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PPTX
What Is Data Integration and Transformation?
subhashenia
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
BinarySearchTree in datastructures in detail
kichokuttu
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
What Is Data Integration and Transformation?
subhashenia
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 

Live Demo: Introducing the Spark Connector for MongoDB