Analytics and Machine Learning with Spark and MongoDB

Download as PPTX, PDF

1 like2,354 views

The document discusses various aspects of utilizing Spark and MongoDB for machine learning, particularly focusing on clustering algorithms and data processing. It provides a series of configurations for Spark shell commands with MongoDB for accessing and managing precipitation and weather station data. Additionally, it references resources and examples related to Spark and MongoDB integrations.

#MDBlocal
Bryan Reinero, Product Manager
Analytics and Machine Learning with
Spark and MongoDB

Analytics and Machine Learning with Spark and MongoDB

Parallelism
Machine
Learning
Stream
Aggregation
Native
Processing

Spark
Stand
Alone
YAR
N
Mesos
HDF
S
Distributed Resources

YAR
N
Spark
Mesos
HDF
S
Spark
Stand
Alone
Hadoop
Distributed Processing

YAR
N
Spark
Mesos
Hiv
e
Pig
Spar
k
Spark Shell
Spark
Streaming
Spark
Stand
Alone
Hadoop
Domain
Specific
Languag
es HDF
S

YAR
N
Spark
Mesos
Hiv
e
Pig
Spar
k
Spark Shell
Spark
Streaming
Spark
Stand
Alone
Hadoop

https://github.com/mongodb/mongo
-spark

Data Refuge
https://www.datarefuge.org/dataset/15-minute-precipitation-
data-dsi-3260

${ "_id" : "01006303201304010015QPCP", "value" : 0, "station" : "006303", "time" : "0015", "state" : "01", "year" : 2013, "month" : 4, "day" : 1 } Precipitation Data$

${ "_id" : NumberLong(10957), "NCDCSTN_ID" : "20000282", "NWSLI_ID" : "BOZA1", "GHCND_ID" : "USC00010957", "NAME_COOP_SHORT" : "BOAZ", "FIPS_COUNTRY_NAME" : "UNITED STATES", "STATE_PROV" : "AL", "COUNTY" : "MARSHALL", "NWS_CLIM_DIV" : "02", "NWS_CLIM_DIV_NAME" : "APPALACHIAN MOUNTAINS", "LON_DEC" : -86.1633, "LAT_DEC" : 34.2008, "LAT_LON_PRECISION" : "DDdddd", "ELEV_GROUND" : "1070", "ELEV_GROUND_UNIT" : "FEET", "UTC_OFFSET" : -6, "NWS_REGION" : "SOUTHERN", "NWS_WFO" : "HUN", "COOP_SOD" : "Y", "COOP_HPD" : "Y" } Weather Station Data$

./spark-2.2.0/bin/spark-shell
--conf
"spark.mongodb.input.uri=mongodb://10.11.12.13/kmeans.log"
--conf
"spark.mongodb.output.uri=mongodb://10.11.12.13/kmeans.cluster"
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.0.0

./spark-2.1.0-bin-hadoop2.7/bin/spark-shell
--conf
"spark.mongodb.input.uri=mongodb://10.11.12.13/kmeans.log"
--conf
"spark.mongodb.output.uri=mongodb://10.11.12.13/kmeans.cluster"
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.0.0

./spark-2.1.0-bin-hadoop2.7/bin/spark-shell
--conf
"spark.mongodb.input.uri=mongodb://10.0.0.10/kmeans.log"
--conf
"spark.mongodb.output.uri=mongodb://10.0.0.10/kmeans.cluster"
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.0.0

Resource
s
MongoDB University Course M:
233
Spark Connector Examples in
Repo

More Related Content

What's hot (20)

PPTX

Hunk - Unlocking the Power of Big DataSplunk

PDF

Webinar: Managing Real Time Risk Analytics with MongoDB MongoDB

PPTX

Intro to cassandra + hadoopJeremy Hanna

PPTX

Open source log analyticsVinod Nayal

PPTX

SparkKoushik Mondal

PPTX

Hunk - Unlocking The Power of Big Data Breakout SessionSplunk

PPTX

Indexing big data in the cloudOpenSource Connections

PPTX

Video Analysis in HadoopDataWorks Summit

PPTX

Real Time and Big Data – It’s About TimeMapR Technologies

PPTX

Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDBMongoDB

KEY

Cassandra euJeremy Hanna

PDF

Data Analytics with DruidYousun Jeong

PDF

Apache Druid®: A Dance of Distributed ProcessesImply

PPTX

Big data advanced topics - part IMoldovan Radu Adrian

PDF

Big data workloads using Apache Sparkon HDInsightNilesh Gule

PPT

Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan

PDF

Presto @ Uber Hadoop summit2017Zhenxiao Luo

PPTX

Big data introduction (HackTM 2016)Moldovan Radu Adrian

PDF

Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Charles Allen

PPTX

Big data advance topics - part 2.pptxMoldovan Radu Adrian

Hunk - Unlocking the Power of Big DataSplunk

Webinar: Managing Real Time Risk Analytics with MongoDB MongoDB

Intro to cassandra + hadoopJeremy Hanna

Open source log analyticsVinod Nayal

SparkKoushik Mondal

Hunk - Unlocking The Power of Big Data Breakout SessionSplunk

Indexing big data in the cloudOpenSource Connections

Video Analysis in HadoopDataWorks Summit

Real Time and Big Data – It’s About TimeMapR Technologies

Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDBMongoDB

Cassandra euJeremy Hanna

Data Analytics with DruidYousun Jeong

Apache Druid®: A Dance of Distributed ProcessesImply

Big data advanced topics - part IMoldovan Radu Adrian

Big data workloads using Apache Sparkon HDInsightNilesh Gule

Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan

Presto @ Uber Hadoop summit2017Zhenxiao Luo

Big data introduction (HackTM 2016)Moldovan Radu Adrian

Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Charles Allen

Big data advance topics - part 2.pptxMoldovan Radu Adrian

Similar to Analytics and Machine Learning with Spark and MongoDB (20)

PDF

MongoDB World 2018: Spark and Machine LearningMongoDB

PDF

Databricks with R: Deep DiveDatabricks

PDF

H2O PySparkling WaterSri Ambati

PDF

20171012 found IT #9 PySparkの勘所Ryuji Tamagawa

PPTX

MongoDB and HadoopTugdual Grall

PDF

Infra space talk on Apache Spark - Into to CASKRob Mueller

PPTX

MongoDB.local Dallas 2019: MongoDB and SparkMongoDB

PPTX

MongoDB and SparkNorberto Leite

PPTX

Building Advanced Analytics Pipelines with Azure DatabricksLace Lofranco

PDF

Big Data JourneyTugdual Grall

PDF

MongoDB + SparkBryan Reinero

PDF

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!

PPTX

Azure Databricks & Spark @ Techorama 2018Nathan Bijnens

PPTX

Spark-on-YARN: Empower Spark Applications on Hadoop ClusterDataWorks Summit

PDF

SparkR: Enabling Interactive Data Science at Scale on HadoopDataWorks Summit

PDF

PySparkの勘所（20170630 sapporo db analytics showcase） Ryuji Tamagawa

PDF

Machine Learning with H2O, Spark, and Python at Strata 2015Sri Ambati

PDF

Big data with javaStefan Angelov

PPTX

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn

PPTX

Big Data on the CloudSercan Karaoglu

MongoDB World 2018: Spark and Machine LearningMongoDB

Databricks with R: Deep DiveDatabricks

H2O PySparkling WaterSri Ambati

20171012 found IT #9 PySparkの勘所Ryuji Tamagawa

MongoDB and HadoopTugdual Grall

Infra space talk on Apache Spark - Into to CASKRob Mueller

MongoDB.local Dallas 2019: MongoDB and SparkMongoDB

MongoDB and SparkNorberto Leite

Building Advanced Analytics Pipelines with Azure DatabricksLace Lofranco

Big Data JourneyTugdual Grall

MongoDB + SparkBryan Reinero

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!

Azure Databricks & Spark @ Techorama 2018Nathan Bijnens

Spark-on-YARN: Empower Spark Applications on Hadoop ClusterDataWorks Summit

SparkR: Enabling Interactive Data Science at Scale on HadoopDataWorks Summit

PySparkの勘所（20170630 sapporo db analytics showcase） Ryuji Tamagawa

Machine Learning with H2O, Spark, and Python at Strata 2015Sri Ambati

Big data with javaStefan Angelov

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn

Big Data on the CloudSercan Karaoglu

More from MongoDB (20)

PDF

MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB

PDF

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB

PDF

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB

PDF

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB

PDF

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB

PDF

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB

PDF

MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB

PDF

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB

PDF

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB

PDF

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB

PDF

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB

PDF

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB

PDF

MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB

PDF

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB

PDF

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB

PDF

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB

PDF

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB

PDF

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB

PDF

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB

PDF

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB