SlideShare a Scribd company logo
mongoDB Project
Relational databases & Document-Oriented databases
Athens University of Economics and Business
Dpt. Of Management Science and Technology
Prof. Damianos Chatziantoniou
| lkoutsokera@gmail.com
| stratos.gounidellis@gmail.com
Lamprini Koutsokera
Stratos Gounidellis
BDSMasters
SQL Server vs. mongoDB
2
Description Microsoft’s relational DBMS One of the most popular
document stores
Database model Relational DBMS Document store
Implementation language C++ C ++
Data scheme yes schema-free
Triggers yes no
Replication methods yes, depending the SQL-Server Edition Master-slave replication
Partitioning methods tables can be distributed across Sharding
several files, sharding through
federation
References: [1]
From theory to practice
3
Tools
4
Required installations
5
References: [2]
1. Determine which MongoDB build you need.
2. Download MongoDB for Windows
3. Install MongoDB Community Edition.
4. Set up the MongoDB environment.
Mongodb installation
Required python packages installation
From software configuration to coding
6
Python coding [1] – table parsing
7
Part 1 - Queries and the Aggregation Pipeline [1]
. . .
load() function
mongo shell
prep.js file
References: [3]
Python coding [1] – table parsing
8
Query 1 : How many students in your database are currently taking at least 1 class (i.e. have a class with
a course_status of “In Progress”)?
Part 1 - Queries and the Aggregation Pipeline [2]
Query 2 : Produce a grouping of the documents that contains the name of each home city and the
number of students enrolled from that home city.
Python coding [1] – table parsing
9
Query 3 : Which hobby or hobbies are the most popular?
Part 1 - Queries and the Aggregation Pipeline [3]
the most popular
the top 5 popular
Python coding [1] – table parsing
10
Query 4 : What is the GPA (ignoring dropped
classes and in progress classes) of the best student?
Part 1 - Queries and the Aggregation Pipeline [4]
Query 5 : Which student has the largest number of
grade 10’s?
Python coding [1] – table parsing
11
Query 6 : Which class has the highest average
GPA?
Part 1 - Queries and the Aggregation Pipeline [5]
Query 7 : Which class has been dropped the most
number of times?
Python coding [1] – table parsing
12
Query 8 : Produce of a count of classes that have been COMPLETED by class type. The class type is found
by taking the first letter of the course code so that M102 has type M.
Part 1 - Queries and the Aggregation Pipeline [6]
Python coding [1] – table parsing
13
Query 9 : Produce a transformation of the documents so that the documents now have an additional boolean
field called “hobbyist” that is true when the student has more than 3 hobbies and false otherwise.
Part 1 - Queries and the Aggregation Pipeline [7]
Python coding [1] – table parsing
14
Query 10 : Produce a transformation of the documents so that the documents now have an additional field that
contains the number of classes that the student has completed.
Part 1 - Queries and the Aggregation Pipeline [8]
Python coding [1] – table parsing
15
Query 11 : Produce a transformation of the documents in
the collection so that they look like the following output.
The GPA is the average grade of all the completed
classes. The other two computed fields are the number of
classes currently in progress and the number of classes
dropped. No other fields should be in there. No other fields
should be present.
Part 1 - Queries and the Aggregation Pipeline [9]
Python coding [1] – table parsing
16
Query 12 : Produce a NEW collection (HINT: Use $out in the aggregation pipeline) so that the new documents
in this correspond to the classes on offer. The structure of the documents should be like the following output.
The _id field should be the course code.
The course_title is what it was before. The numberOfDropouts is the number of students who dropped out. The
numberOfTimesCompleted is the number of students that completed this class. The currentlyRegistered array is
an array of ObjectID’s corresponding to the students who are currently taking the class. Finally, for the students
that completed the class, the maxGrade, minGrade and avgGrade are the summary statistics for that class.
Part 1 - Queries and the Aggregation Pipeline [10]
Python coding [1] – table parsing
17
Part 1 - Queries and the Aggregation Pipeline [11]
Python coding [1] – table parsing
18
Part 2 - Python & MongoDB [1]
python_mongodb.py: Implement simple operations on mongo database.
Connect to mongo database and collection. Connect to mongo database and collection and insert
a record.
Python coding [1] – table parsing
19
Part 2 - Python & MongoDB [2]
Connect to mongo database and collection and insert
multiple records.
Connect to mongo database and collection and print
its content.
python_mongodb.py: Implement simple operations on mongo database.
Python coding [1] – table parsing
20
Part 2 - Python & MongoDB [3]
Connect to mongo database and collection and update
Its documents.
Connect to mongo database and collection and print
specific field.
python_mongodb.py: Implement simple operations on mongo database.
Python coding [1] – table parsing
21
Part 2 - Python & MongoDB [4]
Connect to mongo database and collection and convert the
collection to a dataframe.
Connect to mongo database and collection and import
data from a dataframe.
python_mongodb.py: Implement simple operations on mongo database.
Python coding [1] – table parsing
22
Part 2 - Python & MongoDB [5]
1. Clone this repository:
2. Install the required python packages.
3. Run python_mongodb.py to implement basic operations (insert_one, insert_many,
update, delete_one, delete_many, etc.) on mongodb.
python_mongodb.py: Implement simple operations on mongo database.
Python coding [1] – table parsing
23
Part 2 - Python & MongoDB [6]
Output
Python coding [1] – table parsing
24
Part 3 - MapReduce (Word Count) [1]
MapReduce 1 : Write a map reduce job on the
students collection similar to the classic word
count example. More specifically, implement a
word count using the course title field as the text.
In addition, exclude stop words from this list. You
should find/write your own list of stop words. (Stop
words are the common words in the English
language like “a”, “in”, “to”, “the”, etc.)
References: [4,5]
key value
[word1, 1]
[word2, 1]
[word3, 1]
[word2, 1]
[word4, 1]
. . . . . . . .
[wordx, 1]
Mapper
map
map
map
.
.
.
25
Part 3 - MapReduce (Word Count) [2]
course_title
[Predictive Modeling]
[MongoDB Operations]
[Hadoop and MapReduce]
[Data Mining]
[MongoDB Operations]
[Data Mining]
. . . . . . . . . . . . . . . . . . . . . .
Reducer
reduce
reduce
reduce
.
.
.
26
Part 3 - MapReduce (Word Count) [3]
key value
[word1, count1]
[word2, count2]
[word3, count3]
. . . . . . . . . . . . .
key value
[word1, {1, 1, 1, …} ]
[word2, {1, 1, 1, …} ]
[word3, {1, 1, 1, …} ]
. . . . . . . . . . . . . . . . .
Python coding [1] – table parsing
27
Part 3 - MapReduce (Average grade) [1]
MapReduce 2 : Write a map reduce job on the students
collection whose goal is to compute average
GPA scores for completed courses by
home city and by course type (M, B, P, etc.).
References: [4,5]
key value
[{home_city: Athina, course_type: M},
{count: 1, sum: 8}]
[{home_city: Chania, course_type: P},
{count: 1, sum: 6}]
[{home_city: Thyra, course_type: V},
{count: 1, sum: 3}]
[{home_city: Arta, course_type: M},
{count: 1, sum: 10}]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mapper
map
map
map
.
.
.
28
Part 3 - MapReduce (Average grade) [2]
(course_code, home_city), grade
[(S201, Athina), 10]
[(M101, Mytilini), 9]
[(S202, Kavala), 3]
[(D102, Chania), 5]
[(P103, Athina), 6]
[(P101, Arta), 10]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reducer
reduce
reduce
reduce
.
.
.
29
Part 3 - MapReduce (Average grade) [3]
key value
[{home_city: Athina, course_type: M},
[{count: 1, sum: 8}, [{count: 1, sum:
3}, [{count: 1, sum: 7}]
[{home_city: Chania, course_type: P},
{count: 1, sum: 6} , [{count: 1, sum:
5}, [{count: 1, sum: 4}]]
[{home_city: Thyra, course_type: V},
{count: 1, sum: 3} , [{count: 1, sum:
7}, [{count: 1, sum: 10}]]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
key value
[{home_city: Athina, course_type: M},
{count: 22, sum: 80}]
[{home_city: Chania, course_type: P},
{count: 8, sum: 100}]
[{home_city: Thyra, course_type: V},
{count: 47, sum: 300}]
[{home_city: Arta, course_type: M},
{count: 19, sum: 150}]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Finalizer
finalize
finalize
finalize
.
.
.
30
Part 3 - MapReduce (Average grade) [4]
key value
[{home_city: Athina, course_type: M},
{count: 22, sum: 80}]
[{home_city: Chania, course_type: P},
{count: 8, sum: 100}]
[{home_city: Thyra, course_type: V},
{count: 47, sum: 300}]
[{home_city: Arta, course_type: M},
{count: 19, sum: 150}]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
key value
[{home_city: Athina, course_type: M},
{avg: 8.5012}]
[{home_city: Chania, course_type: P},
{avg: 5.5314}]
[{home_city: Thyra, course_type: V},
{avg: 7.7713}]
[{home_city: Arta, course_type: M},
{avg: 6.4344}]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References
[1] Db-engines.com. (n.d.). System Properties Comparison Microsoft SQL Server vs. MongoDB vs. Oracle NoSQL
[online] Available at: https://db-engines.com/en/system/Microsoft+SQL+Server%3BMongoDB%3BOracle+NoSQL
[Accessed 5 May 2017].
[2] Install MongoDB Community Edition on Windows — MongoDB Manual 3.4.
https://docs.mongodb.com/manual/tutorial/install-mongodb-on-windows/t [Accessed 2 May 2017].
[3] Aggregation Pipeline — MongoDB Manual 3.4 https://docs.mongodb.com/manual/core/aggregation-pipeline/
[Accessed 2 May 2017].
[4] Map-Reduce Examples — MongoDB Manual 3.4 https://docs.mongodb.com/manual/tutorial/map-reduce-
examples/ [Accessed 2 May 2017].
[5] Map-Reduce — MongoDB Manual 3.4 https://docs.mongodb.com/manual/core/map-reduce/ [Accessed 2 May
2017].
| lkoutsokera@gmail.com
| stratos.gounidellis@gmail.com
Lamprini Koutsokera
Stratos Gounidellis
BDSMasters

More Related Content

What's hot (20)

PDF
The Weather of the Century Part 3: Visualization
MongoDB
 
PDF
Date and Time Module in Python | Edureka
Edureka!
 
PPTX
CloudClustering: Toward a scalable machine learning toolkit for Windows Azure
Ankur Dave
 
PDF
Vasia Kalavri – Training: Gelly School
Flink Forward
 
PDF
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Spark Summit
 
PDF
Deep dive into deeplearn.js
Kai Sasaki
 
PDF
Map reduce: beyond word count
Jeff Patti
 
PPTX
From Trill to Quill: Pushing the Envelope of Functionality and Scale
Badrish Chandramouli
 
PPTX
Weather of the Century: Design and Performance
MongoDB
 
PDF
GeoMesa on Apache Spark SQL with Anthony Fox
Databricks
 
PDF
Enhancing Spark SQL Optimizer with Reliable Statistics
Jen Aman
 
ZIP
とある断片の超動的言語
Kiyotaka Oku
 
PPTX
Megadata With Python and Hadoop
ryancox
 
PPTX
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
PingCAP
 
PDF
Pyclustering tutorial - BANG
Andrei Novikov
 
PDF
Optimization
Anshul Goyal, EIT
 
PDF
GaianDB
Dale Lane
 
PDF
ThreeTen
彥彬 洪
 
PPTX
Deep dumpster diving 2010
RonnBlack
 
PDF
Anais Dotis-Georgiou [InfluxData] | Learn Flux by Example | InfluxDays NA 2021
InfluxData
 
The Weather of the Century Part 3: Visualization
MongoDB
 
Date and Time Module in Python | Edureka
Edureka!
 
CloudClustering: Toward a scalable machine learning toolkit for Windows Azure
Ankur Dave
 
Vasia Kalavri – Training: Gelly School
Flink Forward
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Spark Summit
 
Deep dive into deeplearn.js
Kai Sasaki
 
Map reduce: beyond word count
Jeff Patti
 
From Trill to Quill: Pushing the Envelope of Functionality and Scale
Badrish Chandramouli
 
Weather of the Century: Design and Performance
MongoDB
 
GeoMesa on Apache Spark SQL with Anthony Fox
Databricks
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Jen Aman
 
とある断片の超動的言語
Kiyotaka Oku
 
Megadata With Python and Hadoop
ryancox
 
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
PingCAP
 
Pyclustering tutorial - BANG
Andrei Novikov
 
Optimization
Anshul Goyal, EIT
 
GaianDB
Dale Lane
 
ThreeTen
彥彬 洪
 
Deep dumpster diving 2010
RonnBlack
 
Anais Dotis-Georgiou [InfluxData] | Learn Flux by Example | InfluxDays NA 2021
InfluxData
 

Similar to MongoDB Project: Relational databases to Document-Oriented databases (20)

PDF
mongoDB Project: Relational databases & Document-Oriented databases
Stratos Gounidellis
 
PDF
Full metal mongo
Israel Gutiérrez
 
PPTX
MongoDB - Features and Operations
ramyaranjith
 
PPTX
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rick Copeland
 
PDF
Eat your own dog food using mongo db at mongodb
NAVER Engineering
 
KEY
MongoDB at GUL
Israel Gutiérrez
 
PDF
MongoDB
Hemant Kumar Tiwary
 
PDF
Mongo db
Toki Kanno
 
PDF
MongoDB Aggregation Framework
Caserta
 
PPTX
Introduction to MongoDB
Raghunath A
 
PDF
MongoDB, Hadoop and humongous data - MongoSV 2012
Steven Francia
 
PPTX
Mongo DB 102
Abhijeet Vaikar
 
KEY
PHP Development With MongoDB
Fitz Agard
 
KEY
PHP Development with MongoDB (Fitz Agard)
MongoSF
 
KEY
MongoDB and hadoop
Steven Francia
 
PPTX
Getting Started with MongoDB
Ahasanul Kalam Akib
 
PDF
Mdb dn 2017_18_query_hackathon
Daniel M. Farrell
 
PPT
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rick Copeland
 
PDF
MongoDB Basics
Sarang Shravagi
 
PPTX
Introduction to MongoDB at IGDTUW
Ankur Raina
 
mongoDB Project: Relational databases & Document-Oriented databases
Stratos Gounidellis
 
Full metal mongo
Israel Gutiérrez
 
MongoDB - Features and Operations
ramyaranjith
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rick Copeland
 
Eat your own dog food using mongo db at mongodb
NAVER Engineering
 
MongoDB at GUL
Israel Gutiérrez
 
Mongo db
Toki Kanno
 
MongoDB Aggregation Framework
Caserta
 
Introduction to MongoDB
Raghunath A
 
MongoDB, Hadoop and humongous data - MongoSV 2012
Steven Francia
 
Mongo DB 102
Abhijeet Vaikar
 
PHP Development With MongoDB
Fitz Agard
 
PHP Development with MongoDB (Fitz Agard)
MongoSF
 
MongoDB and hadoop
Steven Francia
 
Getting Started with MongoDB
Ahasanul Kalam Akib
 
Mdb dn 2017_18_query_hackathon
Daniel M. Farrell
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rick Copeland
 
MongoDB Basics
Sarang Shravagi
 
Introduction to MongoDB at IGDTUW
Ankur Raina
 
Ad

Recently uploaded (20)

PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Ad

MongoDB Project: Relational databases to Document-Oriented databases

  • 1. mongoDB Project Relational databases & Document-Oriented databases Athens University of Economics and Business Dpt. Of Management Science and Technology Prof. Damianos Chatziantoniou | [email protected] | [email protected] Lamprini Koutsokera Stratos Gounidellis BDSMasters
  • 2. SQL Server vs. mongoDB 2 Description Microsoft’s relational DBMS One of the most popular document stores Database model Relational DBMS Document store Implementation language C++ C ++ Data scheme yes schema-free Triggers yes no Replication methods yes, depending the SQL-Server Edition Master-slave replication Partitioning methods tables can be distributed across Sharding several files, sharding through federation References: [1]
  • 3. From theory to practice 3
  • 5. Required installations 5 References: [2] 1. Determine which MongoDB build you need. 2. Download MongoDB for Windows 3. Install MongoDB Community Edition. 4. Set up the MongoDB environment. Mongodb installation Required python packages installation
  • 7. Python coding [1] – table parsing 7 Part 1 - Queries and the Aggregation Pipeline [1] . . . load() function mongo shell prep.js file References: [3]
  • 8. Python coding [1] – table parsing 8 Query 1 : How many students in your database are currently taking at least 1 class (i.e. have a class with a course_status of “In Progress”)? Part 1 - Queries and the Aggregation Pipeline [2] Query 2 : Produce a grouping of the documents that contains the name of each home city and the number of students enrolled from that home city.
  • 9. Python coding [1] – table parsing 9 Query 3 : Which hobby or hobbies are the most popular? Part 1 - Queries and the Aggregation Pipeline [3] the most popular the top 5 popular
  • 10. Python coding [1] – table parsing 10 Query 4 : What is the GPA (ignoring dropped classes and in progress classes) of the best student? Part 1 - Queries and the Aggregation Pipeline [4] Query 5 : Which student has the largest number of grade 10’s?
  • 11. Python coding [1] – table parsing 11 Query 6 : Which class has the highest average GPA? Part 1 - Queries and the Aggregation Pipeline [5] Query 7 : Which class has been dropped the most number of times?
  • 12. Python coding [1] – table parsing 12 Query 8 : Produce of a count of classes that have been COMPLETED by class type. The class type is found by taking the first letter of the course code so that M102 has type M. Part 1 - Queries and the Aggregation Pipeline [6]
  • 13. Python coding [1] – table parsing 13 Query 9 : Produce a transformation of the documents so that the documents now have an additional boolean field called “hobbyist” that is true when the student has more than 3 hobbies and false otherwise. Part 1 - Queries and the Aggregation Pipeline [7]
  • 14. Python coding [1] – table parsing 14 Query 10 : Produce a transformation of the documents so that the documents now have an additional field that contains the number of classes that the student has completed. Part 1 - Queries and the Aggregation Pipeline [8]
  • 15. Python coding [1] – table parsing 15 Query 11 : Produce a transformation of the documents in the collection so that they look like the following output. The GPA is the average grade of all the completed classes. The other two computed fields are the number of classes currently in progress and the number of classes dropped. No other fields should be in there. No other fields should be present. Part 1 - Queries and the Aggregation Pipeline [9]
  • 16. Python coding [1] – table parsing 16 Query 12 : Produce a NEW collection (HINT: Use $out in the aggregation pipeline) so that the new documents in this correspond to the classes on offer. The structure of the documents should be like the following output. The _id field should be the course code. The course_title is what it was before. The numberOfDropouts is the number of students who dropped out. The numberOfTimesCompleted is the number of students that completed this class. The currentlyRegistered array is an array of ObjectID’s corresponding to the students who are currently taking the class. Finally, for the students that completed the class, the maxGrade, minGrade and avgGrade are the summary statistics for that class. Part 1 - Queries and the Aggregation Pipeline [10]
  • 17. Python coding [1] – table parsing 17 Part 1 - Queries and the Aggregation Pipeline [11]
  • 18. Python coding [1] – table parsing 18 Part 2 - Python & MongoDB [1] python_mongodb.py: Implement simple operations on mongo database. Connect to mongo database and collection. Connect to mongo database and collection and insert a record.
  • 19. Python coding [1] – table parsing 19 Part 2 - Python & MongoDB [2] Connect to mongo database and collection and insert multiple records. Connect to mongo database and collection and print its content. python_mongodb.py: Implement simple operations on mongo database.
  • 20. Python coding [1] – table parsing 20 Part 2 - Python & MongoDB [3] Connect to mongo database and collection and update Its documents. Connect to mongo database and collection and print specific field. python_mongodb.py: Implement simple operations on mongo database.
  • 21. Python coding [1] – table parsing 21 Part 2 - Python & MongoDB [4] Connect to mongo database and collection and convert the collection to a dataframe. Connect to mongo database and collection and import data from a dataframe. python_mongodb.py: Implement simple operations on mongo database.
  • 22. Python coding [1] – table parsing 22 Part 2 - Python & MongoDB [5] 1. Clone this repository: 2. Install the required python packages. 3. Run python_mongodb.py to implement basic operations (insert_one, insert_many, update, delete_one, delete_many, etc.) on mongodb. python_mongodb.py: Implement simple operations on mongo database.
  • 23. Python coding [1] – table parsing 23 Part 2 - Python & MongoDB [6] Output
  • 24. Python coding [1] – table parsing 24 Part 3 - MapReduce (Word Count) [1] MapReduce 1 : Write a map reduce job on the students collection similar to the classic word count example. More specifically, implement a word count using the course title field as the text. In addition, exclude stop words from this list. You should find/write your own list of stop words. (Stop words are the common words in the English language like “a”, “in”, “to”, “the”, etc.) References: [4,5]
  • 25. key value [word1, 1] [word2, 1] [word3, 1] [word2, 1] [word4, 1] . . . . . . . . [wordx, 1] Mapper map map map . . . 25 Part 3 - MapReduce (Word Count) [2] course_title [Predictive Modeling] [MongoDB Operations] [Hadoop and MapReduce] [Data Mining] [MongoDB Operations] [Data Mining] . . . . . . . . . . . . . . . . . . . . . .
  • 26. Reducer reduce reduce reduce . . . 26 Part 3 - MapReduce (Word Count) [3] key value [word1, count1] [word2, count2] [word3, count3] . . . . . . . . . . . . . key value [word1, {1, 1, 1, …} ] [word2, {1, 1, 1, …} ] [word3, {1, 1, 1, …} ] . . . . . . . . . . . . . . . . .
  • 27. Python coding [1] – table parsing 27 Part 3 - MapReduce (Average grade) [1] MapReduce 2 : Write a map reduce job on the students collection whose goal is to compute average GPA scores for completed courses by home city and by course type (M, B, P, etc.). References: [4,5]
  • 28. key value [{home_city: Athina, course_type: M}, {count: 1, sum: 8}] [{home_city: Chania, course_type: P}, {count: 1, sum: 6}] [{home_city: Thyra, course_type: V}, {count: 1, sum: 3}] [{home_city: Arta, course_type: M}, {count: 1, sum: 10}] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mapper map map map . . . 28 Part 3 - MapReduce (Average grade) [2] (course_code, home_city), grade [(S201, Athina), 10] [(M101, Mytilini), 9] [(S202, Kavala), 3] [(D102, Chania), 5] [(P103, Athina), 6] [(P101, Arta), 10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  • 29. Reducer reduce reduce reduce . . . 29 Part 3 - MapReduce (Average grade) [3] key value [{home_city: Athina, course_type: M}, [{count: 1, sum: 8}, [{count: 1, sum: 3}, [{count: 1, sum: 7}] [{home_city: Chania, course_type: P}, {count: 1, sum: 6} , [{count: 1, sum: 5}, [{count: 1, sum: 4}]] [{home_city: Thyra, course_type: V}, {count: 1, sum: 3} , [{count: 1, sum: 7}, [{count: 1, sum: 10}]] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . key value [{home_city: Athina, course_type: M}, {count: 22, sum: 80}] [{home_city: Chania, course_type: P}, {count: 8, sum: 100}] [{home_city: Thyra, course_type: V}, {count: 47, sum: 300}] [{home_city: Arta, course_type: M}, {count: 19, sum: 150}] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  • 30. Finalizer finalize finalize finalize . . . 30 Part 3 - MapReduce (Average grade) [4] key value [{home_city: Athina, course_type: M}, {count: 22, sum: 80}] [{home_city: Chania, course_type: P}, {count: 8, sum: 100}] [{home_city: Thyra, course_type: V}, {count: 47, sum: 300}] [{home_city: Arta, course_type: M}, {count: 19, sum: 150}] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . key value [{home_city: Athina, course_type: M}, {avg: 8.5012}] [{home_city: Chania, course_type: P}, {avg: 5.5314}] [{home_city: Thyra, course_type: V}, {avg: 7.7713}] [{home_city: Arta, course_type: M}, {avg: 6.4344}] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  • 31. References [1] Db-engines.com. (n.d.). System Properties Comparison Microsoft SQL Server vs. MongoDB vs. Oracle NoSQL [online] Available at: https://db-engines.com/en/system/Microsoft+SQL+Server%3BMongoDB%3BOracle+NoSQL [Accessed 5 May 2017]. [2] Install MongoDB Community Edition on Windows — MongoDB Manual 3.4. https://docs.mongodb.com/manual/tutorial/install-mongodb-on-windows/t [Accessed 2 May 2017]. [3] Aggregation Pipeline — MongoDB Manual 3.4 https://docs.mongodb.com/manual/core/aggregation-pipeline/ [Accessed 2 May 2017]. [4] Map-Reduce Examples — MongoDB Manual 3.4 https://docs.mongodb.com/manual/tutorial/map-reduce- examples/ [Accessed 2 May 2017]. [5] Map-Reduce — MongoDB Manual 3.4 https://docs.mongodb.com/manual/core/map-reduce/ [Accessed 2 May 2017]. | [email protected] | [email protected] Lamprini Koutsokera Stratos Gounidellis BDSMasters