SlideShare a Scribd company logo
Python for Big Data Analytics
www.edureka.in/python
View Complete Course at : www.edureka.in/python
*
Post your Questions on Twitter on @edurekaIN: #askEdureka
Objectives of this Session
• Un
• Why Python?
• Web Scrapping example using Python
• Pydoop : Python API for Hadoop
• Word Count example in Pydoop
• Data Science with Python
• Zombie Invasion modeling using Python
For Queries during the session and class recording:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
www.edureka.in/python
Why Python?
 Python is a great language for the beginner programmers since it is easy-to-
learn and easy-to-maintain
 Python’s biggest strength is that bulk of it’s library is portable. It also
supports GUI Programming and can be used to create Applications portable on
Mac, Windows and Unix X-Windows system
 With libraries like PyDoop and SciPy, it’s a dream come true for Big Data
Analytics
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Growing Interest in Python
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Demo: Web Scraping using Python
 This example demonstrates how to scrape basic financial data from
https://www.google.com/finance website from a given list of Companies.
 We shall use open source web scraping framework for Python called
Beautiful Soup to crawl and extract data from webpages.
 Scraping is used for a wide range of purposes, from data mining to
monitoring and automated testing.
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Demo: Collecting Tweets using Python
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
 This example demonstrates how to extract historical tweets for a particular
brand like “nike” or “apple”
 We shall make a REST API call to twitter to extract tweets.
 This data can be further used to perform sentiment analysis for a particular
brand on Twitter.
Big Data
 Lots of Data (Terabytes or Petabytes)
 Big data is the term for a collection of data sets
so large and complex that it becomes difficult to
process using on-hand database management
tools or traditional data processing applications.
 The challenges include capture, curation,
storage, search, sharing, transfer, analysis, and
visualization.
cloud
tools
statistics
No SQL
compression
storage
support
database
analize
information
terabytes
processing
mobile
Big Data
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Un-Structured Data is Exploding
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Big Data Scenarios : Hospital Care
Hospitals are analyzing medical data and patient
records to predict those patients that are likely to seek
readmission within a few months of discharge. The
hospital can then intervene in hopes of preventing
another costly hospital stay.
Medical diagnostics company analyzes millions of lines
of data to develop first non-intrusive test for
predicting coronary artery disease. To do so,
researchers at the company analyzed over 100 million
gene samples to ultimately identify the 23 primary
predictive genes for coronary artery disease
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
http://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png
Amazon has an unrivalled bank of data on online consumer
purchasing behaviour that it can mine from its 152
million customer accounts.
Amazon also uses Big Data to monitor, track and secure its 1.5
billion items in its retail store that are laying around it 200
fulfilment centres around the world. Amazon stores the
product catalogue data in S3.
S3 can write, read and delete objects up to 5 TB of data each.
The catalogue stored in S3 receives more than 50 million
updates a week and every 30 minutes all data received is
crunched and reported back to the different warehouses and
the website.
Big Data Scenarios : Amazon.com
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
http://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png
Netflix uses 1 petabyte to store the videos for streaming.
BitTorrent Sync has transferred over 30 petabytes of data
since its pre-alpha release in January 2013.
The 2009 movie Avatar is reported to have taken over 1
petabyte of local storage at Weta Digital for the rendering
of the 3D CGI effects.
One petabyte of average MP3-encoded songs (for mobile,
roughly one megabyte per minute), would require 2000
years to play.
Big Data Scenarios: NetFlix
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
 IBM’s Definition – Big Data Characteristics
http://www-01.ibm.com/software/data/bigdata/
Web
logs
Images
Videos
Audios
Sensor
Data
Volume Velocity Variety
IBM’s Definition
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Hadoop for Big Data
 Apache Hadoop is a framework that allows for the distributed processing of large data sets across
clusters of commodity computers using a simple programming model.
 It is an Open-source Data Management with scale-out storage & distributed processing.
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Hadoop and MapReduce
Hadoop is a system for large scale data processing.
It has two main components:
 HDFS – Hadoop Distributed File System (Storage)
 Distributed across “nodes”
 Natively redundant
 NameNode tracks locations.
 MapReduce (Processing)
 Splits a task across processors
 “near” the data & assembles results
 Self-Healing, High Bandwidth
 Clustered storage
 Job Tracker manages the Task Trackers
Map-Reduce
Key Value
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
PyDoop – Hadoop with Python
 PyDoop package provides a Python API for Hadoop MapReduce and
HDFS
 PyDoop has several advantages over Hadoop’s built-in solutions for
Python programming, i.e., Hadoop Streaming and Jython.
 One of the biggest advantage of PyDoop is it’s HDFS API. This
allows you to connect to an HDFS installation, read and write files, and
get information on files, directories and global file system properties.
 The MapReduce API of PyDoop allows you to solve many complex
problems with minimal programming efforts. Advance MapReduce
concepts such as ‘Counters’ and ‘Record Readers’ can be
implemented in Python using PyDoop.
Python can be used to write Hadoop MapReduce programs and
applications to access HDFS API for Hadoop with PyDoop
package.
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Demo: Word Count using Hadoop Streaming API
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
 The example shows the simple word count application written in Python.
 We shall use Hadoop Streaming APIs to run mapreduce code written in Python.
 Word Count application can be used to index text documents/files for a given “search query”.
Python and Data Science
 Python is an excellent choice for Data Scientist to do his
day-to-day activities as it provides libraries to do all these
things.
 Python has a diverse range of open source libraries for
just about everything that a Data Scientist does in his
day-to-day work.
 Python and most of its libraries are both open source
and free.
The day-to-day tasks of a data scientist involves many
interrelated but different activities such as accessing and
manipulating data, computing statistics and , creating visual
reports on that data, building predictive and explanatory
models, evaluating these models on additional data, integrating
models into production systems, etc.
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
SciPy.org
SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics,
science, and engineering.
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Demo: Zombie Invasion Model
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
This is a lighthearted example, a system of ODEs(Ordinary differential equations) can be used to model a "zombie
invasion", using the equations specified by Philip Munz.
The system is given as:
dS/dt = P - B*S*Z - d*S
dZ/dt = B*S*Z + G*R - A*S*Z
dR/dt = d*S + A*S*Z - G*R
There are three scenarios given in the program to show how Zombie Apocalypse vary with different initial
conditions.
This involves solving a system of first order ODEs given by: dy/dt = f(y, t) Where y = [S, Z, R].
Where:
S: the number of susceptible victims
Z: the number of zombies
R: the number of people "killed”
P: the population birth rate
d: the chance of a natural death
B: the chance the "zombie disease" is transmitted (an alive person becomes a zombie)
G: the chance a dead person is resurrected into a zombie
A: the chance a zombie is totally destroyed
Questions?
www.edureka.in/python
Complete Course curriculum at : www.edureka.in/python
Post your Questions on Twitter on @edurekaIN: #askEdureka

More Related Content

What's hot (20)

PDF
Python final ppt
Ripal Ranpara
 
PPTX
Presentation on python
william john
 
PPTX
Introduction to the basics of Python programming (part 1)
Pedro Rodrigues
 
PPTX
Fundamentals of Python Programming
Kamal Acharya
 
PPT
Python ppt
Mohita Pandey
 
PDF
Introduction To Python | Edureka
Edureka!
 
PPT
Introduction to Python
amiable_indian
 
PPTX
Introduction to python
AnirudhaGaikwad4
 
PPTX
Python 101: Python for Absolute Beginners (PyTexas 2014)
Paige Bailey
 
PPTX
Python Functions
Mohammed Sikander
 
PPTX
Python basics
RANAALIMAJEEDRAJPUT
 
PPTX
Python and its Applications
Abhijeet Singh
 
PDF
Python exception handling
Mohammed Sikander
 
PPTX
Chapter 03 python libraries
Praveen M Jigajinni
 
PDF
Python Tutorial | Python Tutorial for Beginners | Python Training | Edureka
Edureka!
 
PDF
Python basic
Saifuddin Kaijar
 
PDF
Python Crash Course
Haim Michael
 
PPT
File handling in c
David Livingston J
 
PPTX
Python 3 Programming Language
Tahani Al-Manie
 
PDF
Strings in python
Prabhakaran V M
 
Python final ppt
Ripal Ranpara
 
Presentation on python
william john
 
Introduction to the basics of Python programming (part 1)
Pedro Rodrigues
 
Fundamentals of Python Programming
Kamal Acharya
 
Python ppt
Mohita Pandey
 
Introduction To Python | Edureka
Edureka!
 
Introduction to Python
amiable_indian
 
Introduction to python
AnirudhaGaikwad4
 
Python 101: Python for Absolute Beginners (PyTexas 2014)
Paige Bailey
 
Python Functions
Mohammed Sikander
 
Python basics
RANAALIMAJEEDRAJPUT
 
Python and its Applications
Abhijeet Singh
 
Python exception handling
Mohammed Sikander
 
Chapter 03 python libraries
Praveen M Jigajinni
 
Python Tutorial | Python Tutorial for Beginners | Python Training | Edureka
Edureka!
 
Python basic
Saifuddin Kaijar
 
Python Crash Course
Haim Michael
 
File handling in c
David Livingston J
 
Python 3 Programming Language
Tahani Al-Manie
 
Strings in python
Prabhakaran V M
 

Similar to Python PPT (20)

PPTX
Python for Big Data Analytics
Edureka!
 
PPTX
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Edureka!
 
PDF
Python webinar 4th june
Edureka!
 
PDF
Power of Python with Big Data
Edureka!
 
PPTX
Python in big data world
Rohit
 
PPTX
Inroduction to Big Data
Omnia Safaan
 
PDF
London level39
Travis Oliphant
 
PDF
Data Science
Subhajit75
 
PDF
Tools and techniques for data science
Ajay Ohri
 
PPTX
Session 10 handling bigger data
bodaceacat
 
PPTX
Session 10 handling bigger data
Sara-Jayne Terp
 
PDF
What is Big Data?
CodePolitan
 
PPTX
Big Data
Mahesh Bmn
 
PDF
Why Python Should Be Your First Programming Language
Edureka!
 
PDF
Lesson 1 introduction to_big_data_and_hadoop.pptx
Pankajkumar496281
 
PPTX
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Dataconomy Media
 
PPT
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
PPT
Hadoop HDFS.ppt
6535ANURAGANURAG
 
PDF
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Alexey Zinoviev
 
PDF
IJARCCE_49
Mr.Sameer Kumar Das
 
Python for Big Data Analytics
Edureka!
 
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Edureka!
 
Python webinar 4th june
Edureka!
 
Power of Python with Big Data
Edureka!
 
Python in big data world
Rohit
 
Inroduction to Big Data
Omnia Safaan
 
London level39
Travis Oliphant
 
Data Science
Subhajit75
 
Tools and techniques for data science
Ajay Ohri
 
Session 10 handling bigger data
bodaceacat
 
Session 10 handling bigger data
Sara-Jayne Terp
 
What is Big Data?
CodePolitan
 
Big Data
Mahesh Bmn
 
Why Python Should Be Your First Programming Language
Edureka!
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Pankajkumar496281
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Dataconomy Media
 
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
Hadoop HDFS.ppt
6535ANURAGANURAG
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Alexey Zinoviev
 
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
PDF
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
PDF
Tableau Tutorial for Data Science | Edureka
Edureka!
 
PDF
Python Programming Tutorial | Edureka
Edureka!
 
PDF
Top 5 PMP Certifications | Edureka
Edureka!
 
PDF
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
PDF
Linux Mint Tutorial | Edureka
Edureka!
 
PDF
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
PDF
Importance of Digital Marketing | Edureka
Edureka!
 
PDF
RPA in 2020 | Edureka
Edureka!
 
PDF
Email Notifications in Jenkins | Edureka
Edureka!
 
PDF
EA Algorithm in Machine Learning | Edureka
Edureka!
 
PDF
Cognitive AI Tutorial | Edureka
Edureka!
 
PDF
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
PDF
Blue Prism Top Interview Questions | Edureka
Edureka!
 
PDF
Big Data on AWS Tutorial | Edureka
Edureka!
 
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
PDF
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
PDF
Introduction to DevOps | Edureka
Edureka!
 
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Edureka!
 
Ad

Recently uploaded (20)

PDF
STATEMENT-BY-THE-HON.-MINISTER-FOR-HEALTH-ON-THE-COVID-19-OUTBREAK-AT-UG_revi...
nservice241
 
PPTX
Introduction to Biochemistry & Cellular Foundations.pptx
marvinnbustamante1
 
PPTX
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
PPTX
Difference between write and update in odoo 18
Celine George
 
PPTX
How to Manage Allocation Report for Manufacturing Orders in Odoo 18
Celine George
 
PPTX
infertility, types,causes, impact, and management
Ritu480198
 
PDF
Vani - The Voice of Excellence - Jul 2025 issue
Savipriya Raghavendra
 
PDF
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
PPTX
How to Create a Customer From Website in Odoo 18.pptx
Celine George
 
PPTX
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
PPTX
HUMAN RESOURCE MANAGEMENT: RECRUITMENT, SELECTION, PLACEMENT, DEPLOYMENT, TRA...
PRADEEP ABOTHU
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PPTX
Identifying elements in the story. Arrange the events in the story
geraldineamahido2
 
PPTX
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
PDF
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
PDF
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PPTX
EDUCATIONAL MEDIA/ TEACHING AUDIO VISUAL AIDS
Sonali Gupta
 
PDF
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
PPTX
Controller Request and Response in Odoo18
Celine George
 
PDF
Mahidol_Change_Agent_Note_2025-06-27-29_MUSEF
Tassanee Lerksuthirat
 
STATEMENT-BY-THE-HON.-MINISTER-FOR-HEALTH-ON-THE-COVID-19-OUTBREAK-AT-UG_revi...
nservice241
 
Introduction to Biochemistry & Cellular Foundations.pptx
marvinnbustamante1
 
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
Difference between write and update in odoo 18
Celine George
 
How to Manage Allocation Report for Manufacturing Orders in Odoo 18
Celine George
 
infertility, types,causes, impact, and management
Ritu480198
 
Vani - The Voice of Excellence - Jul 2025 issue
Savipriya Raghavendra
 
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
How to Create a Customer From Website in Odoo 18.pptx
Celine George
 
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
HUMAN RESOURCE MANAGEMENT: RECRUITMENT, SELECTION, PLACEMENT, DEPLOYMENT, TRA...
PRADEEP ABOTHU
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
Identifying elements in the story. Arrange the events in the story
geraldineamahido2
 
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
EDUCATIONAL MEDIA/ TEACHING AUDIO VISUAL AIDS
Sonali Gupta
 
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
Controller Request and Response in Odoo18
Celine George
 
Mahidol_Change_Agent_Note_2025-06-27-29_MUSEF
Tassanee Lerksuthirat
 

Python PPT

  • 1. Python for Big Data Analytics www.edureka.in/python View Complete Course at : www.edureka.in/python * Post your Questions on Twitter on @edurekaIN: #askEdureka
  • 2. Objectives of this Session • Un • Why Python? • Web Scrapping example using Python • Pydoop : Python API for Hadoop • Word Count example in Pydoop • Data Science with Python • Zombie Invasion modeling using Python For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN www.edureka.in/python
  • 3. Why Python?  Python is a great language for the beginner programmers since it is easy-to- learn and easy-to-maintain  Python’s biggest strength is that bulk of it’s library is portable. It also supports GUI Programming and can be used to create Applications portable on Mac, Windows and Unix X-Windows system  With libraries like PyDoop and SciPy, it’s a dream come true for Big Data Analytics Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 4. Growing Interest in Python Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 5. Demo: Web Scraping using Python  This example demonstrates how to scrape basic financial data from https://www.google.com/finance website from a given list of Companies.  We shall use open source web scraping framework for Python called Beautiful Soup to crawl and extract data from webpages.  Scraping is used for a wide range of purposes, from data mining to monitoring and automated testing. Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 6. Demo: Collecting Tweets using Python Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python  This example demonstrates how to extract historical tweets for a particular brand like “nike” or “apple”  We shall make a REST API call to twitter to extract tweets.  This data can be further used to perform sentiment analysis for a particular brand on Twitter.
  • 7. Big Data  Lots of Data (Terabytes or Petabytes)  Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. cloud tools statistics No SQL compression storage support database analize information terabytes processing mobile Big Data Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 8. Un-Structured Data is Exploding Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 9. Big Data Scenarios : Hospital Care Hospitals are analyzing medical data and patient records to predict those patients that are likely to seek readmission within a few months of discharge. The hospital can then intervene in hopes of preventing another costly hospital stay. Medical diagnostics company analyzes millions of lines of data to develop first non-intrusive test for predicting coronary artery disease. To do so, researchers at the company analyzed over 100 million gene samples to ultimately identify the 23 primary predictive genes for coronary artery disease Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 10. http://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png Amazon has an unrivalled bank of data on online consumer purchasing behaviour that it can mine from its 152 million customer accounts. Amazon also uses Big Data to monitor, track and secure its 1.5 billion items in its retail store that are laying around it 200 fulfilment centres around the world. Amazon stores the product catalogue data in S3. S3 can write, read and delete objects up to 5 TB of data each. The catalogue stored in S3 receives more than 50 million updates a week and every 30 minutes all data received is crunched and reported back to the different warehouses and the website. Big Data Scenarios : Amazon.com Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 11. http://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png Netflix uses 1 petabyte to store the videos for streaming. BitTorrent Sync has transferred over 30 petabytes of data since its pre-alpha release in January 2013. The 2009 movie Avatar is reported to have taken over 1 petabyte of local storage at Weta Digital for the rendering of the 3D CGI effects. One petabyte of average MP3-encoded songs (for mobile, roughly one megabyte per minute), would require 2000 years to play. Big Data Scenarios: NetFlix Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 12.  IBM’s Definition – Big Data Characteristics http://www-01.ibm.com/software/data/bigdata/ Web logs Images Videos Audios Sensor Data Volume Velocity Variety IBM’s Definition Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 13. Hadoop for Big Data  Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.  It is an Open-source Data Management with scale-out storage & distributed processing. Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 14. Hadoop and MapReduce Hadoop is a system for large scale data processing. It has two main components:  HDFS – Hadoop Distributed File System (Storage)  Distributed across “nodes”  Natively redundant  NameNode tracks locations.  MapReduce (Processing)  Splits a task across processors  “near” the data & assembles results  Self-Healing, High Bandwidth  Clustered storage  Job Tracker manages the Task Trackers Map-Reduce Key Value Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 15. PyDoop – Hadoop with Python  PyDoop package provides a Python API for Hadoop MapReduce and HDFS  PyDoop has several advantages over Hadoop’s built-in solutions for Python programming, i.e., Hadoop Streaming and Jython.  One of the biggest advantage of PyDoop is it’s HDFS API. This allows you to connect to an HDFS installation, read and write files, and get information on files, directories and global file system properties.  The MapReduce API of PyDoop allows you to solve many complex problems with minimal programming efforts. Advance MapReduce concepts such as ‘Counters’ and ‘Record Readers’ can be implemented in Python using PyDoop. Python can be used to write Hadoop MapReduce programs and applications to access HDFS API for Hadoop with PyDoop package. Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 16. Demo: Word Count using Hadoop Streaming API Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python  The example shows the simple word count application written in Python.  We shall use Hadoop Streaming APIs to run mapreduce code written in Python.  Word Count application can be used to index text documents/files for a given “search query”.
  • 17. Python and Data Science  Python is an excellent choice for Data Scientist to do his day-to-day activities as it provides libraries to do all these things.  Python has a diverse range of open source libraries for just about everything that a Data Scientist does in his day-to-day work.  Python and most of its libraries are both open source and free. The day-to-day tasks of a data scientist involves many interrelated but different activities such as accessing and manipulating data, computing statistics and , creating visual reports on that data, building predictive and explanatory models, evaluating these models on additional data, integrating models into production systems, etc. Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 18. SciPy.org SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 19. Demo: Zombie Invasion Model Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python This is a lighthearted example, a system of ODEs(Ordinary differential equations) can be used to model a "zombie invasion", using the equations specified by Philip Munz. The system is given as: dS/dt = P - B*S*Z - d*S dZ/dt = B*S*Z + G*R - A*S*Z dR/dt = d*S + A*S*Z - G*R There are three scenarios given in the program to show how Zombie Apocalypse vary with different initial conditions. This involves solving a system of first order ODEs given by: dy/dt = f(y, t) Where y = [S, Z, R]. Where: S: the number of susceptible victims Z: the number of zombies R: the number of people "killed” P: the population birth rate d: the chance of a natural death B: the chance the "zombie disease" is transmitted (an alive person becomes a zombie) G: the chance a dead person is resurrected into a zombie A: the chance a zombie is totally destroyed
  • 20. Questions? www.edureka.in/python Complete Course curriculum at : www.edureka.in/python Post your Questions on Twitter on @edurekaIN: #askEdureka