SlideShare a Scribd company logo
BIG DATA ANALYTICS
USING HADOOP
SUBMITTED BY
V.N.V. SRIKANTH
138W1A12B4
ABSTRACT
• Big data analytics is the process of examining large data sets
containing a variety of data types i.e., big data to uncover
hidden patterns, unknown correlations, market trends,
customer preferences and other useful business information.
• The analytical findings can lead to more effective marketing,
new revenue opportunities, better customer service, improved
operational efficiency, competitive advantages over rival
organizations and other business benefits
• Many big data projects originate from the need to answer
specific business questions.With the right big data analytics
platforms in place, an enterprise can boost sales, increase
efficiency, and improve operations, customer service and risk
management
Big Data Analytics Using Hadoop
MOTIVATION
• By using big data analytics you can extract only the relevant
information from terabytes, petabytes and exabytes, and
analyze it to transform your business decisions for the future.
• With the right big data analytics platforms in place, an
enterprise can boost sales, increase efficiency, and improve
operations, customer service and risk management.
0
2
4
6
Category 1 Category 2 Category 3 Category 4
Series 1 Series 2 Series 3
• Technologies that includes
Hadoop and related tools such as
YARN, MapReduce, Spark, Hive
and Pig as well as NoSQL
databases supports the
processing of large and diverse
data sets across clustered
systems
PROBLEM STATEMENT
• The first challenge is in breaking down data silos to access all
data an organization stores in different places and often in
different systems.
• A second big data challenge is in creating platforms that can
pull in unstructured data as easily as structured data.
• This massive volume of data is typically so large that it's
difficult to process using traditional database and software
methods.
PROBLEM STATEMENT(cont..)
• The above challenges can be overcome by the
implementation of following technologies
Parallel Database Technologies
Map Reduce
• The best open source tools available are
Big Data Analytics Using Hadoop
1996
1996
1997
1996
KNOWLEDGE FROM LITERATURE SURVEY
1998
2013
KNOWLEDGE FROM LITERATURE
SURVEY(CONT..)
• 2004- Initial versions of HDFS and MapReduce were
implemented.
• 2005-used GFS and MapReduce to perform operations.
• 2006-Yahoo! created Hadoop based on GFS and MapReduce .
• 2007 -Yahoo started using Hadoop on a 1000 node cluster.
• 2008- Apache took over Hadoop,Tested a 4000 node cluster with
it
• 2009- successfully sorted a peta byte of data in less than 17 hours
to handle billions of searches and indexing millions of web pages.
• 2011 - Hadoop releases version 1.0
• 2013 -Version 2.0.6 is available
KNOWLEDGE FROM LITERATURE
SURVEY(CONT..)
2003
2004
2006
LITERATURE SURVEY METHODS
Methods Author Year
RDBMS
(Relational Data Base Management
Systems)
E.F.CODD 1980
GRID COMPUTING IANFOSTER,
CARL KESSELMAN
(Early) 1990s
Volunteer computing Luis F. G. Sarmenta 1996
hadoop HDFS Sanjay Ghemawat, Howard Gobioff, Shun-
Tak Leung
2003
hadoop MapReduce Jefry Dean and Sanjay Ghemawat 2004
Apache Hadoop Doug Cutting
&
Mike Cafarella
2011
LITERATURE SURVEY METHODS(CONT..)
•Hardware Failure:
As soon as we start using many pieces of hardware,
the chance that one will fail is fairly high.
• Combine the data after analysis:
Most analysis tasks need to be able to combine the
data in some way; data read from one disk may need
to be combined with the data from any of the other
99 disks.
DEMERITS OF PREVIOUS METHODS
Apache Hadoop is a framework for running applications on large cluster
built of commodity hardware.
A common way of avoiding data loss is through replication: redundant
copies of the data are kept by the system so that in the event of failure,
there is another copy available.The Hadoop Distributed Filesystem (HDFS),
takes care of this problem.
The second problem is solved by a simple programming model- Mapreduce.
Hadoop is the popular open source implementation of MapReduce, a
powerful tool designed for deep analysis and transformation of very large
data sets.
HADOOP ADVANTAGES
PROJECT IDEAS RELATEDTOTHETOPIC
•TrafficCongestion Control
•Hospital Management
•College Management Systems
CONCLUSION
By using big data analytics you can extract only the relevant
information from terabytes, petabytes and exabytes, and analyze it
to transform your business decisions for the future.
With the right big data analytics platforms in place, an enterprise can
boost sales, increase efficiency, and improve operations, customer
service and risk management.
Pros Cons
Cost Effective Cluster management is hard
Parallel processing Single point of failure
Fault tolerance Security issues
Scalability
REFERENCES
 https://en.wikipedia.org/wiki/Big_data
 http://searchbusinessanalytics.techtarget.com/definition/big-data-
analytics
 http://www.computerworld.com/article/2690856/big-data/8-big-trends-in-
big-data-analytics.html
 http://www.lunametrics.com/blog/2014/01/27/google-analytics-bigquery-
whys-hows/
 http://www.webopedia.com/TERM/B/big_data_analytics.html
 http://www.sas.com/en_us/insights/analytics/big-data-analytics.html
Big Data Analytics Using Hadoop

More Related Content

What's hot (20)

PPTX
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
DataWorks Summit
 
PPTX
Fundamentals of big data analytics and Hadoop
Archana Gopinath
 
PPTX
Big data Analytics Hadoop
Mishika Bharadwaj
 
PPTX
Big data analytics - hadoop
Vishwajeet Jadeja
 
PPTX
Big Data Analysis Patterns - TriHUG 6/27/2013
boorad
 
PPTX
TESTING IN BIG DATA WORLD
Konstantin Pletenev
 
PPT
Big Tools for Big Data
Lewis Crawford
 
PDF
What is hadoop
Asis Mohanty
 
PPT
Big Data: An Overview
C. Scyphers
 
PPTX
Big Data Analysis Patterns with Hadoop, Mahout and Solr
boorad
 
PPTX
Are you ready for BIG DATA?
Putchong Uthayopas
 
PPTX
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
 
PPTX
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
Dataconomy Media
 
PDF
Introduction to Big Data
Haluan Irsad
 
PDF
Big Data Final Presentation
17aroumougamh
 
PPT
Data Mining and Data Warehousing
Amdocs
 
PPTX
The Big Data Stack
Zubair Nabi
 
PDF
Lecture4 big data technology foundations
hktripathy
 
PPTX
Introduction to BIg Data and Hadoop
Amir Shaikh
 
PPTX
Big data analytics with hadoop volume 2
Imviplav
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
DataWorks Summit
 
Fundamentals of big data analytics and Hadoop
Archana Gopinath
 
Big data Analytics Hadoop
Mishika Bharadwaj
 
Big data analytics - hadoop
Vishwajeet Jadeja
 
Big Data Analysis Patterns - TriHUG 6/27/2013
boorad
 
TESTING IN BIG DATA WORLD
Konstantin Pletenev
 
Big Tools for Big Data
Lewis Crawford
 
What is hadoop
Asis Mohanty
 
Big Data: An Overview
C. Scyphers
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
boorad
 
Are you ready for BIG DATA?
Putchong Uthayopas
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
 
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
Dataconomy Media
 
Introduction to Big Data
Haluan Irsad
 
Big Data Final Presentation
17aroumougamh
 
Data Mining and Data Warehousing
Amdocs
 
The Big Data Stack
Zubair Nabi
 
Lecture4 big data technology foundations
hktripathy
 
Introduction to BIg Data and Hadoop
Amir Shaikh
 
Big data analytics with hadoop volume 2
Imviplav
 

Viewers also liked (14)

PPTX
What is Big Data?
Bernard Marr
 
PPTX
Service generated big data and big data-as-a-service
JYOTIR MOY
 
PDF
Big Data Analytics Using Hadoop Cluster On Amazon EMR
IMC Institute
 
PPTX
Introduction to Hadoop and Hadoop component
rebeccatho
 
PDF
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Cloudera, Inc.
 
PPTX
Big Data Analytics Proposal #1
Ziyad Saleh
 
PPTX
Virtual reality Presentation
Anand Akshay
 
PPTX
What is big data?
David Wellman
 
PPT
Big Data
NGDATA
 
PDF
Analytics Trends 2016: The next evolution
Deloitte United States
 
PPTX
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
PPTX
Big data ppt
Nasrin Hussain
 
PDF
State of the Word 2011
photomatt
 
PPTX
Slideshare ppt
Mandy Suzanne
 
What is Big Data?
Bernard Marr
 
Service generated big data and big data-as-a-service
JYOTIR MOY
 
Big Data Analytics Using Hadoop Cluster On Amazon EMR
IMC Institute
 
Introduction to Hadoop and Hadoop component
rebeccatho
 
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Cloudera, Inc.
 
Big Data Analytics Proposal #1
Ziyad Saleh
 
Virtual reality Presentation
Anand Akshay
 
What is big data?
David Wellman
 
Big Data
NGDATA
 
Analytics Trends 2016: The next evolution
Deloitte United States
 
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Big data ppt
Nasrin Hussain
 
State of the Word 2011
photomatt
 
Slideshare ppt
Mandy Suzanne
 
Ad

Similar to Big Data Analytics Using Hadoop (20)

PDF
Big data Question bank.pdf
Sitamarhi Institute of Technology
 
PDF
Big Data Analytics Unit I CCS334 Syllabus
Sunanthini Rajkumar
 
PDF
Hadoop Master Class : A concise overview
Abhishek Roy
 
PDF
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
PPTX
Foxvalley bigdata
Tom Rogers
 
PPTX
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 
PPTX
Modul_1_Introduction_to_Big_Data.pptx
NouhaElhaji1
 
PDF
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
PPTX
Big data analysis using hadoop cluster
Furqan Haider
 
PDF
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
 
PPT
Big Data & Hadoop
Krishna Sujeer
 
PDF
What is Hadoop & its Use cases-PromtpCloud
PromptCloud
 
PPT
Hadoop HDFS.ppt
6535ANURAGANURAG
 
PPT
Big data and hadoop
Prashanth Yennampelli
 
PPTX
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 
PPT
Oh! Session on Introduction to BIG Data
Prakalp Agarwal
 
PPTX
Big data analytics
ANAND PRAKASH
 
PDF
Big data and hadoop overvew
Kunal Khanna
 
PPTX
1.demystifying big data & hadoop
databloginfo
 
PPTX
Apache-Hadoop-Slides.pptx
MURINDANYISUDI
 
Big data Question bank.pdf
Sitamarhi Institute of Technology
 
Big Data Analytics Unit I CCS334 Syllabus
Sunanthini Rajkumar
 
Hadoop Master Class : A concise overview
Abhishek Roy
 
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
Foxvalley bigdata
Tom Rogers
 
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 
Modul_1_Introduction_to_Big_Data.pptx
NouhaElhaji1
 
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
Big data analysis using hadoop cluster
Furqan Haider
 
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
 
Big Data & Hadoop
Krishna Sujeer
 
What is Hadoop & its Use cases-PromtpCloud
PromptCloud
 
Hadoop HDFS.ppt
6535ANURAGANURAG
 
Big data and hadoop
Prashanth Yennampelli
 
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 
Oh! Session on Introduction to BIG Data
Prakalp Agarwal
 
Big data analytics
ANAND PRAKASH
 
Big data and hadoop overvew
Kunal Khanna
 
1.demystifying big data & hadoop
databloginfo
 
Apache-Hadoop-Slides.pptx
MURINDANYISUDI
 
Ad

Recently uploaded (20)

PPTX
big data eco system fundamentals of data science
arivukarasi
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
big data eco system fundamentals of data science
arivukarasi
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
BinarySearchTree in datastructures in detail
kichokuttu
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 

Big Data Analytics Using Hadoop

  • 1. BIG DATA ANALYTICS USING HADOOP SUBMITTED BY V.N.V. SRIKANTH 138W1A12B4
  • 2. ABSTRACT • Big data analytics is the process of examining large data sets containing a variety of data types i.e., big data to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. • The analytical findings can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations and other business benefits • Many big data projects originate from the need to answer specific business questions.With the right big data analytics platforms in place, an enterprise can boost sales, increase efficiency, and improve operations, customer service and risk management
  • 4. MOTIVATION • By using big data analytics you can extract only the relevant information from terabytes, petabytes and exabytes, and analyze it to transform your business decisions for the future. • With the right big data analytics platforms in place, an enterprise can boost sales, increase efficiency, and improve operations, customer service and risk management. 0 2 4 6 Category 1 Category 2 Category 3 Category 4 Series 1 Series 2 Series 3 • Technologies that includes Hadoop and related tools such as YARN, MapReduce, Spark, Hive and Pig as well as NoSQL databases supports the processing of large and diverse data sets across clustered systems
  • 5. PROBLEM STATEMENT • The first challenge is in breaking down data silos to access all data an organization stores in different places and often in different systems. • A second big data challenge is in creating platforms that can pull in unstructured data as easily as structured data. • This massive volume of data is typically so large that it's difficult to process using traditional database and software methods.
  • 6. PROBLEM STATEMENT(cont..) • The above challenges can be overcome by the implementation of following technologies Parallel Database Technologies Map Reduce • The best open source tools available are
  • 10. • 2004- Initial versions of HDFS and MapReduce were implemented. • 2005-used GFS and MapReduce to perform operations. • 2006-Yahoo! created Hadoop based on GFS and MapReduce . • 2007 -Yahoo started using Hadoop on a 1000 node cluster. • 2008- Apache took over Hadoop,Tested a 4000 node cluster with it • 2009- successfully sorted a peta byte of data in less than 17 hours to handle billions of searches and indexing millions of web pages. • 2011 - Hadoop releases version 1.0 • 2013 -Version 2.0.6 is available KNOWLEDGE FROM LITERATURE SURVEY(CONT..)
  • 12. Methods Author Year RDBMS (Relational Data Base Management Systems) E.F.CODD 1980 GRID COMPUTING IANFOSTER, CARL KESSELMAN (Early) 1990s Volunteer computing Luis F. G. Sarmenta 1996 hadoop HDFS Sanjay Ghemawat, Howard Gobioff, Shun- Tak Leung 2003 hadoop MapReduce Jefry Dean and Sanjay Ghemawat 2004 Apache Hadoop Doug Cutting & Mike Cafarella 2011 LITERATURE SURVEY METHODS(CONT..)
  • 13. •Hardware Failure: As soon as we start using many pieces of hardware, the chance that one will fail is fairly high. • Combine the data after analysis: Most analysis tasks need to be able to combine the data in some way; data read from one disk may need to be combined with the data from any of the other 99 disks. DEMERITS OF PREVIOUS METHODS
  • 14. Apache Hadoop is a framework for running applications on large cluster built of commodity hardware. A common way of avoiding data loss is through replication: redundant copies of the data are kept by the system so that in the event of failure, there is another copy available.The Hadoop Distributed Filesystem (HDFS), takes care of this problem. The second problem is solved by a simple programming model- Mapreduce. Hadoop is the popular open source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets. HADOOP ADVANTAGES
  • 15. PROJECT IDEAS RELATEDTOTHETOPIC •TrafficCongestion Control •Hospital Management •College Management Systems
  • 16. CONCLUSION By using big data analytics you can extract only the relevant information from terabytes, petabytes and exabytes, and analyze it to transform your business decisions for the future. With the right big data analytics platforms in place, an enterprise can boost sales, increase efficiency, and improve operations, customer service and risk management. Pros Cons Cost Effective Cluster management is hard Parallel processing Single point of failure Fault tolerance Security issues Scalability
  • 17. REFERENCES  https://en.wikipedia.org/wiki/Big_data  http://searchbusinessanalytics.techtarget.com/definition/big-data- analytics  http://www.computerworld.com/article/2690856/big-data/8-big-trends-in- big-data-analytics.html  http://www.lunametrics.com/blog/2014/01/27/google-analytics-bigquery- whys-hows/  http://www.webopedia.com/TERM/B/big_data_analytics.html  http://www.sas.com/en_us/insights/analytics/big-data-analytics.html