SlideShare a Scribd company logo
Handling Big Data Using a Data-Aware HDFS and
Evolutionary Clustering Technique
ABSTRACT:
The increased use of cyber-enabled systems and Internet-of-Things (IoT) led to a
massive amount of data with different structures. Most big data solutions are built
on top of the Hadoop eco-system or use its distributed file system (HDFS).
However, studies have shown inefficiency in such systems when dealing with
today’s data. Some research overcame these problems for specific types of graph
data, but today’s data are more than one type of data. Such efficiency issues lead to
large scale problems, including larger space required in data centers, and waste in
resources (like power consumption), that in turn lead to environmental problems
(such as more carbon emission), as per scholars. We propose a data-aware module
for the Hadoop eco-system. We also propose a distributed encoding technique for
Genetic Algorithms. Our framework allows Hadoop to manage the distribution of
data and its placement based on cluster analysis of the data itself. We are able to
handle a broad range of data types as well as optimize query time and resource
usage. We performed our experiments on multiple datasets generated via LUBM.
SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:
 System : i3 Processor
 Hard Disk : 500 GB.
 Monitor : 15’’ LED
 Input Devices : Keyboard, Mouse
 Ram : 4GB.
SOFTWARE REQUIREMENTS:
 Operating system : Windows 7/UBUNTU.
 Coding Language : Java 1.7 ,Hadoop 0.8.1
 IDE : Eclipse
 Database : MYSQL
REFERENCE:
Mustafa Hajeer, Member, IEEE, and Dipankar Dasgupta, Fellow, IEEE, “Handling
Big Data Using a Data-Aware HDFS and Evolutionary Clustering Technique”,
IEEE Transactions on Big Data, 2019.

More Related Content

What's hot (20)

PDF
Introduction to the Environmental Data Initiative (EDI)
Corinna Gries
 
PPT
GreenLight Data Collection Architecture
Jerry Sheehan
 
PDF
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Microsoft Azure for Research
 
PPTX
A4 r overview deck_1.7
Microsoft Azure for Research
 
PDF
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Microsoft Azure for Research
 
PPTX
Starfish-A self tuning system for bigdata analytics
sai Pramoda
 
PPTX
Hadoop Tutorial
Ujjwal Gupta
 
PPTX
Supporting Big Data, Open Data, Data Analytics and Data Science
Simon Price
 
DOCX
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
JAYAPRAKASH JPINFOTECH
 
PPT
Physical preservation with EPrints: 1 Storage, by Adam Field, David Tarrant, ...
JISC KeepIt project
 
PDF
NOVEL FUNCTIONAL DEPENDENCY APPROACH FOR STORAGE SPACE OPTIMISATION IN GREEN ...
Nurul Emran
 
PPTX
Class 1 - Introduction to Big data.pptx
tejayasam
 
DOCX
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
JAYAPRAKASH JPINFOTECH
 
PPTX
EcoTas13 Turner AEKOS
TERN Australia
 
DOCX
Hadoop bigdata projects list(ver)
S3 Infotech IEEE Projects
 
PPTX
Empowering Transformational Science
Chelle Gentemann
 
PPTX
PhD Projects in Green Cloud Computing Research Guidance
PhD Services
 
PPT
Usage Statistics for E-Resources: Is All That Data Meaningful? - Justin Clarke
Electronic Resources & Libraries
 
PDF
LEVERAGING DATA DEDUPLICATION TO IMPROVE THE PERFORMANCE OF PRIMARY STORAGE S...
Nexgen Technology
 
PPTX
SEEKing our way to better presentation of data and models from scientific inv...
Natalie Stanford
 
Introduction to the Environmental Data Initiative (EDI)
Corinna Gries
 
GreenLight Data Collection Architecture
Jerry Sheehan
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Microsoft Azure for Research
 
A4 r overview deck_1.7
Microsoft Azure for Research
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Microsoft Azure for Research
 
Starfish-A self tuning system for bigdata analytics
sai Pramoda
 
Hadoop Tutorial
Ujjwal Gupta
 
Supporting Big Data, Open Data, Data Analytics and Data Science
Simon Price
 
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
JAYAPRAKASH JPINFOTECH
 
Physical preservation with EPrints: 1 Storage, by Adam Field, David Tarrant, ...
JISC KeepIt project
 
NOVEL FUNCTIONAL DEPENDENCY APPROACH FOR STORAGE SPACE OPTIMISATION IN GREEN ...
Nurul Emran
 
Class 1 - Introduction to Big data.pptx
tejayasam
 
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
JAYAPRAKASH JPINFOTECH
 
EcoTas13 Turner AEKOS
TERN Australia
 
Hadoop bigdata projects list(ver)
S3 Infotech IEEE Projects
 
Empowering Transformational Science
Chelle Gentemann
 
PhD Projects in Green Cloud Computing Research Guidance
PhD Services
 
Usage Statistics for E-Resources: Is All That Data Meaningful? - Justin Clarke
Electronic Resources & Libraries
 
LEVERAGING DATA DEDUPLICATION TO IMPROVE THE PERFORMANCE OF PRIMARY STORAGE S...
Nexgen Technology
 
SEEKing our way to better presentation of data and models from scientific inv...
Natalie Stanford
 

Similar to Handling Big Data Using a Data-Aware HDFS and Evolutionary Clustering Technique (20)

PDF
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
IJCSIS Research Publications
 
PPTX
Introduction to Data Science: A Practical Approach to Big Data Analytics
Ivan Khvostishkov
 
PDF
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
oj08
 
PPTX
Big data processing system
shima jafari
 
PDF
Survey Paper on Big Data and Hadoop
IRJET Journal
 
PPTX
Introduction to Big Data and Hadoop
Edureka!
 
PPTX
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 
PPTX
Big Data and Hadoop Training in Bangalore by myTectra
myTectra Learning Solutions Private Ltd
 
PDF
Hadoop-2.6.0 Slides
kul prasad subedi
 
PPTX
Big data and apache hadoop adoption
faizrashid1995
 
PDF
An Efficient Approach for Clustering High Dimensional Data
IJSTA
 
PPTX
(Big) Data (Science) Skills
Oscar Corcho
 
PDF
Lesson 1 introduction to_big_data_and_hadoop.pptx
Pankajkumar496281
 
PPTX
selected topics in CS-CHaaapteerobe.pptx
BachaLamessaa
 
PDF
The Hadoop Ecosystem for Developers
Zohar Elkayam
 
PDF
Introduction to Big Data & Hadoop
Edureka!
 
DOCX
2014 IEEE JAVA DATA MINING PROJECT Data mining with big data
IEEEFINALYEARSTUDENTPROJECT
 
DOCX
IEEE 2014 JAVA DATA MINING PROJECTS Data mining with big data
IEEEFINALYEARSTUDENTPROJECTS
 
DOCX
2014 IEEE JAVA DATA MINING PROJECT Data mining with big data
IEEEMEMTECHSTUDENTSPROJECTS
 
PPT
Introduction to hadoop
karthika karthi
 
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
IJCSIS Research Publications
 
Introduction to Data Science: A Practical Approach to Big Data Analytics
Ivan Khvostishkov
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
oj08
 
Big data processing system
shima jafari
 
Survey Paper on Big Data and Hadoop
IRJET Journal
 
Introduction to Big Data and Hadoop
Edureka!
 
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 
Big Data and Hadoop Training in Bangalore by myTectra
myTectra Learning Solutions Private Ltd
 
Hadoop-2.6.0 Slides
kul prasad subedi
 
Big data and apache hadoop adoption
faizrashid1995
 
An Efficient Approach for Clustering High Dimensional Data
IJSTA
 
(Big) Data (Science) Skills
Oscar Corcho
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Pankajkumar496281
 
selected topics in CS-CHaaapteerobe.pptx
BachaLamessaa
 
The Hadoop Ecosystem for Developers
Zohar Elkayam
 
Introduction to Big Data & Hadoop
Edureka!
 
2014 IEEE JAVA DATA MINING PROJECT Data mining with big data
IEEEFINALYEARSTUDENTPROJECT
 
IEEE 2014 JAVA DATA MINING PROJECTS Data mining with big data
IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Data mining with big data
IEEEMEMTECHSTUDENTSPROJECTS
 
Introduction to hadoop
karthika karthi
 
Ad

More from JAYAPRAKASH JPINFOTECH (20)

PDF
Java Web Application Project Titles 2023-2024.pdf
JAYAPRAKASH JPINFOTECH
 
PDF
Dot Net Final Year IEEE Project Titles.pdf
JAYAPRAKASH JPINFOTECH
 
PDF
MATLAB Final Year IEEE Project Titles 2023 - 2024.pdf
JAYAPRAKASH JPINFOTECH
 
PDF
Python IEEE Project Titles 2023 - 2024.pdf
JAYAPRAKASH JPINFOTECH
 
PDF
Python ieee project titles 2021 - 2022 | Machine Learning Final Year Project...
JAYAPRAKASH JPINFOTECH
 
DOCX
Spammer detection and fake user Identification on Social Networks
JAYAPRAKASH JPINFOTECH
 
DOCX
Sentiment Classification using N-gram IDF and Automated Machine Learning
JAYAPRAKASH JPINFOTECH
 
DOCX
Privacy-Preserving Social Media DataPublishing for Personalized Ranking-Based...
JAYAPRAKASH JPINFOTECH
 
DOCX
FunkR-pDAE: Personalized Project Recommendation Using Deep Learning
JAYAPRAKASH JPINFOTECH
 
DOCX
Discovering the Type 2 Diabetes in Electronic Health Records using the Sparse...
JAYAPRAKASH JPINFOTECH
 
DOCX
Crop Yield Prediction and Efficient use of Fertilizers
JAYAPRAKASH JPINFOTECH
 
DOCX
Collaborative Filtering-based Electricity Plan Recommender System
JAYAPRAKASH JPINFOTECH
 
DOCX
Achieving Data Truthfulness and Privacy Preservation in Data Markets
JAYAPRAKASH JPINFOTECH
 
DOCX
V2V Routing in a VANET Based on the Auto regressive Integrated Moving Average...
JAYAPRAKASH JPINFOTECH
 
DOCX
Towards Fast and Reliable Multi-hop Routing in VANETs
JAYAPRAKASH JPINFOTECH
 
DOCX
Selective Authentication Based Geographic Opportunistic Routing in Wireless S...
JAYAPRAKASH JPINFOTECH
 
DOCX
Robust Defense Scheme Against Selective DropAttack in Wireless Ad Hoc Networks
JAYAPRAKASH JPINFOTECH
 
DOCX
Privacy-Preserving Cloud-based Road Condition Monitoring with Source Authenti...
JAYAPRAKASH JPINFOTECH
 
DOCX
Novel Intrusion Detection and Prevention for Mobile Ad Hoc Networks
JAYAPRAKASH JPINFOTECH
 
DOCX
Node-Level Trust Evaluation in Wireless Sensor Networks
JAYAPRAKASH JPINFOTECH
 
Java Web Application Project Titles 2023-2024.pdf
JAYAPRAKASH JPINFOTECH
 
Dot Net Final Year IEEE Project Titles.pdf
JAYAPRAKASH JPINFOTECH
 
MATLAB Final Year IEEE Project Titles 2023 - 2024.pdf
JAYAPRAKASH JPINFOTECH
 
Python IEEE Project Titles 2023 - 2024.pdf
JAYAPRAKASH JPINFOTECH
 
Python ieee project titles 2021 - 2022 | Machine Learning Final Year Project...
JAYAPRAKASH JPINFOTECH
 
Spammer detection and fake user Identification on Social Networks
JAYAPRAKASH JPINFOTECH
 
Sentiment Classification using N-gram IDF and Automated Machine Learning
JAYAPRAKASH JPINFOTECH
 
Privacy-Preserving Social Media DataPublishing for Personalized Ranking-Based...
JAYAPRAKASH JPINFOTECH
 
FunkR-pDAE: Personalized Project Recommendation Using Deep Learning
JAYAPRAKASH JPINFOTECH
 
Discovering the Type 2 Diabetes in Electronic Health Records using the Sparse...
JAYAPRAKASH JPINFOTECH
 
Crop Yield Prediction and Efficient use of Fertilizers
JAYAPRAKASH JPINFOTECH
 
Collaborative Filtering-based Electricity Plan Recommender System
JAYAPRAKASH JPINFOTECH
 
Achieving Data Truthfulness and Privacy Preservation in Data Markets
JAYAPRAKASH JPINFOTECH
 
V2V Routing in a VANET Based on the Auto regressive Integrated Moving Average...
JAYAPRAKASH JPINFOTECH
 
Towards Fast and Reliable Multi-hop Routing in VANETs
JAYAPRAKASH JPINFOTECH
 
Selective Authentication Based Geographic Opportunistic Routing in Wireless S...
JAYAPRAKASH JPINFOTECH
 
Robust Defense Scheme Against Selective DropAttack in Wireless Ad Hoc Networks
JAYAPRAKASH JPINFOTECH
 
Privacy-Preserving Cloud-based Road Condition Monitoring with Source Authenti...
JAYAPRAKASH JPINFOTECH
 
Novel Intrusion Detection and Prevention for Mobile Ad Hoc Networks
JAYAPRAKASH JPINFOTECH
 
Node-Level Trust Evaluation in Wireless Sensor Networks
JAYAPRAKASH JPINFOTECH
 
Ad

Recently uploaded (20)

PPTX
ROLE OF ANTIOXIDANT IN EYE HEALTH MANAGEMENT.pptx
Subham Panja
 
PDF
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
PDF
The-Beginnings-of-Indian-Civilisation.pdf/6th class new ncert social/by k san...
Sandeep Swamy
 
PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PDF
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
PDF
Zoology (Animal Physiology) practical Manual
raviralanaresh2
 
PPTX
How to Manage Promotions in Odoo 18 Sales
Celine George
 
PPTX
How to Configure Lost Reasons in Odoo 18 CRM
Celine George
 
PPTX
Latest Features in Odoo 18 - Odoo slides
Celine George
 
PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PPT
digestive system for Pharm d I year HAP
rekhapositivity
 
PPTX
SCHOOL-BASED SEXUAL HARASSMENT PREVENTION AND RESPONSE WORKSHOP
komlalokoe
 
PDF
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
PPTX
How to Create Rental Orders in Odoo 18 Rental
Celine George
 
PDF
community health nursing question paper 2.pdf
Prince kumar
 
PPTX
LEGAL ASPECTS OF PSYCHIATRUC NURSING.pptx
PoojaSen20
 
PPTX
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
PPTX
Gall bladder, Small intestine and Large intestine.pptx
rekhapositivity
 
PPTX
Presentation: Climate Citizenship Digital Education
Karl Donert
 
PPTX
Accounting Skills Paper-I, Preparation of Vouchers
Dr. Sushil Bansode
 
ROLE OF ANTIOXIDANT IN EYE HEALTH MANAGEMENT.pptx
Subham Panja
 
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
The-Beginnings-of-Indian-Civilisation.pdf/6th class new ncert social/by k san...
Sandeep Swamy
 
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
Zoology (Animal Physiology) practical Manual
raviralanaresh2
 
How to Manage Promotions in Odoo 18 Sales
Celine George
 
How to Configure Lost Reasons in Odoo 18 CRM
Celine George
 
Latest Features in Odoo 18 - Odoo slides
Celine George
 
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
digestive system for Pharm d I year HAP
rekhapositivity
 
SCHOOL-BASED SEXUAL HARASSMENT PREVENTION AND RESPONSE WORKSHOP
komlalokoe
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
How to Create Rental Orders in Odoo 18 Rental
Celine George
 
community health nursing question paper 2.pdf
Prince kumar
 
LEGAL ASPECTS OF PSYCHIATRUC NURSING.pptx
PoojaSen20
 
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
Gall bladder, Small intestine and Large intestine.pptx
rekhapositivity
 
Presentation: Climate Citizenship Digital Education
Karl Donert
 
Accounting Skills Paper-I, Preparation of Vouchers
Dr. Sushil Bansode
 

Handling Big Data Using a Data-Aware HDFS and Evolutionary Clustering Technique

  • 1. Handling Big Data Using a Data-Aware HDFS and Evolutionary Clustering Technique ABSTRACT: The increased use of cyber-enabled systems and Internet-of-Things (IoT) led to a massive amount of data with different structures. Most big data solutions are built on top of the Hadoop eco-system or use its distributed file system (HDFS). However, studies have shown inefficiency in such systems when dealing with today’s data. Some research overcame these problems for specific types of graph data, but today’s data are more than one type of data. Such efficiency issues lead to large scale problems, including larger space required in data centers, and waste in resources (like power consumption), that in turn lead to environmental problems (such as more carbon emission), as per scholars. We propose a data-aware module for the Hadoop eco-system. We also propose a distributed encoding technique for Genetic Algorithms. Our framework allows Hadoop to manage the distribution of data and its placement based on cluster analysis of the data itself. We are able to handle a broad range of data types as well as optimize query time and resource usage. We performed our experiments on multiple datasets generated via LUBM. SYSTEM REQUIREMENTS: HARDWARE REQUIREMENTS:  System : i3 Processor  Hard Disk : 500 GB.  Monitor : 15’’ LED
  • 2.  Input Devices : Keyboard, Mouse  Ram : 4GB. SOFTWARE REQUIREMENTS:  Operating system : Windows 7/UBUNTU.  Coding Language : Java 1.7 ,Hadoop 0.8.1  IDE : Eclipse  Database : MYSQL REFERENCE: Mustafa Hajeer, Member, IEEE, and Dipankar Dasgupta, Fellow, IEEE, “Handling Big Data Using a Data-Aware HDFS and Evolutionary Clustering Technique”, IEEE Transactions on Big Data, 2019.