SlideShare a Scribd company logo
Data Mining:
Implementation of Data
Mining Techniques using
RapidMiner software
Prepared by
Mohammed Kharma
Definitions review
• Cluster: A collection of data objects
– similar (or related) to one another within the
same group
– dissimilar (or unrelated) to the objects in other
groups
• Cluster analysis
– Finding similarities between data according to the
characteristics found in the data and grouping
similar data objects into clusters
Clustering Methods
• Partitioning :
– Unsupervised learning algorithms, Construct various
partitions and then evaluate them by some criterion,
e.g., minimizing the sum of square errors
– Typical methods: k-means, k-medoids
• Hierarchical :
– Create a hierarchical decomposition of the set of data
(or objects) using some criterion
– Typical methods: Diana, Agnes, BIRCH, ROCK,
CAMELEON
Illustration & compression of 2
clustering technique using
Rapidminer tool and Java
application
illustrate of 2 clustering technique
using Rapidminer tool and Java
• K-means algorithm:
We performed two test
1. Using java program: program parameters
K = 2;
Data:
22 21
19 20
18 22
1 3
3 2
6
K-means Clustering
• Input: the number of clusters K and the collection of n
instances
• Output: a set of k clusters that minimizes the squared error
criterion
• Method:
– Arbitrarily choose k instances as the initial cluster centers
– Repeat
• (Re)assign each instance to the cluster to which the
instance is the most similar, based on the mean value of
the instances in the cluster
• Update cluster means (compute mean value of the
instances for each cluster)
– Until no change in the assignment
• Squared Error Criterion
– E = ∑i=1 k ∑ pЄCi |p-mi|2
– where mi are the cluster means and p are points in clusters
The result K-Means-java program
The result of K-Means-RapidMiner
The result of K-Means-RapidMiner
Continued-The result of K-Means-
RapidMiner
11
K-medoids
• Input: the number of clusters K and the collection of n
instances
• Output: A set of k clusters that minimizes the sum of the
dissimilarities of all the instances to their nearest medoids
• Method:
– Arbitrarily choose k instances as the initial medoids
– Repeat
• (Re)assign each remaining instance to the cluster with
the nearest medoid
• Randomly select a non-medoid instance, or
• Compute the total cost, S, of swapping Oj with Or
• If S<0 then swap Oj with Or to form the new set of k
medoids
– Until no change
The result of k-medoids-RapidMiner
The result of k-medoids-RapidMiner
Java Live Demo:
http://home.dei.polimi.it/matteucc/Clustering/t
utorial_html/AppletKM.html
Comparison
The results of both algorithms are the same
Both require K to be specified in the
input
K-medoids is less influenced by outliers in the
data
Both methods assign each instance exactly to
one cluster
»Thank you

More Related Content

What's hot (20)

PPTX
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
MLconf
 
PPTX
CS267_Graph_Lab
JaideepKatkar
 
PDF
How to use Map() Filter() and Reduce() functions in Python | Edureka
Edureka!
 
PPTX
Scaling out logistic regression with Spark
Barak Gitsis
 
PPTX
MLconf NYC Xiangrui Meng
MLconf
 
ODP
Java - Collections
Amith jayasekara
 
PDF
Joey gonzalez, graph lab, m lconf 2013
MLconf
 
PDF
Flexible Memory Allocation in Kinetic Monte Carlo Simulations
Aaron Craig
 
PDF
Gelly in Apache Flink Bay Area Meetup
Vasia Kalavri
 
PDF
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
Daiki Tanaka
 
PPTX
Array
Iama Marsian
 
PDF
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 
PPT
Modelling Accessibility Performance in LTE networks, An Analytics Methodology
alien_gmx
 
PPTX
Unsupervised Learning: Clustering
Experfy
 
PPT
[ppt]
butest
 
PPTX
Quick and Heap Sort with examples
Bst Ali
 
PDF
Generalized Linear Models with H2O
Sri Ambati
 
PPTX
Introduction to Data Science
Sridhara R
 
PPTX
0415_seminar_DeepDPG
Hye-min Ahn
 
PDF
Heapsort quick sort
Dr Sandeep Kumar Poonia
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
MLconf
 
CS267_Graph_Lab
JaideepKatkar
 
How to use Map() Filter() and Reduce() functions in Python | Edureka
Edureka!
 
Scaling out logistic regression with Spark
Barak Gitsis
 
MLconf NYC Xiangrui Meng
MLconf
 
Java - Collections
Amith jayasekara
 
Joey gonzalez, graph lab, m lconf 2013
MLconf
 
Flexible Memory Allocation in Kinetic Monte Carlo Simulations
Aaron Craig
 
Gelly in Apache Flink Bay Area Meetup
Vasia Kalavri
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
Daiki Tanaka
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 
Modelling Accessibility Performance in LTE networks, An Analytics Methodology
alien_gmx
 
Unsupervised Learning: Clustering
Experfy
 
[ppt]
butest
 
Quick and Heap Sort with examples
Bst Ali
 
Generalized Linear Models with H2O
Sri Ambati
 
Introduction to Data Science
Sridhara R
 
0415_seminar_DeepDPG
Hye-min Ahn
 
Heapsort quick sort
Dr Sandeep Kumar Poonia
 

Viewers also liked (20)

PPTX
Rapidminer
Gernot Schulmeister
 
PDF
Introduction to RapidMiner Studio V7
geraldinegray
 
PPTX
RapidMiner: Introduction To Rapid Miner
Rapidmining Content
 
PPT
Data mining tools
suganmca14
 
PDF
Slides PAPIs.io'14 RapidMiner
Sabrina Kirstein
 
PPTX
Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Ma...
Cloudera, Inc.
 
PPTX
Data mining tools overall
Mohamed Sharique Vellikan
 
PPTX
M Chambers and RapidMiner Overview for Babson class
mcAnalytics99
 
PDF
RapidMiner, an entrance to explore MIMIC-III?
Sven Van Poucke, MD, PhD
 
PDF
Data Analytics.01. Data selection and capture
Alex Rayón Jerez
 
PPTX
Predictive Modelling
Rajiv Advani
 
PDF
Predictive Modeling and Analytics select_chapters
Jeffrey Strickland, Ph.D., CMSP
 
PPTX
predictive models
Jeffrey Strickland, Ph.D., CMSP
 
PDF
Introduction to Text Classification with RapidMiner Studio 7
Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University
 
PDF
My First Data Science Project (using Rapid Miner)
Data Science Thailand
 
PPTX
Predictive Analytics World Berlin 2016
Rising Media Ltd.
 
PPTX
Introduction to predictive modeling v1
Venkata Reddy Konasani
 
PDF
Advanced Predictive Modeling with R and RapidMiner Studio 7
Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University
 
PPT
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Salah Amean
 
Introduction to RapidMiner Studio V7
geraldinegray
 
RapidMiner: Introduction To Rapid Miner
Rapidmining Content
 
Data mining tools
suganmca14
 
Slides PAPIs.io'14 RapidMiner
Sabrina Kirstein
 
Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Ma...
Cloudera, Inc.
 
Data mining tools overall
Mohamed Sharique Vellikan
 
M Chambers and RapidMiner Overview for Babson class
mcAnalytics99
 
RapidMiner, an entrance to explore MIMIC-III?
Sven Van Poucke, MD, PhD
 
Data Analytics.01. Data selection and capture
Alex Rayón Jerez
 
Predictive Modelling
Rajiv Advani
 
Predictive Modeling and Analytics select_chapters
Jeffrey Strickland, Ph.D., CMSP
 
Introduction to Text Classification with RapidMiner Studio 7
Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University
 
My First Data Science Project (using Rapid Miner)
Data Science Thailand
 
Predictive Analytics World Berlin 2016
Rising Media Ltd.
 
Introduction to predictive modeling v1
Venkata Reddy Konasani
 
Advanced Predictive Modeling with R and RapidMiner Studio 7
Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Salah Amean
 
Ad

Similar to Data Mining: Implementation of Data Mining Techniques using RapidMiner software (20)

PPT
Clustering
Meme Hei
 
PDF
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
PPTX
K- means clustering method based Data Mining of Network Shared Resources .pptx
SaiPragnaKancheti
 
PPTX
K- means clustering method based Data Mining of Network Shared Resources .pptx
SaiPragnaKancheti
 
PDF
Applications Of Clustering Techniques In Data Mining A Comparative Study
Fiona Phillips
 
PPTX
Clustering - K-Means, DBSCAN
Medicaps University
 
PPTX
MODULE 4_ CLUSTERING.pptx
nikshaikh786
 
PDF
Clustering
Kiran Bhowmick
 
PDF
A survey on Efficient Enhanced K-Means Clustering Algorithm
ijsrd.com
 
PDF
New Approach for K-mean and K-medoids Algorithm
Editor IJCATR
 
PDF
Experimental study of Data clustering using k- Means and modified algorithms
IJDKP
 
PPTX
Unsupervised learning Algorithms and Assumptions
refedey275
 
PPTX
Data mining techniques unit v
malathieswaran29
 
PPTX
8clustering.pptx
DeepanshuPatel19
 
PPTX
Chapter 10.1,2,3.pptx
Amy Aung
 
PPTX
Introduction to Clustering . pptx
Harsha Patil
 
PPT
Data mining concepts and techniques Chapter 10
mqasimsheikh5
 
PDF
Clustering techniques data mining book ....
ShaimaaMohamedGalal
 
PDF
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
theijes
 
PPTX
UNIT_V_Cluster Analysis.pptx
sandeepsandy494692
 
Clustering
Meme Hei
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
SaiPragnaKancheti
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
SaiPragnaKancheti
 
Applications Of Clustering Techniques In Data Mining A Comparative Study
Fiona Phillips
 
Clustering - K-Means, DBSCAN
Medicaps University
 
MODULE 4_ CLUSTERING.pptx
nikshaikh786
 
Clustering
Kiran Bhowmick
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
ijsrd.com
 
New Approach for K-mean and K-medoids Algorithm
Editor IJCATR
 
Experimental study of Data clustering using k- Means and modified algorithms
IJDKP
 
Unsupervised learning Algorithms and Assumptions
refedey275
 
Data mining techniques unit v
malathieswaran29
 
8clustering.pptx
DeepanshuPatel19
 
Chapter 10.1,2,3.pptx
Amy Aung
 
Introduction to Clustering . pptx
Harsha Patil
 
Data mining concepts and techniques Chapter 10
mqasimsheikh5
 
Clustering techniques data mining book ....
ShaimaaMohamedGalal
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
theijes
 
UNIT_V_Cluster Analysis.pptx
sandeepsandy494692
 
Ad

More from Mohammed Kharma (8)

PPTX
Data Mining Project for student academic specialization and performance
Mohammed Kharma
 
PPTX
Cloud Computing Presentation
Mohammed Kharma
 
DOCX
A data mining framework for fraud detection in telecom based on MapReduce (Pr...
Mohammed Kharma
 
DOC
How to speedup GWT compiler
Mohammed Kharma
 
PDF
37 c 551 - reduced changes in the carrier of steganography algorithm
Mohammed Kharma
 
DOCX
Learning objects and metadata framework - Mohammed Kharma
Mohammed Kharma
 
PPT
Mohammed Kharma-A flexible framework for quality assurance and testing of sof...
Mohammed Kharma
 
DOC
Mohammed Kharma - A flexible framework for quality assurance and testing of s...
Mohammed Kharma
 
Data Mining Project for student academic specialization and performance
Mohammed Kharma
 
Cloud Computing Presentation
Mohammed Kharma
 
A data mining framework for fraud detection in telecom based on MapReduce (Pr...
Mohammed Kharma
 
How to speedup GWT compiler
Mohammed Kharma
 
37 c 551 - reduced changes in the carrier of steganography algorithm
Mohammed Kharma
 
Learning objects and metadata framework - Mohammed Kharma
Mohammed Kharma
 
Mohammed Kharma-A flexible framework for quality assurance and testing of sof...
Mohammed Kharma
 
Mohammed Kharma - A flexible framework for quality assurance and testing of s...
Mohammed Kharma
 

Recently uploaded (20)

PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna36
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna36
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
BinarySearchTree in datastructures in detail
kichokuttu
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 

Data Mining: Implementation of Data Mining Techniques using RapidMiner software

  • 1. Data Mining: Implementation of Data Mining Techniques using RapidMiner software Prepared by Mohammed Kharma
  • 2. Definitions review • Cluster: A collection of data objects – similar (or related) to one another within the same group – dissimilar (or unrelated) to the objects in other groups • Cluster analysis – Finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters
  • 3. Clustering Methods • Partitioning : – Unsupervised learning algorithms, Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods: k-means, k-medoids • Hierarchical : – Create a hierarchical decomposition of the set of data (or objects) using some criterion – Typical methods: Diana, Agnes, BIRCH, ROCK, CAMELEON
  • 4. Illustration & compression of 2 clustering technique using Rapidminer tool and Java application
  • 5. illustrate of 2 clustering technique using Rapidminer tool and Java • K-means algorithm: We performed two test 1. Using java program: program parameters K = 2; Data: 22 21 19 20 18 22 1 3 3 2
  • 6. 6 K-means Clustering • Input: the number of clusters K and the collection of n instances • Output: a set of k clusters that minimizes the squared error criterion • Method: – Arbitrarily choose k instances as the initial cluster centers – Repeat • (Re)assign each instance to the cluster to which the instance is the most similar, based on the mean value of the instances in the cluster • Update cluster means (compute mean value of the instances for each cluster) – Until no change in the assignment • Squared Error Criterion – E = ∑i=1 k ∑ pЄCi |p-mi|2 – where mi are the cluster means and p are points in clusters
  • 8. The result of K-Means-RapidMiner
  • 9. The result of K-Means-RapidMiner
  • 10. Continued-The result of K-Means- RapidMiner
  • 11. 11 K-medoids • Input: the number of clusters K and the collection of n instances • Output: A set of k clusters that minimizes the sum of the dissimilarities of all the instances to their nearest medoids • Method: – Arbitrarily choose k instances as the initial medoids – Repeat • (Re)assign each remaining instance to the cluster with the nearest medoid • Randomly select a non-medoid instance, or • Compute the total cost, S, of swapping Oj with Or • If S<0 then swap Oj with Or to form the new set of k medoids – Until no change
  • 12. The result of k-medoids-RapidMiner
  • 13. The result of k-medoids-RapidMiner
  • 15. Comparison The results of both algorithms are the same Both require K to be specified in the input K-medoids is less influenced by outliers in the data Both methods assign each instance exactly to one cluster