Data Mining: Implementation of Data Mining Techniques using RapidMiner software

Download as PPTX, PDF

1 like1,157 views

K-means and k-medoids clustering techniques are illustrated using RapidMiner tool and a Java application. K-means partitions data into k groups based on minimizing distance between data points and cluster centers. It assigns each data point to exactly one cluster. K-medoids is similar but uses actual data points as centers instead of means. Both require specifying the number of clusters k in advance and can be impacted by outliers, though k-medoids is less sensitive to outliers. The document demonstrates implementing both techniques using different software and compares the results.

Data & Analytics Technology

Data Mining:
Implementation of Data
Mining Techniques using
RapidMiner software
Prepared by
Mohammed Kharma

Definitions review
• Cluster: A collection of data objects
– similar (or related) to one another within the
same group
– dissimilar (or unrelated) to the objects in other
groups
• Cluster analysis
– Finding similarities between data according to the
characteristics found in the data and grouping
similar data objects into clusters

Clustering Methods
• Partitioning :
– Unsupervised learning algorithms, Construct various
partitions and then evaluate them by some criterion,
e.g., minimizing the sum of square errors
– Typical methods: k-means, k-medoids
• Hierarchical :
– Create a hierarchical decomposition of the set of data
(or objects) using some criterion
– Typical methods: Diana, Agnes, BIRCH, ROCK,
CAMELEON

Illustration & compression of 2
clustering technique using
Rapidminer tool and Java
application

illustrate of 2 clustering technique
using Rapidminer tool and Java
• K-means algorithm:
We performed two test
1. Using java program: program parameters
K = 2;
Data:
22 21
19 20
18 22
1 3
3 2

6
K-means Clustering
• Input: the number of clusters K and the collection of n
instances
• Output: a set of k clusters that minimizes the squared error
criterion
• Method:
– Arbitrarily choose k instances as the initial cluster centers
– Repeat
• (Re)assign each instance to the cluster to which the
instance is the most similar, based on the mean value of
the instances in the cluster
• Update cluster means (compute mean value of the
instances for each cluster)
– Until no change in the assignment
• Squared Error Criterion
– E = ∑i=1 k ∑ pЄCi |p-mi|2
– where mi are the cluster means and p are points in clusters

Continued-The result of K-Means-
RapidMiner

11
K-medoids
• Input: the number of clusters K and the collection of n
instances
• Output: A set of k clusters that minimizes the sum of the
dissimilarities of all the instances to their nearest medoids
• Method:
– Arbitrarily choose k instances as the initial medoids
– Repeat
• (Re)assign each remaining instance to the cluster with
the nearest medoid
• Randomly select a non-medoid instance, or
• Compute the total cost, S, of swapping Oj with Or
• If S<0 then swap Oj with Or to form the new set of k
medoids
– Until no change

Java Live Demo:
http://home.dei.polimi.it/matteucc/Clustering/t
utorial_html/AppletKM.html

Comparison
The results of both algorithms are the same
Both require K to be specified in the
input
K-medoids is less influenced by outliers in the
data
Both methods assign each instance exactly to
one cluster

More Related Content

What's hot (20)

PPTX

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf

PPTX

CS267_Graph_LabJaideepKatkar

PDF

How to use Map() Filter() and Reduce() functions in Python | EdurekaEdureka!

PPTX

Scaling out logistic regression with SparkBarak Gitsis

PPTX

MLconf NYC Xiangrui MengMLconf

ODP

Java - CollectionsAmith jayasekara

PDF

Joey gonzalez, graph lab, m lconf 2013MLconf

PDF

Flexible Memory Allocation in Kinetic Monte Carlo SimulationsAaron Craig

PDF

Gelly in Apache Flink Bay Area MeetupVasia Kalavri

PDF

[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka

PPTX

ArrayIama Marsian

PDF

Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf

PPT

Modelling Accessibility Performance in LTE networks, An Analytics Methodologyalien_gmx

PPTX

Unsupervised Learning: Clustering Experfy

PPT

[ppt]butest

PPTX

Quick and Heap Sort with examplesBst Ali

PDF

Generalized Linear Models with H2O Sri Ambati

PPTX

Introduction to Data ScienceSridhara R

PPTX

0415_seminar_DeepDPGHye-min Ahn

PDF

Heapsort quick sortDr Sandeep Kumar Poonia

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf

CS267_Graph_LabJaideepKatkar

How to use Map() Filter() and Reduce() functions in Python | EdurekaEdureka!

Scaling out logistic regression with SparkBarak Gitsis

MLconf NYC Xiangrui MengMLconf

Java - CollectionsAmith jayasekara

Joey gonzalez, graph lab, m lconf 2013MLconf

Flexible Memory Allocation in Kinetic Monte Carlo SimulationsAaron Craig

Gelly in Apache Flink Bay Area MeetupVasia Kalavri

[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka

ArrayIama Marsian

Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf

Modelling Accessibility Performance in LTE networks, An Analytics Methodologyalien_gmx

Unsupervised Learning: Clustering Experfy

[ppt]butest

Quick and Heap Sort with examplesBst Ali

Generalized Linear Models with H2O Sri Ambati

Introduction to Data ScienceSridhara R

0415_seminar_DeepDPGHye-min Ahn

Heapsort quick sortDr Sandeep Kumar Poonia

Viewers also liked (20)

PPTX

RapidminerGernot Schulmeister

PDF

Introduction to RapidMiner Studio V7geraldinegray

PPTX

RapidMiner: Introduction To Rapid MinerRapidmining Content

PPT

Data mining toolssuganmca14

PDF

Slides PAPIs.io'14 RapidMinerSabrina Kirstein

PPTX

Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Ma...Cloudera, Inc.

PPTX

Data mining tools overallMohamed Sharique Vellikan

PPTX

M Chambers and RapidMiner Overview for Babson classmcAnalytics99

PDF

RapidMiner, an entrance to explore MIMIC-III?Sven Van Poucke, MD, PhD

PDF

Data Analytics.01. Data selection and captureAlex Rayón Jerez

PPTX

Predictive ModellingRajiv Advani

PDF

Predictive Modeling and Analytics select_chaptersJeffrey Strickland, Ph.D., CMSP

PPTX

predictive modelsJeffrey Strickland, Ph.D., CMSP

PDF

Introduction to Text Classification with RapidMiner Studio 7Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

PDF

My First Data Science Project (using Rapid Miner)Data Science Thailand

PDF

Search Twitter with RapidMiner Studio 6Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

PPTX

Predictive Analytics World Berlin 2016 Rising Media Ltd.

PPTX

Introduction to predictive modeling v1Venkata Reddy Konasani

PDF

Advanced Predictive Modeling with R and RapidMiner Studio 7Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

PPT

Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsSalah Amean

RapidminerGernot Schulmeister

Introduction to RapidMiner Studio V7geraldinegray

RapidMiner: Introduction To Rapid MinerRapidmining Content

Data mining toolssuganmca14

Slides PAPIs.io'14 RapidMinerSabrina Kirstein

Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Ma...Cloudera, Inc.

Data mining tools overallMohamed Sharique Vellikan

M Chambers and RapidMiner Overview for Babson classmcAnalytics99

RapidMiner, an entrance to explore MIMIC-III?Sven Van Poucke, MD, PhD

Data Analytics.01. Data selection and captureAlex Rayón Jerez

Predictive ModellingRajiv Advani

Predictive Modeling and Analytics select_chaptersJeffrey Strickland, Ph.D., CMSP

predictive modelsJeffrey Strickland, Ph.D., CMSP

Introduction to Text Classification with RapidMiner Studio 7Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

My First Data Science Project (using Rapid Miner)Data Science Thailand

Search Twitter with RapidMiner Studio 6Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

Predictive Analytics World Berlin 2016 Rising Media Ltd.

Introduction to predictive modeling v1Venkata Reddy Konasani

Advanced Predictive Modeling with R and RapidMiner Studio 7Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsSalah Amean

Similar to Data Mining: Implementation of Data Mining Techniques using RapidMiner software (20)

PPT

ClusteringMeme Hei

PDF

Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor

PPTX

K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti

PPTX

K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti

PDF

Applications Of Clustering Techniques In Data Mining A Comparative StudyFiona Phillips

PPTX

Clustering - K-Means, DBSCANMedicaps University

PPTX

MODULE 4_ CLUSTERING.pptxnikshaikh786

PDF

ClusteringKiran Bhowmick

PDF

A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com

PDF

New Approach for K-mean and K-medoids AlgorithmEditor IJCATR

PDF

Experimental study of Data clustering using k- Means and modified algorithmsIJDKP

PPTX

Unsupervised learning Algorithms and Assumptionsrefedey275

PPTX

Data mining techniques unit vmalathieswaran29

PPTX

8clustering.pptxDeepanshuPatel19

PPTX

Chapter 10.1,2,3.pptxAmy Aung

PPTX

Introduction to Clustering . pptxHarsha Patil

PPT

Data mining concepts and techniques Chapter 10mqasimsheikh5

PDF

Clustering techniques data mining book ....ShaimaaMohamedGalal

PDF

Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes

PPTX

UNIT_V_Cluster Analysis.pptxsandeepsandy494692

ClusteringMeme Hei

Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor

K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti

Applications Of Clustering Techniques In Data Mining A Comparative StudyFiona Phillips

Clustering - K-Means, DBSCANMedicaps University

MODULE 4_ CLUSTERING.pptxnikshaikh786

ClusteringKiran Bhowmick

A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com

New Approach for K-mean and K-medoids AlgorithmEditor IJCATR

Experimental study of Data clustering using k- Means and modified algorithmsIJDKP

Unsupervised learning Algorithms and Assumptionsrefedey275

Data mining techniques unit vmalathieswaran29

8clustering.pptxDeepanshuPatel19

Chapter 10.1,2,3.pptxAmy Aung

Introduction to Clustering . pptxHarsha Patil

Data mining concepts and techniques Chapter 10mqasimsheikh5

Clustering techniques data mining book ....ShaimaaMohamedGalal

Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes

UNIT_V_Cluster Analysis.pptxsandeepsandy494692

More from Mohammed Kharma (8)

PPTX

Data Mining Project for student academic specialization and performanceMohammed Kharma

PPTX

Cloud Computing PresentationMohammed Kharma

DOCX

A data mining framework for fraud detection in telecom based on MapReduce (Pr...Mohammed Kharma

DOC

How to speedup GWT compilerMohammed Kharma

PDF

37 c 551 - reduced changes in the carrier of steganography algorithmMohammed Kharma

DOCX

Learning objects and metadata framework - Mohammed KharmaMohammed Kharma

PPT

Mohammed Kharma-A flexible framework for quality assurance and testing of sof...Mohammed Kharma

DOC

Mohammed Kharma - A flexible framework for quality assurance and testing of s...Mohammed Kharma

Data Mining Project for student academic specialization and performanceMohammed Kharma

Cloud Computing PresentationMohammed Kharma

A data mining framework for fraud detection in telecom based on MapReduce (Pr...Mohammed Kharma

How to speedup GWT compilerMohammed Kharma

37 c 551 - reduced changes in the carrier of steganography algorithmMohammed Kharma

Learning objects and metadata framework - Mohammed KharmaMohammed Kharma

Mohammed Kharma-A flexible framework for quality assurance and testing of sof...Mohammed Kharma

Mohammed Kharma - A flexible framework for quality assurance and testing of s...Mohammed Kharma

Recently uploaded (20)

PDF

Data Science Course Certificate by Sigma Software UniversityStepan Kalika

PDF

apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...apidays

PPTX

ER_Model_with_Diagrams_Presentation.pptxdharaadhvaryu1992

PPTX

apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...apidays

PDF

Optimizing Large Language Models with vLLM and Related Tools.pdfTamanna36

PPTX

apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...apidays

PDF

Simplifying Document Processing with Docling for AI Applications.pdfTamanna36

PPTX

apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...apidays

PPTX

Aict presentation on dpplppp sjdhfh.pptxvabaso5932

PDF

OPPOTUS - Malaysias on Malaysia 1Q2025.pdfOppotus

PPT

tuberculosiship-2106031cyyfuftufufufivifvivivAkshaiRam

PPTX

BinarySearchTree in datastructures in detailkichokuttu

PPTX

apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...apidays

PPTX

Listify-Intelligent-Voice-to-Catalog-Agent.pptxnareshkottees

PPTX

apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...apidays

PDF

Development and validation of the Japanese version of the Organizational Matt...Yoga Tokuyoshi

PPTX

SlideEgg_501298-Agentic AI.pptx agentic ai530BYManoj

PDF

A GraphRAG approach for Energy Efficiency Q&AMarco Brambilla

PDF

The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...Lal Chandran

PPTX

apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...apidays

Data Science Course Certificate by Sigma Software UniversityStepan Kalika

apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...apidays

ER_Model_with_Diagrams_Presentation.pptxdharaadhvaryu1992

apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...apidays

Optimizing Large Language Models with vLLM and Related Tools.pdfTamanna36

apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...apidays

Simplifying Document Processing with Docling for AI Applications.pdfTamanna36

apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...apidays

Aict presentation on dpplppp sjdhfh.pptxvabaso5932

OPPOTUS - Malaysias on Malaysia 1Q2025.pdfOppotus

tuberculosiship-2106031cyyfuftufufufivifvivivAkshaiRam

BinarySearchTree in datastructures in detailkichokuttu

apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...apidays

Listify-Intelligent-Voice-to-Catalog-Agent.pptxnareshkottees

apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...apidays

Development and validation of the Japanese version of the Organizational Matt...Yoga Tokuyoshi

SlideEgg_501298-Agentic AI.pptx agentic ai530BYManoj

A GraphRAG approach for Energy Efficiency Q&AMarco Brambilla

The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...Lal Chandran

apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...apidays

Data Mining: Implementation of Data Mining Techniques using RapidMiner software

1. Data Mining: Implementation of Data Mining Techniques using RapidMiner software Prepared by Mohammed Kharma

2. Definitions review • Cluster: A collection of data objects – similar (or related) to one another within the same group – dissimilar (or unrelated) to the objects in other groups • Cluster analysis – Finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters

3. Clustering Methods • Partitioning : – Unsupervised learning algorithms, Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods: k-means, k-medoids • Hierarchical : – Create a hierarchical decomposition of the set of data (or objects) using some criterion – Typical methods: Diana, Agnes, BIRCH, ROCK, CAMELEON

4. Illustration & compression of 2 clustering technique using Rapidminer tool and Java application

5. illustrate of 2 clustering technique using Rapidminer tool and Java • K-means algorithm: We performed two test 1. Using java program: program parameters K = 2; Data: 22 21 19 20 18 22 1 3 3 2

6. 6 K-means Clustering • Input: the number of clusters K and the collection of n instances • Output: a set of k clusters that minimizes the squared error criterion • Method: – Arbitrarily choose k instances as the initial cluster centers – Repeat • (Re)assign each instance to the cluster to which the instance is the most similar, based on the mean value of the instances in the cluster • Update cluster means (compute mean value of the instances for each cluster) – Until no change in the assignment • Squared Error Criterion – E = ∑i=1 k ∑ pЄCi |p-mi|2 – where mi are the cluster means and p are points in clusters

7. The result K-Means-java program

8. The result of K-Means-RapidMiner

9. The result of K-Means-RapidMiner

10. Continued-The result of K-Means- RapidMiner

11. 11 K-medoids • Input: the number of clusters K and the collection of n instances • Output: A set of k clusters that minimizes the sum of the dissimilarities of all the instances to their nearest medoids • Method: – Arbitrarily choose k instances as the initial medoids – Repeat • (Re)assign each remaining instance to the cluster with the nearest medoid • Randomly select a non-medoid instance, or • Compute the total cost, S, of swapping Oj with Or • If S<0 then swap Oj with Or to form the new set of k medoids – Until no change

12. The result of k-medoids-RapidMiner

13. The result of k-medoids-RapidMiner

14. Java Live Demo: http://home.dei.polimi.it/matteucc/Clustering/t utorial_html/AppletKM.html

15. Comparison The results of both algorithms are the same Both require K to be specified in the input K-medoids is less influenced by outliers in the data Both methods assign each instance exactly to one cluster

16. »Thank you