SlideShare a Scribd company logo
K A LT H O O M A L M A Q B A L I & M AT T C O U RT N E Y
WEKA
WEB AND SOCIAL COMPUTING
OUTLINES
• Weka .
• REPTree results and compare it with different trees.
• REPTree analysis.
• JRIP results and compare it with different rules .
• JRIP rule analysis.
• Conclusion .
• Question & Answers .
WEKA
 A collection of machine learning algorithm
for data mining tasks.
 It is also well-suited for developing new
machine learning schemes.
 The algorithms can either be applied
directly to a dataset or called from your own
Java code.
WEKA
 Weka contains tools for :
• Data pre-processing: called “filters” ,for
discretization, normalization, resampling, attribute
selection, transformation and combination of
attributes.
• Data classification: for predicting nominal or numeric
quantities .
• Data clustering : for finding groups of similar
instances in a dataset.
• Data association : an implementation of the Apriori
algorithm for learning association rules.
• Attribute Selection :searches through all possible
combinations of attributes in the data and finds
which subset of attributes works best for prediction
• Data visualization :allows you to visualize a 2-D plot
of the current working relation , it is useful in
practice, which helps to determine difficulty of the
learning problem.
REDUCED ERROR PRUNING TREE (REPTREE)
• Uses regression tree logic.
• Splits data based on information gain or variance reduction.
• Creates multiple trees, then selects best.
• Prunes selected best with reduced error pruning.
• Results in a “fast” learning algorithm.
TEST DATA
AGGREGATE DATA
GRAPH - MEAN CORRECT CLASSIFICATIONS (%)
GRAPH - RANGE CORRECT CLASSIFICATIONS (%)
GRAPH - MEAN TIME (MS)
AGGREGATE DATA - RANKED
AGGREGATE DATA - RANKED
WHAT DOES THIS MEAN?
WHAT DOES THIS MEAN?
• REPTree is a fast learning algorithm.
• REPTree is 5x faster than J48 (average).
• J48 results in 1.05x more correct classifications (average).
• May be useful for initial passes of datasets to look for potential
knowledge gain with good potential for success. More likely than
J48 to do this in excellent time.
JRIP RULE
• Is a rule-based learner that builds a set of rules that identify
the classes while Repeated Incremental Pruning to Produce
Error Reduction.
• This algorithm was designed by Cohen in 1995 namely
• JRIP is especially more efficient on large noisy datasets .
• There are two kinds of loop in JRip algorithm: outer loop which
adds one rule at a time to the rule base and Inner loop adds
one condition at a time to the current rule.
•
TEST DATA
•
AGGREGATE DATA
AVERAGE GRAPH
•
• What I found out after applied JRIP and compared it with other
rules results ?
WHAT I FOUND ?
• JRIP rule is a fast learning algorithm.
• JRIP got good correct range in both mean and range time
)compare to other rules).
• JRIP is not the highest , not the lowest ( average ).
• Do I recommend to use JRIP rule ?
Q& A

More Related Content

PPT
3. mining frequent patterns
Azad public school
 
PPTX
Association rule mining.pptx
maha797959
 
PPT
2.4 rule based classification
Krish_ver2
 
PPTX
03 Data Mining Techniques
Valerii Klymchuk
 
PPT
Cure, Clustering Algorithm
Lino Possamai
 
PPT
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
PPTX
Decision trees
Jagjit Wilku
 
PPTX
Term weighting
Primya Tamil
 
3. mining frequent patterns
Azad public school
 
Association rule mining.pptx
maha797959
 
2.4 rule based classification
Krish_ver2
 
03 Data Mining Techniques
Valerii Klymchuk
 
Cure, Clustering Algorithm
Lino Possamai
 
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Decision trees
Jagjit Wilku
 
Term weighting
Primya Tamil
 

What's hot (20)

PPTX
Apriori algorithm
Gaurav Aggarwal
 
PDF
linear classification
nep_test_account
 
PPTX
Apriori algorithm
Mainul Hassan
 
PPT
Decision tree
Soujanya V
 
PPTX
Data mining query language
GowriLatha1
 
PPT
2.2 decision tree
Krish_ver2
 
PPTX
Clustering in Data Mining
Archana Swaminathan
 
PDF
Multivariate decision tree
Prafulla Shukla
 
PPTX
DMQL(Data Mining Query Language).pptx
Dr. Jasmine Beulah Gnanadurai
 
PPTX
CLUSTER SILHOUETTES.pptx
agniva pradhan
 
PPTX
Digital Forensic Case Study
MyAssignmenthelp.com
 
PPT
1.8 discretization
Krish_ver2
 
PPTX
K-Nearest Neighbor Classifier
Neha Kulkarni
 
PDF
Data mining
R A Akerkar
 
PPTX
WEB BASED INFORMATION RETRIEVAL SYSTEM
Sai Kumar Ale
 
PPT
DESIGN AND ANALYSIS OF ALGORITHMS
Gayathri Gaayu
 
PPTX
Merge sort
Rojin Khadka
 
PPTX
Data Mining: Mining ,associations, and correlations
Datamining Tools
 
PPT
Frequent itemset mining using pattern growth method
Shani729
 
Apriori algorithm
Gaurav Aggarwal
 
linear classification
nep_test_account
 
Apriori algorithm
Mainul Hassan
 
Decision tree
Soujanya V
 
Data mining query language
GowriLatha1
 
2.2 decision tree
Krish_ver2
 
Clustering in Data Mining
Archana Swaminathan
 
Multivariate decision tree
Prafulla Shukla
 
DMQL(Data Mining Query Language).pptx
Dr. Jasmine Beulah Gnanadurai
 
CLUSTER SILHOUETTES.pptx
agniva pradhan
 
Digital Forensic Case Study
MyAssignmenthelp.com
 
1.8 discretization
Krish_ver2
 
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Data mining
R A Akerkar
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
Sai Kumar Ale
 
DESIGN AND ANALYSIS OF ALGORITHMS
Gayathri Gaayu
 
Merge sort
Rojin Khadka
 
Data Mining: Mining ,associations, and correlations
Datamining Tools
 
Frequent itemset mining using pattern growth method
Shani729
 
Ad

Viewers also liked (20)

PPT
Data Mining Final Presentation
krampert
 
PPTX
Weka By Chathawee Luangmanotham 54102011144
Chathawee May
 
PPT
Data mining
Ahmed Moussa
 
PDF
Prospect Identification from a Credit Database using Regression, Decision Tre...
Akanksha Jain
 
PPTX
Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...
Triangle American Marketing Association
 
DOC
Loan Processing System
tenlaclgt
 
PPTX
Text classification with Weka
Milad Alshomary
 
PPTX
Random forest
Ujjawal
 
PDF
Tutorial weka
René Rojas Castillo
 
PDF
Weka presentation cmt111
Clement Robert Habimana
 
PDF
Data mining assignment 3
BarryK88
 
PPT
Survey on data mining techniques in heart disease prediction
Sivagowry Shathesh
 
PPT
Decision tree and random forest
Lippo Group Digital
 
PDF
Data mining assignment 1
BarryK88
 
PPTX
Pornography addiction and couples counseling
Earl Ledford, LCSW, CST, CET, CAP
 
PDF
Data mining seminar report
mayurik19
 
PPTX
Introduction to Addressing Sex and Pornography Addiction
Dr. DawnElise Snipes ★AllCEUs★ Unlimited Counselor Training
 
PDF
Data mining with weka
Hein Min Htike
 
PPTX
Inspiration Porn
Adele Arnette
 
Data Mining Final Presentation
krampert
 
Weka By Chathawee Luangmanotham 54102011144
Chathawee May
 
Data mining
Ahmed Moussa
 
Prospect Identification from a Credit Database using Regression, Decision Tre...
Akanksha Jain
 
Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...
Triangle American Marketing Association
 
Loan Processing System
tenlaclgt
 
Text classification with Weka
Milad Alshomary
 
Random forest
Ujjawal
 
Tutorial weka
René Rojas Castillo
 
Weka presentation cmt111
Clement Robert Habimana
 
Data mining assignment 3
BarryK88
 
Survey on data mining techniques in heart disease prediction
Sivagowry Shathesh
 
Decision tree and random forest
Lippo Group Digital
 
Data mining assignment 1
BarryK88
 
Pornography addiction and couples counseling
Earl Ledford, LCSW, CST, CET, CAP
 
Data mining seminar report
mayurik19
 
Introduction to Addressing Sex and Pornography Addiction
Dr. DawnElise Snipes ★AllCEUs★ Unlimited Counselor Training
 
Data mining with weka
Hein Min Htike
 
Inspiration Porn
Adele Arnette
 
Ad

Similar to weka data mining (20)

PDF
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
MLconf
 
PDF
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 
PPTX
02 Related Concepts
Valerii Klymchuk
 
PPTX
Competition16
Saurabh Vashist
 
PPTX
UNIT_5_Data Wrangling.pptx
BhagyasriPatel2
 
PPTX
Introduction to data visualization tools like Tableau and Power BI and Excel
Lipika Sharma
 
PPTX
04-Data-Analysis-Overview.pptx
Shree Shree
 
PPTX
FINAL REVIEW
samuelrajueda
 
PDF
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
PPTX
lecture1-220221114413Algorithims and data structures.pptx
smartashammari
 
PPTX
lecture1-2202211144eeeee24444444413.pptx
smartashammari
 
PPTX
Algorithms and Data Structures
sonykhan3
 
PDF
From Pipelines to Refineries: Scaling Big Data Applications
Databricks
 
PPTX
Evolving the Optimal Relevancy Ranking Model at Dice.com
Simon Hughes
 
PPT
Presentation
butest
 
PPTX
Rapid Miner
SrushtiSuvarna
 
PDF
H2O World - Ensembles with Erin LeDell
Sri Ambati
 
PPTX
classificaiton algorithm selection in automl
iyousafzai11
 
PPTX
ML SFCSE.pptx
NIKHILGR3
 
PDF
Nose Dive into Apache Spark ML
Ahmet Bulut
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
MLconf
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 
02 Related Concepts
Valerii Klymchuk
 
Competition16
Saurabh Vashist
 
UNIT_5_Data Wrangling.pptx
BhagyasriPatel2
 
Introduction to data visualization tools like Tableau and Power BI and Excel
Lipika Sharma
 
04-Data-Analysis-Overview.pptx
Shree Shree
 
FINAL REVIEW
samuelrajueda
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
lecture1-220221114413Algorithims and data structures.pptx
smartashammari
 
lecture1-2202211144eeeee24444444413.pptx
smartashammari
 
Algorithms and Data Structures
sonykhan3
 
From Pipelines to Refineries: Scaling Big Data Applications
Databricks
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Simon Hughes
 
Presentation
butest
 
Rapid Miner
SrushtiSuvarna
 
H2O World - Ensembles with Erin LeDell
Sri Ambati
 
classificaiton algorithm selection in automl
iyousafzai11
 
ML SFCSE.pptx
NIKHILGR3
 
Nose Dive into Apache Spark ML
Ahmet Bulut
 

Recently uploaded (20)

PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 

weka data mining

  • 1. K A LT H O O M A L M A Q B A L I & M AT T C O U RT N E Y WEKA WEB AND SOCIAL COMPUTING
  • 2. OUTLINES • Weka . • REPTree results and compare it with different trees. • REPTree analysis. • JRIP results and compare it with different rules . • JRIP rule analysis. • Conclusion . • Question & Answers .
  • 3. WEKA  A collection of machine learning algorithm for data mining tasks.  It is also well-suited for developing new machine learning schemes.  The algorithms can either be applied directly to a dataset or called from your own Java code.
  • 4. WEKA  Weka contains tools for : • Data pre-processing: called “filters” ,for discretization, normalization, resampling, attribute selection, transformation and combination of attributes. • Data classification: for predicting nominal or numeric quantities . • Data clustering : for finding groups of similar instances in a dataset. • Data association : an implementation of the Apriori algorithm for learning association rules. • Attribute Selection :searches through all possible combinations of attributes in the data and finds which subset of attributes works best for prediction • Data visualization :allows you to visualize a 2-D plot of the current working relation , it is useful in practice, which helps to determine difficulty of the learning problem.
  • 5. REDUCED ERROR PRUNING TREE (REPTREE) • Uses regression tree logic. • Splits data based on information gain or variance reduction. • Creates multiple trees, then selects best. • Prunes selected best with reduced error pruning. • Results in a “fast” learning algorithm.
  • 8. GRAPH - MEAN CORRECT CLASSIFICATIONS (%)
  • 9. GRAPH - RANGE CORRECT CLASSIFICATIONS (%)
  • 10. GRAPH - MEAN TIME (MS)
  • 13. WHAT DOES THIS MEAN?
  • 14. WHAT DOES THIS MEAN? • REPTree is a fast learning algorithm. • REPTree is 5x faster than J48 (average). • J48 results in 1.05x more correct classifications (average). • May be useful for initial passes of datasets to look for potential knowledge gain with good potential for success. More likely than J48 to do this in excellent time.
  • 15. JRIP RULE • Is a rule-based learner that builds a set of rules that identify the classes while Repeated Incremental Pruning to Produce Error Reduction. • This algorithm was designed by Cohen in 1995 namely • JRIP is especially more efficient on large noisy datasets . • There are two kinds of loop in JRip algorithm: outer loop which adds one rule at a time to the rule base and Inner loop adds one condition at a time to the current rule.
  • 16.
  • 20. • What I found out after applied JRIP and compared it with other rules results ?
  • 21. WHAT I FOUND ? • JRIP rule is a fast learning algorithm. • JRIP got good correct range in both mean and range time )compare to other rules). • JRIP is not the highest , not the lowest ( average ). • Do I recommend to use JRIP rule ?
  • 22. Q& A