SlideShare a Scribd company logo
Submitted To:
Dr. Neelima Gupta
Dr. Sandhya Aneja
Dr. Vipin Kumar
Submitted By:
Saurabh Kumar Chaudhary
M.Sc. (Computer Science)
Roll No. 20829
Message passing interface(MPI)
• MPI is an industrial standard that specifies library routines needed for writing message passing
programs.
• MPI allows the development of scalable portable message passing programs
• MPI uses a library approach to support parallel programming
• MPI programs are compiled with regular compiler (e.g. gcc) and linked with an MPI library
• No shared memory
BASIC MPI ROUTINES
• MPI_INIT : INITIATE AN MPI COMPUTATION.
• MPI_FINALIZE : TERMINATE A COMPUTATION.
• MPI_COMM_SIZE : DETERMINE NUMBER OF PROCESSES.
• MPI_COMM_RANK : DETERMINE MY PROCESS IDENTIFIER.
• MPI_SEND : SEND A MESSAGE
• MPI_RECV : RECEIVE A MESSAGE.
PROBLEMS WITH SERIAL PROGRAMMING
• MOORE'S LAW AND ITS LIMITS
• CHIP PERFORMANCE DOUBLES EVERY 18-24 MONTHS
• LIMITS OF SERIAL COMPUTING
• HEATING ISSUES
• LIMIT TO TRANSMISSIONS SPEEDS.
WHY MPI ?
• Thus need development of explicit parallel algorithms that are based on a fundamental
understanding of the parallelism inherent in a problem, and exploiting that parallelism with
minimum interaction/communication between the parallel parts
• MPI provides a powerful, efficient, and portable way to express parallel programs
• MPI was explicitly designed to enable libraries…
• Which may eliminate the need for many users to learn (much of) MPI
MACHINE LEARNING
• Field of study that gives computers the ability to learn without being explicitly programmed.
• Machine learning explores the study and construction of algorithms/model that can learn from
and make predictions on data.
• A computer program is said to learn from experience E with respect to some class of tasks t and
Performance measure P if its performance at tasks in T, as measured by P, improves with
experience
E .
CLASSIFICATION OF MACHINE LEARNING
•Supervised learning
•Classification
•Regression
•Unsupervised leaning
•Clustering
K-NN : K NEAREST NEIGHBOR SEARCH ALGORITHM
• Searches the feature space for the k training instances that are closest
to the unknown instance (test tuple)
• Closeness/similarity : in terms of distance metric.
PROBLEMS WITH SERIAL K-NN:
• Time complexity
• Sensitive to the local structure of the data
• Curse of dimensionality.
SOLUTION:
• Use parallel approach to search K-NN of the query instance.
WORK DONE
• CNN (CONDENSED NEAREST NEIGHBOR) RULE.
• SNN(SELECTIVE NEAREST NEIGHBOR) RULE.
• RNN(REDUCED NEAREST NEIGHBOR )RULE.
• Repeated ENN(EDITED NEAREST NEIGHBOR )RULE.
• IB2 & IB3 (INSTANCE BASED) ALGORITHM.
OUR PROPOSED APPROACH
•Pre-processing step
• Perform clustering process on training set to divide it into p mutually exclusive partition
{P1,P2,…,Pp}, where p is number of process .
• Create the Representative Instance to represent each partition
STEP-II
For i=1 to p
• Apply k-means approach
• Evaluate nearest neighbor similarity with representative instance(centroid) of each partition.
• Perform
• Competence Enhancement – Repeated Wilson Editing Rule (noise removal)
• Competence Preservation (removal of superfluous instance)
Update the centroid of the cluster.
• Repeat step I & II until number of instances in the selected one partition >=k.
STEP-III
•Take a test instance .
•Select the partition whose R.I is closest to test instance.
•Apply the majority rule.
•Select the class label who has majority for the test instance.
UPDATION OF TRAINING SET
• When the similarity value of the new test instance with the R.I. of the different partition exceeds
the max radius value which we store during the pre-processing step.
• Update the R.I. of that partition only, which is closed to the new test instance.
RESULT AND CONCLUSIONS
TIME COMPLEXITY
(Single Time Investment)
In each iteration
• I-STEP (PRE PROCESSING STEP) : O(nd/p)
• II-STEP : O(ndk/p)
• IN THE LAST ITERATION
III-STEP : o(kd)
FUTURE WORK
•Selection of initial centroid in clustering in order to get speed up.
•Selection of k
•Number of processor
THANK YOU…

More Related Content

What's hot (20)

PPTX
Update on Benchmark 7
Daniel Wheeler
 
PDF
Optimization of graph storage using GoFFish
Anushree Prasanna Kumar
 
PPTX
Performance analysis(Time & Space Complexity)
swapnac12
 
ODP
Parallel Programming on the ANDC cluster
Sudhang Shankar
 
PDF
Optimizing an Earth Science Atmospheric Application with the OmpSs Programmin...
George Markomanolis
 
PPTX
Selection K in K-means Clustering
Junghoon Kim
 
PDF
VaMoS 2022 - Transfer Learning across Distinct Software Systems
Luc Lesoil
 
PPT
Chap6 slides
BaliThorat1
 
PPTX
Code optimization
Pradip Bhattarai
 
PPT
Applying Reinforcement Learning for Network Routing
butest
 
PDF
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
Pooyan Jamshidi
 
PPTX
Compiler Design
sweetysweety8
 
PPTX
Dynamically updated parallel k-NN
saurabh Kumar Chaudhary
 
PPT
Chap5 slides
BaliThorat1
 
PPT
Amortized analysis
Dr Shashikant Athawale
 
PPTX
Multi layered perceptron (mlp)
Handson System
 
PPTX
Graph Matching Unsupervised Domain Adaptation
Debasmit Das
 
PPT
Parallel computing chapter 3
Md. Mahedi Mahfuj
 
PPT
Chap4 slides
BaliThorat1
 
PPTX
Analysis of algorithn class 2
Kumar
 
Update on Benchmark 7
Daniel Wheeler
 
Optimization of graph storage using GoFFish
Anushree Prasanna Kumar
 
Performance analysis(Time & Space Complexity)
swapnac12
 
Parallel Programming on the ANDC cluster
Sudhang Shankar
 
Optimizing an Earth Science Atmospheric Application with the OmpSs Programmin...
George Markomanolis
 
Selection K in K-means Clustering
Junghoon Kim
 
VaMoS 2022 - Transfer Learning across Distinct Software Systems
Luc Lesoil
 
Chap6 slides
BaliThorat1
 
Code optimization
Pradip Bhattarai
 
Applying Reinforcement Learning for Network Routing
butest
 
Transfer Learning for Performance Analysis of Configurable Systems: A Causal ...
Pooyan Jamshidi
 
Compiler Design
sweetysweety8
 
Dynamically updated parallel k-NN
saurabh Kumar Chaudhary
 
Chap5 slides
BaliThorat1
 
Amortized analysis
Dr Shashikant Athawale
 
Multi layered perceptron (mlp)
Handson System
 
Graph Matching Unsupervised Domain Adaptation
Debasmit Das
 
Parallel computing chapter 3
Md. Mahedi Mahfuj
 
Chap4 slides
BaliThorat1
 
Analysis of algorithn class 2
Kumar
 

Viewers also liked (20)

PPT
www1.cs.columbia.edu
butest
 
PDF
Machine learning algorithm for classification of activity of daily life’s
Siddharth Chakravarty
 
PPTX
MS SQL SERVER: Microsoft naive bayes algorithm
DataminingTools Inc
 
PPT
MSShin-Machine_Learning_Algorithm_in_Period_Estimation.ppt
butest
 
DOCX
Pattern Recognition #1 - Gulraj
Muhammad GulRaj
 
PPTX
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Pawandeep Kaur
 
PPTX
Feature Selection in Machine Learning
Upekha Vandebona
 
PDF
Elastic search adaptto2014
Vivek Sachdeva
 
PPTX
Classification with Naive Bayes
Josh Patterson
 
PPTX
Sentiment analysis using naive bayes classifier
Dev Sahu
 
PPTX
Tweets Classification using Naive Bayes and SVM
Trilok Sharma
 
PPTX
Support Vector Machine
Shao-Chuan Wang
 
PPTX
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Parinda Rajapaksha
 
PDF
Iris data analysis example in R
Duyen Do
 
PPT
k Nearest Neighbor
butest
 
PPTX
Naive bayes
Ashraf Uddin
 
PDF
Support Vector Machines for Classification
Prakash Pimpale
 
PPTX
Neural network & its applications
Ahmed_hashmi
 
PPTX
An Introduction to Elastic Search.
Jurriaan Persyn
 
PDF
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 
www1.cs.columbia.edu
butest
 
Machine learning algorithm for classification of activity of daily life’s
Siddharth Chakravarty
 
MS SQL SERVER: Microsoft naive bayes algorithm
DataminingTools Inc
 
MSShin-Machine_Learning_Algorithm_in_Period_Estimation.ppt
butest
 
Pattern Recognition #1 - Gulraj
Muhammad GulRaj
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Pawandeep Kaur
 
Feature Selection in Machine Learning
Upekha Vandebona
 
Elastic search adaptto2014
Vivek Sachdeva
 
Classification with Naive Bayes
Josh Patterson
 
Sentiment analysis using naive bayes classifier
Dev Sahu
 
Tweets Classification using Naive Bayes and SVM
Trilok Sharma
 
Support Vector Machine
Shao-Chuan Wang
 
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Parinda Rajapaksha
 
Iris data analysis example in R
Duyen Do
 
k Nearest Neighbor
butest
 
Naive bayes
Ashraf Uddin
 
Support Vector Machines for Classification
Prakash Pimpale
 
Neural network & its applications
Ahmed_hashmi
 
An Introduction to Elastic Search.
Jurriaan Persyn
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 
Ad

Similar to Design & implementation of machine learning algorithm in (2) (20)

PPTX
Super COMPUTING Journal
Pandey_G
 
PDF
A Survey of Machine Learning Methods Applied to Computer ...
butest
 
PPTX
Oxford 05-oct-2012
Ted Dunning
 
PPTX
Parallel K means clustering using CUDA
prithan
 
PPT
Parallel Computing 2007: Bring your own parallel application
Geoffrey Fox
 
PPT
Presentation
butest
 
PDF
Parallel Programming Slide - Michael J.Quinn
vinhtt599
 
PDF
Machine learning
Andrea Iacono
 
PPTX
Nearest Neighbor Customer Insight
MapR Technologies
 
PPT
sequenckjkojkjhguignmpojihiubgijnkompoje.ppt
JITENDER773791
 
PPT
sequencea.ppt
olusolaogunyewo1
 
PPT
sequf;lds,g;'dsg;dlld'g;;gldgence - Copy.ppt
JITENDER773791
 
PDF
Move Message Passing Interface Applications to the Next Level
Intel® Software
 
PDF
J&J Thesis Presentation July 2016
Michalis Avgoulis
 
PDF
Scaling algebraic multigrid to over 287K processors
Markus Blatt
 
PDF
Implementation of p pic algorithm in map reduce to handle big data
eSAT Publishing House
 
PPTX
Dwd mdatamining intro-iep
Ashish Kumar Thakur
 
PPTX
Instance Based Learning in machine learning
tanishqgujari
 
PPT
Dwdm ppt for the btech student contain basis
nivatripathy93
 
PDF
A report on designing a model for improving CPU Scheduling by using Machine L...
MuskanRath1
 
Super COMPUTING Journal
Pandey_G
 
A Survey of Machine Learning Methods Applied to Computer ...
butest
 
Oxford 05-oct-2012
Ted Dunning
 
Parallel K means clustering using CUDA
prithan
 
Parallel Computing 2007: Bring your own parallel application
Geoffrey Fox
 
Presentation
butest
 
Parallel Programming Slide - Michael J.Quinn
vinhtt599
 
Machine learning
Andrea Iacono
 
Nearest Neighbor Customer Insight
MapR Technologies
 
sequenckjkojkjhguignmpojihiubgijnkompoje.ppt
JITENDER773791
 
sequencea.ppt
olusolaogunyewo1
 
sequf;lds,g;'dsg;dlld'g;;gldgence - Copy.ppt
JITENDER773791
 
Move Message Passing Interface Applications to the Next Level
Intel® Software
 
J&J Thesis Presentation July 2016
Michalis Avgoulis
 
Scaling algebraic multigrid to over 287K processors
Markus Blatt
 
Implementation of p pic algorithm in map reduce to handle big data
eSAT Publishing House
 
Dwd mdatamining intro-iep
Ashish Kumar Thakur
 
Instance Based Learning in machine learning
tanishqgujari
 
Dwdm ppt for the btech student contain basis
nivatripathy93
 
A report on designing a model for improving CPU Scheduling by using Machine L...
MuskanRath1
 
Ad

Design & implementation of machine learning algorithm in (2)

  • 1. Submitted To: Dr. Neelima Gupta Dr. Sandhya Aneja Dr. Vipin Kumar Submitted By: Saurabh Kumar Chaudhary M.Sc. (Computer Science) Roll No. 20829
  • 2. Message passing interface(MPI) • MPI is an industrial standard that specifies library routines needed for writing message passing programs. • MPI allows the development of scalable portable message passing programs • MPI uses a library approach to support parallel programming • MPI programs are compiled with regular compiler (e.g. gcc) and linked with an MPI library • No shared memory
  • 3. BASIC MPI ROUTINES • MPI_INIT : INITIATE AN MPI COMPUTATION. • MPI_FINALIZE : TERMINATE A COMPUTATION. • MPI_COMM_SIZE : DETERMINE NUMBER OF PROCESSES. • MPI_COMM_RANK : DETERMINE MY PROCESS IDENTIFIER. • MPI_SEND : SEND A MESSAGE • MPI_RECV : RECEIVE A MESSAGE.
  • 4. PROBLEMS WITH SERIAL PROGRAMMING • MOORE'S LAW AND ITS LIMITS • CHIP PERFORMANCE DOUBLES EVERY 18-24 MONTHS • LIMITS OF SERIAL COMPUTING • HEATING ISSUES • LIMIT TO TRANSMISSIONS SPEEDS.
  • 5. WHY MPI ? • Thus need development of explicit parallel algorithms that are based on a fundamental understanding of the parallelism inherent in a problem, and exploiting that parallelism with minimum interaction/communication between the parallel parts • MPI provides a powerful, efficient, and portable way to express parallel programs • MPI was explicitly designed to enable libraries… • Which may eliminate the need for many users to learn (much of) MPI
  • 6. MACHINE LEARNING • Field of study that gives computers the ability to learn without being explicitly programmed. • Machine learning explores the study and construction of algorithms/model that can learn from and make predictions on data. • A computer program is said to learn from experience E with respect to some class of tasks t and Performance measure P if its performance at tasks in T, as measured by P, improves with experience E .
  • 7. CLASSIFICATION OF MACHINE LEARNING •Supervised learning •Classification •Regression •Unsupervised leaning •Clustering
  • 8. K-NN : K NEAREST NEIGHBOR SEARCH ALGORITHM • Searches the feature space for the k training instances that are closest to the unknown instance (test tuple) • Closeness/similarity : in terms of distance metric.
  • 9. PROBLEMS WITH SERIAL K-NN: • Time complexity • Sensitive to the local structure of the data • Curse of dimensionality.
  • 10. SOLUTION: • Use parallel approach to search K-NN of the query instance.
  • 11. WORK DONE • CNN (CONDENSED NEAREST NEIGHBOR) RULE. • SNN(SELECTIVE NEAREST NEIGHBOR) RULE. • RNN(REDUCED NEAREST NEIGHBOR )RULE. • Repeated ENN(EDITED NEAREST NEIGHBOR )RULE. • IB2 & IB3 (INSTANCE BASED) ALGORITHM.
  • 12. OUR PROPOSED APPROACH •Pre-processing step • Perform clustering process on training set to divide it into p mutually exclusive partition {P1,P2,…,Pp}, where p is number of process . • Create the Representative Instance to represent each partition
  • 13. STEP-II For i=1 to p • Apply k-means approach • Evaluate nearest neighbor similarity with representative instance(centroid) of each partition. • Perform • Competence Enhancement – Repeated Wilson Editing Rule (noise removal) • Competence Preservation (removal of superfluous instance) Update the centroid of the cluster. • Repeat step I & II until number of instances in the selected one partition >=k.
  • 14. STEP-III •Take a test instance . •Select the partition whose R.I is closest to test instance. •Apply the majority rule. •Select the class label who has majority for the test instance.
  • 15. UPDATION OF TRAINING SET • When the similarity value of the new test instance with the R.I. of the different partition exceeds the max radius value which we store during the pre-processing step. • Update the R.I. of that partition only, which is closed to the new test instance.
  • 17. TIME COMPLEXITY (Single Time Investment) In each iteration • I-STEP (PRE PROCESSING STEP) : O(nd/p) • II-STEP : O(ndk/p) • IN THE LAST ITERATION III-STEP : o(kd)
  • 18. FUTURE WORK •Selection of initial centroid in clustering in order to get speed up. •Selection of k •Number of processor