International Journal of Research in Advent Technology, Vol.3, No.12, December 2015
E-ISSN: 2321-9637
Available online at www.ijrat.org
23
A study on the Class Imbalance classification using
Fuzzy Total margin based Support vector Machine
S.Lavanya1
, Dr.S.Palaniswami2
, R.Premalatha3
Assistant Professor, Department of CSE, Anna University Regional Campus, Coimbatore, India1
Principal, Government college of Engineering, Bodinayakanur, Tamilnadu, India2
PG Scholar, Department of CSE, Anna University Regional Campus, Coimbatore, India3
Abstract-The classification of imbalanced data is a difficult challenge for machine learning with help parameter
selection and classification techniques. Employment of the traditional classifiers like SVM will leads to the
overfitting to the class. Fuzzy total margin based support vector machine (FTM-SVM) way to handle the class
imbalance learning (CIL) is analysed to identify the outliers in the feature vector, incorporates total margin
algorithm, different cost functions and the proper method of fuzzification of the penalty into FTM-SVM and
defines them in nonlinear case. Comparison with state-of-the-art data stream classification techniques creates the
effectiveness of the proposed approach. And we conclude the classification of imbalanced data is a analysed in
detail with machine learning principles, feature selection and classification techniques. Employment of the
traditional classifiers like SVM has leads to the overfitting to the class. Fuzzy total margin based support vector
machine (FTM-SVM) method to handle the class imbalance learning (CIL) is analysed to identify the outliers in
the feature vector, incorporates total margin algorithm, different cost functions and the proper approach of
fuzzification of the penalty into FTM-SVM and formulates them in nonlinear case. Comparison with state-of-
the-art data stream classification techniques establishes the effectiveness of the proposed approach.
Index Terms- outlier detection, Classification, Class Imbalance Data, Data Classification
1 INTRODUCTION
Class imbalance learning (CIL) is an emerging topic
that is attracting growing attention. It aims to tackle
the combined concern of online learning [1] and class
imbalance learning [2]. Different from incremental
learning that processes data in clutch, online learning
here means study from data examples “one-by-one”
without storing and reprocessing observed examples.
Class imbalance learning handles a type of
classification problems where some classes of data are
densely under represented in relation to other classes.
With these above two problems, connected class
imbalance learning deals with data streams where data
appear progressively and the class distribution is
imbalanced. Although online learning and class
imbalance learning have been well learned in the
literature individually, the combined problem has not
been discussed much. When both problems of online
learning and class imbalance exist, new challenges and
interesting research questions arise, with regards to the
assumed accuracy on the minority class and adaptively
to dynamic environments. The difficulty of learning
from imbalanced data is caused due to or absolutely
underrepresented class that cannot draw equal
consideration to the learning algorithm related to the
majority class. It often leads to very specific
classification rules or lost rules for the minority class
without much observation ability for future prediction.
Total margin-based adaptive fuzzy support vector
machines (TAF-SVM) [4] and fuzzy support vector
machines for class imbalance learning (FSVM-CIL)
[3] are emerging. TAF-SVM not only mitigate the
overfitting problem influenced by the outliers and
noise with the approach of fuzzification of the
damages, but also corrects the change of the optimal
separating hyperplane(OSH) owing to the imbalanced
data sets by using different cost functions. The total
margin algorithm, fuzzy membership functions and
different cost functions are attached in the traditional
SVM so that the TAF-SVM is reformulated in both
linear and nonlinear cases. In the FSVM-CIL, fuzzy
participation values for training examples are
authorized to reduce the effect of both the problems of
CIL and the disputes of outliers and noise under the
principle of cost sensitive learning. FSVM-CIL can be
used to handle CIL problem in the existence of
outliers and noise.. The proposed method incorporates
total margin algorithm, different cost functions and the
fitting approach of fuzzification of the penalty. FTM-
SVM introduces variety of cost functions and the
applicable fuzzy membership functions. Therefore,
FTM-SVM can be used to mitigate the problem of
CIL better than some existing CIL learning methods.
FTM-SVM introduces fuzzy membership functions
with strong methods, this is not sensitive to outliers
International Journal of Research in Advent Technology, Vol.3, No.12, December 2015
E-ISSN: 2321-9637
Available online at www.ijrat.org
24
and noise and can compromise with this overfitting
problem when the data sets contain some outliers and
noise examples. FTM-SVM has good universality
ability because it introduces the total margin algorithm
to replace traditional soft margin algorithm.So we
considered six forms of fuzzy-membership values and
got six FTM-SVM settings. We calculated the FTM-
SVM method on two artificial data sets and
imbalanced data sets and compared its performance
with existing CIL methods. Experimental results show
that the proposed FTM-SVM method has higher F
Measure and precision and recall values than some
existing CIL methods. The Rest of the paper is
organized as follows In Section 2, some related work
is reviewed, In section 3, Outline of the Work is
analysed. Section 4 Provides review of literatures. In
Section 5, paper is concluded.
2. RELATED WORKS
2.1. Diversity in Classification Ensembles
Diversity of ensembles has been a hot
topic during the past few years. It is frequently agreed
that the success of ensemble is applied to diversity, the
degree of disagreement within an ensemble. In the
regression text, it has before been quantified and
measured explicitly in terms of the correlation
between individual learners. In the classification
context, it is loosely described as “making errors on
different examples”.
2. 2. Ensemble Methods
Ensemble learning methods have become a
major category of solutions for class imbalance
learning, due to their flexibility and ability to improve
generalization. First, an ensemble method is applicable
to most classification algorithms. Second, it’s easy to
fix with resampling techniques. Third, combining
multiple classifiers is able to reduce the error
bias/variance [5]. These attractive features lead to a
variety of ensemble methods proposed to handle
imbalanced data sets from the data and algorithm
levels.
2.3. Synthetic Minority Oversampling Technique
(SMOTE)
The algorithm which occupies the minority class
feature space vitally placing synthetic examples on the
line segment connecting two minority instances.
SMOTE has been shown to improve the classification
accuracy on the minority class over other standard
approaches.
OUTLINE OF THE SURVAY
2.4 Data labelling and Preprocessing
Slack variables are popularized. Slack variables can be
delibrated to be treated as counts of outliers and noise
in the data sets because they can grant the margin
constraints to be destroyed and can cause each training
points to have smaller or even negative margin. Thus,
the method that takes account of the slack variables is
applied to handle the data that have outliers and noise.
The noise data pre-processed using the stop word
removal and stemming process, in order to reduce data
size prior to clustering and Classification..
2.5 Feature Selection
The target of feature selection, in general,
is to select a subset of features that allows a classifier
to reach optimal performance, where j is a user-
specified parameter. The curse of dimensionality tells
us that if many of the characteristics are noisy, and the
cost of using a classifier can be very high, and the
performance may be severely hindered, so feature
selection algorithm traced from mutual information
measure and the entropy computation is characterized
using the mutual information measure for
identification of a suitable feature subset with less
redundancy.
2.6.Data Classification using SupportVector
Machine and Fuzzy Total Margin
We analyze sophisticated SVM; the maximal
margin is an important concept. In order to optimize
the maximal margin, by maximizing the minimum
distance between support vectors and the separating
hyper plane, Optimal Separating Hyper plane with
OSH is seen with maximal mar-gin of separation.
However, only few support vectors could cause the
loss of information because most information is
contained in the majority of the pre-processed training
set. Therefore, it can improve the generalization error
bound. The surplus variables measure the distance in
between the correctly classified data points and the
hyper plane respectively. In addition to minimizing the
sum of slack variables during maximizing the margin
of separation proposed by soft margin algorithm,
Fuzzy membership values is used as training examples
are assigned to reduce the effect of both the problems
of CIL and the problems of outliers and noise under
the principle based on cost sensitive learning
3.Equations
The FSVM method assigns different fuzzy
membership values or weights, for different examples
to reflect different importance in their own classes.In
the proposed FTM-SVM method,we defined these
membership functions as follows:
where f(xi) generates a value between 0 and 1, and
f(xi) reflectsthe importance of xiin its own class. We
assigned r+= 1 and r−= r,where r is the minority-to-
majority class ratio. Therefore, according to this
assignment of values, a positive-class example can
takea membership value in the [0, 1] interval,
negative-classexample can take a membership value in
the [0, r], where r < 1.
International Journal of Research in Advent Technology, Vol.3, No.12, December 2015
E-ISSN: 2321-9637
Available online at www.ijrat.org
25
4.1 Flow Diagram:
4.2 TABLE
Radial basis function
UCI ML Repository Dataset
Collection
Pre-Processing / Noise Removal
Parameter Selection using kernel spaces
SVM
Data Point ExtractionError boundary Extraction
Fuzzy Total margin
detection
Data Classification
International Journal of Research in Advent Technology, Vol.3, No.12, December 2015
E-ISSN: 2321-9637
Available online at www.ijrat.org
26
4.3 TABLE
REFERENCE
[1] L. L. Minku, “Online ensemble learning in the
presence of concept drifts,” Ph.D. dissertation,
School Comput. Sci., Univ. Birmingham,
Birmingham, U.K., 2010.
[2] H. He and E. A. Garcia, “Learning from
imbalanced data,” IEEE Trans. Knowl. Data Eng.,
vol. 21, no. 9, pp. 1263–1284, Sep. 2009.
[3] R. Batuwita, V. Palade, FSVM-CIL: fuzzy support
vector machines for classimbalance learning,
IEEE Trans. Fuzzy Syst. 18 (June (3)) (2010)
558–571.
[4] Y. Liu, Y. Chen, Face recognition using total
margin-based adaptive fuzzysupport vector
machines, IEEE Trans. Neural Netw. 18 (January
(1)) (2007)178–192.
[5]. Y. Sun, M.S. Kamel, A.K. Wong, and Y. Wang,
“Cost-Sensitive Boosting for Classification of
Imbalanced Data,” Pattern Recognition, vol. 40,
no. 12, pp. 3358-3378, 2007. [5] H. Xue, S. Chen,
Q. Yang, Structural regularized support vector
machine: aframework for structural large margin
classifier, IEEE Trans. Neural Netw. 22(April (4))
(2011) 573–587.
[6] R. Batuwita, V. Palade, FSVM-CIL: fuzzy support
vector machines for classimbalance learning,
IEEE Trans. Fuzzy Syst. 18 (June (3)) (2010)
558–571.
[7]X.-Y. Liu, J. Wu, Z.-H. Zhou, Exploratory
undersampling for class-imbalancelearning, IEEE
Trans Syst. Man Cybern. B: Cybern. 39 (2) (April
2009) 539–550.
[8] R. Batuwita, V. Palade, Efficient resampling
methods for training support vectormachines with
imbalanced datasets, in: Proc. of IJCNN 2010,
2010 IEEE World Congress on Comp.
Intelligence-WCCI 2010, Barcelona-Spain, July
2010.

More Related Content

PDF
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
PDF
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
PDF
Tuition Reduction Determination Using Fuzzy Tsukamoto
PDF
Implementation of miml framework using annotated
PDF
MCGDM with AHP based on Adaptive interval Value Fuzzy
PDF
Automatic face naming by learning discriminative affinity matrices from weakl...
PDF
Study on a Hybrid Segmentation Approach for Handwritten Numeral Strings in Fo...
PDF
A hybrid constructive algorithm incorporating teaching-learning based optimiz...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
Tuition Reduction Determination Using Fuzzy Tsukamoto
Implementation of miml framework using annotated
MCGDM with AHP based on Adaptive interval Value Fuzzy
Automatic face naming by learning discriminative affinity matrices from weakl...
Study on a Hybrid Segmentation Approach for Handwritten Numeral Strings in Fo...
A hybrid constructive algorithm incorporating teaching-learning based optimiz...

What's hot (16)

PDF
D05222528
PDF
Multi label text classification
PDF
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
DOCX
Automatic face naming by learning discriminative
DOCX
AUTOMATIC FACE NAMING BY LEARNING DISCRIMINATIVE AFFINITY MATRICES FROM WEAKL...
PDF
A review-miml-framework-and-image-annotation
DOCX
Automatic face naming by learning discriminative affinity matrices from weakl...
PDF
C1802041824
PDF
PDF
Ijetcas14 327
PDF
A Novel Learning Formulation in a unified Min-Max Framework for Computer Aide...
PDF
An ann approach for network
PDF
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
PDF
F017533540
PDF
MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Do...
PDF
Filtering of Frequency Components for Privacy Preserving Facial Recognition
D05222528
Multi label text classification
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
Automatic face naming by learning discriminative
AUTOMATIC FACE NAMING BY LEARNING DISCRIMINATIVE AFFINITY MATRICES FROM WEAKL...
A review-miml-framework-and-image-annotation
Automatic face naming by learning discriminative affinity matrices from weakl...
C1802041824
Ijetcas14 327
A Novel Learning Formulation in a unified Min-Max Framework for Computer Aide...
An ann approach for network
Cordon review-mamdani-gf ss-ijar-52-6-2011-pp894-913
F017533540
MIM (Mobile Instant Messaging) Classification using Term Frequency-Inverse Do...
Filtering of Frequency Components for Privacy Preserving Facial Recognition
Ad

Viewers also liked (20)

PDF
Diseño instruccional para la producción de cursos en línea y el e learning
PDF
Paper id 21201425
PDF
Paper id 28201443
PDF
Paper id 27201415
PDF
Paper id 42201625
PDF
Paper id 26201415
PDF
Paper id 2120142
PDF
Paper id 2120147
PDF
Paper id 2820141
PDF
Paper id 28201454
PDF
Paper id 27201420
PDF
Paper id 21201414
PDF
Paper id 21201418
PDF
Paper id 28201419
PDF
Paper id 2720146
PDF
Paper id 26201482
PDF
Paper id 41201609
PDF
Paper id 27201442
PDF
Paper id 27201418
PDF
Paper id 21201493
Diseño instruccional para la producción de cursos en línea y el e learning
Paper id 21201425
Paper id 28201443
Paper id 27201415
Paper id 42201625
Paper id 26201415
Paper id 2120142
Paper id 2120147
Paper id 2820141
Paper id 28201454
Paper id 27201420
Paper id 21201414
Paper id 21201418
Paper id 28201419
Paper id 2720146
Paper id 26201482
Paper id 41201609
Paper id 27201442
Paper id 27201418
Paper id 21201493
Ad

Similar to Paper id 312201512 (20)

PDF
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
PDF
Multi-Cluster Based Approach for skewed Data in Data Mining
PDF
Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...
PDF
IRJET- Predicting Customers Churn in Telecom Industry using Centroid Oversamp...
PPTX
AIML UNIT 4.pptx. IT contains syllabus and full subject
PDF
Iaetsd an enhanced feature selection for
PDF
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
PDF
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
PDF
MACHINE LEARNING TOOLBOX
PDF
A class skew-insensitive ACO-based decision tree algorithm for imbalanced dat...
PDF
Variable and feature selection
PDF
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
PDF
A survey of modified support vector machine using particle of swarm optimizat...
PDF
11.hybrid ga svm for efficient feature selection in e-mail classification
PDF
Hybrid ga svm for efficient feature selection in e-mail classification
PDF
D0931621
PPTX
CST413 KTU S7 CSE Machine Learning Classification Assessment Confusion matrix...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
Multi-Cluster Based Approach for skewed Data in Data Mining
Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...
IRJET- Predicting Customers Churn in Telecom Industry using Centroid Oversamp...
AIML UNIT 4.pptx. IT contains syllabus and full subject
Iaetsd an enhanced feature selection for
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
MACHINE LEARNING TOOLBOX
A class skew-insensitive ACO-based decision tree algorithm for imbalanced dat...
Variable and feature selection
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A survey of modified support vector machine using particle of swarm optimizat...
11.hybrid ga svm for efficient feature selection in e-mail classification
Hybrid ga svm for efficient feature selection in e-mail classification
D0931621
CST413 KTU S7 CSE Machine Learning Classification Assessment Confusion matrix...

More from IJRAT (20)

PDF
96202108
PDF
97202107
PDF
93202101
PDF
92202102
PDF
91202104
PDF
87202003
PDF
87202001
PDF
86202013
PDF
86202008
PDF
86202005
PDF
86202004
PDF
85202026
PDF
711201940
PDF
711201939
PDF
711201935
PDF
711201927
PDF
711201905
PDF
710201947
PDF
712201907
PDF
712201903
96202108
97202107
93202101
92202102
91202104
87202003
87202001
86202013
86202008
86202005
86202004
85202026
711201940
711201939
711201935
711201927
711201905
710201947
712201907
712201903

Recently uploaded (20)

PDF
Artificial Intelligence_ Basics .Artificial Intelligence_ Basics .
PPTX
AI-Reporting for Emerging Technologies(BS Computer Engineering)
PPTX
Agentic Artificial Intelligence (Agentic AI).pptx
PPTX
MAD Unit - 3 User Interface and Data Management (Diploma IT)
PPT
Programmable Logic Controller PLC and Industrial Automation
PDF
Lesson 3 .pdf
PPTX
SE unit 1.pptx by d.y.p.akurdi aaaaaaaaaaaa
PDF
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS
DOCX
ENVIRONMENTAL PROTECTION AND MANAGEMENT (18CVL756)
PPTX
Environmental studies, Moudle 3-Environmental Pollution.pptx
PDF
Beginners-Guide-to-Artificial-Intelligence.pdf
PPTX
chapter 1.pptx dotnet technology introduction
PDF
Engineering Solutions for Ethical Dilemmas in Healthcare (www.kiu.ac.ug)
PDF
AIGA 012_04 Cleaning of equipment for oxygen service_reformat Jan 12.pdf
DOCX
An investigation of the use of recycled crumb rubber as a partial replacement...
PDF
VSL-Strand-Post-tensioning-Systems-Technical-Catalogue_2019-01.pdf
PPTX
DATA STRCUTURE LABORATORY -BCSL305(PRG1)
PPT
UNIT-I Machine Learning Essentials for 2nd years
PPTX
Real Estate Management PART 1.pptxFFFFFFFFFFFFF
PDF
Cryptography and Network Security-Module-I.pdf
Artificial Intelligence_ Basics .Artificial Intelligence_ Basics .
AI-Reporting for Emerging Technologies(BS Computer Engineering)
Agentic Artificial Intelligence (Agentic AI).pptx
MAD Unit - 3 User Interface and Data Management (Diploma IT)
Programmable Logic Controller PLC and Industrial Automation
Lesson 3 .pdf
SE unit 1.pptx by d.y.p.akurdi aaaaaaaaaaaa
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS
ENVIRONMENTAL PROTECTION AND MANAGEMENT (18CVL756)
Environmental studies, Moudle 3-Environmental Pollution.pptx
Beginners-Guide-to-Artificial-Intelligence.pdf
chapter 1.pptx dotnet technology introduction
Engineering Solutions for Ethical Dilemmas in Healthcare (www.kiu.ac.ug)
AIGA 012_04 Cleaning of equipment for oxygen service_reformat Jan 12.pdf
An investigation of the use of recycled crumb rubber as a partial replacement...
VSL-Strand-Post-tensioning-Systems-Technical-Catalogue_2019-01.pdf
DATA STRCUTURE LABORATORY -BCSL305(PRG1)
UNIT-I Machine Learning Essentials for 2nd years
Real Estate Management PART 1.pptxFFFFFFFFFFFFF
Cryptography and Network Security-Module-I.pdf

Paper id 312201512

  • 1. International Journal of Research in Advent Technology, Vol.3, No.12, December 2015 E-ISSN: 2321-9637 Available online at www.ijrat.org 23 A study on the Class Imbalance classification using Fuzzy Total margin based Support vector Machine S.Lavanya1 , Dr.S.Palaniswami2 , R.Premalatha3 Assistant Professor, Department of CSE, Anna University Regional Campus, Coimbatore, India1 Principal, Government college of Engineering, Bodinayakanur, Tamilnadu, India2 PG Scholar, Department of CSE, Anna University Regional Campus, Coimbatore, India3 Abstract-The classification of imbalanced data is a difficult challenge for machine learning with help parameter selection and classification techniques. Employment of the traditional classifiers like SVM will leads to the overfitting to the class. Fuzzy total margin based support vector machine (FTM-SVM) way to handle the class imbalance learning (CIL) is analysed to identify the outliers in the feature vector, incorporates total margin algorithm, different cost functions and the proper method of fuzzification of the penalty into FTM-SVM and defines them in nonlinear case. Comparison with state-of-the-art data stream classification techniques creates the effectiveness of the proposed approach. And we conclude the classification of imbalanced data is a analysed in detail with machine learning principles, feature selection and classification techniques. Employment of the traditional classifiers like SVM has leads to the overfitting to the class. Fuzzy total margin based support vector machine (FTM-SVM) method to handle the class imbalance learning (CIL) is analysed to identify the outliers in the feature vector, incorporates total margin algorithm, different cost functions and the proper approach of fuzzification of the penalty into FTM-SVM and formulates them in nonlinear case. Comparison with state-of- the-art data stream classification techniques establishes the effectiveness of the proposed approach. Index Terms- outlier detection, Classification, Class Imbalance Data, Data Classification 1 INTRODUCTION Class imbalance learning (CIL) is an emerging topic that is attracting growing attention. It aims to tackle the combined concern of online learning [1] and class imbalance learning [2]. Different from incremental learning that processes data in clutch, online learning here means study from data examples “one-by-one” without storing and reprocessing observed examples. Class imbalance learning handles a type of classification problems where some classes of data are densely under represented in relation to other classes. With these above two problems, connected class imbalance learning deals with data streams where data appear progressively and the class distribution is imbalanced. Although online learning and class imbalance learning have been well learned in the literature individually, the combined problem has not been discussed much. When both problems of online learning and class imbalance exist, new challenges and interesting research questions arise, with regards to the assumed accuracy on the minority class and adaptively to dynamic environments. The difficulty of learning from imbalanced data is caused due to or absolutely underrepresented class that cannot draw equal consideration to the learning algorithm related to the majority class. It often leads to very specific classification rules or lost rules for the minority class without much observation ability for future prediction. Total margin-based adaptive fuzzy support vector machines (TAF-SVM) [4] and fuzzy support vector machines for class imbalance learning (FSVM-CIL) [3] are emerging. TAF-SVM not only mitigate the overfitting problem influenced by the outliers and noise with the approach of fuzzification of the damages, but also corrects the change of the optimal separating hyperplane(OSH) owing to the imbalanced data sets by using different cost functions. The total margin algorithm, fuzzy membership functions and different cost functions are attached in the traditional SVM so that the TAF-SVM is reformulated in both linear and nonlinear cases. In the FSVM-CIL, fuzzy participation values for training examples are authorized to reduce the effect of both the problems of CIL and the disputes of outliers and noise under the principle of cost sensitive learning. FSVM-CIL can be used to handle CIL problem in the existence of outliers and noise.. The proposed method incorporates total margin algorithm, different cost functions and the fitting approach of fuzzification of the penalty. FTM- SVM introduces variety of cost functions and the applicable fuzzy membership functions. Therefore, FTM-SVM can be used to mitigate the problem of CIL better than some existing CIL learning methods. FTM-SVM introduces fuzzy membership functions with strong methods, this is not sensitive to outliers
  • 2. International Journal of Research in Advent Technology, Vol.3, No.12, December 2015 E-ISSN: 2321-9637 Available online at www.ijrat.org 24 and noise and can compromise with this overfitting problem when the data sets contain some outliers and noise examples. FTM-SVM has good universality ability because it introduces the total margin algorithm to replace traditional soft margin algorithm.So we considered six forms of fuzzy-membership values and got six FTM-SVM settings. We calculated the FTM- SVM method on two artificial data sets and imbalanced data sets and compared its performance with existing CIL methods. Experimental results show that the proposed FTM-SVM method has higher F Measure and precision and recall values than some existing CIL methods. The Rest of the paper is organized as follows In Section 2, some related work is reviewed, In section 3, Outline of the Work is analysed. Section 4 Provides review of literatures. In Section 5, paper is concluded. 2. RELATED WORKS 2.1. Diversity in Classification Ensembles Diversity of ensembles has been a hot topic during the past few years. It is frequently agreed that the success of ensemble is applied to diversity, the degree of disagreement within an ensemble. In the regression text, it has before been quantified and measured explicitly in terms of the correlation between individual learners. In the classification context, it is loosely described as “making errors on different examples”. 2. 2. Ensemble Methods Ensemble learning methods have become a major category of solutions for class imbalance learning, due to their flexibility and ability to improve generalization. First, an ensemble method is applicable to most classification algorithms. Second, it’s easy to fix with resampling techniques. Third, combining multiple classifiers is able to reduce the error bias/variance [5]. These attractive features lead to a variety of ensemble methods proposed to handle imbalanced data sets from the data and algorithm levels. 2.3. Synthetic Minority Oversampling Technique (SMOTE) The algorithm which occupies the minority class feature space vitally placing synthetic examples on the line segment connecting two minority instances. SMOTE has been shown to improve the classification accuracy on the minority class over other standard approaches. OUTLINE OF THE SURVAY 2.4 Data labelling and Preprocessing Slack variables are popularized. Slack variables can be delibrated to be treated as counts of outliers and noise in the data sets because they can grant the margin constraints to be destroyed and can cause each training points to have smaller or even negative margin. Thus, the method that takes account of the slack variables is applied to handle the data that have outliers and noise. The noise data pre-processed using the stop word removal and stemming process, in order to reduce data size prior to clustering and Classification.. 2.5 Feature Selection The target of feature selection, in general, is to select a subset of features that allows a classifier to reach optimal performance, where j is a user- specified parameter. The curse of dimensionality tells us that if many of the characteristics are noisy, and the cost of using a classifier can be very high, and the performance may be severely hindered, so feature selection algorithm traced from mutual information measure and the entropy computation is characterized using the mutual information measure for identification of a suitable feature subset with less redundancy. 2.6.Data Classification using SupportVector Machine and Fuzzy Total Margin We analyze sophisticated SVM; the maximal margin is an important concept. In order to optimize the maximal margin, by maximizing the minimum distance between support vectors and the separating hyper plane, Optimal Separating Hyper plane with OSH is seen with maximal mar-gin of separation. However, only few support vectors could cause the loss of information because most information is contained in the majority of the pre-processed training set. Therefore, it can improve the generalization error bound. The surplus variables measure the distance in between the correctly classified data points and the hyper plane respectively. In addition to minimizing the sum of slack variables during maximizing the margin of separation proposed by soft margin algorithm, Fuzzy membership values is used as training examples are assigned to reduce the effect of both the problems of CIL and the problems of outliers and noise under the principle based on cost sensitive learning 3.Equations The FSVM method assigns different fuzzy membership values or weights, for different examples to reflect different importance in their own classes.In the proposed FTM-SVM method,we defined these membership functions as follows: where f(xi) generates a value between 0 and 1, and f(xi) reflectsthe importance of xiin its own class. We assigned r+= 1 and r−= r,where r is the minority-to- majority class ratio. Therefore, according to this assignment of values, a positive-class example can takea membership value in the [0, 1] interval, negative-classexample can take a membership value in the [0, r], where r < 1.
  • 3. International Journal of Research in Advent Technology, Vol.3, No.12, December 2015 E-ISSN: 2321-9637 Available online at www.ijrat.org 25 4.1 Flow Diagram: 4.2 TABLE Radial basis function UCI ML Repository Dataset Collection Pre-Processing / Noise Removal Parameter Selection using kernel spaces SVM Data Point ExtractionError boundary Extraction Fuzzy Total margin detection Data Classification
  • 4. International Journal of Research in Advent Technology, Vol.3, No.12, December 2015 E-ISSN: 2321-9637 Available online at www.ijrat.org 26 4.3 TABLE REFERENCE [1] L. L. Minku, “Online ensemble learning in the presence of concept drifts,” Ph.D. dissertation, School Comput. Sci., Univ. Birmingham, Birmingham, U.K., 2010. [2] H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, Sep. 2009. [3] R. Batuwita, V. Palade, FSVM-CIL: fuzzy support vector machines for classimbalance learning, IEEE Trans. Fuzzy Syst. 18 (June (3)) (2010) 558–571. [4] Y. Liu, Y. Chen, Face recognition using total margin-based adaptive fuzzysupport vector machines, IEEE Trans. Neural Netw. 18 (January (1)) (2007)178–192. [5]. Y. Sun, M.S. Kamel, A.K. Wong, and Y. Wang, “Cost-Sensitive Boosting for Classification of Imbalanced Data,” Pattern Recognition, vol. 40, no. 12, pp. 3358-3378, 2007. [5] H. Xue, S. Chen, Q. Yang, Structural regularized support vector machine: aframework for structural large margin classifier, IEEE Trans. Neural Netw. 22(April (4)) (2011) 573–587. [6] R. Batuwita, V. Palade, FSVM-CIL: fuzzy support vector machines for classimbalance learning, IEEE Trans. Fuzzy Syst. 18 (June (3)) (2010) 558–571. [7]X.-Y. Liu, J. Wu, Z.-H. Zhou, Exploratory undersampling for class-imbalancelearning, IEEE Trans Syst. Man Cybern. B: Cybern. 39 (2) (April 2009) 539–550. [8] R. Batuwita, V. Palade, Efficient resampling methods for training support vectormachines with imbalanced datasets, in: Proc. of IJCNN 2010, 2010 IEEE World Congress on Comp. Intelligence-WCCI 2010, Barcelona-Spain, July 2010.