A comparative study using different measure of filteration

A comparative study of different measures used in filter method
to select relevant genes from microarray gene expression
presented by
Jayati Mitra

Content
Chapter 1. Introduction.
Chapter 2. Basic concepts of Bioinformatics and Molecular Biology
2.1 Introduction to Bioinformatics.
2.2 Goal of Bioinformatics.
2.3 Research area of Bioinformatics.
2.4 Real world application of Bioinformatics.
2.5 Introduction to Molecular Biology
2.6 Central dogma of Molecular Biology.
Chapter 3. DNA Microarray technology and Gene Expression Data.
Chapter 4. Literature Survey on feature selection techniques applied on Microarray Gene expression
data.
Chapter 5. Proposed Work .
5.1 Scoring function based feature selection.
5.2 Working Principal.
Chapter 6. Result Analysis.
Chapter 7. Conclusion and future scope.
References.

Chapter 1.Introduction
 Gene expression is the process of transcribing a gene’s DNA sequence
into RNA. A gene’s expression level indicates the approximate number of
copies of that gene’s RNA produced in a cell and it is correlated with the
amount of the corresponding proteins made.
 DNA microarray (also commonly known as DNA chip or biochip) is a
collection of microscopic DNA spots attached to a solid surface. Scientists
use DNA microarrays to measure the expression levels of large numbers of
genes simultaneously or to genotype multiple regions of a genome.
 Classification is the form of data analysis that extracts models of
describing important data classes. Such models called classifiers.Data
classification consists of two steps:
1.First one is the learning step or training phase where a
classification model or classifier is created from the training dataset and their
associated class labels and
2.the second one is the classification step where classifier is applied
to classify unseen data.

 Feature selection is the process of selecting a subset of relevant and redundant features
from a dataset in order to improve the performance of the classification algorithms in
terms of accuracy and time to build the model.
 The process of feature selection is classified into three categories:- (1) filter (2) wrapper
and (3) embedded. Filter methods evaluate a subset of genes by looking at the intrinsic
characteristics of data with respect to class labels.
 Here in this project thesis a review is completed on those score functions which are used
in filter methods.

Chapter 2. Basic concept of Bioinformatics and molecular biology
2.1.Introduction to Bioinformatics:
Bioinformatics is the field of science in which biology, computer science, mathematics
and information technology merge into a single discipline.
The ultimate goal of the field is to enable the discovery of new biological insights as well
as to create a global perspective from which unifying principles in biology can be
discerned.
2.2 Goal of Bioinformatics:
Bioinformatics then became more ambitious, aiming to revolutionize medicine by
making sequencing a diagnostic tool.
 The goal was to develop new approaches to eradicate diseases like cancer, and to pave
the way towards personalized medicine.

2.3 Research areas of bioinformatics:
Sequence analysis: Sequence analysis is the most primitive operation in computational
biology. This operation consists of finding which part of the biological sequences are
alike and which part differs during medical analysis and genome mapping processes.
• Genome annotation:In the context of genomics, annotation is the process of marking
the genes and other biological features in a DNA sequence.
• Analysis of gene expression: The expression of many genes can be determined by
measuring mRNA levels with various techniques such as microarrays, expressed cDNA
sequence tag (EST) sequencing, serial analysis of gene expression (SAGE) tag
sequencing, massively parallel signature sequencing (MPSS), or various applications of
multiplexed in-situ hybridization etc.
•Analysis of protein expression: Gene expression is measured in many ways including
mRNA and protein expression, however protein expression is one of the best clues of
actual gene activity since proteins are usually final catalysts of cell activity. Protein
microarrays and high throughput (HT) mass spectrometry (MS) can provide a snapshot of
the proteins present in a biological sample

Analysis of mutations in cancer: In cancer, the genomes of affected cells are rearranged in
complex or even unpredictable ways.
Massive sequencing efforts are used to identify previously unknown point mutations in a
variety of genes in cancer.
Bioinformaticians continue to produce specialized automated systems to manage the sheer
volume of sequence data produced, and they create new algorithms and software to
compare the sequencing results to the growing collection of human genome sequences and
germ line polymorphisms.
New physical detection technologies are employed, such as oligonucleotide microarrays to
identify chromosomal gains and losses and single-nucleotide polymorphism arrays to detect
known point mutations.
Protein structure prediction: The amino acid sequence of a protein (so-called, primary
structure) can be easily determined from the sequence on the gene that codes for it. In most
of the cases, this primary structure uniquely determines a structure in its native
environment.

2.4 Real world applications of bioinformatics:
• Basic Research
• Functional genomics
• Evolutionary genomics
• Epigenomics
• Genome Wide Association Analysis (GWA)
• Genomics
• Proteomics
• Omic sciences
• Systems biology/ Systems genetics
• High Performance Computing (HPC)
Biomedicine
• Drug discovery
• Personalized medicine
• Preventive medicine
• Gene therapy

Microbiology
• Biotechnology
• Waste cleanup
• Climate change
• Alternative energy sources
• Antibiotic resistance
• Epidemiological studies
Agriculture
• Crops
• Insect resistance
• Improving of nutritional quality

2.5 Introduction to Molecular Biology
Molecular Biology is the study of gene structure and function at molecular level to
understand the molecular basis of hereditary genetic variation and the expression patterns
of the genes.
Some common Molecular Biology techniques are:-
 Electrophoresis – A process which separates molecules such as DNA or proteins out
according to their size, electrophoresis is a mainstay of molecular biology laboratories. It
can be used to identify molecules or fragments of molecules and as a check to make sure
that we have the correct molecule present.
 Cloning – The technique of introducing a new gene into a cell or organism. This can
be used to see what effect the expression of that gene has on the organism.
 Restriction Digest – The process of cutting DNA up into smaller fragments using
enzymes which only act at a particular genetic sequence.
 Ligation – The process of joining two pieces of DNA together. Ligation is useful when
introducing a new piece of DNA into another genome

Cell:-The cell is the basic structural, functional, and biological unit of all known living
organisms. A cell is the smallest unit of life. Cells are often called the "building blocks of
life". The study of cells is called cell biology or cellular biology.
DNA:- Deoxyribonucleic acid or DNA is a molecule which contains all the hereditary
material and holds the instructions for building the proteins that are essential for our bodies
to function.

There are four types nitrogen containing regions called bases:
• Adenine (A)
• Cytosine (C)
• Guanine (G)
• Thymine (T)
The bases of the two strands of DNA are stuck together to create a ladder-like shape. Within
the ladder, A always sticks to T, and G always sticks to C to create the "rungs." The length of
the ladder is formed by the sugar and phosphate groups.

RNA: RNAstands for ribonucleic acid. Molecules are single- stranded nucleic acids
composed of nucleotides. It plays an important role in protein synthesis through the process
of translation.

There are basically three types of RNA involved in translation process:-
➢ Messenger RNA (mRNA)
➢ Transfer RNA (tRNA)
➢ Ribosomal RNA (rRNA)
Gene: A gene is the basic physical and functional unit of heredity. Genes are made up of
DNA. Some genes act as instructions to make molecules called proteins. In humans, genes
vary in size from a few hundred DNA bases to more than 2 million bases. The Human
Genome Project estimated that humans have between 20,000 and 25,000 genes.

Protein: Proteins are essential nutrients for the human body. They are one of the building
blocks of body tissue and can also serve as a fuel source. As a fuel, proteins provide as
much energy density of protein from a nutritional standpoint is its amino acid composition.

2.6 Central Dogma of Molecular Biology:
2.6 Central Dogma of Molecular Biology:

A comparative study using different measure of filteration

Post-translational modification (PTM) refers to the covalent and
generally enzymatic modification of proteins following protein biosynthesis.
Proteins are synthesized by ribosomes translating mRNA into polypeptide chains,
which may then undergo PTM to form the mature protein product. PTMs are
important components in cell signaling, as for example when prohormones are
converted to hormones.
Splicing: The detailed splicing
mechanism is quite complex. In
short, it involves five snRNAs and
their associated proteins. These
ribonucleo proteins form a
large(60S)complex,called
spliceosome. Then, after a two-step
enzymatic reaction, the intron is
removed and two neighboring
exons are joined together . The
branch point A residue plays a
critical role in the enzymatic
reaction.

Chapter 3.DNA Microarray Technology:
A DNA microarray is a collection of microscopic DNA spots attached to a solid
surface. Scientists use DNA microarrays to measure the expression levels of large
numbers of genes simultaneously or to genotype multiple regions of a genome.
DNA microarray technology provides biologists with the ability to measure the
expression levels of thousands of genes in a single experiment. Initial experiments
suggest that genes of similar function capitulate similar expression patterns in
microarray hybridization experiments.
DNA Microarray technology has empowered the scientific community to
understand the fundamental aspects underlining the growth and development of
life as well as to explore the genetic causes of anomalies occurring in the
functioning of the human body.

Design of DNA
Microarray
System

1.Sample
Preparation
2.Purification

3.Reverse Transcription
4. Labelling: Cyanine dyes Cy3
and Cy5 used predominant label in
microarray analysis

5.Hibridization
6. Scanning
7.Normalization
and analysis

Types of microarray
• DNA Microarray
– cDNA microarray
– Oligonucleotide arrays
• Protein microarray
– Analytical
– Functional
– Reverse phase
• Chemical compound arrays
– collection of organic chemical compounds spotted on a solid surface
• Carbohydrate arrays
– various oligosaccharides and/or polysaccharides immobilized on a solid
support in a spatially defined arrangement
• Cellular Microarrays
– spotted with varying materials, such as antibodies, proteins, or lipids,
which can interact with the cells, leading to their capture on specific
spots.

Chapter 4. Literature Survey on Feature selection techniques applied on
microarray Gene expression data:
Gene Expression: Gene expression is the process by which information from a gene is
used in the synthesis of a functional gene product. These products are often proteins,
but in non-protein coding genes such as transfer RNA (tRNA) or small nuclear RNA
(snRNA) genes, the product is a functional RNA.
The process of gene expression is used by all known life—eukaryotes (including
multicellular organisms), prokaryotes (bacteria and archaea), and utilized by viruses—to
generate the macromolecular machinery for life.
Several steps in the gene expression process may be modulated, including the
transcription, RNA splicing, translation, and post-translational modification of a protein.

Feature Selection for Classification
Feature selection mainly affects the training phase of classification. After
generating features, instead of processing data with the whole features to the
learning algorithm directly.
feature selection for classification will first perform feature selection to select a
subset of features and then process the data with the selected features to the
learning algorithm.
The feature selection phase might be independent of the learning algorithm, like
filter models, or it may iteratively utilize the performance of the learning
algorithms to evaluate the quality of the selected features, like wrapper models.
The process of feature selection is classified into three categories:-

a)Filter: It is also called open-loop method. It is the earliest method. It examines the
features based on the intrinsic characteristics prior to the learning tasks. A filter
algorithm principally measures the feature characteristics based on four types of
evaluation criteria, i.e., dependency, information, distance, and consistency. However,
filter methods ignore the interactions between classifiers and the possible interaction
among features (combined features may have net effect that is not necessarily
reflected by the individual features in that group). It also leads to varied prediction
performance when the selected features are applied to different learning algorithms.
.

Advantages and disadvantages of filters methods.

Filter method based works in Microarray gene expression data: some process and
methods are used as of based on filter method...In the table which is given below
showing some key references for filter method of feature selection technique in
the microarray domain.

b) Wrapper or close-loop method:wraps the feature selection around the learning
algorithm and utilizes classification error rate or performance accuracy as feature
evaluation criterion. It selects the most discriminative subset of features by minimizing the
prediction error of a particular classifier.

Advantages and disadvantages of Wrapper methods.

Wrapper method based works in Microarray gene expression data:

c) Embedded :method is a built-in feature selection mechanism that embeds the feature
selection in the learning algorithm and uses its properties to guide feature evaluation.
Embedded method is more efficient and computationally more tractable than wrapper
method while maintaining similar performance. This is because the embedded method
avoids the repetitive execution of classifier and examination of every feature subset.

Advantages and disadvantages of Embedded methods.

Embedded method based works in Microarray gene expression data:

d)Hybrid : represent the latest developments in feature selection. Hybrid method can be
either formed by combining two different methods (e.g. filter and wrapper), two methods
of the same criterion, or two feature selection approaches. It uses different evaluation
criteria in different search stages to improve the efficiency and prediction performance
with better computational performance.

)
Advantages and disadvantages of Hybrid methods:

Hybrid method based works in Microarray gene expression data:

e)Ensemble: this method is a method that aims to construct a group of feature subsets
and then produce an aggregated result out of the group. It is purposely designed to tackle
the instability and perturbation issues in many feature selection algorithms. This method
is based on different sub-sampling strategies where a particular feature selection method
is run on a number of sub-samples and the obtained features are merged to form a more
stable subset..

Advantages and disadvantages of Ensemble methods

Ensemble method based works in Microarray gene expression data:

Chapter 5. Proposed work
5.1 Scoring Function Based Feature Selection
In this work our main objective is to focus on scoring function which is used in filter
method.
•Filter methods are divided into two categories: univariate methods and multivariate
methods.
In the univariate scheme, each feature is ranked independently based on some
score functions or measures. and then a given number of features are selected
according to their rank.
In the multivariate scheme feature dependency is considered. Therefore, the
multivariate scheme is naturally capable of handling redundant features.
Here the score functions which are used to rank relevant genes in gene expression
data are discussed below.

Here, 𝐾𝑁×𝑀 is a gene expression data matrix, which contains 𝑁 number of objects (samples)
and 𝑀 number of features (genes).
•Here, 𝑋={𝑋1,𝑋2,...........𝑋𝑁 } is a set of samples .
•𝑓={𝑓1,𝑓2,…….,𝑓𝑀 } is a set of features(Gene).
•𝐶𝑁×1 is a class vector which contains a class value associated with every sample.
1.Mutual Information: the mutual information (MI) of two random variables is a measure
of the mutual dependence between the two features.
•fi and fs are individual features.

average normalized MI as a measure of redundancy between the i th feature and the
subset of selected features S={fs}
|s| is s the cardinality of set S .
For selecting the best features we calculating MI by defining this equation which is
given below:

2. Symmetric Uncertainty:
This is one of normalized form of Mutual Information; introduced by Witten and
Frank, 2005 . Its defined as bellow:
3. Information Gain:
•The information gain measure is based on the entropy concept
•It is commonly used as measure of feature relevance in filter strategies that evaluate
features individually
•Information gain (IG) measures the amount of information in bits about the class
prediction, if the only information available is the presence of a feature and the
corresponding class distribution

Where Pd is the marginal probability of class ci th.
Here, Dataset is partitioned with respect a feature f into k parts.
The information gain with respect to feature f is given below:
Where|𝐷𝑘
𝑓 |and |𝐷| represent the number of objects
present in and respectively

4. Chi-square test:
 Chi-square is a statistical test commonly used to compare observed data with data we
would expect to obtain according to a specific hypothesis.
The chi-square test is always testing what scientists call the null hypothesis
There is no significant difference between the expected and observed result.
The formula for calculating chi-square (x2) is:
Here we calculating Chi-square for every feature variable and target variable
and observe the existence of relationship between the feature variable and
target variable.

•In feature selection, the two events are occurrence of the term and occurrence of the
class.
•X2 is a measure of how much expected counts and observed counts diverge from each
other.
•A high value of X 2 indicates that the hypothesis of independence, which implies that
expected and observed counts are similar, is incorrect.
•If the two events are dependent, then the occurrence of the term makes the occurrence of
the class more likely (or less likely), so it should be helpful as a feature.

5.Gini Index
•It is a univariate and supervised feature weighting method. It is a measure for quantifying a
feature's ability to distinguish between classes
•The main idea behind the Gini-Index theory is the, is a univariate and supervised feature
weighting method
•It is a measure for quantifying a feature's ability to distinguish between classes
•Given c classes, GI of a feature f can be calculated as:

6.Relief
•filter-method approach to feature selection that is notably sensitive to feature interactions
•Relief calculates a feature score for each feature which can then be applied to rank and
select top scoring features for feature selection
•Relief feature scoring is based on the identification of feature value differences between
nearest neighbor instance pairs
•If a feature value difference is observed in a neighboring instance pair with the same class
(a 'hit'), the feature score decreases
•if a feature value difference is observed in a neighboring instance pair with different class
values (a 'miss'), the feature score increases
Relief finds the nearest instance from same class to find a hit and a miss from different
class and according to that 𝑊(𝑓𝑡 ) is increased or decreased.
…………………………………………………11

7. Fisher Score:
•It is a supervised and univariate feature weighting method
•It picks features that assigns similar value to the samples from the same class
•And picks different value to samples from different classes to evaluate measure used in
Fisher Score
Fisher score can be expressed as:
•where, 𝜇𝒇𝒊𝒄𝒊 and 𝜌𝑓𝑖𝑐𝑖 are the mean and variance of ith features in ci class.
•𝜇𝑓𝑖 is the mean of ith feature
•𝑛𝑐 is the number of samples of ci th class

8.T-test :
•It measures the relationship between two samples statistically by comparing its mean
values
•It calculates the ratio between two class mean and variability of two classes.
…………………………………..13
𝑓̅𝑡1 is the mean of sample values of feature for class 1
𝑓̅𝑡2 is the mean of sample values of feature for class 2
𝑆𝑡1 𝑎𝑛𝑑 𝑆𝑡2are standard deviation of sample value of feature 𝑓𝑡 for class 1
and sample values of feature 𝑓𝑡for class 2 respectively.
𝑛1represents number of samples of class1 and 𝑛2 represents number of
samples of class2.

5.2 Working Principle:
Algorithm:
Filter
1. Choose k= the number of genes to be selected by each filter.
2. For each filter FTi(i=1,2,3)
a) Calculate the statistical scores for all genes and rank the scores from the highest
to the lowest.
b) Select k genes with top ranking scores in each list.
3. Take the union of the list of genes obtained by FTi(i=1,2,3) to produce a set of p features,

Flowchart of Filter base feature selection

Chapter 6 Result Analysis
The score functions or measures are applied on different microarray gene expression
datasets and then best 100 number of genes are selected by each measure and
classification accuracy of samples are checked using those genes for every measure. Here
KNN classifier is used to check classification accuracy using leave one out cross
validation method.
Table 1. Dataset Description

Table 2. PERFORMANCE OF DIFFERENT MEASURES ON BREAST CANCER
DATASET USING
KNN
Measures Classification accuracy
Fisher Score 81.6
T-test 81.6
Chi-square 83.7
Symmetric Uncertainty 85.7
Information Gain 89.8
Gini Index 85.7
Mutual Information 89.8
Relief 89.8

Table 3. PERFORMANCE OF DIFFERENT MEASURES ON COLON CANCER
DATASET USING
KNN
Measures Classification accuracy
Fisher Score 72.6
T-test 72.6
Chi-square 83.9
Symmetric Uncertainty 83.9
Information Gain 90.3
Gini Index 85.5
Mutual Information 90.3
Relief 85.5

Chapter 7 Conclusion and Future Scope:
This project focuses on the filter approach to select the most relevant features, which is
based on the study of the existing scoring functions, which are univariate.
After analyzing the outcomes which have been carried out from these scoring function a
comparative study can be established
It also helps us to analyze the methodology in selecting the more relevant features and
removing irrelevant features.
In future, this proposed work can be applied in the multivariate scheme by using filter
based scoring function.

References
[1]Analysis of Gene Expression Data, E. Klipp, R. Herwig, A. Kowald, C. Wierling, H.
Lehrach, ISBN: 3-527-31078-9.
[2]. C.Lavanya1, M.Nandihini2, R.Niranjana3, C.Gunavathi4(2014) "Classification of
Microarray Data Based On Feature Selection Method".An ISO 3297: 2007 Certified
Organization, Volume 3, Special Issue 1,PAGE -126.
[3]. Rabia Aziz *, C.K. Verma, and Namita Srivastava "Dimension reduction methods for
microarray data: a review".DOI: 10.3934/bioeng.2017.1.179.
[4]Ang Jun Chin, Andri Mirzal, Habibollah Haron, Senior Member, IEEE, Haza Nuzly
Abdull Hamed: Supervised, Unsupervised and Semi-supervised Feature Selection: A
Review on Gene Selection. DOI 10.1109/TCBB.2015.2478454,pages 6-9.
[5]Alejandra J. Magana1,Manaz Taleyarkhan2, Daniela Rivera Alvarado3,Michael Kane4, John
Springer5, and Kari Clase6 "A Survey of Scholarly Literature Describing the Field of
Bioinformatics Education and Bioinformatics Educational Research"doi: 10.1187/cbe.13-
10-0193.

[6] P. D. Karp. "An ontology for biological function based on molecular interactions"
DOI:16(3):269–285, 2000.
[7] R. Bals1, B. Jany2: "Identification of disease genes by expression profiling" Eur Respir
J. 2001 Nov;18(5):8829.
[8] Rajeshwar Govindarajan, Jeyapradha Duraiyan1 , Karunakaran Kaliyappan, Murugesan
Palanisamy2 "Microarray and its applications" doi: 10.4103/0975-7406.100283 .2012 jan
page S311.
[9]. Yvan Saeys1.*, In˜aki Inza2 and Pedro Larran˜aga2: A review of feature selection
techniques in bioinformatics"doi:10.1093/bioinformatics/btm344 , June 25, 2007 page
2508-2514.
[10]. Carmen Lai*1, Marcel JT Reinders1, Laura J van't Veer2 and Lodewyk FA Wessels1,2" A
comparison of univariate and multivariate gene selection techniques for classification of
cancer datasets". doi.org/10.1186/1471-2105-7-235,2006 may.
[11]. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach
Learn Res3: 1157–1182.

[12]. Xiong, M., Fang, Z., & Zhao, J. (2001). Biomarker identification by feature
wrappers, Genome Research, 11, 1878-1887.
[13]" A weighted logistic regression analysis for predicting the odds of head/face and
neck injuries during rollover crashes" by Jingwen Hu, Clifford C. Chou, King H. Yang,
Albert I. King, 2007; 51: 363–379.
[14].A comparative review of statistical methods for discovering differentially expressed
Genes in replicated Microarray Experiments.Wei Pan Bioinformatics, Volume 18, Issue
4, April 2002, Pages 546–554,https://doi.org/10.1093/bioinformatics/18.4.546.
[15]. Somol P, Pudil P, Novovičová J, et al. (1999) Adaptive floating search methods in
feature selection. Pattern Recogn Lett 20: 1157–1163.
[16]. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput
Electr Eng 40:16–28.
[17]. F. Nie, H. Huang, X. Cai, and C. H. Ding, “Efficient and Robust Feature Selection
via Joint mathscrl2,1-Norms Minimization,” in Advances in Neural Information
Processing Systems 23, J. D. Laf-ferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel,
and A. Cu-lotta, Eds. Curran Associates, Inc., 2010, pp. 1813–1821.

[18]. S. Xiang, F. Nie, G. Meng, C. Pan, and C. Zhang, “Discrimina-tive Least Squares
Regression for Multiclass Classification and Feature Selection,” IEEE Trans. Neural
Netw. Learn. Syst., vol. 23, no. 11, pp. 1738–1754, Nov. 2012.
[19]. H. Pang, S. L. George, K. Hui, and T. Tong, “Gene Selection Using Iterative
Feature Elimination Random Forests for Sur-vival Outcomes,” IEEEACM Trans
Comput Biol Bioinforma., vol. 9, no. 5, pp. 1422–1431, Sep. 2012.
[20]. Q. Hu, W. Pan, S. An, P. Ma, and J. Wei, “An efficient gene se-lection technique
for cancer recognition based on neighbor-hood mutual information,” Int. J. Mach.
Learn. Cybern., vol. 1, no. 1–4, pp. 63–74, Dec. 2010.
[21]. P. Saengsiri, S. N. Wichian, P. Meesad, and U. Herwig, “Com-parison of hybrid
feature selection models on gene expression data,” in Knowledge Engineering, 2010
8th International Conference on ICT and, 2010, pp. 13–18.
[22]. Y. Leung and Y. Hung, “A Multiple-Filter-Multiple-Wrapper Approach to Gene
Selection and Microarray Data Classifica-tion,” IEEEACM Trans Comput Biol
Bioinforma., vol. 7, no. 1, pp. 108–117, Jan. 2010.

[23]. H. Liu, L. Liu, and H. Zhang, “Ensemble gene selection by grouping for
microarray data classification,” J. Biomed. Inform., vol. 43, no. 1, pp. 81–87, Feb.
2010.
[24]. P. Yang, B. B. Zhou, Z. Zhang, and A. Y. Zomaya, “A multi-filter enhanced
genetic ensemble system for gene selection and sam-ple classification of microarray
data,” BMC Bioinformatics, vol. 11, no. Suppl 1, p. S5, Jan. 2010.
[25]. M. A. Gaafar, N. A. Yousri, and M. A. Ismail, “A novel ensemble selection
method for cancer diagnosis using microarray da-tasets,” in IEEE 12th International
Conference on BioInformatics and BioEngineering, BIBE 2012, 2012, pp. 368–373.
[26]. K.Mani1 P.Kalpana2 " A Review on Filter Based Feature Selection".Vol. 4, Issue 5,
May 2016, DOI: 10.15680/IJIRCCE.2015. 0405094 ,PAGE-9149-9151
[27]. Shilpi Bose1 , Chandra Das2 , Matangini Chattopadhyay3 , Kuntal Ghosh4 ,
Samiran Chattopadhyay5 " An Ensemble Filtering Approach based Supervised Gene
Clustering Algorithm to Identify Informative Genes to Improve Sample Classification
Accuracy in Microarray Gene Expression Data.
[28] . Ryan J. Urbanowicza,_, Melissa Meekerb, William LaCavaa, Randal S. Olsona,
Jason H. Moorea" Relief-Based Feature Selection: Introduction and
Review.doi:1711.08421V2-2 APR 2018.

A comparative study using different measure of filteration

More Related Content

Similar to A comparative study using different measure of filteration (20)

Recently uploaded (20)

A comparative study using different measure of filteration