Classification using pattern probability estimators
2010 IEEE International Symposium on Information Theory, 2010•ieeexplore.ieee.org
We consider the problem of classification, where the data of the classes are generated iid
according to unknown probability distributions. The goal is to classify test data with minimum
error probability, based on the training data available for the classes. The Likelihood Ratio
Test (LRT) is the optimal decision rule when the distributions are known. Hence, a popular
approach for classification is to estimate the likelihoods using well known probability
estimators, eg, the Laplace and Good-Turing estimators, and use them in a LRT. We are …
according to unknown probability distributions. The goal is to classify test data with minimum
error probability, based on the training data available for the classes. The Likelihood Ratio
Test (LRT) is the optimal decision rule when the distributions are known. Hence, a popular
approach for classification is to estimate the likelihoods using well known probability
estimators, eg, the Laplace and Good-Turing estimators, and use them in a LRT. We are …
We consider the problem of classification, where the data of the classes are generated i.i.d. according to unknown probability distributions. The goal is to classify test data with minimum error probability, based on the training data available for the classes. The Likelihood Ratio Test (LRT) is the optimal decision rule when the distributions are known. Hence, a popular approach for classification is to estimate the likelihoods using well known probability estimators, e.g., the Laplace and Good-Turing estimators, and use them in a LRT. We are primarily interested in situations where the alphabet of the underlying distributions is large compared to the training data available, which is indeed the case in most practical applications. We motivate and propose LRT's based on pattern probability estimators that are known to achieve low redundancy for universal compression of large alphabet sources. While a complete proof for optimality of these decision rules is warranted, we demonstrate their performance and compare it with other well-known classifiers by various experiments on synthetic data and real data for text classification.
ieeexplore.ieee.org
Showing the best result for this search. See all results