Open In App

NLP | Classifier-based tagging

Last Updated : 16 Dec, 2019
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report
ClassifierBasedPOSTagger class:
  • It is a subclass of ClassifierBasedTagger that uses classification technique to do part-of-speech tagging.
  • From the words, features are extracted and then passed to an internal classifier.
  • It classifies the features and returns a label i.e. a part-of-speech tag.
  • The feature detector finds multiple length suffixes, does some regular expression matching, and looks at the unigram, bigram, and trigram history to produce a fairly complete set of features for each word
Code #1 : Using ClassifierBasedPOSTagger Python3 1==
from nltk.tag.sequential import ClassifierBasedPOSTagger
from nltk.corpus import treebank

# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]

tagging = ClassifierBasedPOSTagger(train = train_data)

a = tagging.evaluate(test_data)

print ("Accuracy : ", a)
Output :
Accuracy : 0.9309734513274336
ClassifierBasedPOSTagger class inherits from ClassifierBasedTagger and only implements a feature_detector() method. All the training and tagging is done in ClassifierBasedTagger. Code #2 : Using MaxentClassifier Python3 1==
from nltk.classify import MaxentClassifier
from nltk.corpus import treebank

# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]


tagger = ClassifierBasedPOSTagger(
        train = train_sents, classifier_builder = MaxentClassifier.train)

a = tagger.evaluate(test_data)

print ("Accuracy : ", a)
Output :
Accuracy : 0.9258363911072739
custom feature detector detecting features There are two ways to do it:
  1. Subclass ClassifierBasedTagger and implement a feature_detector() method.
  2. Pass a function as the feature_detector keyword argument into ClassifierBasedTagger at initialization.
Code #3 : Custom Feature Detector Python3 1==
from nltk.tag.sequential import ClassifierBasedTagger
from tag_util import unigram_feature_detector
from nltk.corpus import treebank

# initializing training and testing set    
train_data = treebank.tagged_sents()[:3000]
test_data = treebank.tagged_sents()[3000:]

tag = ClassifierBasedTagger(
        train = train_data, 
        feature_detector = unigram_feature_detector)

a = tagger.evaluate(test_data)

print ("Accuracy : ", a)
Output :
Accuracy : 0.8733865745737104

Similar Reads