Text data augmentation techniques
Text data augmentation encompasses a wide range of techniques, from simple word-level manipulations to more complex semantic transformations.
Synonym replacement
This technique involves replacing words in the original text with their synonyms. We can use WordNet, a lexical database for the English language, to find synonyms:
def synonym_replacement(text, n=1): words = text.split() new_words = words.copy() random_word_list = list( set([word for word in words if word.isalnum()]) ) random.shuffle(random_word_list) num_replaced = 0 for random_word in random_word_list: synonyms = get_synonyms(random_word) if len(synonyms) >= 1: ...