Posts

Showing posts with the label Names Matching

2022-01-19: Leveraging Google Translate for matching Arabic names written in English

Image
Introduction: There is a significant amount of research papers and tools for Named-entity recognition (NER); however, only a small potion of it addresses Arabic text and even less tools for extracting named entities from documents written in Arabic. In August 2020, I proposed an approach for  extracting named entities from Arabic text using a combination of tools, Google Translate and Stanford NERC , and produced comparable results to Arabic Linguistic Pipeline (ALP) . The implementation of my approach,  GTS, is available on GitHub . In December 2020, I wrote a  blog post outlining tools and libraries for matching Arabic names written in English , which is important for Entity Linking, a subtask of Natural Language Processing (NLP). While discussing the importance of Entity linking is beyond the scope of this post, merging Arabic named entities written in English is the first step for Entity Linking when processing English documents. This is because discrepancie...

2020-12-29: Tools and libraries for matching Arabic names written in English

Image
Tools and libraries for matching Arabic names written in English Introduction: While working on my research, I needed to find a way to scan a set of Arabic and non-Arabic names written in English to find matches. This is especially difficult because you cannot count on the spelling of the name being consistent and distinct when written in a foreign language. Discrepancies between spellings, of the same name, may be due to the lack of name spelling standards, typos, translations, illiteracy, personal preferences, cultural differences, or all of the above. In this post, I discuss different approaches to solving this problem by using string matching and/or phonetic algorithms. The latter set of algorithms enable us to compare two strings based on how they sound, rather than how they are spelled, which is what the former set does. The real-world applications of names matching include Information Retrieval, Entity Recognition and Extraction, Natural Language Processing, Machine Translation...