UAE researchers develop unique Arabic thesaurus

Abu Dhabi - The interface provides a novel way to search Arabic words, their related forms and English equivalents.

(Supplied image)
(Supplied image)

By Ashwani Kumar

Published: Sat 23 Jan 2021, 7:15 PM

Last updated: Sat 23 Jan 2021, 7:21 PM

Two researchers from the New York University Abu Dhabi have developed a first-of-its kind Arabic thesaurus. The new tool provides a novel way to search Arabic words, their related forms and English equivalents, said associate professor of practice of Arabic language Muhamed Al Khalil.

Al Khalil co-developed the interface with professor of computer science Nizar Habash.

“A user can search using any conjugated form of a word and the tool will identify all the Arabic roots and dictionary entries associated with that word based on different vowelisations, known as ‘harakaat’ in Arabic. It will link to different related words, by same root, synonyms, antonyms, etc.

“It will provide the readability level, which ranks the difficulty or easiness of the word and tells the expected language proficiency needed of the readers to understand the word. A user can search in Arabic or English, but the results and relationships are primarily focused on Arabic,” Al Khalil said.

Habash said the tool provides readability on a five-level scale. “For example, the following three words for beauty in Arabic come from three levels ‘jamaal’ (level 1), husn (level 3) and sabaaha (level 5). Comparatively, in English, it’s beauty (lower level, easy readability) and pulchritude (higher level, difficult readability).”

The NYUAD-funded project Simplification of Arabic Masterpieces for Extensive Reading (SAMER) laid the research groundwork for the development of the tool.

Habash underlined that Arabic is challenging and complex for many reasons.

“Words have many different conjugations. For example, a verb in Arabic can have more than 5,000 forms, compared to English with five to six forms or Chinese with one form,” he said.

He added that Arabic has a very ambiguous spelling system and many dialectal variants.

The tool can be used by teachers and learners, and to improve Arabic writing in various sectors — from art to science writing to government and media. Learners can use it as a standard to simplify Arabic fiction, which may have archaic words or complex multi-layered meanings and connotations.

“Students and teachers can use the thesaurus to learn new words using simpler forms and also determine if a word is appropriately used for a particular context. An aspiring poet or novelist can use this to identify words that can help them improve their writing and style. Most Arabic thesauri do not provide readability information. This is the gap we fill. However, to make this resource particularly useful to teachers and educators, we have developed a simplification interface built on our lexicon and its readability levels,” Al Khalil noted.

The tool is a demonstration of how basic enabling technologies for Arabic can help users.

Habash added: “The specific readability lexicon aspect is interesting for automatic AI text simplification, an area we look forward to exploring in the future.”

More news from