Mon, Oct 07, 2024 | Rabiʻ II 3, 1446 | DXB °C

Abu Dhabi: This Indian AI graduate developed a smart solution to detect hateful memes on social media

Gokul Karthik Kumar says his experience at the Mohamed Bin Zayed University of Artificial Intelligence prepared him well for a career in AI research and development

Supplied photo

Supplied photo

Detecting hate speech in memes, creating an AI-human interactive dubbing platform, and doing research on deep learning-based text-to-speech systems in 13 Indian languages are some of the breakthrough projects of a graduate of the Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI).

Ahead of commencement exercises on Sunday, Gokul Karthik Kumar, a master’s degree student of computer vision, spoke to Khaleej Times about his journey at the pioneering institution.


Recommended For You

 

The university, he said, gave him the freedom to explore several areas of artificial intelligence (AI), including his passion for natural language processing.

“While my two-year major is in computer vision, my supervisor has supported me in pursuing projects in other domains like natural language processing and speech processing, which has been immensely fulfilling and helped me identify the areas that I’m currently passionate about.”


A native of the Indian state of Tamil Nadu, Kumar underlined that his experience at the MBZUAI has been “transformative” and prepared him well for a future career in AI research and development.

“I have learned from some of the most knowledgeable professors in artificial intelligence. I have also enhanced my research skills from problem identification to research proposal to presentation.”

Unique AI solutions

Kumar has an extensive background in machine learning across text, image, speech, and time-series, having worked with top technology organisations like Microsoft Research India, TCS Research, MBZUAI, and IIT Madras. He has won numerous hackathons, including the IEEE SLT 2022 international hackathon in Qatar, as well as eight national-level hackathons in the UAE and India, and a US patent.

He has co-authored articles that have been published in major conferences such as IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023), Association for Computational Linguistics (ACL 2022) Workshop, Empirical Methods in Natural Language Processing (EMNLP 2022) Workshop, and International Joint Conference on Neural Networks (IJCNN).

Following the commencement ceremony, he will travel to Greece to attend ICASSP 2023, where he will present his research paper titled Towards Building Text-To-Speech Systems for the Next Billion Users.

“This project was initiated during my summer internship at Microsoft Research India, where I collaborated with my co-author, Praveen from IIT Madras. Our work involved a systematic evaluation of design choices for text-to-speech systems, leading to the release of state-of-the-art models for 13 Indian languages. Most open-source text-to-speech is available in English but extending it to local languages could reach masses, especially people who don’t know how to read.”

Hateful memes, Autodub

Kumar’s thesis research explores efficient representation methods for multilingual and multimodal data. His work addresses crucial tasks such as question answering, hateful meme classification, text-to-speech, and text-image retrieval. In the current era of social media, where online bullying has become increasingly prevalent, Kumar's research holds importance.

Hateful memes, which encompass hate speech targeted at individuals on social media, pose a concerning challenge. While various techniques exist for classifying such memes, Kumar has devised a straightforward approach that effectively combines image and textual features to predict the probability of hatefulness. This could empower social media platforms to make informed decisions about what should and should not be published.

He has also been involved in the joint development of the award-winning Autodub, a human-in-the-loop AI dubbing platform that aims to eliminate language barriers in educational video content to enhance remote, online learning to all corners of the globe. Autodub seamlessly integrates transcription, translation, voiceover, and background audio separation to create accurate translations and promote accessibility for all. Since many educational videos are primarily in English, this can create a hindrance for non-native English speakers. Autodub offers a viable solution to this challenge.

“What truly excites me about my future career is the opportunity to make a tangible impact. If I can develop something that enhances processes and, consequently, positively influences a significant number of individuals, it would be truly remarkable. Only a few fields or technologies have the power to create something that instantly captures widespread attention and sparks conversations across various communities.”

Joining G42 as scientist

An ardent follower of IPL cricket team Chennai Super Kings, he recollected his favourite UAE memory of seeing his team clinch the season title in 2021 in Dubai, coinciding with the beginning of his master’s journey. Adding to the excitement, his team won again just days before his commencement ceremony.

His next challenge is to help develop large language models for the UAE – a country that has identified leveraging AI for good as a key priority.

“I will be joining G42’s Inception Institute of Artificial Intelligence (IIAI) as an applied scientist, where my focus will be working collaboratively in a team to develop large language models tailored to UAE-focused applications.”

Kumar is the first to achieve a master’s degree in his family. He also holds a bachelor’s degree in information technology from Anna University, Chennai. He is one of 59 computer vision, machine learning, and natural language processing students graduating as part of Class of 2023.

ALSO READ:


Next Story