Friday 6 Feb 2026 Abu Dhabi UAE
Prayer Timing
Today's Edition
Today's Edition
AI

MBZUAI’s Global Google PhD Fellow explores new frontiers in AI video reasoning

MBZUAI’s Global Google PhD Fellow explores new frontiers in AI video reasoning
6 Feb 2026 08:46

MAYS IBRAHIM (ABU DHABI)

Muhammad Maaz arrived in Abu Dhabi as part of the first cohort of students at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI).

Five years later, he became the first researcher in the Gulf to be named a Global Google PhD Fellow, an award given to just over 200 doctoral students worldwide.

“I always dreamed of being recognised by companies like Google and Meta,” Maaz told Aletihad. “To see that happen now, from here in Abu Dhabi, means a lot to me personally and to the region.”

Now a final year PhD candidate at MBZUAI, Maaz received the Graduate Research Excellence Award from the university, which celebrated its fifth-year anniversary earlier this week.

Maaz, originally from Pakistan, trained as an electrical engineer before discovering his passion for artificial intelligence through a bachelor’s thesis in computer vision. 

After graduating, he worked as a software engineer on real-world problems such as object detection and tracking hands-on experience that shaped his interest in how AI systems are built from the ground up.

His path to MBZUAI began when faculty members visited his university to introduce the then-new AI-focused institution in Abu Dhabi.

“They spoke about sustainability not just financially, but intellectually,” he said. “I wanted to pursue a PhD somewhere I could focus entirely on research, without constantly worrying about survival.”

Teaching AI to Understand Videos

Maaz’s research focuses on multimodal large language models, a rapidly advancing area of AI that allows systems to understand and reason across different types of data such as text, images and video.

“You can think of it as ChatGPT for videos,” he explained. “A user gives the system a video and asks a question, and the model responds based on what it sees and understands.”

One of the projects he worked on, Video Chatter Jeopardy, is considered the world’s first fully fledged video conversation model. 

Rather than simply describing what appears on screen, the system allows users to ask natural-language questions about video content and receive meaningful answers.

Another project, GLAM, focuses on what researchers call “grounding” enabling AI models not just to describe images, but to precisely locate and identify objects within them.

“Grounding means localisation,” Maaz said. “The model doesn’t just talk about what’s in an image; it points to where it is.”

Both projects gained global attention not only because they were first-of-their-kind, but also because of the team’s commitment to openness.

“When we publish, we don’t just release a paper; we release the code, tutorials, demos everything needed to reproduce the work,” Maaz said. 

The Google PhD Fellowship recognises researchers whose work is both academically rigorous and highly relevant to real-world applications. Maaz believes the practical impact of his research played a key role.

“These are foundation models,” he said. “They can be applied in education, online learning, medicine, agriculture; almost any industry you can think of.”

He credits MBZUAI’s research culture and mentorship for pushing him to pursue work that is not only novel, but usable by others. “We were taught to think beyond publications and focus on how people will actually use what we build.”

Abu Dhabi Advantage

For Maaz, Abu Dhabi has become more than a place of study.

“It’s my home,” he said. “It’s where I’ve grown professionally and personally. I started my family and met my better half here; she’s also one of the brilliant researchers at MBZUAI.”

He pointed to the university’s comprehensive support system from stipends and housing to access to world-class computing resources as a major reason for his success.

“The biggest advantage is that you only have to focus on research,” he said. “Everything else is taken care of.”

Having built systems that can describe and converse about videos, Maaz is now working on the next frontier: video reasoning.

“The goal is not just to explain what is happening in a video, but why it is happening,” he said. “And to remember past interactions with the user.”

His team recently launched an interactive video reasoning model, believed to be the first of its kind globally. 

The project required months of intense work – often up to 16 hours a day alongside fellow researchers and supervisors.

Copyrights reserved to Aletihad News Center © 2026