Alibaba challenges OpenAI’s GPT-4o and Google’s Nano Banana with new multimodal AI model

Name: FULL EVENT: China Future Tech Webinar | The US-China chip war
Uploaded: 2025-09-23T10:30:08.000Z
Description: FULL EVENT: China Future Tech Webinar | The US-China chip war

Two variants of Qwen3-Omni outperform GPT-4o and Gemini-2.5-Flash in audio, image and video comprehension, developers say

Reading Time:2 minutes

Why you can trust SCMP

Ben Jiangin Beijing

Published: 6:30pm, 23 Sep 2025

Alibaba Group Holding on Tuesday unveiled a suite of new artificial intelligence models, including a multimodal system rivalling OpenAI’s GPT-4o and Google’s popular “Nano Banana” image editor, intensifying both domestic and international competition in the field.

Chief among the new releases was Qwen3-Omni, a flagship multimodal model akin to OpenAI’s GPT-4o launched in May 2024. The Alibaba model is designed to process a combination of text, audio, image and video inputs and respond with text and audio.

Qwen3-Omni was the first native end-to-end multimodal system that “unifies text, images, audio and video in one model”, the development team said on social media. Alibaba owns the Post.

The model competes with similar offerings already available outside China, including OpenAI’s GPT-4o and Google’s Gemini 2.5-Flash, also known as “Nano Banana” – an image editing and generating tool that has been making waves recently.

FULL EVENT: China Future Tech Webinar | The US-China chip war

Citing benchmark tests on audio recognition and comprehension, as well as image and video understanding, developers said two variants of Qwen3-Omni outperformed their predecessor, Qwen2.5-Omni-7B, as well as GPT-4o and Gemini-2.5-Flash.

Lin Junyang, a researcher on the Qwen team under Alibaba’s cloud unit, attributed the improvements to various foundational projects related to audio and images.

Select Voice

Choose your listening speed

Get through articles 2x faster

1.25x

250 WPM

Slow

Average

Fast

00:0000:00

1.25x

Alibaba challenges OpenAI’s GPT-4o and Google’s Nano Banana with new multimodal AI model

.css-1c6uqr6{color:inherit;font-weight:inherit;font-size:inherit;font-family:inherit;line-height:inherit;overflow-wrap:break-word;}Two variants of Qwen3-Omni outperform GPT-4o and Gemini-2.5-Flash in audio, image and video comprehension, developers say

Two variants of Qwen3-Omni outperform GPT-4o and Gemini-2.5-Flash in audio, image and video comprehension, developers say