Alibaba AI voice model cracks top 5 globally, outperforming US rivals in regional accents

The new model supports more than 30 languages, seven major Chinese dialects and over 20 regional accents

2-MIN READ2-MIN

Listen

Voice-based AI systems are generally seen as easier for mainstream users to adopt than text-based interfaces. Photo: Shutterstock Images

Published: 7:00pm, 29 May 2026

A new artificial intelligence voice model from Alibaba Group Holding has beaten out Western rivals OpenAI and xAI on a major global benchmark, underscoring its technical edge in capturing complex Chinese dialects and accents.

Fun-Realtime-TTS-Preview, developed by Alibaba’s Tongyi Lab, has secured the fifth spot on the Artificial Analysis Speech Arena leaderboard with a score of 1,190. It was the only Chinese-engineered voice system in the global top five.

Alibaba owns the South China Morning Post.

The Speech Arena benchmark is operated by Artificial Analysis, a San Francisco-based AI evaluation organisation backed by investors including former GitHub chief executive Nat Friedman and Google Brain founder Andrew Ng.

The platform ranks models through blind user evaluations of generated speech clips using an Elo-based system. Speech Arena users test how well models can perform across three core capabilities – converting speech into text, enabling end-to-end voice understanding and conversational interaction, and transforming text into natural-sounding speech.

In a separate Artificial-Analysis Word Error Rate index, Alibaba’s Fun-Realtime-ASR model ranked first with a word error rate of 1.8 per cent, meaning fewer than two words out of every 100 were transcribed incorrectly.

The Alibaba logo is pictured outside its offices in Beijing on April 1, 2026. Photo: AFP

Alibaba AI voice model cracks top 5 globally, outperforming US rivals in regional accents

.css-1c6uqr6{color:inherit;font-weight:inherit;font-size:inherit;font-family:inherit;line-height:inherit;overflow-wrap:break-word;}The new model supports more than 30 languages, seven major Chinese dialects and over 20 regional accents

The new model supports more than 30 languages, seven major Chinese dialects and over 20 regional accents