Alibaba AI voice model cracks top 5 globally, outperforming US rivals in regional accents
The new model supports more than 30 languages, seven major Chinese dialects and over 20 regional accents

A new artificial intelligence voice model from Alibaba Group Holding has beaten out Western rivals OpenAI and xAI on a major global benchmark, underscoring its technical edge in capturing complex Chinese dialects and accents.
Alibaba owns the South China Morning Post.
The Speech Arena benchmark is operated by Artificial Analysis, a San Francisco-based AI evaluation organisation backed by investors including former GitHub chief executive Nat Friedman and Google Brain founder Andrew Ng.
The platform ranks models through blind user evaluations of generated speech clips using an Elo-based system. Speech Arena users test how well models can perform across three core capabilities – converting speech into text, enabling end-to-end voice understanding and conversational interaction, and transforming text into natural-sounding speech.
In a separate Artificial-Analysis Word Error Rate index, Alibaba’s Fun-Realtime-ASR model ranked first with a word error rate of 1.8 per cent, meaning fewer than two words out of every 100 were transcribed incorrectly.
