Advertisement
Artificial intelligence
TechBig Tech

DeepSeek pitches new route to scale AI, but researchers call for more testing

DeepSeek’s proposed ‘mHC’ design could change how AI models are trained, but experts caution it still needs to prove itself at scale

Reading Time:3 minutes
Why you can trust SCMP
DeepSeek’s paper is fuelling speculation that its next models could incorporate new architecture. Photo: Reuters
Eunice Xu
DeepSeek’s proposed “mHC” architecture could transform the training of large language models (LLMs) – the technology behind artificial intelligence chatbots – as developers look for ways to scale models without simply adding more computing power.

However, experts cautioned that while the approach could prove far-reaching, it might still prove difficult to put into practice.

In a technical paper released last week, co-authored by DeepSeek founder and CEO Liang Wenfeng, the company proposed Manifold-Constrained Hyper-Connections (mHC), a method designed to address the training instability of Hyper-Connections (HC), a network structure introduced by Chinese tech giant ByteDance in 2024.
Advertisement

HC was developed to address limitations of Residual Networks (ResNet), an architecture that underpins many modern deep-learning models, including LLMs.

ResNet was proposed about a decade ago by four researchers at Microsoft Research Asia, including prominent computer scientist Kaiming He.

Advertisement

DeepSeek’s paper marks the Chinese AI start-up’s latest effort to improve model training efficiency with limited computing resources, fuelling speculation that its next models could incorporate the new architecture.

Advertisement
Select Voice
Choose your listening speed
Get through articles 2x faster
1.25x
250 WPM
Slow
Average
Fast
1.25x