Advertisement
Artificial intelligence
TechBig Tech

DeepSeek kicks off 2026 with paper signalling push to train bigger models for less

DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture

Reading Time:2 minutes
Why you can trust SCMP
4
For industry watchers, DeepSeek’s papers often provide an important early signal of the engineering choices that will shape the start-up’s next major model release. Photo: Getty
Vincent Chow

Chinese artificial intelligence start-up DeepSeek has ushered in 2026 with a new technical paper, co-authored by founder Liang Wenfeng, that proposes a rethink of the fundamental architecture used to train foundational AI models.

The method – dubbed Manifold-Constrained Hyper-Connections (mHC) – forms part of the Hangzhou firm’s push to make its models more cost-effective as it strives to keep pace with better-funded US rivals with deeper access to computing power.

It also reflected the increasingly open, collaborative culture among Chinese AI companies, which have published a growing share of their research in public.

Advertisement

For industry watchers, DeepSeek’s papers often provide an important early signal of the engineering choices that will shape the start-up’s next major model release.

In the paper, released on Thursday, a team of 19 DeepSeek researchers said they tested mHC on models with 3 billion, 9 billion and 27 billion parameters, and found it scaled without adding significant computational burden.

Advertisement

“Empirical results confirm that mHC effectively … [enables] stable large-scale training with superior scalability compared with conventional HC (hyper-connections),” wrote the researchers, led by Zhenda Xie, Yixuan Wei and Huanqi Cao.

Advertisement
Select Voice
Choose your listening speed
Get through articles 2x faster
1.25x
250 WPM
Slow
Average
Fast
1.25x