DeepSeek proposes shift in AI model development with ‘mHC’ architecture to upgrade ResNet
The paper comes at a time when most AI start-ups have been focusing on turning AI capabilities in LLMs into agents and other products

DeepSeek’s latest technical paper, co-authored by the firm’s founder and CEO Liang Wenfeng, has been cited as a potential game changer in developing artificial intelligence models, as it could translate into improvements in the fundamental architecture of machine learning.
The paper’s theme of Manifold-Constrained Hyper-Connections (mHC) marks an improvement to conventional hyper-connections and residual networks (ResNet), a fundamental mechanism underlying large language models (LLMs), showcasing the Chinese AI start-up’s continuous efforts to train powerful models with limited computing resources.
In the paper, a team of 19 DeepSeek researchers said they tested mHC on models with 3 billion, 9 billion and 27 billion parameters and found it scaled without adding significant computational burden.
The paper, published on January 1, immediately triggered interest and debate among developers despite its dense technical details.
Quan Long, professor of the Hong Kong University of Science and Technology, said the new findings were “very significant for transformer architecture made for LLMs”. Quan said he was “very excited to see the important optimisation from DeepSeek which has already revolutionised the LLM in efficiency”.
The paper comes at a time when most AI start-ups have been focusing on turning AI capabilities in LLMs into agents and other products. DeepSeek, a side project for Liang’s quant trading firm, however, has been seeking improvements in the basic technical mechanisms of how machines learn from data.