Advertisement
Artificial intelligence
TechTech Trends

DeepSeek founder’s latest paper proposes new AI model training to bypass GPU limits

The development underscores the start-up’s focus on maximising cost efficiency amid a deficit in computational power relative to the US

Reading Time:2 minutes
Why you can trust SCMP
3
The new technical paper will be widely read by industry insiders both in China and the US for signs of progress at DeepSeek. Photo: Shutterstock Images
Vincent Chow
A technical paper co-authored by Liang Wenfeng, the founder of Chinese artificial intelligence start-up DeepSeek, and a group of Peking University researchers has proposed a new model training technique, which they say can facilitate “aggressive parameter expansion” by bypassing graphics processing unit (GPU) memory constraints.

The development underscores the Hangzhou start-up’s continued focus on maximising cost efficiency amid a deficit in computational power relative to US industry leaders, as speculation mounts over a major new model release in the run-up to the Lunar New Year.

The highly technical paper will be widely read by industry insiders both in China and the US for signs of progress at DeepSeek, which has been the poster child for China’s AI innovation over the past year.

Advertisement

The latest paper, published on Tuesday, introduced a “conditional memory” technique called Engram to address a key bottleneck of scaling up AI models: the limited capacity of GPU high-bandwidth memory (HBM).

The DeepSeek app is seen on a smartphone with its founder Liang Wenfeng in the background in this arranged picture taken on May 23, 2025. Photo: Shutterstock Images
The DeepSeek app is seen on a smartphone with its founder Liang Wenfeng in the background in this arranged picture taken on May 23, 2025. Photo: Shutterstock Images

Existing large language models (LLMs) retrieve basic information through computation, which requires computational power. However, the researchers said that this process wasted “valuable sequential depth on trivial operations that could otherwise be allocated to higher-level reasoning”.

Advertisement
HBM represents one of China’s biggest AI hardware gaps with the US. According to Ray Wang, a Seoul-based analyst at SemiAnalysis, China’s memory champion ChangXin Memory Technologies (CXMT) was still several years behind industry leaders such as South Korea’s Samsung Electronics and SK Hynix, and Micron Technology of the US, despite steady progress in recent years.
Advertisement
Select Voice
Choose your listening speed
Get through articles 2x faster
1.25x
250 WPM
Slow
Average
Fast
1.25x