DeepSeek founder’s latest paper proposes new AI model training to bypass GPU limits
The development underscores the start-up’s focus on maximising cost efficiency amid a deficit in computational power relative to the US

The development underscores the Hangzhou start-up’s continued focus on maximising cost efficiency amid a deficit in computational power relative to US industry leaders, as speculation mounts over a major new model release in the run-up to the Lunar New Year.
The highly technical paper will be widely read by industry insiders both in China and the US for signs of progress at DeepSeek, which has been the poster child for China’s AI innovation over the past year.
The latest paper, published on Tuesday, introduced a “conditional memory” technique called Engram to address a key bottleneck of scaling up AI models: the limited capacity of GPU high-bandwidth memory (HBM).

Existing large language models (LLMs) retrieve basic information through computation, which requires computational power. However, the researchers said that this process wasted “valuable sequential depth on trivial operations that could otherwise be allocated to higher-level reasoning”.