Advertisement

Attention deficit reorder: how China’s AI start-ups are rewiring the way models remember

Amid chip shortages, China’s AI start-ups are re-engineering their algorithms, hoping more efficient architecture will do the heavy lifting

Reading Time:2 minutes
Why you can trust SCMP
18
China’s AI developers are hoping that algorithmic changes will help its models close the gap on Western rivals. Photo: Shutterstock

As access to advanced chips narrows, Chinese AI developers are focusing on fixing an algorithmic bottleneck at the heart of large language models (LLMs) – hoping that more efficient architecture, not more powerful hardware, will help them steal a march on their Western rivals.

By experimenting with hybrid forms of “attention” – the mechanism that allows LLMs to process and recall information – start-ups such as Moonshot AI and DeepSeek aim to stretch limited computing resources, while keeping pace with global leaders.

Their work centres on redesigning the costly “full attention” process used by most LLMs, which compares every new token of data with all previous ones. As the number of tokens grows, this process becomes exponentially more demanding.

Advertisement

AI experts have identified this limited “attention budget” of LLMs as one of the key choke points in the development of powerful AI agents.

Chinese developers are now exploring hybrid “linear attention” systems that make comparisons with only a subset of tokens, dramatically reducing computational costs.

Advertisement

One of the latest examples is Moonshot AI’s Kimi Linear, released in late October, which introduced a hybrid “Kimi Delta Attention” (KDA) technique to combine both full and linear attention layers.

Advertisement
Select Voice
Choose your listening speed
Get through articles 2x faster
1.25x
250 WPM
Slow
Average
Fast
1.25x