Attention deficit reorder: how China’s AI start-ups are rewiring the way models remember

Amid chip shortages, China’s AI start-ups are re-engineering their algorithms, hoping more efficient architecture will do the heavy lifting

Reading Time:2 minutes

China’s AI developers are hoping that algorithmic changes will help its models close the gap on Western rivals. Photo: Shutterstock

Published: 10:00pm, 12 Nov 2025Updated: 8:38am, 13 Nov 2025

As access to advanced chips narrows, Chinese AI developers are focusing on fixing an algorithmic bottleneck at the heart of large language models (LLMs) – hoping that more efficient architecture, not more powerful hardware, will help them steal a march on their Western rivals.

By experimenting with hybrid forms of “attention” – the mechanism that allows LLMs to process and recall information – start-ups such as Moonshot AI and DeepSeek aim to stretch limited computing resources, while keeping pace with global leaders.

Their work centres on redesigning the costly “full attention” process used by most LLMs, which compares every new token of data with all previous ones. As the number of tokens grows, this process becomes exponentially more demanding.

AI experts have identified this limited “attention budget” of LLMs as one of the key choke points in the development of powerful AI agents.

Chinese developers are now exploring hybrid “linear attention” systems that make comparisons with only a subset of tokens, dramatically reducing computational costs.

One of the latest examples is Moonshot AI’s Kimi Linear, released in late October, which introduced a hybrid “Kimi Delta Attention” (KDA) technique to combine both full and linear attention layers.

Select Voice

Choose your listening speed

Get through articles 2x faster

1.25x

250 WPM

Slow

Average

Fast

00:0000:00

1.25x