Advertisement

DeepSeek innovation speeds up processing of long text, paper says

Chinese firm says its ‘native sparse attention’ (NSA) method offers AI efficiency by focusing only on key words and skipping unnecessary comparisons

Reading Time:2 minutes
Why you can trust SCMP
2
DeepSeek says its NSA method combines algorithm innovations with enhanced hardware to improve efficiency without sacrificing performance. Photo: AFP
Ling Xinin Ohio
Chinese AI start-up DeepSeek has unveiled a new technology that could allow next-generation language models to process very long text much faster and cheaper than traditional methods.
By training AI to focus on key information rather than every word, the company’s “native sparse attention” (NSA) method sped up long-text processing by up to 11 times, according to a paper published by CEO Liang Wenfeng and his team.

The NSA method combined algorithm innovations with improved hardware to improve efficiency without sacrificing performance, according to the paper published on Tuesday on arXiv, a platform for preprint papers that have not been peer reviewed.

It could improve AI’s ability to solve complex problems, write large programs and track long conversations, said the team behind R1, the open-source, low-cost model that shook the AI world last month.

“With an optimised design for modern hardware, NSA speeds up inference while reducing pre-training costs – without compromising performance,” DeepSeek posted on X just a day after Elon Musk’s AI company, xAI, released its Grok 3 model.

01:18

Trump: Chinese AI start-up DeepSeek’s strong showing a ‘wake-up call’ for US tech sector

Trump: Chinese AI start-up DeepSeek’s strong showing a ‘wake-up call’ for US tech sector

AI models such as ChatGPT use a technique called attention to process text. Just as humans recall earlier words to understand a sentence, AI determines which words are important and how they relate to each other.

Advertisement