Alibaba launches AI model that can process images and video on phones and laptops
The multimodal Qwen2.5-Omni-7B model is designed to run locally on mobile devices and tops rivals in some benchmarks

The company launched Qwen2.5-Omni-7B on Thursday as the latest addition to its Qwen family of models. With just 7 billion parameters, it is designed to run on mobile phones, tablets and laptops, making advanced AI capabilities more accessible to everyday users.
The company highlighted potential use cases such as assisting visually impaired users with real-time audio descriptions and providing step-by-step cooking guidance by analysing ingredients. The model’s versatility underscores the growing demand for AI systems that go beyond text generation.
Qwen2.5-Omni-7B has demonstrated strong performance in benchmark tests. It scored 56.1 on OmniBench, surpassing the 42.9 achieved by Google’s Gemini-1.5-Pro. It also outperformed Alibaba’s earlier Qwen2-Audio model in the CV15 audio benchmark, scoring one point higher with 92.4. For image-related tasks, it achieved 59.2 on the Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark, beating the Qwen2.5-VL vision-language model.