Open Source Project of the Day (Part 8): NexaSDK - Cross-Platform On-Device AI Runtime for Running Frontier Models Locally
# opensource# llm# npu# python
WonderLab
Introduction "What if the latest AI models could run on your phone, on IoT devices, even...
Introduction
"What if the latest AI models could run on your phone, on IoT devices, even on edge devices — without needing to rely on the cloud?"
This is Part 8 of the "Open Source Project of the Day" series. Today we explore NexaSDK (GitHub).
Imagine running the Qwen3-VL multimodal model on an Android phone, using Apple Neural Engine for speech recognition on an iOS device, and running the Granite-4 model on a Linux IoT device — all without connecting to the cloud. That's the revolutionary experience NexaSDK delivers — bringing frontier AI models truly "down to earth" on all kinds of devices.
Why this project?
🚀 NPU-first: The industry's first NPU-first on-device AI runtime
📱 Full platform support: PC, Android, iOS, Linux/IoT all covered
🎯 Day-0 model support: Supports newly released models (GGUF, MLX, NEXA formats)
🔌 Multimodal capabilities: LLM, VLM, ASR, OCR, Rerank, image generation, and more
🌟 Community recognized: 7.6k+ Stars, collaborates with Qualcomm on on-device AI competitions
What You'll Learn
Core concepts and architecture design of NexaSDK
How to run on-device AI models on various platforms
Support and usage of NPU, GPU, and CPU compute backends
Integration and use of multimodal AI capabilities
Comparative analysis with other on-device AI frameworks
How to get started building on-device AI applications with NexaSDK
Prerequisites
Basic understanding of LLMs and AI models
Familiarity with at least one programming language (Python, Go, Kotlin, Swift)
Understanding of on-device AI basics (optional)
Basic knowledge of hardware acceleration like NPU, GPU (optional)
Project Background
Project Introduction
NexaSDK is a cross-platform on-device AI runtime supporting frontier LLM and VLM models on GPU, NPU, and CPU. It provides comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker) platforms.
Core problems the project solves:
On-device AI runtimes are fragmented, requiring different solutions for different platforms
Lack of native NPU support, unable to fully utilize hardware acceleration
After new model releases, on-device support lags (no day-0 support)
Multimodal AI capabilities are difficult to integrate on-device
Cross-platform development costs are high, requiring separate implementations per platform
Target user groups:
Developers building on-device AI applications
Mobile app developers wanting to leverage NPU acceleration
Developers needing to run AI models on IoT devices
Researchers interested in on-device AI
Author/Team Introduction
Team: NexaAI
Background: Team focused on on-device AI solutions
Partners: Collaborates with Qualcomm to host on-device AI competitions
Contributors: 45 contributors including @RemiliaForever, @zhiyuan8, @mengshengwu, and others
Philosophy: Enable frontier AI models to run efficiently on all kinds of devices
Project creation date: 2024 (based on GitHub commit history showing continuous activity)
Project Stats
⭐ GitHub Stars: 7.6k+ (rapidly and continuously growing)
🍴 Forks: 944+
📦 Version: v0.2.71 (latest version, released January 22, 2026)
📄 License: Apache-2.0 (CPU/GPU components); NPU components require a license
🏆 Competition: Nexa × Qualcomm On-Device AI Competition ($6,500 prize)
Project development history:
2024: Project launched, initial version released
2024-2025: Rapid development, multi-platform support added
2025: NPU support refined, collaboration with Qualcomm
2026: Continuous iteration, more models and feature support added
Supported models:
OpenAI GPT-OSS
IBM Granite-4
Qwen-3-VL
Gemma-3n
Ministral-3
And many more frontier models
Main Features
Core Purpose
NexaSDK's core purpose is to provide a unified cross-platform on-device AI runtime, enabling developers to:
Run AI models on multiple devices: PC, phones, and IoT devices all covered
Fully utilize hardware acceleration: Automatic selection from NPU, GPU, or CPU backends
Quickly integrate new models: Day-0 support, use new models as soon as they're released
Multimodal AI capabilities: Comprehensive support for text, images, audio, video, and more
Simplify development: Unified API, one codebase for all platforms
Use Cases
Mobile AI applications
Smart assistants on phones
Offline speech recognition and translation
Image recognition and processing
Local LLM chat applications
IoT and edge computing
AI capabilities for smart home devices
Intelligent analysis for industrial IoT
AI inference on edge servers
Perception capabilities for autonomous vehicles
Desktop application integration
Local AI assistant
Intelligent document processing
Code generation tools
Creative content generation
Enterprise applications
Data privacy protection (local processing)
Offline AI capabilities
Reduce cloud costs
Real-time response requirements
Research and development
Model performance testing
Hardware acceleration research
New model validation
Algorithm optimization experiments
Quick Start
CLI Method (Simplest)
# Install Nexa CLI# Windows (x64 with Intel/AMD NPU)# Download: nexa-cli_windows_x86_64.exe# macOS (x64)# Download: nexa-cli_macos_x86_64.pkg# Linux (ARM64)
curl -L https://github.com/NexaAI/nexa-sdk/releases/latest/download/nexa-cli_linux_arm64.sh | bash
# Run first model
nexa infer ggml-org/Qwen3-1.7B-GGUF
# Multimodal: drag and drop image to CLI
nexa infer NexaAI/Qwen3-VL-4B-Instruct-GGUF
# NPU support (Windows arm64 with Snapdragon X Elite)
nexa infer NexaAI/OmniNeural-4B
Python SDK
# Install
pipinstallnexaai# Usage example
fromnexaaiimportLLM,GenerationConfig,ModelConfig,LlmChatMessage# Create LLM instance
llm=LLM.from_(model="NexaAI/Qwen3-0.6B-GGUF",config=ModelConfig())# Build conversation
conversation=[LlmChatMessage(role="user",content="Hello, tell me a joke")]prompt=llm.apply_chat_template(conversation)# Streaming generation
fortokeninllm.generate_stream(prompt,GenerationConfig(max_tokens=100)):print(token,end="",flush=True)
Android SDK
// Add to build.gradle.ktsdependencies{implementation("ai.nexa:core:0.0.19")}// Initialize SDKNexaSdk.getInstance().init(this)// Load and run modelVlmWrapper.builder().vlmCreateInput(VlmCreateInput(model_name="omni-neural",model_path="/data/data/your.app/files/models/OmniNeural-4B/files-1-1.nexa",plugin_id="npu",config=ModelConfig())).build().onSuccess{vlm->vlm.generateStreamFlow("Hello!",GenerationConfig()).collect{print(it)}}