Alibaba’s latest Qwen3.5 model family has landed on Ollama, bringing native vision-language capabilities to local AI development. Unlike previous generations that bolted image understanding onto text models, Qwen3.5 integrates both modalities from the ground up – making it a genuine multimodal foundation model.
What Makes Qwen3.5 Stand Out
Native Multimodal Architecture
Text and image understanding aren’t afterthoughts here. Qwen3.5 processes both input types natively within a single unified model, eliminating the clunky adapter layers that plagued earlier approaches.
Massive 256K Context Window
Every model in the lineup supports 262,144 tokens, enough to digest entire codebases, lengthy documents, or complex multi-turn conversations without losing context.
Unprecedented Language Coverage
With support for 201 languages and dialects (up from 100+ in Qwen3), this model speaks to a truly global developer community.
Thinking Mode Built-In
All models can toggle between deliberate “thinking” mode for complex reasoning and fast inference mode when speed matters most.
Production-Ready Tool Calling
Native function calling and strong agentic capabilities ship across all model sizes, from the smallest 0.8B variant to the flagship 397B.
Getting Started with Cloud Models
For maximum capability, Ollama’s cloud infrastructure lets you run the largest 397B version without local hardware constraints. This works seamlessly with tools like Claude Code and OpenClaw:
ollama launch claude --model qwen3.5:cloud
ollama launch openclaw --model qwen3.5:cloud
Choose Your Model Size
Large (Cloud-Only)
The 397B-A17B flagship model runs exclusively on Ollama’s cloud. It activates 17 billion parameters per token, delivering exceptional performance across reasoning, coding, and vision tasks.
ollama run qwen3.5:397b-cloud
Medium (Local + Cloud)
Three options balance speed and capability:
- 27B: A traditional dense model where all parameters activate on every pass, maximizing per-token reasoning density
- 35B-A3B: Activates only 3B parameters per token for faster inference while matching the performance of much larger legacy models
- 122B-A10B: The sweet spot with 10B active parameters and particularly strong tool use
ollama run qwen3.5:27b
ollama run qwen3.5:35b-a3b
ollama run qwen3.5:122b-a10b
Small (Local)
The 9B model is the default (ollama run qwen3.5) and fits comfortably on a 16GB GPU. It’s your starting point for local development with full multimodal, thinking, and tool calling support.
Even smaller variants (4B, 2B, and 0.8B) are available for resource-constrained environments:
ollama run qwen3.5:9b
ollama run qwen3.5:4b
ollama run qwen3.5:2b
ollama run qwen3.5:0.8b
What This Means for Developers
Qwen3.5’s arrival on Ollama democratizes access to frontier-class multimodal AI. The combination of native vision, extensive language support, and flexible sizing means you can prototype locally with the 9B model, then scale to cloud-hosted 397B for production workloads, all within the same Ollama ecosystem.
Whether you’re building code assistants, document analyzers, or multilingual chatbots, Qwen3.5 delivers the tools you need without vendor lock-in or API costs.
Download Ollama and get started today.