Open Source Breaking

Qwen 3.5 Models Have Known Issues in Ollama and LM Studio - Use llama.cpp or vLLM

March 3, 2026 2 min read

Image: Meta

What Happened

A March 3, 2026 post in r/LocalLLaMA warned users experiencing poor results with Qwen 3.5 models that the problem is framework compatibility, not the model itself. Reported symptoms in Ollama and LM Studio included excessively long chain-of-thought loops that never resolved into a final answer, broken tool call formatting, and garbage output on tasks that should be straightforward.

The poster identified specific technical causes: Ollama's lack of support for the presence_penalty parameter that Qwen 3.5 relies on to terminate chain-of-thought sequences, and structural differences in how various frameworks handle the model's architectural requirements. The recommended alternatives were llama.cpp's server directly, Transformers, vLLM, or SGLang.

Why It Matters

Ollama and LM Studio are the two most popular entry points for running local models, specifically because they abstract away technical complexity. Users who encounter broken Qwen 3.5 behavior in these frontends are likely to conclude the model is low quality and move on - missing that the issue is the framework, not the model.

This is a recurring pattern with newly released models that use architectural changes or parameter requirements not yet supported by mainstream frontends. Llama.cpp, being a more direct implementation, typically achieves compatibility faster than wrapper tools built on top of it. The popular frontends usually catch up within weeks, but the gap during initial release is long enough to mislead many evaluations.

For developers running Qwen 3.5 in any production or serious evaluation context, using the recommended frameworks from the start avoids generating misleading quality data about the model. Evaluating a model through a broken frontend is one of the more common sources of inaccurate model assessments in the local LLM community.

Our Take

If you are running Qwen 3.5 through Ollama and getting poor results - runaway chain-of-thought, broken tool calls, incoherent outputs - switch to llama.cpp server before concluding the model is not suitable for your use case. You are measuring the frontend's compatibility, not the model's capability.

For production local AI deployments more broadly, vLLM and SGLang are better choices than Ollama regardless of the model. They provide proper request batching, better memory management, more complete parameter support, and more predictable behavior with newer architectures. Ollama's ease of use trades off against these capabilities in ways that become visible when running architecturally complex or recently-released models. The compatibility gap with Ollama and LM Studio typically closes within two to four weeks of a major model release as maintainers push updates.

What Happened

Why It Matters

Our Take

More from today

Unsloth's Patched Qwen 3.5 35B-A3B Build Addresses Quality Issues, Shines on Research

Helsing's HX-2 AI-guided drones reported conducting deep-strike missions in Ukraine

Claude became the top free iOS app after ChatGPT uninstalls tied to OpenAI's DoD contract

Cookie Preferences