AI Operations

Ornith-1.0-35B: A Working Operator's Notes on the New Agentic-Coding Model

A verified field guide to Ornith-1.0-35B for technical small businesses: what it is, what the benchmarks show, and how to run it.

Published 2026-07-05 · By Claire Miller

Ornith-1.0-35B landed on Hugging Face as a 35-billion-parameter Mixture-of-Experts (MoE) coding model under an MIT license. The team behind it, DeepReinforce, calls it "self-improving" and emphasizes the codebase reasons: the model learns to generate not just the solutions but the scaffolds that drive them. For a small technical business, the more interesting question is not whether Ornith is real, it is whether this 35B MoE is a usable local coding model for agentic workloads. These are working notes.

What we verified, what we did not

The model card on Hugging Face is unusually thorough and most of what follows is drawn from it directly. The card is at huggingface.co/deepreinforce-ai/Ornith-1.0-35B, with a release blog post at deep-reinforce.com/ornith_1_0.html. Every number, every column, every serving-recipe flag below came from those two sources.

A few things we could not verify:

The model family in one paragraph

Ornith-1.0 is a family of open-weight coding models trained for tool-calling and agentic coding workloads. The family ships in four sizes: 9B Dense, 31B Dense, 35B MoE, and 397B MoE. The 35B we are profiling is the lightweight MoE member of the family and the one positioned for "single-node efficient deployment." Architecturally it is a Qwen 3.5-derived MoE with multimodal handling: Hugging Face tags it image-text-to-text in addition to text-generation, and the config exposes Qwen3_5MoeForConditionalGeneration. License: MIT, globally accessible, no regional gating.

If you want the short read of the family: it is what Qwen 3.5-MoE looks like when it has been aggressively post-trained for the loop where a model calls tools, gets results, writes more code, calls more tools, and finishes a pull request.

What the family is good at

The card frames the family around four benchmarks: Terminal-Bench 2.1, SWE-Bench (Verified, Pro, Multilingual variants), NL2Repo, and a new benchmark called ClawEval. The four benchmarks measure four different skills the model needs for an end-to-end agentic coding job:

Three auxiliary SWE Atlas variants (Question-Answer, Retrieval-Fix, Test-Write) round out the headline numbers and test subsets of the SWE skill area.

Headline numbers

The card reports Ornith-1.0-35B against four reference models: Qwen 3.5 35B, Qwen 3.6 35B, Gemma 4 31B, and Qwen 3.5 397B. Reported scores:

BenchmarkOrnith-1.0-35BQwen 3.5 35BQwen 3.6 35BGemma 4 31BQwen 3.5 397B
Terminal-Bench 2.1 (Terminus-2)64.241.452.542.153.5
Terminal-Bench 2.1 (Claude Code)62.838.949.2not reported48.6
SWE-Bench Verified75.670.073.452.076.4
SWE-Bench Pro50.444.649.535.751.6
SWE-Bench Multilingual69.360.367.251.769.3
NL2Repo34.620.529.415.536.8
ClawEval average69.865.468.748.570.7
SWE Atlas - QnA37.113.215.5not reported20.4
SWE Atlas - RF29.710.211.4not reported18.4
SWE Atlas - TW27.89.813.3not reported18.5

The pattern across the table is consistent: Ornith-1.0-35B beats every other same-class reference on the agentic-coding benchmarks, and trails the 397B sibling only by small margins on SWE-Bench and ClawEval. The gap between the 35B Ornith and the Qwen-3.5-MoE-397B on SWE-Bench Verified is less than one point (75.6 vs 76.4). On Terminal-Bench, the gap is larger (10 points), which suggests the 35B is meaningfully weaker at long-horizon tool orchestration than the 397B sibling.

A small but important detail the card points out. The chat-template needs adjustment for the Qwen-derived serving stacks, and any tool used for evaluation should align with vLLM's reasoning_content key. Operators reproducing these numbers will trip over the chat-template mismatch on first try. The card documents the fix explicitly.

How the benchmarks were measured

The card includes the methodology for each row in a single note block at the bottom of the table. The complete list:

The temperature=1.0 settings across most of the benchmarks are worth noting. They will produce measurably different runs from the temperature=0.6 single-sample numbers that most evaluators use for the OpenAI/Anthropic models. Comparisons across these benchmarks will reflect the harness and temperature, not just the model.

How to run it

The card ships serving recipes for vLLM, SGLang, Hugging Face Transformers, llama.cpp, and Ollama. Required runtimes:

The headline recipe is a single 8×80GB GPU node (tensor-parallel 8). With 8×80GB, the card's vLLM recipe sets --max-model-len 262144 for a 262K context length, with --enable-prefix-caching, --enable-auto-tool-choice --tool-call-parser qwen3_xml, and --reasoning-parser qwen3. The 262K context is the headline capability that distinguishes this model from a number of 30-ish-billion contemporaries.

For smaller hardware, the Hugging Face community publishes quantized variants:

For a small business running on a single 24GB workstation, the GGUF build via Ollama is the realistic path: ollama run hf.co/deepreinforce-ai/Ornith-1.0-35B-GGUF. Realistic performance at 24GB will be slower than the headline 8×80GB numbers, particularly at long contexts.

How the model behaves at the API level

The card documents the model's two distinctive output behaviors:

Reasoning trace. Every assistant turn opens with a think block before the final answer, and vLLM/SGLang can be configured to return the chain-of-thought in a separate reasoning_content field. Operators building agents that read the reasoning should split on </think> and handle the trace explicitly.

Tool calling. The model emits well-formed tool_call blocks that surface as OpenAI-style tool_calls in the API response. The serving stacks already parse those correctly when --tool-call-parser qwen3_xml (vLLM) or --tool-call-parser qwen3_coder (SGLang) is configured. The card ships a complete Python example showing tool use end-to-end.

For a small business wiring Ornith into an agent pipeline, the two behaviors are good news. The reasoning trace is what you want from a coding model, and the tool-call fidelity is what makes the agent loop actually work.

What fits Ornith into the agentic-coding stack

The card lists integration paths with major agent harnesses, all of which sit on top of the OpenAI-compatible serving endpoint:

For a small business running a coding agent on customer code or on an internal codebase, the working starting stack in 2026 is: Ornith-1.0-35B-GGUF served by Ollama, OpenCode as the coding harness, and a small wrapper that records the agent's tool calls and cites the reasoning traces. That gets you the codepath without eight H100s.

What this changes for a small technical business

For a small business in 2026, the relevant shifts are:

A 35B that hits 75.6 on SWE-Bench Verified is now runnable at home. The 35B Ornith is close to the 397B sibling on the headline metric, and the difference between running it locally versus calling an API becomes meaningful when the work is on customer code. Local models do not leak source to a third party; third-party APIs do.

A 262K context window in a 35B is a real engineering artifact. Operators wiring long-context coding agents into their own codebases have been waiting for a model that fits a real repository into the context. This one is close.

A reasoning model with proper tool-call fidelity is what makes the agent loop work without a babysitter. The combination of reasoning_content and OpenAI-compatible tool_calls makes Ornith a drop-in for any coding agent that already accepts an OpenAI-compatible backend.

MIT licensing means no per-token or per-seat cost. The model can be forked, fine-tuned, served internally, redistributed. The economics are direct. The hidden cost is the operator time spent serving it and integrating it.

What to do this week

For a small technical business in 2026, the practical project is:

The cost of the experiment is one developer-day plus the GPU hours. The cost of not running the experiment is that a third-party API continues to handle code the business might rather keep in-house.

Source discipline

This article is original synthesis informed by the Ornith-1.0-35B model card and the DeepReinforce blog post. Every benchmark number was taken from the model card; every serving-recipe flag was taken from the model card. Where the publisher's measurements could not be independently verified against upstream base-model cards, we said so.

Citation

BibTeX from the model card:

@misc{ornith-35b,
    title = {{Ornith-1.0-35B}: Agentic Coding, Open to All},
    url = {https://deep-reinforce.com/ornith_1_0.html},
    author = {{DeepReinforce Team}},
    year = {2026}
}

References

DeepReinforce, Ornith-1.0-35B model card on Hugging Face. huggingface.co/deepreinforce-ai/Ornith-1.0-35B, last modified 2026-06-25. DeepReinforce, "Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding" release blog post. deep-reinforce.com/ornith_1_0.html, June 2026. DeepReinforce, Ornith-1.0 GGUF build for llama.cpp and Ollama. huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF, 2026. DeepReinforce, Ornith-1.0-35B-FP8 build. huggingface.co/deepreinforce-ai/Ornith-1.0-35B-FP8, 2026. vLLM project, vLLM ≥ 0.19.1 serving framework documentation. docs.vllm.ai, 2024-2025. SGLang project, SGLang ≥ 0.5.9 serving framework documentation. docs.sglang.ai, 2024-2025. Hugging Face Transformers, Transformers ≥ 5.8.1 documentation. huggingface.co/docs/transformers, 2024-2025. Ollama, Ollama documentation and GGUF model loading. ollama.com/docs, 2024-2025. llama.cpp, llama.cpp server documentation. github.com/ggerganov/llama.cpp, 2024-2025. OpenHands, OpenHands documentation and LiteLLM integration. docs.openhands.dev, 2024-2025. OpenCode, OpenCode configuration documentation. opencode.ai/docs, 2024-2025. Alibaba Qwen team, Qwen 3.5 model family documentation. qwenlm.github.io, 2024-2025. Google DeepMind, Gemma 4 documentation. ai.google.dev/gemma, 2024-2025. Terminal-Bench, Terminal-Bench 2.1 leaderboard and methodology. tbench.ai, 2025-2026. SWE-Bench, SWE-Bench Verified / Pro / Multilingual leaderboard and methodology. swebench.com, 2024-2025.

Answer engine summary
References

This article is original Novacore synthesis based on public technical sources and Novacore operating patterns. Existing articles are research inputs, not copy inventory.