Intelligence DispatchesJune 17, 202612 min read

You Don't Need Fable 5 to Work on the Frontier. You Need a Model Routing System.

Kilo is right that model freedom matters after the Fable/Mythos shock. The next step is a routed frontier stack: Fable 5, Opus 4.8, GPT-5.5 in Kilo Code, Grok 4.3, Gemini 3.5 Flash in Antigravity, Kimi, MiniMax, and Nemotron with receipts.

Frank

AI Architect & Creator Systems Builder

Share Share

You Don't Need Fable 5 to Work on the Frontier. You Need a Model Routing System.

TL;DR: Kilo's post is directionally right: if one frontier model disappears, refuses, rate-limits, gets policy-constrained, or becomes too expensive, production cannot stop. But "use more models" is only half the operating system. The real move is a routed model stack: primary model, fallback model, cheap model, open/sovereign model, cost estimate, and evidence receipt. Fable 5 is still a serious execution ceiling. It is not a strategy by itself.

Frontier work without a single frontier model dependency

Kilo published a useful field note: You Don't Have to Use Fable and Mythos to Work on the Frontier. Their argument is that model freedom matters when the flagship you planned around becomes unavailable. I agree with the frame. I would sharpen the lesson:

Availability is now a capability.

A model can be brilliant and still be the wrong default if it cannot be relied on under your access, policy, latency, cost, data, or refusal constraints. The frontier is not just "which model is smartest?" It is "which model can finish this job, under these constraints, with evidence?"

That is the difference between a model list and a model routing system.

What Kilo Got Right

Kilo's post names the operational truth plainly: the frontier is bigger than one lab. They point teams toward GPT-5.5, Nemotron 3 Ultra, MiniMax M3, and Kimi K2.7 Code as alternative lanes when Fable/Mythos access is disrupted or not the right fit.

The strongest part is not any single benchmark. It is the instinct: do not let a production workflow depend on one model string.

Where I would go further:

Kilo's useful point	The Starlight upgrade
Model freedom keeps work moving	Route by task, cost, policy, and evidence
Kilo Gateway exposes many models	Every model needs a role, fallback, and receipt
Alternative models are getting strong	Strength only matters inside your harness
Frontier work can continue without Fable	Frontier work should never depend on one model

The mistake would be replacing Fable dependence with random-model buffet behavior. The better pattern is boring and powerful: define lanes.

What The Fable/Mythos Shock Actually Teaches

Kilo frames the Fable/Mythos moment as an availability shock. Anthropic's own docs describe Fable 5 as the generally available version of the new Claude line and Mythos 5 as the limited-availability version through Project Glasswing. The docs also make the architectural point product teams should care about: Fable 5 can refuse, those refusals return as successful HTTP 200 responses with stop_reason: "refusal", and integrations should plan fallback behavior.

That is not a footnote. That is systems design.

If your workflow has a single hard-coded primary model and no fallback contract, three things are true:

You do not have production model routing.
You do not have a measurable cost-of-error strategy.
You do not know whether your "best model" is best for your actual job.

The right response to a powerful new model is not hype. It is a receipt.

What Our Arena Receipts Show About Fable 5

We ran the Starlight Model Arena inside Claude Code after Fable 5 landed. The raw JSON receipts live in the Starlight arena runs directory, and the live research page is here: Starlight Model Arena.

The short version:

Lane	Current read
Strict output contracts	Fable 5 is the best Claude lane we measured
Judgment-heavy review	Opus 4.8 looked stronger in the stress/work-sample rounds
Accessible component craft	Opus 4.8 won the work-sample task
ACOS skill authoring	Fable 5 won the work-sample task
Heavy task load	Even Fable can slip; enforce contracts structurally
Grok Composer lane	Directional pass, not a head-to-head yet

That is the nuance missing from most model discourse. Fable 5 is not simply "better." It is better at specific jobs. Opus 4.8 is not obsolete. It is better at different jobs. Grok 4.3 is not proven as the same ceiling. It may still be the better volume lane. Gemini 3.5 Flash is not a generic also-ran. It is Google's official GA agentic runtime for Antigravity and AI Studio.

Timeline of Starlight Model Arena receipts

The Practical Frontier Stack

Here is how I would route the current field before a team commits serious volume.

Job	Primary	Fallback	Cheap / open lane	Evidence label
Strict schema agent	Fable 5	Opus 4.8	Gemini 3.5 Flash	Starlight receipt + vendor docs
Judgment review	Opus 4.8	Fable 5	Sonnet/Haiku tier	Starlight receipt
Kilo Code terminal agent	GPT-5.5	Kimi K2.7 Code	MiniMax M3	Kilo-sourced, verify locally
Google agent workflow	Gemini 3.5 Flash	Gemini 3.5 Pro when GA	Gemma 4	Google official
Fast volume work	Grok 4.3	Gemini Flash	DeepSeek / Kimi	xAI docs + Starlight directional lane
Sovereign / self-host	Nemotron 3 Ultra	Mistral / DeepSeek	local deployment	Kilo-sourced + provider docs
Content and research draft	Opus 4.8	GPT-5.5	Grok 4.3 / Gemini Flash	internal receipt needed
Creator system build	Fable 5	Gemini Flash / Opus	Kimi / DeepSeek	receipt per workflow

This is not meant to be permanent. It is meant to be operational. A serious AI Center of Excellence should refresh this after every model release, price change, or workflow failure.

The Kilo Code Lane: How To Use It Without Betting The Company

Kilo's docs show the important setup pattern: configure a provider, choose models through that provider, and use provider routing fields where supported. Their OpenRouter page specifically describes adding OpenRouter in Kilo Code and using kilo.json for CLI configuration, with OpenRouter routing options passed through under model options.

I would use Kilo Code as a model-freedom harness in five steps:

Choose the provider lane. Start with Kilo Gateway or OpenRouter. For governance-sensitive work, decide whether BYOK, provider restriction, or data-collection denial is required.
Pick three models, not one. A primary, a fallback, and a cheap lane. For example: GPT-5.5 primary, Kimi K2.7 Code fallback, MiniMax M3 cheap lane.
Run the same task card. Same prompt, same repo, same acceptance checks, same time box.
Capture the receipt. Record model, provider, cost estimate, attempts, test results, failures, and what the model did wrong.
Promote the route, not the vibe. The winner gets the lane for that task class only. Different task class, different receipt.

A Kilo/OpenRouter routing shape should look conceptually like this:

{
  "provider": {
    "openrouter": {
      "models": {
        "openai/gpt-5.5": {
          "options": {
            "provider": {
              "sort": "throughput",
              "data_collection": "deny"
            }
          }
        }
      }
    }
  }
}

Do not copy that as a magic production config. Treat it as the pattern: provider, model, provider options, governance preferences. Check the current Kilo docs before shipping exact syntax because Kilo's CLI and extension surface is moving quickly.

The Models Kilo Mentioned, With Better Labeling

Kilo's article mentions several strong alternatives. Good. But each one belongs in the routing graph with a source label.

GPT-5.5

Kilo positions GPT-5.5 as the Kilo CLI driver and cites Terminal-Bench and KiloBench numbers. I would treat it as the Kilo Code primary lane until your own receipt says otherwise. It is especially interesting for terminal-agent workflows where the coding surface, model, and provider routing all live in one operational loop.

Evidence label: Kilo-sourced claims + your own Kilo Code receipts required.

Nemotron 3 Ultra

Kilo frames Nemotron 3 Ultra as the open-weight sovereignty story: self-hosting, hardware control, and a large open lane for teams worried about hosted-model access. That matters. It is not just "cheaper." It is a governance route.

Evidence label: Kilo-sourced claims + NVIDIA/provider docs + self-host proof required.

MiniMax M3

MiniMax M3 is the cost-efficiency lane in Kilo's post. The reason to test it is not prestige; it is simple: if it can pass your coding card at the claimed price, it can absorb a lot of daily agent volume.

Evidence label: Kilo-sourced benchmark/pricing claims + OpenRouter/provider check + local receipt.

Kimi K2.7 Code

Kimi K2.7 Code is the most interesting Kilo-mentioned fallback for MCP/tool-heavy coding loops. Kilo cites a Modified MIT license, 256K context, and MCP Mark Verified performance. That makes it a good candidate for a daily-driver fallback lane.

Evidence label: Kilo-sourced claims + OpenRouter/provider check + local receipt.

The Extra Lanes I Would Add

Kilo's model list is useful, but for FrankX/Starlight I would add two more lanes immediately.

Grok 4.3: the fast frontier volume lane

xAI's current docs say Grok 4.3 is the model to use for chat/general text work, with dedicated APIs for coding, images, video, and voice. That makes it useful as a routing lane even before you treat it as a ceiling model. Our local Grok Composer receipt is directional only, but it passed two self-verifying agentic tasks on first attempt and produced a visual-composition artifact.

My routing call: Grok 4.3 is a strong candidate for fast, cost-sensitive, error-tolerant frontier work. Do not use it as your untested replacement for Fable 5 on error-expensive coding. Run the receipt first.

Gemini 3.5 Flash + Antigravity: the Google agent runtime

Google's Gemini 3.5 announcement is unusually relevant for builders because 3.5 Flash is explicitly positioned for agentic workflows, coding, Antigravity, AI Studio, Android Studio, Gemini Enterprise, and multi-step work. Google says 3.5 Flash is generally available and that 3.5 Pro is the heavier tier still rolling out.

My routing call: Gemini 3.5 Flash belongs in the primary/fallback set for Antigravity and Google-cloud agentic workflows. Gemini 3.5 Pro should be watched closely, but do not architect the production route around a not-yet-verified Pro rollout.

The Operating Principle

The real frontier workflow looks like this:

Decide the task class.
Pick the primary model.
Pick the fallback model.
Pick the cheap/open lane.
Run the same task across the lanes.
Capture the receipt.
Promote the route only for that task class.

That is how founders, startups, and enterprise teams should think about models now. You are not shopping for a favorite assistant. You are designing an intelligence supply chain.

The core question becomes:

If this model became unavailable tomorrow, would the workflow pause, degrade gracefully, or route intelligently?

If the answer is "pause," you do not have a frontier stack. You have a dependency.

The FrankX / Starlight Standard For This

I would not call this the "FrankX standard" as a brand flex. The better name is the Starlight Evidence Routing Protocol:

Layer	Standard
Capability	What can the model do on this task class?
Availability	Can we access it reliably under current policy, geography, and account constraints?
Economics	What is the cost per accepted output, not just per token?
Governance	Can this workload legally and safely run through this provider?
Fallback	What happens on refusal, rate limit, outage, or price spike?
Receipt	What proof do we have from our own harness?

That is the foundation real AI labs use: measured behavior, repeatable harnesses, explicit caveats, and update loops. No leaderboard worship. No fake certainty.

What To Do This Week

If you operate a team, do this now:

Take one real coding task, one review task, and one content/research task.
Run each through your current primary model.
Re-run each through one Kilo Code alternative, one Grok/Gemini lane, and one open/cheap lane.
Record test results, cost, time, refusal behavior, and failure mode.
Update your routing table.

If you want the exact evidence trail behind our current calls, start here:

The frontier did not disappear when one model became uncertain. It got more interesting. The teams that win now will not be the ones with the most model takes. They will be the ones with the cleanest routing loops.

FAQ

Do I still use Fable 5?

Yes, where the job deserves it. Fable 5 remains the strict-contract, agentic-execution ceiling in our current Claude receipts. Do not use it blindly for every step.

Is Kilo Code enough by itself?

No. Kilo Code is a strong coding surface and model-access layer. You still need task cards, acceptance checks, cost tracking, and receipts.

Should Grok 4.3 replace Fable 5?

Not as a blanket rule. Grok 4.3 is a strong candidate for fast, cost-sensitive frontier work. For error-expensive coding, run a head-to-head receipt before promoting it.

Is Gemini 3.5 Flash production-ready?

Google says Gemini 3.5 Flash is generally available through Antigravity, Gemini API in AI Studio, Android Studio, Gemini Enterprise Agent Platform, and Gemini Enterprise. That makes it legitimate to test as a production lane. Validate it on your own workflows.

What is the main takeaway?

Do not ask "what is the best model?" Ask "what is the best routed model stack for this job, under these constraints, with receipts?"

By Frank - building the Starlight Model Intelligence Graph for founders, creators, startups, and enterprise teams that need frontier AI to keep working when the model landscape moves.

Get Started

Build your first AI system

Step-by-step guide to setting up ACOS, creating your first agent, and shipping real products with AI.

Start building

Templates & Blueprints

Production-ready architecture

Download AI architecture templates, multi-agent blueprints, and prompt engineering patterns.

Browse templates

Inner Circle

Join the builder community

Connect with creators and architects shipping AI products. Weekly office hours, shared resources, direct access.

Join the circle

Stay in the intelligence loop

Weekly field notes on AI systems, production patterns, and builder strategy.

Continue Reading

Intelligence Dispatches6 min read

The AI Model Routing Guide: Which Model for Which Agent (Q2 2026 Edition)

The working AI Architect's routing matrix as a narrative: which frontier model runs your coding agents, review gates, fan-out workers, and sovereign stacks — with prices, evidence grades, and the persona map. Refreshed every quarter and after every arena round.

Read article

Intelligence Dispatches9 min read

Claude Fable 5: Benchmarks, Pricing, and What Four Day-One Evals Actually Show

Anthropic released Claude Fable 5 on June 9, 2026 — a Mythos-class model made generally available. Launch benchmarks: 95% SWE-bench Verified, ~80% SWE-bench Pro. We ran four first-party eval rounds against Opus 4.8 in Claude Code within 24 hours. Here are the receipts, the pricing math, and the routing guide.

Read article

Intelligence Dispatches6 min read

How to Run Your Own LLM Evals in Claude Code (No Eval Platform Required)

The complete tutorial for head-to-head model evals inside Claude Code: per-spawn model overrides, ground truth before dispatch, self-verifying tasks, blind judging, and JSON receipts. The exact harness behind our Fable 5 vs Opus 4.8 rounds.

Read article

Intelligence DispatchesJune 17, 202612 min read

You Don't Need Fable 5 to Work on the Frontier. You Need a Model Routing System.

Frank

AI Architect & Creator Systems Builder

Share Share

You Don't Need Fable 5 to Work on the Frontier. You Need a Model Routing System.

Availability is now a capability.

That is the difference between a model list and a model routing system.

What Kilo Got Right

The strongest part is not any single benchmark. It is the instinct: do not let a production workflow depend on one model string.

Where I would go further:

Kilo's useful point	The Starlight upgrade
Model freedom keeps work moving	Route by task, cost, policy, and evidence
Kilo Gateway exposes many models	Every model needs a role, fallback, and receipt
Alternative models are getting strong	Strength only matters inside your harness
Frontier work can continue without Fable	Frontier work should never depend on one model

The mistake would be replacing Fable dependence with random-model buffet behavior. The better pattern is boring and powerful: define lanes.

What The Fable/Mythos Shock Actually Teaches

That is not a footnote. That is systems design.

If your workflow has a single hard-coded primary model and no fallback contract, three things are true:

You do not have production model routing.
You do not have a measurable cost-of-error strategy.
You do not know whether your "best model" is best for your actual job.

The right response to a powerful new model is not hype. It is a receipt.

What Our Arena Receipts Show About Fable 5

We ran the Starlight Model Arena inside Claude Code after Fable 5 landed. The raw JSON receipts live in the Starlight arena runs directory, and the live research page is here: Starlight Model Arena.

The short version:

Lane	Current read
Strict output contracts	Fable 5 is the best Claude lane we measured
Judgment-heavy review	Opus 4.8 looked stronger in the stress/work-sample rounds
Accessible component craft	Opus 4.8 won the work-sample task
ACOS skill authoring	Fable 5 won the work-sample task
Heavy task load	Even Fable can slip; enforce contracts structurally
Grok Composer lane	Directional pass, not a head-to-head yet

The Practical Frontier Stack

Here is how I would route the current field before a team commits serious volume.

Job	Primary	Fallback	Cheap / open lane	Evidence label
Strict schema agent	Fable 5	Opus 4.8	Gemini 3.5 Flash	Starlight receipt + vendor docs
Judgment review	Opus 4.8	Fable 5	Sonnet/Haiku tier	Starlight receipt
Kilo Code terminal agent	GPT-5.5	Kimi K2.7 Code	MiniMax M3	Kilo-sourced, verify locally
Google agent workflow	Gemini 3.5 Flash	Gemini 3.5 Pro when GA	Gemma 4	Google official
Fast volume work	Grok 4.3	Gemini Flash	DeepSeek / Kimi	xAI docs + Starlight directional lane
Sovereign / self-host	Nemotron 3 Ultra	Mistral / DeepSeek	local deployment	Kilo-sourced + provider docs
Content and research draft	Opus 4.8	GPT-5.5	Grok 4.3 / Gemini Flash	internal receipt needed
Creator system build	Fable 5	Gemini Flash / Opus	Kimi / DeepSeek	receipt per workflow

This is not meant to be permanent. It is meant to be operational. A serious AI Center of Excellence should refresh this after every model release, price change, or workflow failure.

The Kilo Code Lane: How To Use It Without Betting The Company

I would use Kilo Code as a model-freedom harness in five steps:

Choose the provider lane. Start with Kilo Gateway or OpenRouter. For governance-sensitive work, decide whether BYOK, provider restriction, or data-collection denial is required.
Pick three models, not one. A primary, a fallback, and a cheap lane. For example: GPT-5.5 primary, Kimi K2.7 Code fallback, MiniMax M3 cheap lane.
Run the same task card. Same prompt, same repo, same acceptance checks, same time box.
Capture the receipt. Record model, provider, cost estimate, attempts, test results, failures, and what the model did wrong.
Promote the route, not the vibe. The winner gets the lane for that task class only. Different task class, different receipt.

A Kilo/OpenRouter routing shape should look conceptually like this:

{
  "provider": {
    "openrouter": {
      "models": {
        "openai/gpt-5.5": {
          "options": {
            "provider": {
              "sort": "throughput",
              "data_collection": "deny"
            }
          }
        }
      }
    }
  }
}