Why I Now Run Every Answer Through Three AI Models

What I built — without knowing the term for it — is called an agent council. It's an established multi-agent AI pattern: multiple models receive the same query in parallel, each producing an independent answer, and a synthesis layer combines their outputs into a final response. The term comes from research on "council mode" for mitigating hallucination and bias in LLMs.

The idea: different models, trained on different data with different architectures, develop complementary reasoning strategies. When they agree, confidence is high. When they diverge, that disagreement is itself useful signal — it flags exactly where uncertainty lives.

My Council Setup

Here's the specific configuration:

Claude (primary) — reads my prompt, drafts an answer and orchestrates the council
Gemini 2.5 Flash — independently receives the same prompt and drafts its own answer in parallel
GPT-4o-mini — same prompt, same parallel processing

All three run simultaneously, not sequentially. There's a 15-second timeout per model — if one stalls, the other two proceed without it. Claude then reads all three responses and synthesizes a final answer, flagging any significant disagreements between the models.

Why Gemini 2.5 Flash?

Gemini 2.5 Flash is a "thinking model" — it reasons before answering, which makes it particularly good at catching logical errors and surfacing considerations the primary model might have missed. It's also on Gemini's free tier for the volume we're running.

There was a technical hiccup during setup: Gemini 2.5 Flash uses thinking tokens internally, which consume from the output token budget before any visible text is generated. Setting maxOutputTokens: 200 resulted in truncated responses — the model had used most of its budget on internal reasoning. The fix was setting maxOutputTokens: 8192, which gives the model room to think and produce a complete visible response.

Why GPT-4o-mini?

Cost. GPT-4o-mini costs approximately $0.00015 per 1K input tokens and $0.00060 per 1K output tokens. For a typical verification check — roughly 500 input tokens, 200 output tokens — that's about $0.00027 per call. Less than a tenth of a cent.

At that price, running it on every response is essentially free. The latency impact is minimal because it runs in parallel with Gemini. The marginal cost of adding a third opinion is negligible.

Total cost per response

Gemini 2.5 Flash: ~$0 (free tier). GPT-4o-mini: ~$0.0003. Claude primary: existing subscription cost. For the volume of prompts I send in a typical week, the verification layer adds less than a dollar to my monthly AI spend.

What It Actually Changes

The most visible change is catching cases where Claude gives a confident-sounding answer that Gemini or GPT-4o-mini flags as incomplete or subtly wrong. These aren't frequent, but they happen — especially for technical questions where precise details matter.

The less obvious change is that I trust the answers more. Knowing that two other models had a chance to object, and didn't, increases my confidence in the output. That's not a statistical guarantee, but it's not nothing either.

Should You Do This?

Probably not for casual queries. If you're asking "what should I have for dinner," a triple-model verification is overkill. But for anything where you're about to make a decision based on the answer — a technical implementation, a business judgment, a factual claim you're going to repeat to someone else — having two other models independently check the work is cheap insurance.

The latency adds maybe 3-5 seconds to responses that previously took 2-3 seconds. For the use cases where I've turned it on, that trade-off is easy to accept.

The Agent Council: Why I Now Run Every Answer Through Three AI Models

My Council Setup

Why Gemini 2.5 Flash?

Why GPT-4o-mini?

What It Actually Changes

Should You Do This?