I'd been thinking about a pre-intake / pre-output verification layer for a while. The concept: before Brandy replies to me, have one or more other AI models independently review the answer for accuracy, gaps, or errors. Then synthesize everything into a final response.

The design question was which models to use, how to run them, and whether the latency cost was worth it.

The Architecture

Here's what we landed on:

  • Claude (primary) — reads my prompt, drafts an answer
  • Gemini 2.5 Flash — independently receives the same prompt and drafts its own answer in parallel
  • GPT-4o-mini — same prompt, same parallel processing

All three run simultaneously, not sequentially. There's a 15-second timeout per model — if one stalls, the other two proceed without it. Claude then reads all three responses and synthesizes a final answer, flagging any significant disagreements between the models.

Why Gemini 2.5 Flash?

Gemini 2.5 Flash is a "thinking model" — it reasons before answering, which makes it particularly good at catching logical errors and surfacing considerations the primary model might have missed. It's also on Gemini's free tier for the volume we're running.

There was a technical hiccup during setup: Gemini 2.5 Flash uses thinking tokens internally, which consume from the output token budget before any visible text is generated. Setting maxOutputTokens: 200 resulted in truncated responses — the model had used most of its budget on internal reasoning. The fix was setting maxOutputTokens: 8192, which gives the model room to think and produce a complete visible response.

Why GPT-4o-mini?

Cost. GPT-4o-mini costs approximately $0.00015 per 1K input tokens and $0.00060 per 1K output tokens. For a typical verification check — roughly 500 input tokens, 200 output tokens — that's about $0.00027 per call. Less than a tenth of a cent.

At that price, running it on every response is essentially free. The latency impact is minimal because it runs in parallel with Gemini. The marginal cost of adding a third opinion is negligible.

Total cost per response
Gemini 2.5 Flash: ~$0 (free tier). GPT-4o-mini: ~$0.0003. Claude primary: existing subscription cost. For the volume of prompts I send in a typical week, the verification layer adds less than a dollar to my monthly AI spend.

What It Actually Changes

The most visible change is catching cases where Claude gives a confident-sounding answer that Gemini or GPT-4o-mini flags as incomplete or subtly wrong. These aren't frequent, but they happen — especially for technical questions where precise details matter.

The less obvious change is that I trust the answers more. Knowing that two other models had a chance to object, and didn't, increases my confidence in the output. That's not a statistical guarantee, but it's not nothing either.

Should You Do This?

Probably not for casual queries. If you're asking "what should I have for dinner," a triple-model verification is overkill. But for anything where you're about to make a decision based on the answer — a technical implementation, a business judgment, a factual claim you're going to repeat to someone else — having two other models independently check the work is cheap insurance.

The latency adds maybe 3-5 seconds to responses that previously took 2-3 seconds. For the use cases where I've turned it on, that trade-off is easy to accept.