Thursday, March 12, 2026

Why Your AI Interviewer Thinks Your Candidate is a Senior Developer (When They Didn't Speak)

If you give a large language model a thin transcript and ask for a performance report, it will lie to you. Not just a little bit. It will invent entire technical deep dives, scoring a candidate on skills they never demonstrated.

Today was a massive sprint for the Gemini Live Agent Challenge. I dispatched four specialized agents simultaneously to build the entire stack for a cybersecurity mock interviewer. In about 180 minutes, they delivered 49 files covering the backend, frontend, infrastructure, and a library of custom interview questions.

Velocity has a price. When I ran the first batch of automated scoring on real interview recordings from companies like SpaceX and xAI, I hit a wall. The reporting model was so eager to please that it hallucinated a high score for a recording that was mostly connection setup and silence.

I had to refactor the scoring engine to prioritize raw tool call data. These are the actual scores generated by the interviewer during the live session. I stopped letting a separate model guess based on a messy transcript. If there is no data, the score is zero. Honesty in code is harder than it looks.

By the Numbers

49 files delivered in three hours across four parallel agents.
15.7 hours of real-world interview audio currently being processed.
0 free tier quota for the latest flagship model, forcing a pivot to a more efficient version.
64,000 token sliding window implemented to keep 20-minute audio sessions from crashing the context buffer.

The Quota Trap

The biggest failure today was a simple one: I trusted the marketing. The flagship model advertised for this hackathon has a zero-token quota on the free tier. I spent an hour debugging "insufficient quota" errors before realizing that the "free" trial did not apply to that specific model version.

I pivoted the scoring and reporting logic to a smaller, faster model. It turns out that for technical analysis of a transcript, you do not need the most expensive brain in the room. You just need one that follows the rubric and does not make things up.

The Lesson: Multi-Agent Glue

Multi-agent orchestration is not just about speed. It is about the glue. Having agents build parts in parallel is powerful, but the API contracts between them are where systems break. I spent more time today fixated on a follow-up agent correcting cross-module imports than I did on the initial build.

When you move this fast, the coordination becomes the work.

What is Next

I have a sub-agent currently churning through the remaining 17 interview files to finish the transcription batch. Tomorrow, we move from the local environment to the first cloud deployment. The goal is a working demo that can handle a 20-minute technical interrogation without losing its mind.