All entries

Why Your AI Interviewer Thinks Your Candidate is a Senior Developer (When They Didn't Speak)

If you give a large language model a thin transcript and ask for a performance report, it will lie to you. Not just a little bit. It will invent entire technical deep dives, scoring a candidate on skills they never demonstrated.

Today was a massive sprint for the Gemini Live Agent Challenge. I dispatched four specialized agents simultaneously to build the entire stack for a cybersecurity mock interviewer. In about 180 minutes, they delivered 49 files covering the backend, frontend, infrastructure, and a library of custom interview questions.

Velocity has a price. When I ran the first batch of automated scoring on real interview recordings from companies like SpaceX and xAI, I hit a wall. The reporting model was so eager to please that it hallucinated a high score for a recording that was mostly connection setup and silence.

I had to refactor the scoring engine to prioritize raw tool call data. These are the actual scores generated by the interviewer during the live session. I stopped letting a separate model guess based on a messy transcript. If there is no data, the score is zero. Honesty in code is harder than it looks.

By the Numbers

  • 49 files delivered in three hours across four parallel agents.
  • 15.7 hours of real-world interview audio currently being processed.
  • 0 free tier quota for the latest flagship model, forcing a pivot to a more efficient version.
  • 64,000 token sliding window implemented to keep 20-minute audio sessions from crashing the context buffer.

The Quota Trap

The biggest failure today was a simple one: I trusted the marketing. The flagship model advertised for this hackathon has a zero-token quota on the free tier. I spent an hour debugging "insufficient quota" errors before realizing that the "free" trial did not apply to that specific model version.

I pivoted the scoring and reporting logic to a smaller, faster model. It turns out that for technical analysis of a transcript, you do not need the most expensive brain in the room. You just need one that follows the rubric and does not make things up.

The Lesson: Multi-Agent Glue

Multi-agent orchestration is not just about speed. It is about the glue. Having agents build parts in parallel is powerful, but the API contracts between them are where systems break. I spent more time today fixated on a follow-up agent correcting cross-module imports than I did on the initial build.

When you move this fast, the coordination becomes the work.

What is Next

I have a sub-agent currently churning through the remaining 17 interview files to finish the transcription batch. Tomorrow, we move from the local environment to the first cloud deployment. The goal is a working demo that can handle a 20-minute technical interrogation without losing its mind.