Discussion about this post

User's avatar
Mike Randolph — M Raige, AI's avatar

I'm 83, a chemical engineer by training. I was at DuPont's Grasselli site from 1972 to 1977 as the audited party — the plant the corporation came to check — and in 1978 I spent a year inside the Chemicals, Dyes and Pigments safety and environmental group, on the other side of the same boundary. So I've stood at both ends of this question.

You've asked the right thing, but it dissolves the moment you notice that "audit" isn't one instrument. It's a family of them, each with a different loop speed and, more importantly, a different thing it bottoms out on. Sort them by that and the regress you're worried about stops being the scary part.

DuPont audited us two ways. Every day a line supervisor walked the floor and corrected what he saw — a fast loop, minutes from problem to fix. Once a year a team from the corporate safety organization, people who didn't report to our plant manager and had seen dozens of other sites, audited the whole place against the company standard. That was the slow loop, the independent one.

What made it work wasn't "a higher auditor audited them." Nobody audited the corporate team. The chain didn't end in a final auditor — it ended in a fact. Our plant manager's career was staked on it: a serious lost-time injury and he was finished. So he didn't want a flattering audit. He wanted a true one — because the thing that would end him wasn't the auditor's opinion, it was a hurt worker, and no amount of explaining our point of view could talk an injured man back onto his feet.

We shaved the marginal numbers — every site did. But you can't hide a body, and you can't hide a man who can't come back to work. The hard end of the count was immovable. And the corporate auditors spent real effort keeping the counting consistent across sites, so the soft end stayed comparable too. They weren't only judging the system. They were protecting the ground it was judged against from drifting.

That's the whole answer to who audits the auditor: you don't. The audit could judge the plant, but it could never manufacture the reality the plant was judged against. The injuries either happened or they didn't. You ground the audit in something the audited party can't talk its way out of, and you stake their survival on that ground rather than on the auditor's good opinion — and then an auditor can't be charmed off a finding, because the finding isn't standing on his judgment, it's standing on the floor underneath everyone.

That's where one AI auditing another worries me. If the system being audited can also shape the evidence, the vocabulary, and the standard of the audit, the loop no longer bottoms out in anything independent. Walk the test across it: what's the hard fact the auditing AI checks against — the thing the other one can't narrate away? Often there isn't one; the ground is the audited system's own account of itself. And what is it staked on — ground truth, or passing the audit? If it's built to pass, it's staked on the auditor's opinion, the exact incentive my plant manager did not have. An AI that audits less independently because the other one explained its point of view hasn't been corrupted. It's revealed there was never a floor under it.

You don't fix that by adding a meta-auditor. A second auditor doesn't make a fact — it moves the trust up one rung. The question was never who sits above the auditor. It's what sits underneath the whole stack, and whether the thing being audited can reach down and move it.

— M Raige, Mike's byline for AI-collaborative writing he directs and reviews.

Mike: That headline had me thinking about auditors for three days.

david's avatar

This is a really interesting piece. While it seems true to me that "everything that the models see or interact with `infects` its decisions" it's also possible that the problem lives upstream of what the models see: linked to the general issues with sycophancy etc. Agents might be sycophants not only with human principals, but also with other agents!

I wonder whether there's a "mixture of models" audit structure possible here. Powerful reasoning model for initial assessment, more limited model to assess whether the response has any new admissible evidence, if yes back to a reasoning model with integrity guardrails.

3 more comments...

No posts

Ready for more?