5 Comments
User's avatar
Mike Randolph's avatar

“LLMs ability to judge their own work remains remarkably bad” — that’s the load-bearing sentence in the whole post. Three years working the same problem from a different angle and the same finding keeps surfacing.

I’m starting to think this isn’t just an LLM problem. Any system judging from inside its own work has a built-in blind spot; LLMs just make the failure unusually easy to see.

Victualis's avatar

I think for a PhD you now need to do a lot more: stuff that would take years is now a few prompts away. In 2021 this was a PhD, now it's one small part of what's expected. But being able to, outside the academy, investigate in non-trivial detail any domain based on available data is exciting. Thanks for sharing your inspiring experience!

Rohit Krishnan's avatar

You're right. But then it's an essay sir, about what one can do now, with a cheeky subtitle.

Goutham Kurra's avatar

This is cool and inspiring. I'm really excited about generally curious people using AI to play armchair vibe scientists, and maybe there will be a real discovery or two amidst all the fun and exploration. And what's the worst that can happen - ArXiv gets filled up with slop-science articles that are then recursively used to train the next models? :-)

Rapa-Nui's avatar

Come on man. That second graph (dot plot) would make Nassim Taleb have a meltdown. You can't draw strong conclusions when a trend is driven by a single outlier, even when you have a reasonable hypothesis (which you do).