the more I read such writing, the more my respect grows for those few rare authors who deeply understand a complex subject and write on it with an earnest effort to make the language as simple and accessible as possible.
Some models exhibit sine-wave behaviour over time as dialogue continues with a bias towards recency. This may be considered as an X Axis of number of inquiries, and Y Axis of recency bias. At first, naturally responses are based on the original queries in a steep rise. Then the model draws deeply on sources and instructions in a levelling-off of recency bias. As dialogue continues however, outputs increasingly reflect most recent responses and forget earlier interactions, in a steep rise of recency bias, until continuation of a given thread becomes tedious.
That's good insight, Paul! I wonder if recency bias is problematic, though. Considering that these are socio-technical systems and humans may exhibit recency bias naturally, would it make sense for us then to consider recency bias in LLMs 'normal' and 'expected', and not a cause for concern?
Indeed, LLMs have a big problem with predictability and accuracy.
The next logical step is integration with honest models, beyond token prediction.
LLM would need to diligently go through steps, run the tools that are needed to get the work done, and can evaluate if it made a mistake, which often requires more tools.
Excellent piece. The lack of predictability of LLMs make me think that ai won’t be taking all the jobs any time soon. Enterprise adoption of ai will also be much slower than many venture capitalists and Silicon Valley entrepreneurs expect.
>This creates an asymmetric trust problem, especially since you can’t verify everything. What it needs is a new way to think about “how should we accomplish [X] goal” rather than “how can we automate [X] process”.
This is a great point, and lines up with Jack Clark and Dean Ball's points about how limited liability may require human-in-loop if only for the purpose of accountability. The whole point of many processes is to act as a liability shield. Its purpose is not to maximally accomplish some goal, but rather to eliminate the very edge cases that LLMs are prone to make.
I do think you are right, businesses need to look at LLMs as goal-directed, but I have to imagine all LLMs are going to need a System Prompt of carefully crafted legal responsibilities. Those system prompts are in a way the new terms of service.
the more I read such writing, the more my respect grows for those few rare authors who deeply understand a complex subject and write on it with an earnest effort to make the language as simple and accessible as possible.
Some models exhibit sine-wave behaviour over time as dialogue continues with a bias towards recency. This may be considered as an X Axis of number of inquiries, and Y Axis of recency bias. At first, naturally responses are based on the original queries in a steep rise. Then the model draws deeply on sources and instructions in a levelling-off of recency bias. As dialogue continues however, outputs increasingly reflect most recent responses and forget earlier interactions, in a steep rise of recency bias, until continuation of a given thread becomes tedious.
That's good insight, Paul! I wonder if recency bias is problematic, though. Considering that these are socio-technical systems and humans may exhibit recency bias naturally, would it make sense for us then to consider recency bias in LLMs 'normal' and 'expected', and not a cause for concern?
We might call it "Recenility" in LLMs. 😆
It's a detectable concern and the quality of outputs seems to degrade- more so than learned humans. Advantage human?
Fair point and good observation! and "Recentility" - I like it 😂
Indeed, LLMs have a big problem with predictability and accuracy.
The next logical step is integration with honest models, beyond token prediction.
LLM would need to diligently go through steps, run the tools that are needed to get the work done, and can evaluate if it made a mistake, which often requires more tools.
This will not be cheap at all.
nailed it. most teams are still asking “how can we plug in AI?” when they should be asking “what fails when it does?”
at extensity we learned the hard way: you don’t deploy agents, you rewire workflows. you don’t predict outcomes, you pressure test systems.
LLMs don’t behave like software. they behave like overconfident interns. powerful, useful—if you wrap them in structure.
As always, enjoy your writing. Refreshing to read thoughtful writing on LLMs.
Excellent piece. The lack of predictability of LLMs make me think that ai won’t be taking all the jobs any time soon. Enterprise adoption of ai will also be much slower than many venture capitalists and Silicon Valley entrepreneurs expect.
>This creates an asymmetric trust problem, especially since you can’t verify everything. What it needs is a new way to think about “how should we accomplish [X] goal” rather than “how can we automate [X] process”.
This is a great point, and lines up with Jack Clark and Dean Ball's points about how limited liability may require human-in-loop if only for the purpose of accountability. The whole point of many processes is to act as a liability shield. Its purpose is not to maximally accomplish some goal, but rather to eliminate the very edge cases that LLMs are prone to make.
I do think you are right, businesses need to look at LLMs as goal-directed, but I have to imagine all LLMs are going to need a System Prompt of carefully crafted legal responsibilities. Those system prompts are in a way the new terms of service.
Thought-provoking piece as always.