Strange Loop Canon

I find it interesting that Gemini coming well after other competitive products - and with everything Google has in terms of data, infrastructure, talent, good "process" (I assume) & an incentive to get this right - tripped so badly. I see this as Google's "New Coke" moment. For consumer facing AI products at the intersection of company values, technology & politics the go/no go criteria have to be defined very differently than say B2B applications. And the company culture influences these criteria so I'm very sympathetic to Ben Thomson's view that existing cuture will have to change which may not be possible with current leadership.

And I agree that Google was probably a bit unlucky; other AI companies will have the same hurdles to cross. Interesting times nevertheless!

Expand full comment

I'm not sure they had a good process personally. I'd venture they leant on their smarts instead of brute forcing solutions, which doesn't work with LLMs. Hence you end up speedrunning all the same mistakes.

Expand full comment

Rajesh Achanta

Mar 2, 2024

In which case my comment on culture (given the reputational risks involved after all the known issues with 'hallucinations' as well as legal challenges OpenAI/others are facing) is even more pertinent.

Expand full comment

David Kiferbaum

ChatGPT’s text-based answers seem generally more neutral. What is OpenAI doing right that Google is doing wrong?

Expand full comment

An enormous amount I imagine, gained from multiple experiments over multiple years and releases now

Expand full comment

Neeraj Krishnan

"When you yell AIs to be nicer, or focus on a diverse world, or to be law abiding, or to not say hateful things, these all interact with each other in weird ways" (tell?) and "bureaucracy meeting the urge to ship fast" I think are much needed notes of empathy for people and companies trying to solve hard problems.

As a (very) average programmer, I know how hard it is to write correct code, get it to run reliably, ship it on time, learn from user feedback, etc. and things like AGI are many orders of magnitude more difficult to get right than, say, simple webapps.

This freak out (or just in corners I inhabit) over embarrassing but hardly consequential errors is startling to observe.

Expand full comment

Vin Bhalerao

Mar 2, 2024

Isn’t the problem here that we’re trying to think of these LLMs as having a single personality instead of a collection of a large number of personalities? The solution then would be to expose them as a large collection of personalities instead of a single one.

If I’m a subsistence farmer in Africa looking for advice on some issue my crops are having, I don’t want the solution that would be appropriate for an industrial scale farmer in the US. Ideally the UX for these LLMs should require you to first choose who you want to talk to and then ask the question. With that type of a UX even the inappropriate images you included in your post could be considered to be quite appropriate provided, say, you choose an alternate history fiction writer as the personality you’re talking to.

Expand full comment

Mar 2, 2024

They're already trained on a large collection of personalities, perhaps the largest ever. Turns out that doesn't provide situation specific intelligence only inasmuch as also a whole lot of weirder associations that we can't predict.

Expand full comment

Andrew Beeli

Even the gotcha image from Bing had 2 white male soldiers in the 4 that were picked.

I think there's a huge difference between data-derived stereotypes, and then a reinforcement training program that attempts to counter that stereotype, and Gemini eliminating the stereotype - and an entire people group. This was manifestly obvious to anyone who generated images. The fact the model was released publicly doesn't point to bureaucracy intermixed with urgency. It points to a myopic world view best represented by "the median member of the San Francisco Board of Supervisors." This is an institutional failure and the backlash is justified.

Expand full comment

Sure. I'm saying the institutional failure isn't easy to point to without saying where or why. Did they just prompt it heavily? Or screw up the training data? Or tune it wrong? It definitely isn't trivial, and seems a continuation of the same types of errors that all of them fell prey to. Counting the number of white people isn't instructive enough about failure modes imo.

Expand full comment

Andrew Beeli

Where: I think it's a failure to add a lot of politically correct language that you append to each prompt.

Why: Ultimately I think you have to look at people's motivations and the incentives in the institution. Clearly they screwed up the training process because they had strong ideological motivations and blind spots.

I don't think I need to identify every (or even any) cause to identify a car accident or notice the pattern and blame BA's culture and decision-making.

It's not just a continuation of the same types of errors, it's so egregious as to be qualitatively different.

Expand full comment

My point was they all add this and the solution it's to continue to add it but make sure that the model understands why you are adding and the texts in which it should apply, which is more complicated than just leave it blank

Expand full comment

Andrew Beeli

Not sure you should add it, but I might be wrong.

Either way, Google wasn't just a difference of degree, but a huge lapse indicative of systemic cultural rot. Other LLMs would actually generate a white male, but Gemini wouldn't (at least in all the examples I've seen).

Expand full comment

Comment deleted

Apr 7, 2024

Comment deleted

Expand full comment