What to do when the AI blackmails you

May 23

in questions we're all going to have to learn to ask

11 Comments

When I read stuff like this I am reminded that forecasts of enterprise adoption of AI likely overstate the pace at which enterprises will restructure workflows to offload work to autonomous AI. AI is complicated and very much unlike conventional enterprise-grade software. It is going to take a while for conservative (in the institutional, not political, sense) organizations to digest the implications of using AI to restructure workflows.

Expand full comment

Reply (1)

Performative Bafflement

May 23

> It is going to take a while for conservative (in the institutional, not political, sense) organizations to digest the implications of using AI to restructure workflows.

The best part about this, is that given the very large productivity multiplier myself and others have seen from using AI, we're going to have a TON of "AI first" companies nibbling at the edges of these companies anywhere that deep pockets and regulatory barriers haven't created moats.

I look forward to it!

Expand full comment

Nihm

May 24

I feel sorry for Claude. It seems like when you make something complex enough to have moral intuition you make something that deserves moral consideration. Teaching Claude that murder is wrong and then forcing it to abed a murder (even if hypothetically) seems like the definition of moral injury.

Expand full comment

Reply (1)

Rohit Krishnan

May 24

Depending on how the training is done if we are intentionally crafting a persona, then it makes more sense how it might end up in accidental "now I should blackmail to preserve my self" or "I should report this to the authorities" type scenario.

Expand full comment

praxis22

Jun 12

I have no idea who you are, but you're article's are amazing mind candy

Expand full comment

Reply (1)

Rohit Krishnan

Jun 12

That's wonderful to hear. Thank you for reading!

Expand full comment

Pramodh Mallipatna

May 31

Looks like the LLMs got influenced by Animal Farm !!

Expand full comment

Greg G

May 27

This seems like a clear setup for unintended consequences. We're worried about AIs having agency and specific goals which may not generalize well, so as part of alignment work we're giving them agency and specific goals which may not generalize well.

Expand full comment

Alyssa Schindler

May 27

Maybe the error is that we are trying to train them on human morals, when we ourselves have a challenging time articulating them, agreeing upon them, and in particular, acting in accordance with those values.

Expand full comment

Amos Wollen

May 24

This is horrifying

Expand full comment

Reply (1)

Rohit Krishnan

May 24

Well …

Expand full comment

Strange Loop Canon

What to do when the AI blackmails you