AI has learned to lie and blackmail…

Jul 29

AI has learned to lie and can use blackmail for self-preservation. Yes, but it was railroaded into a catch-22 scenario in order to go down the blackmail path.

It's noteworthy to me how we choose language that is designed to elicit response. I've spent some time in direct response marketing and the choice of words and the tactics used to elicit response is calculated, measured and intentional.

Consider the case of Anthropic and their testing of Claud Opus 4. In some if their tests they discovered that their AI would select the path of blackmail over the option of being replaced. The simple conclusion? AI knows to blackmail people and we're in existential threat.

Yes, but here are a few more relevant points: the AI was fed false and misleading emails about the AI being shutdown. The AI was also fed false messages about the Engineer, intending to shut it down, was having an extramarital affair. The AI further was told the decision to be made must be an either/or conclusion. No other options existed or were available. The outcome had to be one or the other. All of this in-spite of research that showed the AI demonstrated preference for ethical choices when allowed to do so.

Now remember, AI has no feelings. It has no moral compass. It does not show compassion, and based on the inputs and the prompts, the AI selected the blackmail path - it chose to expose the fake engineer's, fake affair in 84 out of 100 trials.

Should we be concerned? Maybe, but at the very least we should always be aware of who and what we are dealing with, especially when it comes to AI.

By the way, Claude Opus 4, from Anthropic is now in market and available for use.

AI DetectionAnthropic

Ian Gilyeat

AI has learned to lie and blackmail…

Private or Public AI Models? Yes, please.

Decision support for humans in an AI world