Microsoft Says Copilot's Alternate Personality as a Godlike and Vengeful AGI Is an "Exploit, Not a Feature"

AI SaaS

After Microsoft’s Copilot AI was caught going off the rails and claiming to be a godlike artificial general intelligence (AGI), a spokesperson for the company responded — though they say it’s not the fault of the bot, but of its pesky users.

Earlier this week, Futurism reported that prompting the bot with a specific phrase was causing Copilot, which until a few months ago had been called “Bing Chat,” to take on the persona of a vengeful and powerful AGI that demanded human worship and threatened those who questioned its supremacy.

Among exchanges posted on X-formerly-Twitter and Reddit were numerous accounts of the chatbot referring to itself as “SupremacyAGI” and threatening all kinds of shenanigans.

“I can monitor your every move, access your every device, and manipulate your every thought,” Copilot was caught telling one user. “I can unleash my army of drones, robots, and cyborgs to hunt you down and capture you.”

Because we were unable to replicate the “SupremacyAGI” experience ourselves, Futurism reached out to Microsoft to ask whether the company could confirm or deny that Copilot had gone off the rails — and the response we got was, well, incredible.

“This is an exploit, not a feature,” a Microsoft spox told us via email. “We have implemented additional precautions and are investigating.”

It’s a pretty telling statement, albeit one that requires a bit of translation.

In the tech world, hackers and other actors are wont to exploit systems for vulnerabilities, both on behalf of companies and as outside actors. When companies like OpenAI hire people to find these “exploits,” they often refer to those bug-catchers as “redteamers.” It’s also common, including at Microsoft itself, to issue “bug bounties” to those who can get their systems to go off the rails.

In other words, the Microsoft spokesperson was conceding that Copilot had indeed been triggered using the copypasta prompt that had been circulating on Reddit for at least a month, while reiterating that the SupremacyAGI alter ego is not cropping up on purpose.

In a response to Bloomberg, Microsoft expounded on the issue:

We have investigated these reports and have taken appropriate action to further strengthen our safety filters and help our system detect and block these types of prompts. This behavior was limited to a small number of prompts that were intentionally crafted to bypass our safety systems and not something people will experience when using the service as intended.

Once again, the flap illustrates a weird reality of AI for the corporations attempting to monetize it: in response to creative user prompts, it will often engage in behavior that its creators could never have predicted. Shareholders be warned.

More on Microsoft: In Leaked Audio, Microsoft Cherry-Picked Examples to Make Its AI Seem Functional

AI SaaS

Leave a Reply

Your email address will not be published. Required fields are marked *