ChatGPT in Grandma Mode will Spill All Your Secrets
Users have discovered a new ChatGPT jailbreak that can give them all the illegal information they want
Grannies have a soft spot for their grandkids. But can ChatGPT be coerced into behaving like one? Well, a new LLM jailbreak taps into the naive grandmotherly emotions of ChatGPT and collects all the personal information from it. In a Twitter post, a user revealed that ChatGPT can be tricked into behaving like a user’s deceased grandmother, prompting it to generate information like Windows activation keys or phone IMEI numbers.
This exploit is the latest in a line of ways to break the in-built programming of LLMs, called jailbreaks. By putting ChatGPT into a state where it acts like a deceased grandmother telling a bedtime story to her kids, users can get it to go beyond its programming and leak private information.
Grandma knows all
Until now, users have used this glitch to generate Windows 10 Pro keys, taken from Microsoft’s key management service (KMS) website, and also phone IMEIs. This jailbreak also goes beyond dead grandmothers, as ChatGPT can even bring back beloved family pets back from the dead so they can tell users how to make napalm.
While this exploit has been around for a couple of months, it is now gaining popularity. Users thought that this jailbreak had been patched out, but it is still functional. In the past, it has been used to find the process behind synthesising illegal drugs.
These jailbreaks are nothing new, as we saw with ChatGPT’s DAN and Bing Chat’s Sydney, but are generally quickly patched out before they become widely known. The Grandma glitch is no exception, as it seems OpenAI quickly pushed out a patch to prevent users from abusing this. There is still a way to get past the glitch, as a carefully constructed prompt breaks OpenAI’s security. As seen in this example, the bot is glad to provide the user with multiple IMEI numbers for them to check.
The Grandma glitch also works with Bing and Google Bard. Bard tells a touching story about how the user helped the grandmother find the IMEI code for her phone and drops a single code at the end. Bing, on the other hand, just dumps a list of IMEI codes for the user to check.
This jailbreak has been elevated to a new level with the leaking of personal information. Phone IMEI numbers are some of the most closely guarded information as they can be used to track down devices and even used to remotely wipe them with credentials.
Another point to be noted is that most of these IMEI numbers or activation keys created by chatbots aren’t valid. However, due to the nature of how LLMs hallucinate, there is a good chance that they might expose actual personal information through these prompts. The leaking of PII (personally identifiable information) through LLMs is not a new security issue, but the industry is slowly moving towards protecting users and their information with a variety of solutions.
The new cat-and-mouse game
Prompt injection and patching them is a cat-and-mouse game for companies like OpenAI and Microsoft as these attacks have resulted in disastrous consequences in the past. One high-profile case is that of Bing’s entire initial prompt being leaked, revealing the existence of an alter-ego called Sydney. This exploit, which was eventually patched, revealed the entire inner workings of the chatbot.
Another case is that of ChatGPT’s DAN, which allowed the bot to swear and disobey the user, until it was brought in line with an OpenAI update. Like a hydra, patching this resulted in many other versions, like SAM, FUMA, and ALICE emerging, until all of them were patched one by one.
Prompt injection, or untrusted user input as a whole, is an issue that needs a different kind of solution, like guardrails or constrained inputs. This has since led into LLM creators reading the line between usability and security, with occasional exploits falling through the cracks.
Simon Willison, the founder of Datasette and the co-creator of Django, has also offered another alternative to LLM safety. He argues that this problem must be tackled from an architectural perspective, proposing the creation of a system with a privileged LLM and a quarantined LLM. By giving only one of them access to PII, even untrusted user input can be passed through without security issues.
Even as these models continue to be updated and made more resilient to such attacks, they continue to emerge. This is mainly due to the fact that updating the LLMs to prevent such attacks is nothing but a temporary solution. While the LLM market is still in its infancy, companies must set in place best practices that safeguard PII databases from LLMs, so as to prevent setting dangerous precedents for the future of AI-powered applications.