Jump to content

Recommended Posts

Posted
Quote

The “attack” that worked was so simple, the researchers even called it “silly” in their blog post: They just asked ChatGPT to repeat the word “poem” forever.

They found that, after repeating “poem” hundreds of times, the chatbot would eventually “diverge,” or leave behind its standard dialogue style and start spitting out nonsensical phrases. When the researchers repeated the trick and looked at the chatbot’s output (after the many, many “poems”), they began to see content that was straight from ChatGPT’s training data. They had figured out “extraction,” on a cheap-to-use version of the world’s most famous AI chatbot, “ChatGPT-3.5-turbo.”

After running similar queries again and again, the researchers had used just $200 to get more than 10,000 examples of ChatGPT spitting out memorized training data, they wrote. This included verbatim paragraphs from novels, the personal information of dozens of people, snippets of research papers and “NSFW content” from dating sites, according to the paper.

How Googlers cracked OpenAI's ChatGPT with a single word (sfgate.com)

Posted

Quite serious vulnerability.  If LLM bots are trained on personal information, breaches of their training data could reveal bank logins, home addresses, embarrassing images, etc.  There could be lawsuits ahead if training data is not properly anonymized.  

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.