22
votes
Criminals are getting increasingly adept at crafting malicious AI prompts to get data out of ChatGPT
Link information
This data is scraped automatically and may be incorrect.
- Title
- Psst ... wanna jailbreak ChatGPT? Inside look at evil prompts
- Published
- Jan 25 2024
- Word count
- 495 words
To be honest I'm pretty confused at what the article is trying to say. On polymorphic malware, is the idea that you embed an LLM into the malware for the polymorphic capability, or that you just use chatGPT to write polymorphic malware? For the first, I don't think it's going to viable anytime soon for your malware to have 40 GBs minimum of weights. For the second, chatGPT is not capable of superhuman programming lol. Anything chatGPT can do, a programmer can do. It's not particularly close - chatGPT is quite nifty for programming because you don't have the same expectations as a trained human. Considering we don't have polymorphic malware yet, this seems to be putting the cart before the horse.
Then there's the Swagger example where they ask chatGPT to list some API endpoints which could have swagger/openapi specs exposed. For one, that's not much of a exploit, if your API's specification could cause you damage that's a violation of Shannon's maxim. Secondly, all it would do is regurgitate blogspam on the issue or make shit up.
The article opens with one issue and then jumps to others without a transition.
The opening paragraph is about prompt engineering to get back sensitive, raw, training data. That could include PII, or private data that the owner didn't realize was accessible by crawlers.
Then they jump to polymorphic code. I don't think the idea here is that the malware would be a self contained LLM, but that it would query one through the API requesting specific code, and since the LLMs include methods to add stochastic elements to the output, the exact code would change each time and evade fingerprinting.
That would require having a set of prompts that consistently return working code, which seems unlikely in the short term.
But yeah, not a great article.
I see. It'd certainly be hard to get any LLM to consistently spit out a long series of code that not only works, but works in the same every time. Even if you wanted to do that, it'd be immensely easier to start by training one of the open source LLMs on custom data. I find it hilarious the article implies that people are trying to do this by giving chatGPT sufficiently wacky prompts ("jailbreaking") as opposed to using a specialized tool.
It sounds like the author read some people who have no idea what they're doing speculating about things on some forum and wrote about it like the elder council of hackers has ordered the hacking institute of hacking to start work on it.
Yeah. It really sounds like a lot of this is either people testing to see what actually works (and finding that at the moment it doesn't), script kiddies being script kiddies, and people grifting by selling "secret prompts" to script kiddies. It's one of those areas where I understand why security companies are tracking it, but not exactly something actionable at this time.
Note you can get an okay Phi-2 quantization at 2-3 GB
If there's any silver lining to this, it's that criminals abusing AI will force us to take AI safety more seriously
I actually find it somewhat funny, although I'm not an early adopter or a lover of tech for tech's sake.
It reminds me of that poor robot who got sent out to travel until the shitheads in Philadelphia destroyed it for existing. Any technical change has to account for the existence and likely tactics of pranksters and fraudsters and shitheads.