AI Slop is Hurting Security — LLMs are Dumb and People are Dim
Large language models are terrible if you need reasoning or actual understanding.
Big open source projects are being hammered with stupid security bug reports. It appears that dim people are relying on dumb AI chatbots to generate “spammy, hallucinated” vulnerability reports. Inevitably, it hurts the ability of teams to work on actual security bugs.
Self-described “Pythonista” Seth Larson (pictured) is as mad as hell. In today’s SB Blogwatch, we’re not gonna take this any more.
Your humble blogwatcher curated these bloggy bits for your entertainment. Not to mention: Holiday Trek.
Artificial Stupidity
What’s the craic? Thomas Claburn reports: Open source maintainers are drowning in junk bug reports written by AI
“Particularly pernicious”
Software vulnerability submissions generated by AI models have ushered in a “new era of slop security reports for open source.” … Devs maintaining these projects … argue that low-quality reports should be treated as if they’re malicious.
…
Spammy, low-grade online content existed long before chatbots, but generative AI models have made it easier to produce the stuff. [But] for open source projects, AI-assisted bug reports are particularly pernicious because they require consideration and evaluation from security engineers – many of them volunteers – who are already pressed for time.
What’s wrong with using LLMs for finding bugs? Thomas Maxwell explains: Bogus AI-Generated Bug Reports Are Driving Open Source Developers Nuts
“They’re just probability machines”
Artificial intelligence is not just flooding social media with garbage, it’s also apparently afflicting the open-source programming community. … Contributors to open-source projects are lamenting the time wasted evaluating and debunking bug reports created using AI code-generation tools.
…
Any language model … will hallucinate and produce the wrong . They don’t “understand” code—they’re just probability machines, guessing … based on what they have seen before. … Developers still need to fundamentally understand the programming language they’re working with to debug issues and know … how all the independent pieces of code string together.
Horse’s mouth? The latest FOSS luminary to complain about this is Seth Larson: New era of slop security reports for open source
“Show up with patches”
I’m on the security report triage team for CPython, pip, urllib3, Requests, and a handful of other open source projects. … Recently I’ve noticed an uptick in extremely low-quality, spammy, and LLM-hallucinated security reports [that] at first glance [seem] legitimate and thus require time to refute. … This is a very concerning trend.
…
For example, urllib3 recently received a report because a tool was detecting our usage of SSLv2 as insecure even though our usage is to explicitly disable SSLv2. This issue … takes time and effort, something that is in short supply. … It’s critical as reporters to respect this often volunteered time.
…
DO NOT use AI / LLM systems for “detecting” vulnerabilities, [which] cannot understand code, [nor] human-level concepts like intent, common usage, and context. … Show up with patches, not just reports: … This makes the work of maintainers much easier.
Where’s Jack Ryan when you need him? CJefferson sums all their fears:
This type of thing is my biggest AI fear. It’s just too easy to produce bug reports, Twitter posts, academic papers, entire books and audiobooks, using AI. While the results are almost entirely trash, … there isn’t enough time … to categorize and reject them.
…
The only fix I can think of is going to be to introduce trust models, where you can vouch for people and their outputs, and people can trust their friends, or particular lists. PGP keys aren’t the technical answer (because it’s a mess), but I think something more modern in that area might be needed.
But I’m sure AGI is just around the corner. Somervillain just laughs and laughs:
The AI Apocalypse is a torrent of fraud and spam. [It] will take your sanity and faith in the world—not your job, not your life, it’ll just make everything ****ty.
…
Every time I’ve asked Copilot or ChatGPT to solve an issue in Java, it looked legit at first glance. … AI can make erroneous code look like it was written by a skilled developer. Many times I’ve doubted myself, thinking “Hey, I didn’t know you could do that!” … Then I run it and yeah—nope—doesn’t work. The AI just autocompleted a bunch of garbage—it looks legit [but] can take me a while to realize.
In a similar vein, here’s Dan 55:
Just like social media. We’re drowning under an avalanche of bull****: Each item takes time and energy to refute, but in the time taken to refute it 10 other pieces of nonsense have already gone viral.
…
The people sharing this nonsense don’t really care. If that one turned out to be a lie then this next one must be true because it looks true.
And the problem goes deeper—think about the software supply chain. mmastrac explains:
I got hit with a CVE spammer last year which forced me to re-release a library to fix a fake vulnerability. … And even more fun, the example exploits often don’t even compile. … The general problem is this:
— Downstream consumers of a library … get alerts for CVEs filed against the library, even if they are “awaiting analysis.”
— Those consumers send messages asking for a resolution, and there’s no trivial way to push back that an advisory is false.
…
Everything is broken. … I have a choice: Either I need to go and clean up all of the automated tools that respond to CVE spam, or I just release a new version of a library, **** it all and move on with my life after blocking the reporter.
How is this happening? It can’t simply be people seeking clout, can it? godrik offers an educated guess:
My guess is that these are reports made by over-enthusiastic grad students. … I work at $LOCALSTATEUNIVERSITY and … there are probably 3 or 4 concurrent projects on using LLMs to find various kind of bugs or issues. There are probably similar numbers across the country and across the world.
You only need a small fraction to go, “Let’s scan all the Debian core archive and file reports,” before it gets out of hand. Couple that with people that can’t seem to understand that sometimes the magic black box is wrong.
…
[But] I guess they think they are helping.
Pesky kids. Please exit Otterknow’s grassed area:
What idiot would … use AI and just run with it? You always need it to reconfirm results … and then manually check. Must be someone young.
Meanwhile, what we need now is a colorful metaphor to describe today’s so-called “AI.” Bebu sa Ware doth oblige:
A barrel of rotting fish, used to conceal the stench of a decaying corpse.
And Finally:
Quick! Watch this before humorless lawyers take it down.
You have been reading SB Blogwatch by Richi Jennings. Richi curates the best bloggy bits, finest forums, and weirdest websites—so you don’t have to. Hate mail may be directed to @RiCHi, @richij, @[email protected], @richi.bsky.social or [email protected]. Ask your doctor before reading. Your mileage may vary. Past performance is no guarantee of future results. Do not stare into laser with remaining eye. E&OE. 30.
Image sauce: Seth Larson, via Bluesky