BSides SF: Using AI to spot shadow patches in open-source software

SAN FRANCISCO — As you might guess, AI was the dominant theme of this year's BSides San Francisco conference here this past weekend (April 26 and 27), running its hallucinating tendrils like a clinging vine through more than half the presentations we attended.

Some of the talks touted the coding abilities of one large-language-model or another. Others doubted the ability of any LLM to be consistent enough to generate reliable code.

But our favorite presentation unveiled a use case convincing enough to charm this AI skeptic: using ChatGPT to sniff out undisclosed patches, or "shadow patching," in open-source code.

Invisible demons

The results were eye-opening. Over the course of 2024, an LLM-based process developed by Aikido Security in Ghent, Belgium uncovered more than 500 undisclosed vulnerabilities in open-source software, said the company's Mackenzie Jackson.

Dozens of those vulnerabilities were critical, and dozens more were high-risk, yet developers using the software would never know unless they'd looked at the changelogs — and who does when a piece of software is a dependency of a dependency of a dependency?

"One little thing buried way down deep can make a whole chain of services and applications vulnerable," said Jackson, using 2021's Log4j flaw as an example.

Since the beginning of January of 2025, Jackson said, the Aikido process had found 126 more undisclosed vulnerabilities. Of all those found in 2024, 67% were never reported, even months after being patched. Fifty-six percent of the critical vulnerabilities detected in 2024 fell into this category.

Six steps to revelation

Aikido uses a six-step process to sniff out "silent" vulnerability patches. First, it maintains a list of the changelogs of the top 5 million most popular open-source tools. Then scrapers scan the changelogs for raw data, which are fed into an LLM that outputs the data in a standard format. (Jackson pointed out that there's presently no common format for changelogs.)

The next step is the most crucial — a ChatGPT model trained to look for any indication of a vulnerability fix goes through the data.

Jackson said that LLMs are well suited for this because they can be trained to see through deliberately obfuscating language in changelogs, such as "redacted tokens" or "increase encryption work factor," both of which hint at serious vulnerabilities.

The vulnerabilities discovered are then cross-referenced against public CVE databases and the matches discarded. Finally, the apparently undisclosed vulnerabilities are verified by human security engineers who assign each verified one a severity score.

"We can't report these as CVEs yet," said Jackson, citing the risks involved in disclosing vulnerabilities that many users don't know they have to patch. "But we will."

Sniffing out deliberately malicious projects

Jackson said Aikido was now adopting similar methods to find hidden malware in open-source repositories. The LLMs look for obfuscated hints of malicious intent in changelogs, such as when input validation is changed.

"Just in March," Jackson said, "we found 611 malicious projects in npm."

He said the Aikido group spotted a new Lazarus Group malicious project stealing data, and could watch as the North Korean nation-state actors debugged the malware in real time.

"What we built was a fairly simple use case of harnessing LLMs," said Jackson. "We have to assume that the bad guys have done this too, and that they can find and exploit these vulnerabilities before organizations patch them."