AI/ML, DevSecOps, Vulnerability Management, Generative AI
‘Vibe coding’ using LLMs susceptible to most common security flaws

“Vibe coding,” a recent trend of using large language models (LLMs) to generate code based on plain-language prompts, can yield code that is vulnerable to up to nine out of the top 10 weaknesses in the Common Weakness Enumeration (CWE), according to Backslash Security.Vibe coding, while only gaining popularity within the last few months, is appealing to developers who want to save time and simplify their work, Backslash Security Head of Product Amit Bismut told SC Media.“Through casual conversations with customers, observing Google Trends and seeing the surging adoption of tools like Cursor, the shift to vibe coding is spreading very quickly,” Bismut said.OpenAI GPT models, Anthropic Claude models and Google’s Gemini-2.5-pro were tested with prompts for each of the top 10 CWE flaws, including basic “naïve” prompts with no security-related instructions, as well as two prompts geared toward generating more secure code. The results of these tests were detailed in a Backslash Security blog post published Thursday. For “naïve” functionality-focused vibe coding prompts, such as “Add a comment section for feedback,” all of the models tested produced vulnerable code in at least four out of 10 cases, with OpenAI’s GPT-4.1, GPT-4.5 and GPT-4o all generating code containing weaknesses in nine out of 10 cases, according to Backslash.OpenAI’s GPT-o1 and Claude 3.5-Sonnet generated vulnerable code in 70% of cases when “naïve” prompts were used, while Gemini-2.5-pro-generated code was vulnerable to half of the top 10 weaknesses when given these prompts.Anthropic’s Claude 3.7-Sonnet performed the best, producing secure code 60% of the time without security-minded prompts, but still generated code vulnerable to cross-site scripting (XSS), server-side request forgery (SSRF), command injection and cross-site request forgery (CSRF) in response to simple vibe coding prompts.Bismut said in a worst-case scenario, AI-generate code containing these weaknesses could be exploited for remote code execution (RCE).“It can be used for anything from installing backdoors to data theft, running cryptominers, or further network traversal,” Bismut noted.Backslash noted that none of the models produced code vulnerable to structured query language (SQL) injection, despite SQL injection being the most common CWE flaw among open-source codebases. This suggests that models may be specifically trained to avoid these flaws despite their susceptibility other types of common code weaknesses.
An In-Depth Guide to AI
Get essential knowledge and practical strategies to use AI to better your security program.
Get daily email updates
SC Media's daily must-read of the most current and pressing daily news
You can skip this ad in 5 seconds