
Konstantin Semenenko
July 2, 2026
3
minutes read
AI-generated apps fail security review because the model optimizes for code that works, not code that's safe, and the human review layer often gets skipped. The recurring failures are hardcoded secrets, SQL injection, missing auth, and hallucinated dependencies. The fix is not to stop using AI; it's to put back the layer vibe coding removed: automated security scanning plus a senior engineer reviewing every diff before it ships.




AI-generated apps fail security review for one structural reason: the model is rewarded for producing code that runs, and safe code and running code are not the same thing. Putting an API key directly in the file makes the code work. Moving it to an environment variable requires understanding a deployment context the model doesn't have. So it hardcodes the key, the demo works, and the secret ships to production in browser-readable JavaScript. Multiply that across auth, queries, and dependencies, and you get an app that passes the demo and fails the audit. The fix is to restore the review layer that "just prompt it" removed.
We ship AI-written code under compliance constraints, so we see exactly where generated apps break a security review. This is what fails most often, why, and the concrete steps that close each gap, written for teams who want AI's speed without shipping the vulnerabilities that come with it by default.
This isn't a theoretical risk, it's measured and rising. A 2026 study testing 534 code samples across six major LLMs against the OWASP Top 10 found that 25.1% contained a confirmed security vulnerability. A separate analysis of more than 50,000 AI-generated codebases found 68% had at least one high-severity vulnerability, with an average of 4.2 security issues per project. Black Duck's 2026 OSSRA report showed mean vulnerabilities per codebase jumped 107% year over year.
The trend line is the alarming part. Georgia Tech researchers tracked CVEs directly attributable to AI-generated code rising from 6 in January 2026 to 15 in February to at least 35 in March. And there's a human factor that compounds it: studies have repeatedly found that developers using AI assistants write less secure code while reporting a false sense of security, rating insecure solutions as safe. The tool is confident, the developer trusts it, and the vulnerability ships.
The most common real-world failure in vibe-coded apps is leaked credentials. Security researchers analyzing applications built on AI platforms found OpenAI API keys and database service-role keys hardcoded directly into browser-accessible client-side JavaScript, where anyone can read them. The root cause is the one above: hardcoding the key makes the code run, and the model optimizes for running.
The fix is mechanical and non-negotiable. Secrets live in environment variables and a secrets manager, never in source, and a pre-commit secret scanner blocks any key from reaching the repository. This is the single highest-value check you can add, because a leaked service-role key isn't a bug, it's a direct path into your data.
SQL injection remains the most common vulnerability in AI-generated code, despite being understood for decades. Models frequently generate string interpolation, dropping a variable straight into a query, instead of parameterized queries, because the interpolated version is shorter and works in the demo. The same pattern produces missing input validation across forms and API endpoints.
The fix is to enforce parameterized queries and an input-validation layer as a rule the code is checked against, not a thing you hope the model remembered. Static analysis catches the obvious cases; a reviewer catches the rest. The point is that "it returned the right data in testing" tells you nothing about whether a crafted input can rewrite the query.
Design-level auth flaws are rising fast, one analysis found a 153% increase in issues like authentication bypass and improper session management in AI-generated code. The reason is that auth is contextual: who can do what, under which conditions, with what session handling, is exactly the business logic a model can't infer from a prompt. So it scaffolds endpoints that are publicly callable, permissions that aren't enforced, and data that isn't isolated between users.
This is the classic non-technical-founder trap: the app works when you test it as the only user, and falls apart the moment two accounts exist. The fix is to treat auth, permissions, and data isolation as a designed requirement a senior reviews directly, not a default the generator fills in. No scanner fully owns this one, because correct access control is judgment, not a pattern.
The newer, more insidious failure is the AI inventing packages. When asked to implement a feature, models sometimes suggest non-existent npm or pip packages. If a developer blindly runs the install, an attacker who has registered that exact package name, a technique called dependency confusion or typosquatting, gets their malicious code into your build. This is a supply-chain attack the model opens by accident.
The fix is an automated dependency and license audit on every package an AI suggests, before the commit: verify the package actually exists in the official registry, check it against an authorized list, and scan it. The convenience of "the AI told me to install it" is exactly the convenience an attacker is counting on.
Step back and the individual failures share one origin. Vibe coding, prompting AI to generate code without reviewing the output, amplifies every vulnerability class, because the danger isn't only that AI generates flawed code, it's that the human review layer has been deliberately removed. The model produces confident, clean-looking code, the developer trusts the confidence, and nothing checks the output against a security standard before it ships.
So the fix is not "use a better model" or "stop using AI." It's to put the layer back. AI-generated code is powerful but imperfect, and with the right scanning, process, and human review, teams get the speed without the breaches. That means automated security scanning in the pipeline as a first pass, and a senior engineer reading the diff for the judgment calls, auth, data isolation, business-logic risk, that no scanner catches. The senior's role has shifted from writing the code to orchestrating and verifying it, and that role is the security control.
If you're shipping AI-generated code, a concrete checklist closes most of the gap:
None of this slows AI down meaningfully. It just stops the speed from compounding into a breach.
The pattern is the one under all AI building: the model produces the draft, and the system that verifies it is the product. If you want a senior team to take AI-generated code through a real security review before it reaches production, that's where our AI Dev Team work starts.


