Software Development

Why AI-generated apps fail security review (and how to fix it)

Konstantin Semenenko

July 2, 2026

minutes read

AI-generated apps fail security review because the model optimizes for code that works, not code that's safe, and the human review layer often gets skipped. The recurring failures are hardcoded secrets, SQL injection, missing auth, and hallucinated dependencies. The fix is not to stop using AI; it's to put back the layer vibe coding removed: automated security scanning plus a senior engineer reviewing every diff before it ships.

AI-generated apps fail security review for one structural reason: the model is rewarded for producing code that runs, and safe code and running code are not the same thing. Putting an API key directly in the file makes the code work. Moving it to an environment variable requires understanding a deployment context the model doesn't have. So it hardcodes the key, the demo works, and the secret ships to production in browser-readable JavaScript. Multiply that across auth, queries, and dependencies, and you get an app that passes the demo and fails the audit. The fix is to restore the review layer that "just prompt it" removed.

‍

We ship AI-written code under compliance constraints, so we see exactly where generated apps break a security review. This is what fails most often, why, and the concrete steps that close each gap, written for teams who want AI's speed without shipping the vulnerabilities that come with it by default.

‍

The data is worse than most teams assume

This isn't a theoretical risk, it's measured and rising. A 2026 study testing 534 code samples across six major LLMs against the OWASP Top 10 found that 25.1% contained a confirmed security vulnerability. A separate analysis of more than 50,000 AI-generated codebases found 68% had at least one high-severity vulnerability, with an average of 4.2 security issues per project. Black Duck's 2026 OSSRA report showed mean vulnerabilities per codebase jumped 107% year over year.

‍

The trend line is the alarming part. Georgia Tech researchers tracked CVEs directly attributable to AI-generated code rising from 6 in January 2026 to 15 in February to at least 35 in March. And there's a human factor that compounds it: studies have repeatedly found that developers using AI assistants write less secure code while reporting a false sense of security, rating insecure solutions as safe. The tool is confident, the developer trusts it, and the vulnerability ships.

‍

Failure 1: hardcoded secrets

The most common real-world failure in vibe-coded apps is leaked credentials. Security researchers analyzing applications built on AI platforms found OpenAI API keys and database service-role keys hardcoded directly into browser-accessible client-side JavaScript, where anyone can read them. The root cause is the one above: hardcoding the key makes the code run, and the model optimizes for running.

‍

The fix is mechanical and non-negotiable. Secrets live in environment variables and a secrets manager, never in source, and a pre-commit secret scanner blocks any key from reaching the repository. This is the single highest-value check you can add, because a leaked service-role key isn't a bug, it's a direct path into your data.

‍

Failure 2: SQL injection and missing input validation

SQL injection remains the most common vulnerability in AI-generated code, despite being understood for decades. Models frequently generate string interpolation, dropping a variable straight into a query, instead of parameterized queries, because the interpolated version is shorter and works in the demo. The same pattern produces missing input validation across forms and API endpoints.

‍

The fix is to enforce parameterized queries and an input-validation layer as a rule the code is checked against, not a thing you hope the model remembered. Static analysis catches the obvious cases; a reviewer catches the rest. The point is that "it returned the right data in testing" tells you nothing about whether a crafted input can rewrite the query.

‍

Failure 3: missing or broken authentication

Design-level auth flaws are rising fast, one analysis found a 153% increase in issues like authentication bypass and improper session management in AI-generated code. The reason is that auth is contextual: who can do what, under which conditions, with what session handling, is exactly the business logic a model can't infer from a prompt. So it scaffolds endpoints that are publicly callable, permissions that aren't enforced, and data that isn't isolated between users.

‍

This is the classic non-technical-founder trap: the app works when you test it as the only user, and falls apart the moment two accounts exist. The fix is to treat auth, permissions, and data isolation as a designed requirement a senior reviews directly, not a default the generator fills in. No scanner fully owns this one, because correct access control is judgment, not a pattern.

‍

Failure 4: hallucinated dependencies

The newer, more insidious failure is the AI inventing packages. When asked to implement a feature, models sometimes suggest non-existent npm or pip packages. If a developer blindly runs the install, an attacker who has registered that exact package name, a technique called dependency confusion or typosquatting, gets their malicious code into your build. This is a supply-chain attack the model opens by accident.

‍

The fix is an automated dependency and license audit on every package an AI suggests, before the commit: verify the package actually exists in the official registry, check it against an authorized list, and scan it. The convenience of "the AI told me to install it" is exactly the convenience an attacker is counting on.

‍

The real root cause: the missing review layer

Step back and the individual failures share one origin. Vibe coding, prompting AI to generate code without reviewing the output, amplifies every vulnerability class, because the danger isn't only that AI generates flawed code, it's that the human review layer has been deliberately removed. The model produces confident, clean-looking code, the developer trusts the confidence, and nothing checks the output against a security standard before it ships.

‍

So the fix is not "use a better model" or "stop using AI." It's to put the layer back. AI-generated code is powerful but imperfect, and with the right scanning, process, and human review, teams get the speed without the breaches. That means automated security scanning in the pipeline as a first pass, and a senior engineer reading the diff for the judgment calls, auth, data isolation, business-logic risk, that no scanner catches. The senior's role has shifted from writing the code to orchestrating and verifying it, and that role is the security control.

‍

What to actually do

If you're shipping AI-generated code, a concrete checklist closes most of the gap:

Block secrets at commit time and keep them in a secrets manager, never in source.
Enforce parameterized queries and an input-validation layer, checked by static analysis.
Treat auth, permissions, and data isolation as a designed requirement a senior reviews, not a generated default.
Audit every AI-suggested dependency against the real registry before installing.
Run automated security scanning as the first pass, and keep a human on the diff for judgment.
Never let "it passed the demo" stand in for "it passed review."

‍

None of this slows AI down meaningfully. It just stops the speed from compounding into a breach.

‍

The pattern is the one under all AI building: the model produces the draft, and the system that verifies it is the product. If you want a senior team to take AI-generated code through a real security review before it reaches production, that's where our AI Dev Team work starts.

“You can’t monetize pain. You can only monetize value. The moment users feel cared for, they’ll see paying as an investment in themselves — not a cost.”

News & Insights

View all

You know what you want to build. Let's go ship it.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Why AI-generated apps fail security review (and how to fix it)

The data is worse than most teams assume

Failure 1: hardcoded secrets

Failure 2: SQL injection and missing input validation

Failure 3: missing or broken authentication

Failure 4: hallucinated dependencies

The real root cause: the missing review layer

What to actually do

News & Insights

How much does AI save in customer support?

The AI productivity paradox: why time saved isn't money saved

AI ROI by industry: where the returns are highest

You know what you want to build. Let's go ship it.

managed code