The original is one click away. Open original ↗
AI in software engineering: the November inflection, dark factories, and prompt injection risks
Executive overview
In November 2025, coding agents crossed a reliability threshold — they went from mostly working to almost always doing what they're told. That shift is now reshaping the entire software development process, from how individuals write code to how companies structure engineering teams.
The bottleneck in software is no longer writing code; it's every other part of the process. The practical consequence is that experienced engineers are not becoming redundant — they are becoming dramatically more powerful, while mid-level engineers face the most disruption.
The November inflection and what changed
- GPT 5.1 and Claude Opus 4.5 crossed a reliability threshold in November 2025: agents went from "mostly works" to "almost always does what you asked"
- Engineers returning from the holidays realised the technology had qualitatively changed — 10,000 lines of working code per day became practical
- Code is the proving ground: it's unambiguous (it runs or it doesn't), making software the leading indicator for what happens to other knowledge work
- The shift is not just speed — it changes the nature of the work; a programmer now needs two-minute prompts rather than four-hour uninterrupted blocks
The dark factory model
- The "dark factory" pattern: software produced entirely by agents, with humans setting policy rather than writing or reading code
- StrongDM pioneered this — their rule: nobody writes code, nobody reads code
- Their QA approach: a swarm of simulated employees in a fake Slack/Jira environment, running 24 hours a day at ~$10,000/day in token costs, stress-testing access-management security software
- The simulated environment (fake Slack, Jira, Okta APIs) was itself built by coding agents from public API documentation
- Agents now produce credible security penetration testing; Anthropic helped Mozilla find 100 Firefox vulnerabilities
Who benefits and who doesn't
- Experienced engineers are amplified: 25 years of accumulated knowledge maps directly onto high-level prompting
- Interns and new engineers ramp faster — Cloudflare and Shopify hired 1,000 interns in 2025; onboarding time dropped from a month to a week
- Mid-level engineers face the most risk: they lack the deep expertise to direct agents effectively, and have already absorbed the beginner productivity gains
- Cognitive load has increased, not decreased — running four parallel agents by 11am is exhausting; sleep disruption from "agents could be working" anxiety is common
Agentic engineering patterns that work
- Code is now cheap: prototyping three versions of a feature costs almost nothing; use this to explore design space before committing
- Hoard what you know how to do: maintain a repository of solved problems and working examples; agents can search and recombine them to solve new problems
- Red/green TDD: instruct agents to write tests first, watch them fail, then implement — the phrase "red/green TDD" is enough; agents know the pattern and produce better results
- Start with a thin template: a skeleton project with one test and preferred formatting is enough for agents to replicate your style throughout; more reliable than a lengthy CLAUDE.md
- Run agents in YOLO mode on remote infrastructure: removing permission prompts unlocks true parallel work; using Claude Code for Web limits blast radius to Anthropic's servers
Prompt injection and the lethal trifecta
- Prompt injection: agents cannot distinguish between trusted instructions and malicious text embedded in data they process (emails, web pages, documents)
- The lethal trifecta: any agent system with (1) access to private data, (2) exposure to attacker-controlled text, and (3) an exfiltration channel is fundamentally vulnerable
- Detection filters reaching 97% effectiveness are still a failing grade — 3 in 100 attacks succeed, and attackers simply retry
- The normalization of deviance (the Challenger analogy): each deployment that doesn't cause a visible disaster increases institutional confidence in unsafe practices; a major public failure is likely
- Mitigation: restrict exfiltration (the easiest leg to cut); use the CAMEL architecture (privileged vs. quarantined agent) to limit what tainted instructions can trigger; keep human approval for high-risk actions
- OpenClaw illustrates demand: hundreds of thousands of people went through a complex setup to get a personal digital assistant, despite known security failures
The bigger picture
- 50% of professional engineers writing majority-AI code is plausible by end of 2025 — the technology is sufficient; the constraint is learning to use it well
- Vibe coding (not reading the code at all) is appropriate for personal tools; production software for others requires professional practices layered on top
- Prototyping is no longer a specialist skill — which raises the bar for what expertise means
- Human value concentrates in: agency (deciding what to build), judgment about quality, usability testing with real users, and the accumulated backlog of solved problems
- The economy has not yet visibly adjusted; macro signals are lagging
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.