The agentic SDLC.
Agentic engineering is not about writing code faster. It is the discipline of designing, building, verifying, and governing a delivery system in which AI agents plan, use tools, hold state, and take bounded actions with real effects. The work moves from producing code to specifying intent, curating context, and proving control.
Agile and DevOps are necessary. They are not sufficient.
Traditional delivery assumes humans are the actors and automation is deterministic. An agentic SDLC has to manage probabilistic behaviour, tool invocation, multi-step trajectories, agent memory, model routing, and human approval points. The unit of control changes. You no longer review only code and pipeline outcomes. You also review specifications, agent trajectories, tool calls, evaluation results, approval events, and runtime traces.
- Behaviour must be governed at runtime, not only at deploy time.
- Context becomes a first-class engineering asset, not documentation beside the work.
- Platform, security, architecture, and governance get more important as autonomy rises, not less.
Turn knowledge into machine-usable context, or watch accuracy plateau.
In traditional delivery, requirements, ADRs, standards, and runbooks sit beside the SDLC as prose. In an agentic SDLC they have to become tool schemas, policies, API and data contracts, reusable specifications, evaluation datasets, and runtime rules. The teams that win will not be the ones with the cleverest prompts. They will be the ones that turn architecture, policy, and domain knowledge into reusable context and measurable controls.
From human-led to governed agentic delivery.
This is a ladder, not a switch. Each rung adds autonomy, and with it the need for standard context, observable traces, and policy automation. Most enterprises should expect to live across several rungs at once.
Conventional team delivery with scripts and CI/CD.
Engineers use assistants for code, explanation, and test ideas.
An engineer delegates a bounded task to one repo-aware agent.
Planner, coder, tester, and reviewer agents under one operator.
Agents take part across product, architecture, QA, security, and platform.
Agents pick up bounded backlog items and produce pull requests.
A standardised, auditable, cost-aware operating model across teams.
The roles do not disappear. They move up the value chain.
The career trap for engineers is over-indexing on prompt cleverness while under-investing in domain knowledge and verification. For architects, it is producing elegant documents that no agent can execute. For governance teams, it is treating agentic delivery as a tool-exception process instead of a new operating model.
Software engineers
QA and test
Data and ML
Platform / SRE
Security
Architects
Match the controls to the autonomy. Tier deliberately.
The baseline should be risk-based enablement, not blanket prohibition. Accountability, traceability, oversight, and lifecycle risk management are the governing ideas. The practical expression is an autonomy tier with a minimum set of controls attached to each.
Coding assistance IDE suggestions, explanations, doc drafting
- Approved models
- Acceptable-use rules
- Source and licence policy
- Telemetry
- Optional provenance checks
Repository agents Branch-limited changes, tests, draft PRs
- Sandboxed execution
- Repo-scoped credentials
- Static analysis and secret scanning
- Mandatory human review
- Trace retention
Autonomous PRs Agents prepare merge-ready PRs from backlog items
- Explicit eligibility criteria
- Eval pass gates
- Cost and runtime caps
- Policy checks
- Rollback readiness and reviewer accountability
Production-impacting Agents can change infra, workflows, or service state
- Just-in-time access
- Dual control
- Hard allow-lists
- Runtime mediation
- Audit-grade logging and a kill switch
Customer or regulated data Service workflows, personalised assistants
- Data classification
- PII controls and privacy-enhancing techniques
- Retention rules
- Vendor due diligence
- Legal review
High-risk regulatory domains Employment, credit, education, critical infrastructure
- Full risk-management system
- Logging and technical documentation
- Human oversight
- Robustness and accuracy controls
- Post-market monitoring
field note: the agent plans, the human applies. A plan reads. An apply costs money and changes access. Keep apply authority structurally out of agent reach, not merely discouraged.
Route to the cheapest model that clears the bar, and measure outcomes.
The architecture is hybrid. Small local or low-cost models handle retrieval, classification, summarisation, templating, and narrow transforms. Frontier models earn their cost on ambiguous cross-file reasoning, architecture-sensitive changes, and hard bug fixing. The routing rule is simple: choose the cheapest model that meets the quality threshold for the task class, under an enforced review-burden budget. Intentional and observable, which makes it governance as well as cost control.
Local SLM or low-cost hosted model
Retrieval, classification, summarisation, templating, and narrow transforms do not need a frontier model.
Controls that come with it- Approved-model list
- Telemetry
Routing rule: choose the cheapest model that clears the quality bar for the task class, under an enforced review-burden budget.
Do not run the programme on a single productivity number. Track delivery flow, quality and reliability, safety and security, and economics together. The advantage comes less from how much code you generate and more from how well you specify, verify, govern, and improve a mixed human and agent system.
- Cycle and lead time
- PR review time
- Deployment frequency
- Time to first draft PR
- Change failure rate
- Mean time to restore
- Escaped defect rate
- CI success for agent PRs
- Unsafe-action block rate
- Policy violation rate
- Grounding failure rate
- Prompt-injection detection
- Red-team pass rate
- Cost per completed task
- Cost per accepted PR
- Token spend by workflow
- Cache-hit and context reuse
- Developer satisfaction
- Template adoption
- Evaluation coverage
- Onboarding time
The short version, for people who sign things off.
- What is genuinely different from agile and DevOps?
- Agents add probabilistic, multi-step, tool-using behaviour that must be specified, evaluated, traced, and governed at runtime.
- What is safely augmentable now?
- Test generation, documentation, code explanation, refactoring, issue triage, conformance checking, and draft PRs under review.
- What stays human-led?
- Risk acceptance, requirements arbitration, architecture trade-offs, incident command, regulatory interpretation, and final accountability.
- What should you do first?
- Standardise tools, define autonomy tiers, build context packs and eval suites, instrument traces and cost, and run a few bounded pilots.
And the one rule that survives every domain: do not give agents broad infrastructure permissions, unrestricted customer data, or merge and deploy rights until you can trace, review, reproduce, and roll back their work consistently.
The thinking, in order.
This page is the map. The posts below are the territory. More open up as the work gets done.