/ THE THESIS

The agentic SDLC.

Agentic engineering is not about writing code faster. It is the discipline of designing, building, verifying, and governing a delivery system in which AI agents plan, use tools, hold state, and take bounded actions with real effects. The work moves from producing code to specifying intent, curating context, and proving control.

Read the thread Answers for leaders

01 · WHAT ACTUALLY CHANGES

Agile and DevOps are necessary. They are not sufficient.

Traditional delivery assumes humans are the actors and automation is deterministic. An agentic SDLC has to manage probabilistic behaviour, tool invocation, multi-step trajectories, agent memory, model routing, and human approval points. The unit of control changes. You no longer review only code and pipeline outcomes. You also review specifications, agent trajectories, tool calls, evaluation results, approval events, and runtime traces.

Behaviour must be governed at runtime, not only at deploy time.
Context becomes a first-class engineering asset, not documentation beside the work.
Platform, security, architecture, and governance get more important as autonomy rises, not less.

Humans own intent and acceptance. Agents own bounded execution. The platform owns control and evidence.

02 · CONTEXT IS INFRASTRUCTURE

Turn knowledge into machine-usable context, or watch accuracy plateau.

In traditional delivery, requirements, ADRs, standards, and runbooks sit beside the SDLC as prose. In an agentic SDLC they have to become tool schemas, policies, API and data contracts, reusable specifications, evaluation datasets, and runtime rules. The teams that win will not be the ones with the cleverest prompts. They will be the ones that turn architecture, policy, and domain knowledge into reusable context and measurable controls.

03 · THE MATURITY LADDER

From human-led to governed agentic delivery.

This is a ladder, not a switch. Each rung adds autonomy, and with it the need for standard context, observable traces, and policy automation. Most enterprises should expect to live across several rungs at once.

Conventional team delivery with scripts and CI/CD.

Human ownsAll design, coding, review, release, incident response.

Agent ownsMinimal code completion, or none.

GovernanceExisting SDLC and DevSecOps.

Risk / valueLowest agent risk; limited productivity upside.

Practical next stepEstablish baseline metrics.

Engineers use assistants for code, explanation, and test ideas.

Human ownsValidate outputs, own all decisions.

Agent ownsSuggest snippets, explanations, tests.

GovernanceTool approval and acceptable-use controls.

Risk / valueStrong local productivity upside; low systemic change.

Practical next stepStandardise approved assistants.

An engineer delegates a bounded task to one repo-aware agent.

Human ownsTask framing, review, merge decision.

Agent ownsMulti-step task completion within one context.

GovernanceBranch isolation, human review, logging.

Risk / valueUseful for contained tasks; moderate review burden.

Practical next stepStart with draft PR workflows.

Planner, coder, tester, and reviewer agents under one operator.

Human ownsOrchestration, escalation, final responsibility.

Agent ownsSpecialist task decomposition and tool use.

GovernanceStronger tracing, policy checks, sandboxing.

Risk / valueHigher throughput and higher failure complexity.

Practical next stepBuild eval harnesses and specialist roles.

Agents take part across product, architecture, QA, security, and platform.

Human ownsCross-functional approvals and conflict resolution.

Agent ownsAssist across multiple workstreams.

GovernanceShared context, role-based permissions, audit.

Risk / valueBigger leverage, greater coordination risk.

Practical next stepAdd context repositories and a command model.

Agents pick up bounded backlog items and produce pull requests.

Human ownsApprove eligibility, review, merge, roll back.

Agent ownsImplement, test, document, respond to review.

GovernanceStrict autonomy tiering, CI gates, cost caps.

Risk / valueHigh value on repetitive work; non-trivial failure and spend risk.

Practical next stepPilot in low-risk repositories only.

A standardised, auditable, cost-aware operating model across teams.

Human ownsSet policy, manage risk, accept accountability.

Agent ownsExecute bounded work within policy and budget.

GovernanceContinuous assurance, policy-as-code, runtime mediation.

Risk / valueHighest leverage and highest organisational complexity.

Practical next stepInstitutionalise platform, evals, and governance.

04 · ROLES MOVE

The roles do not disappear. They move up the value chain.

The career trap for engineers is over-indexing on prompt cleverness while under-investing in domain knowledge and verification. For architects, it is producing elegant documents that no agent can execute. For governance teams, it is treating agentic delivery as a tool-exception process instead of a new operating model.

Software engineers Decomposition, spec quality, review judgement

Most augmentedBoilerplate, tests, refactors, repo exploration

Still human-ledProblem framing, trade-offs, accountable design

Rising skillsDecomposition, spec quality, review judgement

What good looks likeDirect agents well, verify outputs, own intent and quality.

QA and test Eval design, simulation, adversarial testing

Most augmentedTest generation, regression, trace analysis

Still human-ledTest strategy, oracle design, risk sign-off

Rising skillsEval design, simulation, adversarial testing

What good looks likeOwn the evaluation system, not just the test scripts.

Data and ML Data contracts, lineage, reproducibility

Most augmentedPipeline scaffolding, transforms, experiment bookkeeping

Still human-ledData quality, feature validity, lineage decisions

Rising skillsData contracts, lineage, reproducibility

What good looks likeCurate trustworthy data and model pipelines.

Platform / SRE Golden paths, secret mediation, cost control

Most augmentedTemplates, environments, release automation

Still human-ledDeploy policy, resilience, incident command

Rising skillsGolden paths, secret mediation, cost control

What good looks likeBuild the paved roads agents can safely use.

Security AI threat modelling, runtime controls, supply-chain governance

Most augmentedStatic analysis, dependency review, policy checks

Still human-ledThreat modelling, risk acceptance, investigation

Rising skillsAI threat modelling, runtime controls, supply-chain governance

What good looks likeShift from gatekeeper to control-system designer.

Architects Context curation, executable constraints, socio-technical design

Most augmentedDiagram drafts, standards lookup, consistency checks

Still human-ledBoundary design, roadmaps, strategic trade-offs

Rising skillsContext curation, executable constraints, socio-technical design

What good looks likeMove from approval gate to constraint designer and system steward.

05 · GOVERNANCE IS THE FEATURE

Match the controls to the autonomy. Tier deliberately.

The baseline should be risk-based enablement, not blanket prohibition. Accountability, traceability, oversight, and lifecycle risk management are the governing ideas. The practical expression is an autonomy tier with a minimum set of controls attached to each.

Coding assistance IDE suggestions, explanations, doc drafting

Approved models
Acceptable-use rules
Source and licence policy
Telemetry
Optional provenance checks

Repository agents Branch-limited changes, tests, draft PRs

Sandboxed execution
Repo-scoped credentials
Static analysis and secret scanning
Mandatory human review
Trace retention

Autonomous PRs Agents prepare merge-ready PRs from backlog items

Explicit eligibility criteria
Eval pass gates
Cost and runtime caps
Policy checks
Rollback readiness and reviewer accountability

Production-impacting Agents can change infra, workflows, or service state

Just-in-time access
Dual control
Hard allow-lists
Runtime mediation
Audit-grade logging and a kill switch

Customer or regulated data Service workflows, personalised assistants

Data classification
PII controls and privacy-enhancing techniques
Retention rules
Vendor due diligence
Legal review

High-risk regulatory domains Employment, credit, education, critical infrastructure

Full risk-management system
Logging and technical documentation
Human oversight
Robustness and accuracy controls
Post-market monitoring

field note: the agent plans, the human applies. A plan reads. An apply costs money and changes access. Keep apply authority structurally out of agent reach, not merely discouraged.

06 · TOOLING AND ECONOMICS

Route to the cheapest model that clears the bar, and measure outcomes.

The architecture is hybrid. Small local or low-cost models handle retrieval, classification, summarisation, templating, and narrow transforms. Frontier models earn their cost on ambiguous cross-file reasoning, architecture-sensitive changes, and hard bug fixing. The routing rule is simple: choose the cheapest model that meets the quality threshold for the task class, under an enforced review-burden budget. Intentional and observable, which makes it governance as well as cost control.

Data sensitivity

Complexity

Recommended model

Local SLM or low-cost hosted model

Retrieval, classification, summarisation, templating, and narrow transforms do not need a frontier model.

Controls that come with it

Approved-model list
Telemetry

Routing rule: choose the cheapest model that clears the quality bar for the task class, under an enforced review-burden budget.

Do not run the programme on a single productivity number. Track delivery flow, quality and reliability, safety and security, and economics together. The advantage comes less from how much code you generate and more from how well you specify, verify, govern, and improve a mixed human and agent system.

25%Delivery flow

Cycle and lead time
PR review time
Deployment frequency
Time to first draft PR

Change failure rate
Mean time to restore
Escaped defect rate
CI success for agent PRs

Unsafe-action block rate
Policy violation rate
Grounding failure rate
Prompt-injection detection
Red-team pass rate

Cost per completed task
Cost per accepted PR
Token spend by workflow
Cache-hit and context reuse

Developer satisfaction
Template adoption
Evaluation coverage
Onboarding time

07 · ANSWERS FOR LEADERS

The short version, for people who sign things off.

What is genuinely different from agile and DevOps?: Agents add probabilistic, multi-step, tool-using behaviour that must be specified, evaluated, traced, and governed at runtime.
What is safely augmentable now?: Test generation, documentation, code explanation, refactoring, issue triage, conformance checking, and draft PRs under review.
What stays human-led?: Risk acceptance, requirements arbitration, architecture trade-offs, incident command, regulatory interpretation, and final accountability.
What should you do first?: Standardise tools, define autonomy tiers, build context packs and eval suites, instrument traces and cost, and run a few bounded pilots.

And the one rule that survives every domain: do not give agents broad infrastructure permissions, unrestricted customer data, or merge and deploy rights until you can trace, review, reproduce, and roll back their work consistently.

08 · THE THREAD

The thinking, in order.

This page is the map. The posts below are the territory. More open up as the work gets done.

Factory Notes · Build log · 20 Jun 2026 How we built this site with Hekton

All writing Get in touch