The Best Engineer on Your Team Might Be Writing Less Code Than Everyone Else
There is an uncomfortable conversation happening in engineering leadership circles right now, and most organizations are not having it openly enough. It goes roughly like this: we have deployed AI coding assistants across the team, commit velocity has increased, lines of code per engineer are up, pull request volume has climbed, and yet something feels off. The engineers who seem most thoughtful about what they are building are not necessarily the ones driving those metrics. Meanwhile, some of the most active committers are shipping code that requires repeated revision, creates architectural debt, or solves problems that did not need solving in the first place.

This is not a new tension. Engineering leadership has always struggled with the difference between activity and output, and between output and value. But AI coding tools have compressed that struggle into a much sharper and more immediate form. When a tool can generate a working implementation in minutes, the act of writing code stops being the bottleneck. And when writing code stops being the bottleneck, the metrics we built around code production start telling us less and less about who our best engineers actually are.
The question this creates for engineering organizations is genuinely difficult: if AI can generate code, review code, write tests, explain APIs, and implement features, how do you know who your best engineers are? How do you evaluate performance fairly? How do you avoid accidentally rewarding the wrong behaviors at exactly the moment when getting this right matters most?
This article is an attempt to think through that question seriously.
How We Ended Up Measuring What Is Easy to Count
Before examining what metrics should look like in an AI-assisted world, it is worth understanding why the current ones exist and why they became so entrenched.
Lines of code was never a serious productivity metric among thoughtful engineers, but it persisted in management circles for decades because it is objective and easy to collect. Commit counts and pull request volume entered the picture with the rise of distributed version control and became proxies for engineering activity because activity is visible and value is not. Sprint velocity emerged from agile practices as a team-level planning tool but migrated, usually against the explicit intentions of the people who designed agile, into an individual performance signal. Hours worked has been a proxy for dedication in every knowledge work discipline, and software engineering was not immune to that instinct.
None of these metrics are completely without meaning. An engineer who never commits anything is probably not doing engineering work. An engineer who never closes any pull requests is probably not delivering software. But these metrics measure the process of software production, not the value of software. They measure motion, not direction.
For a long time, that imprecision was tolerable because writing code was actually the primary bottleneck in software delivery. An engineer who could write more code, more quickly, with acceptable quality, genuinely was more productive than one who wrote less. The correlation between code production and value creation was loose, but it was real enough to make code production metrics roughly useful.
AI coding tools have broken that correlation. Not gradually, but quite suddenly. An engineer using Claude Code, Cursor, or GitHub Copilot effectively can produce ten times the lines of code they could produce without those tools. That multiplier applies to good engineers and mediocre engineers alike. It applies to engineers solving hard problems and engineers generating code that should not exist. The signal-to-noise ratio in code production metrics has dropped dramatically, and most engineering organizations have not updated their mental models or their evaluation frameworks to account for this.
The Seductive Trap of Measuring AI Usage
When organizations realize that traditional metrics are losing meaning, the first instinct is often to measure something new. In the current moment, that often means measuring AI usage. How many Copilot suggestions did the engineer accept? How often are they using Claude Code? What percentage of their commits include AI-generated content?
This impulse is understandable and almost entirely wrong.
Measuring AI usage as a proxy for good engineering creates a set of incentives that are actively harmful. It rewards the superficial adoption of tools without regard to whether those tools are being used thoughtfully. It penalizes engineers who make deliberate, considered choices about when AI assistance is appropriate and when it is not. It mistakes the use of an instrument for the quality of the music.
Consider two engineers. The first uses an AI coding assistant for nearly every task. They accept suggestions quickly, rarely modify them substantially, ship a high volume of features, and are enthusiastic about tool adoption. The second uses AI assistance selectively, often preferring to write critical or architecturally significant code themselves, using AI primarily for tests, documentation, and scaffolding. If you measure AI usage as a positive signal, the first engineer looks more progressive. But the second engineer might be making better engineering decisions, and the code they deliver might be significantly more maintainable and reliable.
The deeper problem is that AI usage metrics, like all activity metrics, measure inputs rather than outcomes. You can generate an enormous amount of AI-assisted code that creates negative value: code that is technically correct but architecturally wrong, code that solves problems in ways that create future maintenance burdens, code that implements features nobody needs. You can also generate very little code that is transformatively valuable, because you spent your time understanding the problem correctly before writing a line.
Measuring AI usage is the new version of measuring lines of code. It tells you something about activity. It tells you almost nothing about value.
Code Production and Problem Solving Are Not the Same Thing
There is a foundational confusion embedded in how most organizations think about software engineering, and it is worth naming explicitly: writing code is a means to an end, not the end itself. Software engineering is problem solving. Code is the artifact that results from problem solving applied to software. Those two things are related but not identical, and conflating them has always caused organizational dysfunction.
Consider what actually happens in a high-quality engineering engagement with a complex problem. An engineer spends time understanding the problem domain. They ask clarifying questions. They read existing code to understand the current system. They think about failure modes. They consider alternative approaches and make tradeoffs between them. They prototype and discard. They consult with colleagues. They check their understanding of requirements against what the business actually needs. Then, and often only then, they write code.
In that workflow, the code-writing step might represent twenty percent of the total time invested in solving the problem well. The other eighty percent is problem framing, system understanding, design thinking, and judgment. AI tools can significantly accelerate the code-writing step. They can help with the code-reading step. They can assist with some aspects of prototyping. But the judgment-intensive, context-dependent, domain-specific work of correctly framing and scoping a problem is not something current AI tools do well. That work still requires a human engineer, and it is the work that determines whether the resulting code actually solves the right problem.

The engineers who understand this distinction are the ones who use AI tools most effectively. They recognize that AI can compress the time spent on implementation once a problem is correctly understood, which frees them to spend more time on the understanding itself. The engineers who miss this distinction use AI tools to generate code faster without investing more in understanding, and they ship solutions to problems they did not fully analyze.
This is the fundamental difference between an AI-amplified engineer and an AI-dependent one, and it is a distinction that no activity metric, AI-based or otherwise, can directly measure.
AI Leverage as an Engineering Skill
There is a skill emerging in software engineering that does not have an established name yet, though people are beginning to use the term AI leverage. It is the ability to use AI tools in ways that multiply your effectiveness as an engineer, rather than substituting for engineering judgment.
AI leverage is not the same as being good at prompting, though effective prompting is one component. It is a broader capability that includes knowing when to reach for an AI tool and when not to, how to decompose a complex engineering task so that AI can help with appropriate pieces of it, how to critically evaluate AI-generated code rather than accepting it uncritically, how to use AI assistance to explore a design space more quickly, and how to integrate AI-generated components into a larger system that you are responsible for understanding holistically.
This skill has a force multiplier quality that is qualitatively different from other engineering skills. An engineer with strong AI leverage who is also technically strong can cover ground that would previously have required a larger team. They can prototype faster, explore more options, write more comprehensive tests, and document more thoroughly, not because they are generating more work product, but because AI assistance has freed them from the mechanical aspects of those tasks. They bring more human judgment to bear on the parts that require human judgment because they are spending less cognitive load on the parts that do not.
Importantly, AI leverage scales with underlying technical depth. An engineer who deeply understands distributed systems, database internals, or security architecture will extract more value from AI tools than one who does not, because they can evaluate AI suggestions critically, catch errors that a less experienced engineer would miss, and ask better, more precise questions. This is the opposite of the narrative that AI democratizes software engineering by making technical depth less important. In practice, technical depth makes AI leverage more powerful, not less necessary.
An engineer who lacks technical depth but uses AI tools heavily is in a precarious position. They may produce correct-looking code that hides subtle errors they cannot detect. They may accept architectural suggestions from AI that are locally reasonable but globally wrong. They may be unable to debug complex failures in AI-generated code because they do not have the mental model of what the code is actually doing. The AI tool, in this case, is not a force multiplier. It is a liability that makes problems less visible.
What Good Looks Like Now: Four Engineer Archetypes
To make these ideas more concrete, it helps to describe actual engineering archetypes that AI tools have made more visible and more consequential.
The first is the high-output, low-value engineer. They are enthusiastic adopters of every AI tool available. They commit frequently, often multiple times a day. Their pull request volume is impressive. They accept AI suggestions rapidly and move on. The code they ship is syntactically correct and passes most automated checks. But their code reviews are shallow. Their features require repeated revision because edge cases were not thought through. Their changes occasionally introduce subtle bugs that only appear in production because they did not deeply consider the failure modes. In sprint metrics, they look like top performers. In production, they are a recurring source of incidents and rework.
The second is the quiet force multiplier. They do not commit as often. Their pull requests tend to be larger and more considered, often including tests, documentation, and migration paths that others forget. Before they write code, they ask questions that frequently cause the team to reconsider what they are actually building. Their code reviews are detailed and catch things that automated tools miss because they are evaluating design, not just syntax. When they use AI tools, they use them for well-scoped mechanical tasks and review the output carefully. They are not always visible in metrics, but teams that have them are measurably more effective, and teams that lose them notice immediately.
The third is the AI-amplified specialist. They have deep expertise in a specific domain, say database performance or distributed consensus, and they use AI tools to work effectively outside their specialty. They can ship a full-stack feature independently where previously they might have needed help with the frontend, because AI assistance makes the unfamiliar parts accessible enough for their judgment to carry them through. Their productivity is genuinely multiplied because their domain expertise guides the AI effectively, and the AI makes them self-sufficient in areas where they would otherwise be blocked.
The fourth is the architectural investor. They write relatively little production code directly. Their time goes into design documents, platform improvements, removing systemic friction, and improving the development experience for everyone else. The code they do write tends to be foundational: the authentication abstraction that every service uses, the observability library that makes debugging tractable, the data model that will shape the system for years. In any given sprint, their individual output looks modest. Over a year, the leverage they create for the rest of the team is enormous. These engineers are often undervalued in metric-driven organizations and overvalued by engineering leaders who have learned to look past activity to impact.
None of these archetypes is cleanly captured by code production metrics. Two of them produce less code than the other two. All four of them behave differently with AI tools, for different reasons that are entirely consistent with engineering excellence or the lack of it.
The Case for Outcomes-Based Engineering Evaluation
If activity metrics are increasingly unreliable, the obvious alternative is outcomes-based evaluation. Measure what engineers deliver in terms of value to the business and the engineering organization, not the volume of work product they generate in doing so.
This sounds straightforward. In practice, it requires a significant shift in how engineering leaders observe and evaluate their teams, because outcomes in software engineering are often delayed, shared, and hard to attribute cleanly to individual contributors.
A feature shipped this quarter may have value that only becomes clear next quarter when customer adoption data is available. A reliability improvement made last month may prevent an incident six months from now that we can never count. An architectural decision made today will either compound positively or create debt that slows everyone down for years, but the effects are diffuse and slow-moving. Individual contributions are intertwined with team contributions. The engineer who wrote the code did so in a context shaped by the engineer who designed the architecture, the engineer who did the code review, the engineer who improved the deployment pipeline, and many others.
These attribution problems are real, but they are not reasons to abandon outcomes thinking. They are reasons to build a more sophisticated evaluation framework that captures multiple dimensions of engineering contribution and acknowledges that some of those dimensions are better measured than others.
A Modern Engineering Scorecard
What follows is a proposed framework for evaluating engineering performance in a world where code generation is increasingly automated. It is not a metric dashboard. It is a set of dimensions for structured conversation between engineers and their leaders, supported where possible by quantitative signals and grounded always in judgment rather than formula.

Feature Delivery Lead Time
Lead time, the elapsed time from when work is started to when it is delivering value in production, is one of the most meaningful signals of engineering effectiveness available to an organization. It captures how quickly an engineer moves from problem to solution, and it reflects not just coding speed but the entire chain of activities that determine how fast value gets delivered: clarity of problem definition, technical decision-making, implementation quality, review quality, and deployment practices.
The reason lead time matters more than throughput is that it captures the cost of delay. A feature that takes one engineer two weeks to deliver is worth more than the same feature that takes another engineer three months, even if the three-month version has more lines of code.
Lead time can be measured through work tracking systems, though it requires discipline in how work is defined and tracked. The metric is most meaningful when work items are reasonably similar in scope, which requires breaking down large efforts into measurable units.
The pitfalls are significant. Lead time can be gamed by starting tasks late or defining scope narrowly. Engineers who take on complex, high-risk work will naturally have longer lead times than those who consistently take simpler work. The metric needs to be evaluated in the context of the complexity and ambiguity of the work being done, which brings us to the next dimension.
An important reason this metric is more meaningful than commit velocity or PR volume is that it measures the end-to-end process, not just the coding step. An engineer who commits frequently but whose code spends two weeks in review because it is not ready is not delivering faster than an engineer who commits less often but whose code merges cleanly.
Production Quality
Production quality captures the reliability of what an engineer ships. It includes bug escape rate, the frequency with which bugs that could have been caught earlier make it to production; incident involvement, how often changes contributed by an engineer are involved in production incidents; change failure rate, the percentage of deployments that require rollback or emergency fix; and test coverage quality, not the quantity of tests but the effectiveness of the test suite in catching regressions.
This dimension matters enormously in an AI-assisted world because AI tools are excellent at generating code that passes tests but are much less reliable at generating code that handles edge cases correctly. An engineer who uses AI tools uncritically will often ship code that looks correct but fails in ways that only emerge under real production conditions. Production quality is the signal that catches this.
Measurement requires good instrumentation at the deployment and incident management level. Tools like Jira, PagerDuty, and deployment tracking systems can provide the data if the organization is disciplined about linking incidents to change records.
The pitfalls include the fact that engineers working on inherently more complex or higher-risk systems will have worse production quality metrics than those working on stable, well-understood systems through no fault of their own. Context matters. The metric should be evaluated relative to the risk profile of the work. There is also the risk that engineers respond by avoiding complex work or by adding excessive defensive measures that slow delivery. Calibrating production quality metrics requires ongoing judgment from engineering leadership rather than mechanical threshold application.
This metric is more meaningful than code volume because it directly captures the downstream consequences of engineering decisions. An engineer who ships ten features with five production incidents may be delivering less value than an engineer who ships five features with zero incidents.
Technical Complexity Handled
This dimension attempts to capture the difficulty and ambiguity of the problems an engineer engages with. It is the most qualitative of the dimensions, and in many ways the most important, because it is where the distinction between code production and problem solving is most visible.
Technical complexity handled includes the ability to work effectively on problems that are not well-defined, the ability to navigate systems the engineer has not worked in before, the ability to make good architectural decisions under uncertainty, and the ability to break down complex problems into solvable pieces. It also includes the willingness to take on the hard problems rather than routing around them.
This is not easily measured by automated tooling. It requires direct observation by engineering leaders and structured input from peers. Calibration questions that help evaluate this dimension include: Was this engineer sought out for advice on difficult problems? Did they take on work that others had avoided because it was hard? Did they simplify complexity or add to it? Did they correctly identify the hardest parts of the problems they worked on?
The pitfall is that complexity can become a status game. Some engineers develop a taste for unnecessarily complicated solutions because complexity signals expertise. The relevant question is not whether an engineer handled complex problems, but whether they made those problems simpler through their work. An engineer who takes a gnarly problem and produces an elegant, simple solution is handling technical complexity effectively. One who takes a simple problem and produces a complex solution may be creating complexity, not handling it.
This dimension is more meaningful than traditional metrics because it directly captures the judgment and problem-framing capability that AI tools cannot replace. It also distinguishes the engineer who consistently takes on hard problems from the one who maximizes their metrics by optimizing for easy wins.
Peer Review Effectiveness
Code review is one of the highest-leverage activities in a software engineering organization. A good code review catches bugs, improves design, shares knowledge, maintains architectural consistency, and mentors less experienced engineers. A poor code review provides false assurance, misses meaningful issues, and creates a bottleneck without adding value.
In an AI-assisted world, the value of human code review increases rather than decreases. AI tools can generate syntactically correct code quickly, but they do not evaluate whether that code fits the system architecture, whether it handles edge cases that are specific to the business domain, or whether it is consistent with the team’s established patterns. Human reviewers who understand these things deeply are more valuable, not less, when the code being reviewed is AI-generated.
Peer review effectiveness can be partially measured through the quality of review comments: specificity, technical depth, actionability. It can also be observed through the downstream outcomes of reviews: whether issues caught in review prevented production problems, whether review feedback was accurate, whether the engineer being reviewed found the review helpful.
The pitfalls include the risk of rewarding review volume over review quality, which pushes toward superficial reviews that generate many comments without meaningful insight. There is also the risk of rewarding engineers who are aggressive in review and leave many comments regardless of their validity, which can create friction and discourage people from submitting work for review.
This dimension matters because it captures one of the most significant ways that senior engineers multiply the effectiveness of their teams. An engineer who consistently produces reviews that make the team’s code better is delivering value that is invisible in individual production metrics but enormous in aggregate impact.
System and Platform Improvements
Some of the most valuable engineering work is work that makes other engineers more effective. Improving the CI pipeline so tests run faster. Refactoring a shared library to make it easier to use correctly and harder to use incorrectly. Writing internal tooling that automates repetitive work. Improving observability so debugging is faster. Documenting a complex system so new engineers can understand it without spending weeks reading code.
This work is chronically undervalued in metric-driven organizations because it does not map cleanly to product features. It does not appear in sprint velocity metrics in a way that compares fairly to feature development. It is often invisible to product managers and business stakeholders. And yet, the compounding effects of this work are among the most significant determinants of team productivity over time.
An engineering organization that consistently invests in system and platform improvements is one where engineers get faster over time rather than slower as the system grows. One that does not becomes progressively harder to change as technical debt accumulates and the development experience degrades.
Measuring this dimension requires explicitly tracking and valuing this category of work. It means having engineering leaders who can recognize platform improvements when they see them and who make them visible in performance discussions. It means creating space in roadmaps for this work and evaluating the engineers who do it on the value of the improvement rather than the volume of the output.
The pitfall is that platform work can become an excuse to avoid delivering product value. Engineers who spend all their time on infrastructure improvements without any connection to business outcomes are not necessarily more valuable than those who deliver features. The balance matters.
Business Impact
Ultimately, engineering work creates value when it changes something meaningful for the business or its users. Features that are shipped but not adopted create limited value. Reliability improvements that prevent incidents that would have affected customers create significant value. Architecture decisions that enable a new business capability create value that can compound for years.
Business impact is the hardest dimension to measure cleanly at the individual level, but it is the most important to keep in view. Without this dimension, the scorecard can become a technical merit system that is disconnected from the reason the engineering organization exists.
Business impact can be approximated through a combination of signals: customer adoption of features delivered, reduction in support volume from quality improvements, revenue attributable to technical capabilities, time saved by internal tooling, incidents prevented by reliability engineering. None of these cleanly attribute to individuals, but together they create a picture of whether engineering work is connecting to business outcomes.
The principal pitfall is that business impact can be dominated by factors outside engineering control. A feature that was well-built but poorly positioned in the product will not see adoption regardless of its technical quality. An engineer working on an important but unsexy part of the infrastructure may have enormous impact that is never visible in customer-facing metrics. Engineering leaders need to distinguish between impact that engineers control and impact that depends on organizational and market factors.
| Dimension | Why It Matters | How to Measure | Key Pitfall |
|---|---|---|---|
| Feature Delivery Lead Time | Captures end-to-end delivery speed, not just coding speed | Work item start-to-production elapsed time | Gaming by cherry-picking simpler tasks |
| Production Quality | Reveals whether AI-assisted code holds up under real conditions | Bug escape rate, incident involvement, change failure rate | Penalizes engineers on complex or high-risk systems |
| Technical Complexity Handled | Captures problem-framing and judgment that AI cannot replicate | Peer observation, structured conversation, problem ownership | Can reward complexity creation rather than complexity reduction |
| Peer Review Effectiveness | Measures how much an engineer raises the quality of team output | Review quality, downstream outcomes, peer feedback | Volume of comments rewarded over quality of insight |
| System and Platform Improvements | Captures compounding leverage on team-wide productivity | Explicitly tracked; valued in planning and review cycles | Becomes avoidance of product delivery |
| Business Impact | Keeps engineering connected to the reason the organization exists | Adoption, reliability improvement, revenue enablement | Factors outside engineering control dominate the signal |
The Economics of AI-Assisted Engineering
It is worth stepping back and considering the economic implications of AI coding tools at the organizational level, because those implications shape how engineering leadership should think about headcount, specialization, and team design.
AI coding tools shift the economics of software delivery in two directions simultaneously. They reduce the marginal cost of code production, which means that the output any given engineer can produce is significantly higher than before. At the same time, they do not reduce the cost of the judgment, design, and problem-framing work that determines whether that production is pointed in a valuable direction.
The implication is that the ratio of value-creating work to code-production work in a well-functioning engineering organization should shift. Teams that previously needed five engineers to build a feature, with two of them primarily writing implementation code, may now need three engineers, with the freed capacity available for more design, more testing, more code review, and more infrastructure investment. The team gets smaller and, if managed correctly, more effective.
This creates a management challenge. The productivity gains from AI tools are real, but they can easily be absorbed by code sprawl if engineering culture does not adapt. When generating code becomes easy, the temptation is to generate more of it. More features, more abstractions, more frameworks, more complexity. Organizations that do not actively resist this temptation will find that their codebases grow faster than the teams managing them can handle, and that the productivity gains from AI tools are offset by the maintenance burden of the code those tools produced.
The organizations that will benefit most from AI coding tools are those that use them to do more with the same team, not more code with the same team. There is a meaningful difference between those two things, and it requires engineering leadership to hold the distinction clearly.
Why Judgment Compounds
One of the most important and underappreciated dynamics in software engineering is the compounding nature of good judgment over time. An engineer who consistently makes good architectural decisions, who correctly identifies technical risk early, who designs systems that remain maintainable as they grow, creates value that compounds year over year. The system they build is easier to change. The debt they avoided means future engineers can move faster. The foundations they established make it possible to build things that were not possible otherwise.
This is the kind of value that looks invisible in quarterly metrics and becomes undeniable over years. It is also the kind of value that AI tools amplify rather than replace. An engineer with deep architectural judgment who uses AI tools effectively can create foundations faster and with more thoroughness than they could without those tools. Their judgment guides the AI toward appropriate patterns. Their experience lets them catch cases where AI suggestions would create long-term problems.
Conversely, an engineer whose primary contribution is code production and who uses AI tools to produce more code faster, without the judgment to guide that production toward valuable outcomes, creates a compounding liability rather than a compounding asset. More code means more surface area for bugs, more things to maintain, more complexity to understand. The AI amplification of code production without the guidance of sound judgment is a compounding problem, not a compounding benefit.
This dynamic is why the highest-leverage thing engineering leaders can do in an AI-assisted world is invest in engineers with strong judgment and create conditions where that judgment can be applied broadly. The leverage from AI tools scales with the quality of the engineering judgment directing them.
Performance Management in Practice
Translating the principles above into actual performance management conversations requires some practical guidance, because the gap between aspiration and practice is where most organizational change efforts fail.
The first practical shift is moving from quarterly metrics reviews to ongoing calibration conversations. The dimensions that matter most for engineering performance are not things that can be meaningfully summarized by a dashboard refreshed once a quarter. They require direct observation, structured feedback, and a shared understanding between the engineer and their leader about what good looks like in the engineer’s specific context. This means more frequent, more substantive conversations about work, not just status updates.
The second shift is being explicit about what the organization values and why. If engineering leaders say they value impact and judgment but reward engineers who ship the most code, engineers will correctly read the actual signal and optimize for code volume. The stated values and the rewarded behaviors need to be aligned. This alignment needs to be visible in promotion decisions, in compensation conversations, and in public recognition.
The third shift is building the ability to have honest conversations about the quality of engineering work, not just the quantity. This requires engineering leaders who are technically capable enough to evaluate whether a code review is genuinely insightful or superficially thorough, whether an architectural decision reflects deep understanding or surface-level pattern matching, whether a production incident reflects carelessness or unavoidable complexity. This capability cannot be outsourced to a metrics dashboard. It requires technical leadership that stays close to the work.
The fourth shift is creating explicit space for the work that traditional metrics do not capture. Platform improvements, documentation, tooling, and mentorship need to be visible in planning processes and recognized in performance conversations, not treated as work that happens when everything else is done.
The Danger of the Busy Mediocre Engineer
There is a specific failure mode that AI coding tools make more likely, and it deserves direct attention. Call it the busy mediocre engineer problem. These are engineers who are highly active, generate a lot of work product, appear productive by every traditional metric, and are actually delivering negative value to the engineering organization because the accumulated weight of their decisions and their code makes the system harder to work with over time.
In a world without AI tools, these engineers were limited by how much code they could write. A mediocre engineer who writes a lot of code can only write so many lines per day. AI tools remove that limit. A mediocre engineer with AI tools can generate an enormous amount of code very quickly, and if that engineer is not guided by sound engineering judgment, the organization ends up with a much larger and more complex codebase than it needs, shipping faster than it should, accumulating technical debt at an accelerating rate.
The countermeasures to this problem are not technical. They are organizational and cultural. Code review is the primary technical mechanism, and it works to the extent that reviewers have the time, the knowledge, and the authority to say “this should not be built this way” or even “this should not be built at all.” But beyond code review, the culture needs to reward engineers who simplify and resist the culture of more. The best engineering cultures have a strong norm against unnecessary complexity. That norm is more important in an AI-assisted world than it was before.
Rethinking Seniority in an AI World
Seniority in software engineering has traditionally been associated with a combination of depth, breadth, and productivity. Senior engineers were expected to write more code than junior engineers, tackle harder problems, and mentor others. AI tools have decoupled the productivity dimension from the depth and breadth dimensions in ways that require rethinking what seniority means.
A junior engineer with strong AI leverage can now produce code at a volume that previously required several years of experience to achieve. This does not make them senior. Seniority is about the judgment to know what to build, the experience to anticipate failure modes, the architectural understanding to make decisions that hold up over time, and the ability to make other engineers more effective. None of these things are accelerated by AI tools in the way that code production is.
What this means practically is that the distinction between junior and senior engineers, which was always more about judgment than volume, becomes more explicit and more consequential in an AI-assisted environment. Organizations that equate seniority with output will find that their leveling systems become meaningless when output can be arbitrarily amplified by tool adoption. Organizations that equate seniority with judgment will find that the distinction between levels is clearer and more defensible than before.
The implication for career development is important. Junior engineers developing in an AI-assisted world need to be deliberate about building the judgment, system understanding, and architectural thinking that AI does not provide. The temptation is to rely on AI tools to compensate for gaps in fundamental understanding, which produces engineers who can ship code but who lack the mental models to understand what it is doing or why it might fail. Engineering leaders and mentors need to be alert to this pattern and create learning experiences that build depth rather than just accelerating production.
The Decade Ahead
Looking forward, the trajectory seems fairly clear even if the pace is not. AI coding tools will become more capable, more integrated into development workflows, and more widely adopted. The fraction of code in most organizations that is generated with significant AI assistance will increase. The cost of code production per line will continue to fall.
In that environment, the scarcest and most valuable resource in software engineering will be the judgment to direct that code production toward meaningful outcomes. The engineers who combine deep technical understanding with the ability to use AI tools effectively with the judgment to make good architectural and product decisions with the communication skills to work across organizational boundaries will be extraordinarily valuable. There will not be enough of them.
The organizations that find, develop, and retain those engineers will be the ones that benefit most from the AI productivity wave. The organizations that optimize for code volume, that reward the engineers who generate the most output, that build their engineering culture around activity rather than impact, will find that AI tools have made them better at producing code that creates less value.
The best engineer on your team may well be writing less code than everyone else. They may be asking the questions that cause their teammates to build better things. They may be doing the architecture work that makes next year’s features possible. They may be doing the code reviews that prevent the incidents that never happen. They may be using AI tools selectively and thoughtfully in ways that make their output smaller but more valuable.
The only way to know is to look past the metrics that are easy to collect and develop the engineering leadership depth to evaluate what actually matters. That has always been true. AI tools have made it more urgent.