Why is Measuring Developer Productivity Difficult?

"If you can't measure it, you can't manage it." - Peter Drucker

Sep 10, 2024

Measuring individual developer and engineering team productivity has been sought after since the times of the pharaohs (perhaps not that long ago), but it has been on top of mind with engineering managers for decades. The driving factor has been to understand if the organization is function effectively, and staying competitive. Most organizations don't have the luxuries as FAANG companies for engineering budgets, but they do have the pressure to produce.

From measurement perspective, it's not as simple as correlating input to output as the work is complex, requires creativity, individual productivity, team collaboration, managing debt from previous short-term goals that need to be fixed, and creating/maintaining magnitudes of systems.

We can see the complexity of this work was misconstrued by those who used the derogatory term "code monkey". It doesn't help that even in the tech space the understanding of development complexity isn't uniform. Recently the head of AWS, Matt Garman, claimed "If you go forward 24 months from now, or some amount of time — I can't exactly predict where it is — it's possible that most developers are not coding" (Source - Futurism.com). Whereas we see someone like Andrew Ng, a leading AI researcher and co-founder of Google Brain, arguing that cloud infrastructure is becoming commoditized, the true differentiation is from their software and AI capabilities. As someone that has been involved both on the development and infrastructure side, I believe it's nuanced and share my insights here.

While the space is changing, it's still important to measure productivity of developers. At its core what we're really trying to understand whether the team is working efficiently and if there is room for improvement. Thus measuring individual productivity alone can be misleading as some individuals are multipliers for the entire team making them produce 30% more collectively. They may not score well on individual productivity. Yet since the COVID pandemic, 85% of developers across the world are remote (Github), I can understand for managers wanting to know whether team members are performing.

Today, we take the time to explore the complexity of this space and discuss why certain metrics are simply not good enough.

Traditional Mechanisms and Why They Fail

Traditional Measurements

The lowest hanging metric and one that's been used for eons is time - How long an individual is working. It's easy to measure, if everyone is working the same amount of time, it should be fair? I would argue it's a poor metric as time spent doesn't dictate quality, or even output. Two people putting in the same amount of time could produce entirely different results.

As Parkinson's Law dictates "Work expands to fill the time available for its completion." The problem then is people start to put in more hours while producing near the same amount of output. Measuring by this metric can in fact be detrimental to the organization. As developer's work longer hours, they will be exhausted and without enough time to rejuvenate working towards burn out. The quality and output both suffer in the long run.

This is not specific to the tech industry but, a study by John Pencval showed that working over 50 hours there was a significant decline. In fact, output between 70 hours and 55 hours was about the same. Video Gaming is particularly known for crunch times where developers have to work overtime to get a game out. I got to observe this phenomenon when I was at Electronic Arts and it didn't produce better code. In fact, the International Game Developers Association (IGDA) reported that this behaviour led to more bugs, lower morale, and diminished creative output in their Developer Satisfaction Surveys. Basically, once crunch time is over there is significant rework to be done.

Pitfalls

So it's obvious that time input isn't enough to measure productivity, perhaps we need to look at output. The main issue here is that many outputs can be gamified if taken too far. Let's discuss several of these.

Pitfall 1: Commits or Lines of Code

One of the great outcomes of continuous integration and continuous deployments was to get a regular cadence for commits. The issue that came was some started to measure against the actual commits, how many lines you might commit. The goal was to test code and have it deploy ready, not to commit unnecessary code as it increases risk for the business. This can be security risks, maintenance overhead, refactor to eliminate, as you can see the list goes on from there. This metric focuses on quantity not quality, and ignores one of the most important aspects of development: collaboration. Collaboration is necessary to bring the best value to clients and that gets entirely missed here.

Pitfall 2: Bugs Fixed & Features Delivered

This set of metrics also focus on quantity over quality. One could create bugs (either malicously or through a lack of care) and then fix them, gaming the metric and producing no value to the organization. Additionally, while focusing on the number of bugs fixed it can be easy to miss the root cause, which may produce more defects.

Features delivered similarly is a poor metric. A developer may choose to ship several features impact features, instead of a single (albeit more complex) feature that's demanded by the customer. Performance isn't associated with customer need, business impact, or even severity. That's not ideal for the business. Developers measured against this metric are encouraged for feature creep while ignoring the maintenance overhead that is accrued.

Pitfall 3: Utilization Rates & Velocity

Utilization rate focuses on how the percentage of time a developer is actively working on tasks. While it's effective to get an idea on output, it does ignore more intangible tasks such as collaboration, and downtime which is where lots of breakthroughs usually come. Over utilization can also lead to burnout.

Velocity focuses on the amount of work that gets delivered in a sprint. It shouldn't be used in isolation as estimates for work can be inflated making it seem that more work was completed. It doesn't take into consideration team composition as much can be going on day-to-day, people are away, promoted, moved on, etc. Some sprints may just be lighter than others and can be easily missed here. Most importantly, it doesn't take into account the complexity of work. For instance, teams focused on machine learning will be engaging in feature stores and a lot of heavy lifting is done up front. This doesn't apply similarly to web development.

Defining Productivity

By now it should be evident development is multi-faceted and a single metric can't give the insights you might be looking for. The more important part is to understand what you want to measure - there's aspects of individual productivity, team outputs, client value, and business objectives.

The last part is key, whenever measuring productivity it needs to align with business objectives in any field. If the output doesn't align to organizational goals then it's pointless. The key takeaway here is there is no single metric that will provide insight into developer productivity, it involves multiple layers. Next week, we expand on what to measure.

Discussion about this post

Ready for more?