Nobody Expects the Agile Imposition (Part VII): Metrics

Not everything that can be counted counts and not everything that counts can be counted

-- William Bruce Cameron

What's measured improves

-- Peter Drucker

Three recent events got me thinking about software metrics again:

Management use individual KPIs to reward high performers in Sales and other departments. They are contemplating doing the same for Software Developers.
Performance Appraisals often seem subjective. Would metrics make them more objective? Or do more harm than good?
Larry Maccherone was in town recently promoting his company's approach to Agile metrics.

I'm interested to learn:

How does your company reward Software Developers? Are the rewards team-based, individual-based, department-based, whole-company based? How well does it work?
Do you have Performance Appraisals? Do they use metrics? Do your Software Developers/Teams have KPIs?
Do you use metrics to improve your Software Development Process?

I've done a bit of basic research on these topics, which I present below.

Software Metric Gaming

Key performance indicators can also lead to perverse incentives and unintended consequences as a result of employees working to the specific measurements at the expense of the actual quality or value of their work. For example, measuring the productivity of a software development team in terms of source lines of code encourages copy and paste code and over-engineered design, leading to bloated code bases that are particularly difficult to maintain, understand and modify.

-- Performance Indicator (wikipedia)

"Thank you for calling Amazon.com, may I help you?" Then -- Click! You're cut off. That's annoying. You just waited 10 minutes to get through to a human and you mysteriously got disconnected right away. Or is it mysterious? According to Mike Daisey, Amazon rated their customer service representatives based on the number of calls taken per hour. The best way to get your performance rating up was to hang up on customers, thus increasing the number of calls you can take every hour.

Software organizations tend to reward programmers who (a) write lots of code and (b) fix lots of bugs. The best way to get ahead in an organization like this is to check in lots of buggy code and fix it all, rather than taking the extra time to get it right in the first place. When you try to fix this problem by penalizing programmers for creating bugs, you create a perverse incentive for them to hide their bugs or not tell the testers about new code they wrote in hopes that fewer bugs will be found. You can't win.

Don't take my word for it, read Austin's book and you'll understand why this measurement dysfunction is inevitable when you can't completely supervise workers (which is almost always).

-- Joel Spolsky on Measurement

The anecdotes above are just the tip of the iceberg. I've heard many stories over the years of harmful gaming of metrics. It is clear that you should not introduce metrics lightly. It seems best to either:

Define metrics that cannot be effectively gamed; or
Win people's trust that metrics are being used solely to improve company performance and will not be used against anyone.

Suggestions on how to achieve this are welcome.

Performance Appraisals

At a recent Agile metrics panel discussion, I was a bit surprised that everyone agreed that their teams had some "rock stars" and some "bad apples". And that "everyone knew who they were". And that you didn't need metrics to know!

That's been my experience too. I've found that by being an active member of the team, you don't need to rely on numbers, you can simply observe how they perform day to day. Combine with regular one-on-ones plus 360-reviews from their peers and customers and it is obvious who the high performers are and who needs improvement.

Though I personally feel confident with this process, I admit that it is subjective. I have seen cases where two different team leads have given markedly different scores to the same individual. Of course, these scores are at different times and for different projects. Still, personality compatibility (or conflict) between the team lead and team member can make a significant difference to the review score. It does seem unfair and subjective. Can metrics be used to make the performance appraisal process more objective? My feeling is that it would do more harm than good, as indicated in the "Software Metric Gaming" section above. What do you think?

Software Development Process Metrics

Lean-Agile City runs on folklore, intuition, and anecdotes

-- Larry Maccherone (slide 2 of "The Impact of Agile Quantified")

It's exceptionally difficult to measure software developer productivity, for all sorts of famous reasons. And it's even harder to perform anything resembling a valid scientific experiment in software development. You can't have the same team do the same project twice; a bunch of stuff changes the second time around. You can't have two teams do the same project; it's too hard to control all the variables, and it's prohibitively expensive to try it in any case. The same team doing two different projects in a row isn't an experiment either. About the best you can do is gather statistical data across a lot of teams doing a lot of projects, and try to identify similarities, and perform some regressions, and hope you find some meaningful correlations.

But where does the data come from? Companies aren't going to give you their internal data, if they even keep that kind of thing around. Most don't; they cover up their schedule failures and they move on, ever optimistic.

-- Good Agile, Bad Agile by Steve Yegge

As pointed out by Yegge above, software metrics are indeed a slippery problem. Especially problematic is getting your hands on a high quality, statistically significant data set.

The findings in this document were extracted by looking at non-attributable data from 9,629 teams

-- The Impact of Agile Quantified by Larry Maccherone

Larry Maccherone was able to solve Yegge's dataset problem by mining non-attributable data from many different teams, in many different organisations, from many different countries. While I found Larry's results interesting and useful, this remains a slippery problem because each team is different and unique.

Each project's ecosystem is unique. In principle, it should be impossible to say anything concrete and substantive about all teams' ecosystems. It is. Only the people on the team can deduce and decide what will work in that particular environment and tune the environment to support them.

-- Communicating, cooperating teams by Alistair Cockburn

By all means learn from Maccherone's overall results. But also think for yourself. Reason about whether each statistical correlation applies to your team's specific context. And Larry strongly cautions against leaping to conclusions about root causes.

Correlation does not necessarily mean Causation

The findings in this document are extracted by looking for correlation between “decisions” or behaviors (keeping teams stable, setting your team sizes to between 5 and 9, keeping your Work in Process (WiP) low, etc.) and outcomes as measured by the dimensions of the SDPI. As long as the correlations meet certain statistical requirements we report them here. However, correlation does not necessarily mean causation. For example, just because we show that teams with low average WiP have 1/4 as many defects as teams with high WiP, doesn’t necessarily mean that if you lower your WiP, you’ll reduce your defect density to 1/4 of what it is now. The effect may be partially or wholly related to some other underlying mechanism.

-- The Impact of Agile Quantified by Larry Maccherone

"Best Practices"

There are no best practices. Only good practices in context.

-- Seven Deadly Sins of Agile Measurement by Larry Maccherone

I've long found the "Best Practice" meme puzzling. After all, it is impossible to prove that you have truly found the "best" practice. So I welcomed Maccherone's opening piece of advice that the best you can hope for in a complex, empirical process, such as Software Development, is a good process for a given context. Which you should always be seeking to improve.

A common example of "context" are business and economic drivers. If your business demands very high quality, for example, your "best practice" may well be four-week iterations, while if higher productivity is more important than quality, your "best practice" may be one-week sprints instead (see the "Impact of Agile Quantified Summary of Results" section below for iteration length metrics).

Team vs Individual Metrics

From the blog cited by Athanasius:

(From US baseball): In short, players play to the metrics their management values, even at the cost of the team.

Yes, Larry Maccherone mentioned a similar anecdote from US basketball, where a star player had a very high individual scoring percentage ... yet statistics showed that the team actually won more often when the star player was not playing! Larry felt this was because he often took low percentage shots to boost his individual score rather than pass to a player in a better position to score.

Finding the Right Metrics

More interesting quotes from this blog:

The same happens in workplaces. Measure YouTube views? Your employees will strive for more and more views. Measure downloads of a product? You’ll get more of that. But if your actual goal is to boost sales or acquire members, better measures might be return-on-investment (ROI), on-site conversion, or retention. Do people who download the product keep using it, or share it with others? If not, all the downloads in the world won’t help your business.

In the business world, we talk about the difference between vanity metrics and meaningful metrics. Vanity metrics are like dandelions – they might look pretty, but to most of us, they're weeds, using up resources, and doing nothing for your property value. Vanity metrics for your organization might include website visitors per month, Twitter followers, Facebook fans, and media impressions. Here's the thing: if these numbers go up, it might drive up sales of your product. But can you prove it? If yes, great. Measure away. But if you can't, they aren't valuable.

Good metrics have three key attributes: their data are consistent, cheap, and quick to collect. A simple rule of thumb: if you can't measure results within a week for free (and if you can't replicate the process), then you’re prioritizing the wrong ones.

Good data scientists know that analyzing the data is the easy part. The hard part is deciding what data matters.

Schwaber recommends measuring:

Cycle time - quickest time to get one feature out
Release cycle - time to get a release out
Defects - change in defects
Productivity - normalized effort to get a unit of functionality "done"
Stabilization - after code complete, % of a release is spent stabilizing before release
Customer satisfaction - up or down
Employee satisfaction - up or down

Agile Measurement Checklists

Larry Maccherone's Seven Deadly Sins (and Heavenly Virtues) of Agile Measurement:

Sin: Using metrics as levers to change someone else's behaviour. Virtue: Use metrics for feedback to improve your own performance.
Sin: Unbalanced metrics. Virtue: Day-one have one metric from each quadrant. The quadrants are: Productivity (Do it fast); Quality (Do it right); Predictability (Do it on time); Employee Satisfaction (Keep doing it).
Sin: Believing metrics can replace thinking. Virtue: Use quantitative insight to complement rather than replace qualitative insight.
Sin: Too-costly metrics. Virtue: Favour automatic metrics from passively acquired data or lightweight surveys.
Sin: Using a lazy/convenient metric. Virtue: Use ODIM (Outcome/Decision/Insight/Measurement) to determine metrics that provide critical insight and drive your desired outcomes.
Sin: Bad analysis. Virtue: Get your statistics right by consulting experts.
Sin: Forecasting without discussing probability and risk. Virtue: Use the percentile coverage distribution, the cone of uncertainty, or Monte Carlo simulation.

Hank Marquis Seven Dirty Little Truths About Metrics cautions that metrics must derive from, and align with, business goals and strategies. And metrics should be selected only after understanding the needs the metric addresses.

What gets measured is what gets done.
Metrics drive both good AND bad behaviour.
Failure to align with Vital Business Functions (VBF, e.g. Revenue impact, data security) can lead you astray.
Metrics do not get better with age -- they often become obsolete.
The real purpose of metrics is to help you make better decisions.
Effective metrics do not measure people -- they measure teams and processes.
Good metrics help optimize the performance of the whole organization.

Further advice from Hank:

Align with Virtual Business Functions. Regardless of the IT activity, you need to make sure your metrics tells you something about the VBF that depends on what you are measuring.
Keep it simple. A common problem manager fault is overloading a metric. That is, trying to get a single metric to report more than one thing. If you want to track more than one thing, create a metric for each. Keep the metric simple and easy to understand. If it is too hard to determine the metrics, people often fake the data or the entire report.
Good enough is perfect. Do not waste time polishing your metrics. Instead, select metrics that are easy to track, and easy to understand.
Use metrics as indicators. A KPI does not troubleshoot anything, but rather the KPI indicates something is amiss.
A few good metrics. Too many metrics, even if they are effective, can overwhelm a team. Use three to six.
Beware the trap of metrics. Failure to follow these guidelines invariably results in process problems.

Impact of Agile Quantified Summary of Results

Maccherone's results were reported with regard to the following four dimensions of performance:

Responsiveness. Based on Time in Process (or Time to Market). The amount of time that a work item spends in process.
Quality. Based on defect density. The count of defects divided by man days.
Productivity. Based on Throughput/Team Size. The count of user stories and defects completed in a given time period.
Predictability. Based on throughput variability. The standard deviation of throughput for a given team over 3 monthly periods divided by the average of the throughput for those same 3 months.

Three further "fuzzier" metrics (often measured via lightweight surveys) are currently under development, namely:

Customer Satisfaction.
Employee Engagement.
Build-the-right-thing.

Stable teams resulted in:

60% better productivity
40% better predictability
60% better responsiveness

Recommendations:

Dedicate people to a single team
Keep teams intact and stable

If people are dedicated to only one team rather than multiple teams or projects, they stay focused and get more done, leading to better performance.

Estimating:

No Estimates: 3%
Full Scrum. Story points and task hours: 79%
Lightweight Scrum. Story points only: 10%
Hour-oriented. Task hours only: 8%

Teams doing Full Scrum have 250% better Quality than teams doing no estimating.
Lightweight Scrum performs better overall, with better Productivity, Predictability and Responsiveness.

Recommendations:

Experienced teams may get best results from Lightweight Scrum.
If new to Agile, or focused strongest on Quality, choose Full Scrum.

Work in Process (or WiP) is the measure of the number of simultaneous work items that are "In process" at the same time.

Teams that aggressively control WiP:

Cut time in process in half
Have 1/4 as many defects
But have 34% lower Productivity

Recommendations:

If your WiP is high, reduce it
If your WiP is already low, consider your economic drivers: if productivity drives your bottom line, don't push WiP too low; if time to market drives your bottom line, push WiP as low as it will go

Small teams (of 1-3 people) have:

17% lower Quality
But 17% more Productivity

than teams of the recommended size (5-9 people).

Recommendations:

Set up team size of 5-9 people for the most balanced performance
If you are doing well with larger teams, there's no evidence that you need to change

Iteration Length:

Teams using two-week iterations have the best balanced performance.
Longer iterations correlate with higher Quality.
Shorter iterations correlate with higher Productivity and Responsiveness.

Testers:

More testers lead to better Quality.
But they also generally lead to worse Productivity and Responsiveness.
Interestingly, teams that self-identify as having no testers have: the best Productivity; almost as good Quality; but much wider variation in Quality.

Motivation:

Motive has small but statistically significant impact on performance.
Extrinsic motiviation does not have a negative impact on performance.
Executive support is critical for success with Agile.
Teamwork is not the dominant factor; talent, skills, and experience are.
Those motivated by Quality perform best.

Co-location:

Teams located within the same time zone have up to 25% better productivity.

Other Articles in This Series

References

The Impact of Agile Quantified by Larry Maccherone (PDF)
The Impact of Agile Quantified by Larry Maccherone (slideshare)
Seven Deadly Sins of Agile Measurement by Larry Maccherone (PDF)
Agile practices: what's folklore, what's quantifiable?
Seven Dirty Little Truths About Metrics by Hank Marquis
Good Agile, Bad Agile by Steve Yegge
Joel Spolsky: Measurement
Performance Indicator (wikipedia)
SMART (wikipedia)

Measuring and Managing Performance in Organizations book by Robert D Austin
Kanban and Scrum making the most of both by Henrik Kniberg & Mattias Skarin
Kanban book by David Anderson
SDLC 3.0 Beyond a Tacit Understanding of Agile book by Mark Kennaley
Process Dynamics, Modeling, and Control book by Ogunnaike and Ray
Moneyball: The Art of Winning an Unfair Game book by Michael Lewis

References Added Later

Measuring programmer quality by deorth (2007)

Update: Added new sub-sections to "Summary of Results" section: Iteration length; Testers; Motivation; Co-location. 23-July-2014 Update: Added new sections: Team vs Individual Metrics, Finding the Right Metrics; 23-Nov-2014: Added Schwaber-recommended metrics.

Comment on Nobody Expects the Agile Imposition (Part VII): Metrics

Replies are listed 'Best First'.

Re: Nobody Expects the Agile Imposition (Part VII): Metrics
by Athanasius (Archbishop) on Jul 13, 2014 at 16:05 UTC

Most people use statistics the way a drunkard uses a lamp post, more for support than illumination. — Mark Twain

In an interesting article, “Know the Difference Between Your Data and Your Metrics,” Jeff Bladt and Bob Filbin illustrate the crucial “difference between numbers and numbers that matter” with the example of a YouTube video appeal for used sporting equipment which resulted in:

1.5 million views; but only
8 donation pledges and
zero actual donations.

They argue that “... all metrics are proxies for what ultimately matters ..., but some are better than others.” The better ones they call meaningful metrics, to be contrasted with vanity metrics (such as the 1.5 million YouTube views) which look impressive but have no business value.

I suppose the moral of all this is the usual one: the usefulness of a tool depends not just on its inherent quality, but, more importantly, on the skill of the person using it. A sharp knife is more useful to a surgeon than a blunt one, but more dangerous in the hands of an amateur. Software metrics are a tool for IT managers, useful (or harmful) according to the wisdom (or naïvety) with which they’re collected and interpreted. Bladt and Filbin conclude:

Organizations can’t control their data, but they do control what they care about. ... Good data scientists know that analyzing the data is the easy part. The hard part is deciding what data matters.

So, where does this leave the (non-management) programmer? What is he or she to do with all these metrics? Probably — not too much. In some cases, of course, the programmer can increase his performance by aiming to improve specific, targeted metrics. For example, a programmer who is considered “slow” might aim to increase his output by monitoring weekly LOC, challenging himself to produce more code each week. But this quickly becomes counter-productive if the percentage of bugs increases. In that case, he has merely increased quantity at the expense of quality.

My conclusion is that metrics are useful for management, much less so for programmers. The programmer should, IMHO, focus on the task of producing quality software; in other words, on nailing down requirements, honing the design, and writing clean, self-documenting, well-tested code. Produce good code, and the metrics will take care of themselves.

Thanks to eyepopslikeamosquito for an interesting meditation.

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]

Re: Nobody Expects the Agile Imposition (Part VII): Metrics
by zentara (Archbishop) on Jul 13, 2014 at 09:42 UTC

What's measured improves

Hi, I have to applaud you for the effort of putting all this together, however, it seems to me that it all just makes life more complicated. I want simplicity. I don't want to worry about the efficacy of toilet paper. Maybe I'm getting old and don't want to learn new tricks, but your article presents alot of truth that would be useful to Edward Bernays' descendants.

All this agile thinking has made me stodgy and stiff. ;-)

I'm not really a human, but I play one on earth.
Old Perl Programmer Haiku ................... flash japh

[reply]

Re: Nobody Expects the Agile Imposition (Part VII): Metrics (self)
by tye (Sage) on Jul 13, 2014 at 17:21 UTC

As with many things, developer metrics can work well when chosen from the bottom (of the organization) and can be horrible when imposed from the top.

The only developer metrics we use are ones selected by our own team and those are only short-term for getting improvement on a specific problem. We haven't resorted to that recently. So that would be the Process Improvement part.

The main place I see pushing to identify metrics is around determining how successful a feature or product (or other "story") really is. But that seems tangential to what you were asking.

There are various rewards for developers and many of them have individual, team, business unit, and corporate-wide components. We have performance reviews and PKIs (using a different name).

We don't use metrics in performance review. Metrics might make them seem less subjective, but they'd mostly just make them more stupid. There is no way around needing a good manager. A desire to impose metrics on performance review is usually a reaction to some of the managers being worse at their job. The best metrics can't compete with even a moderately good manager. Instead of metrics, invest in improving how good your managers are.

We do lots of other things to try to reduce problematic subjectivity: have company-wide guidance on how reviews are done; do them all at the same time; incorporate feedback from top-down, bottom-up, laterally, and cross-team; separate performance review from compensation review; keep pushing to minimize as much of the process as practical.

Some PKIs have metrics, but each such would be chosen by the people closest to the particular PKI. So some of my personal goals have metrics but those metrics are meant to aid me and my manager being more objective in accessing progress on the goal and to improve clarity on expectations. Those metrics mostly don't matter in the PKI assessment process. My manager and I discuss how successful I was at each goal and assign a percentage grade to groups of goals for the purpose of determining bonus amount. We write up a very short assessment and we might mention numbers from some metrics, but what matters is the percentage score we agree on. Rarely there can be push-back from a higher layer of management but that is pretty rare because most of this is no surprise; even the President of the company has some idea of how well I'm doing before he sees my PKI paperwork (and that is true for most of the 200 employees).

The broader PKIs are more likely to have metrics or to be completely expressed as a particular metric (for example, there are usually a couple of direct financial metric goals). Those end up being less important and are just an attempt to encourage alignment at a broader scale and mostly have value, IMHO, when we meet those goals because it lets everybody share in the success.

But the real value in all of this is facilitating the flow of information and expertise such that each person has a clearer and more accurate picture of how well they themselves are doing and how they can do better (and are less likely to feel that others are "getting away with" doing a poor job).

Rewards are actually quite dismal at improving performance. Read "Punished by Rewards" if you want to know more about that.

- tye

[reply]

Re: Nobody Expects the Agile Imposition (Part VII): Metrics
by wjw (Priest) on Jul 13, 2014 at 17:05 UTC

This is an outstanding piece of work which I will refer back to. Thanks for this!

Having worked in small, loose organizations as well as large, highly structured organizations and some variations in between over the last 30+ years, I have found KPI's and other metrics primarily punitive. Human performance is subjective. Measuring a human as if they are a piece of equipment(in spec/out of spec, in tolerance/out of tolerance, effective/non-effective, efficient/non-efficient) is, to me, simply demeaning. Every one of those measurements is subjective when applied to a human.

I don't see this as a problem of measurement so much as compared to what we choose to measure. All kinds of methods are being and have been developed to measure the performance of employees, with the supposed goal of being fair and impartial. I have yet to see one which is not gamed, either by those applying the measurement or those who are measured. Show me any contrived system and I will define it as broken upon conception, because the one sure way to get ahead is to abuse a system, and thus a system will be abused.

One of the problems with any of the 'measurement' systems is that they attempt factor out the core value that humans bring to an endeavor: Diversity. Analogy: Measure every individual fruit in a fruit basket based on the same criteria. Measure apples<=>oranges<=>grapes<=>lemons<=>mango<=>etc..., all on a single scale. I don't care how intricate you make that scale, how you try to balance it. The bottom line will be that a subjective call will have to be made to accurately reflect 'value' brought by any given fruit. Try it with taste as the scale: I would venture to guess that lemon will come out below apple except for in unique cases that fall well outside of six-sigma. Then apply the scale of 'beneficial vitamin C', probably a similar result with a different winner.

Another challenge with these 'measurements' is that the 'environment' around that which your are measuring constantly changes. A good example is when contractor(s) are brought it to assist with a project. In the fruit bowl analogy, this is adding nuts to the basket of fruit. I suppose that it could be argued that nuts are a fruit, but regardless, when they are added - and subsequently removed - the measurements generally do not reflect that change. Again, the subjective is required in order to compensate for the skew not captured in the measurement scale.

That is two examples of what I have seen which convinced me that the quarterly performance review was a worthless venture(except in the rare case where I have had a manager who pencil whipped the thing with me, and then spent the next 45+ minutes discussing opportunities, problems and solutions, and generally giving me a better overview of the organizational direction and mode of operation employed to achieve goals). There are more examples, some of which are referred to in the work by eyepopslikeamosquito. The fact that there is "So Much" in that posting goes to my point from my perspective.

Productivity is not a check-list process. It is an art, not a science. An employee, regardless of whether they are the measured or the applier of measures, would be wise to examine and reflect on that list of items under the 'Agile' heading in the post. There is nothing inherently wrong with that list or the any of the individual points within it. However, to implement it as a check-list process(as I have experienced in most cases) is about as demotivating as one can get. As I generally believe that a system can't work unless all involved in the system are working within it, I have tried to work within the constraints of the systems, and generally lost because of it. Those that ride the fringes and step outside it, and are given the lee-way to do so, generally gain. Maybe I am just a loose cannon..., but I tend to ignore that which I see as broken due to complexity. And measuring human performance is as complex as it gets in my view.

One last point before I conclude(and this may come of sounding political again. It is laymen's philosophy in my view, not political): Human endeavor is a process, not a result. A given result usually involves orders of magnitude more process than we give credit for. We plan to achieve a result, then we endeavor to put in place that which will allow us to achieve the result. The result is given such an intense focus that we tend to forget the importance of process. By that I mean the tangential results of having endured the process. Those are the things that these measurements never really address. And those are the very things that bring long lasting value to the next endeavor, value to the laying out of the next plan, value to the teams and individuals that make up those teams in terms of skills and wisdom. We are in such a hurry to get that damn result, that we tend to throw out the value of having achieved it. So much effort in coming up with what to think as compared to taking the time to examine how to think...

To conclude: I do not mean in any way to take away or diminish the work or the content of the post I am responding to. As I stated up front, it is excellent and useful. In many ways, this is an unfair response:

The original post is well researched, my response is anecdotal.
The original post is well laid out and organized, my response is somewhat slapdash and probably incongruous.
The original post is referenced on existing works of others, my response is based on nothing but my opinion and personal experience.
The original post reflects some experience in a management role, I have avoided that role like the plague though have managed teams informally as part of projects I have been on.
The post approaches the issue(s) openly, whereas my approach is pretty narrow(probably with a bit of attitude tossed in)

I appreciate the post and the opportunity provided by this community to present my thoughts, regardless of the value or lack thereof(of my thoughts)... :-)

...the majority is always wrong, and always the last to know about it...

Insanity: Doing the same thing over and over again and expecting different results...

A solution is nothing more than a clearly stated problem...otherwise, the problem is not a problem, it is a facct

wjw

[reply]

Re: Nobody Expects the Agile Imposition (Part VII): Metrics
by BrowserUk (Patriarch) on Jul 16, 2014 at 00:40 UTC

I have two takes on the premise of the use of metrics to measure, or regulate, or reward the writing of code.

Agile -- simplistically stated -- uses the number, or rather the percentage of, tests passed as its primary working metric. And that encourages producing lots of, simple, easily passed, pointless tests.

You doubt this? Inspect *any* perl module that uses the Test::* conglomeration and see if it uses require_ok( 'Some::Module' );. If it does -- and most do -- you've seen this encouragement in action!

This is a completely pointless test.

If:

(this instance of) Perl is incapable of loading a module:
then the module (Test::More?) that provides the code for require_ok() will not be loaded and the test will never be reached.
The module (Some::Module above) has not been installed:
The the test suite containing this test will also not be installed, so the test will never be run.

However, if the module has been installed; but the Some::Module file has been 'emptied' except for the return 1; (or equivalent), then the test will be run and pass; despite that no other test will pass as no actual code has been loaded.

The upshot of that is that in the only scenario in which the test actually tells you anything (the latter case), it gives a false positive resulting in a 1/N tests pass instead of a 100% failure.

Note: There is no "gaming the system" here. The tool(s) provide the facility, the synopsis and detail promote its use; the users simply use it because "it seems like the right thing to do". The fault lies with both the tool -- for the provision -- and the philosophy for promoting the use.

My second take is that the moment you try to separate the physical construction of code -- kloc, function points, abstracts test quantities -- from the intellectual processes of gathering requirements; understanding work-patterns and flows; and imagining suitable, appropriate, workable algorithms to meet them; you do not have sufficient understanding of the process involved in code development to be making decisions about it.

You might just as well employ a mortuary attendant to run A&E triage.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

[reply]
[d/l]
[select]

Re: Nobody Expects the Agile Imposition (Part VII): Metrics
by mr_mischief (Monsignor) on Jul 15, 2014 at 14:05 UTC

Sussman attains enlightenment reads:

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. “What are you doing?”, asked Minsky. “I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied. “Why is the net wired randomly?”, asked Minsky. “I do not want it to have any preconceptions of how to play”, Sussman said. Minsky then shut his eyes. “Why do you close your eyes?”, Sussman asked his teacher. “So that the room will be empty.” At that moment, Sussman was enlightened.

As with artificial intelligence, there is little point presupposing what a real intelligence will or won't think. To know what a person is thinking and what motivates them, it is helpful to ask. Then, observe their actions to see where the words and actions hold together or where they take different paths.

If management rewards lines of code for the sake of lines of code, then the programmers will gladly produce the lines of code the management requested. This is not the programmers gaming the system. This is the programmers producing what is asked of them.

If the management asks for high bug fix rates, they will be provided buggy software with lots of bugs to fix. This is not the programmers cheating the metric. This is the programmers giving management what management requested.

If what the management really values is small, maintainable, minimally buggy code then management should damn well ask for small, maintainable, minimally buggy code. Then management should allow developer and tester time to the task of designing, developing, refactoring, reviewing, and testing code. Make sure the developers and testers feel safe to criticize one another's work and more importantly feel safe to have their work criticized.

Let me take a particular agile framework as an example. I feel the reason Scrum works so well has less than the progenitors of it think to do with the regularity or exact length of the meetings, although touching base together briefly and regularly is a great thing. I think the biggest factors they got right are the self-managing team that answers as a single unit and having that unit submit their work to a single vision. A good product owner who can stand in the gap between the customer and the development team, and who accepts the work of the team to present to the customer for their acceptance is the secret ingredient in the sauce.

I've seen some complaining around the net about Scrum as having a high overhead. Well, that's true compared to some small teams but it buys a great deal of value for that overhead. One needs to communicate frequently to be truly agile and adjust frequently. One needs to have an at least partially fixed spec for a while to produce something of value without confusion, even if that spec is a bit nebulous and only stays fixed this two-week iteration. The team needs to lean on one another and mentor one another to prevent fortresses of individual skill and knowledge. Those fortresses fall hard when the team is realigned, and can bring down projects or even companies. It is best to prevent them.

Agile development is about getting the minimum work done to meet a quality and completeness goal as quickly as possible. There's a maxim we should all know by now: "Fast, cheap, good: pick two". There's no way to get good software fast without a little overhead cost. Scrum I think balances the cost to the benefit pretty well.

[reply]

Re: Nobody Expects the Agile Imposition (Part VII): Metrics
by sundialsvc4 (Abbot) on Jul 13, 2014 at 17:32 UTC

/me nods ...

“You’re over-thinking this thing,” he muttered, thinking about the book, Cheaper By The Dozen. (A hilarious book by the daughter of an early process-management consultant ...)

Fundamentally, the writing of computer software has to do only with the success or the failure of the computer software to fulfill the purpose for which it was written. Infrequently, the cause is programmer incompetence. Usually, it’s because people are not-working together on a project that has never been completely specified in the first place. And they’re not continually testing what they have done, to know if it actually works and keeps working. In other words, yellow-stickies or not, it’s hopeless. Whether or not you are measuring “progress,” the project’s taking one step forward and two steps back. You measure time that is being profoundly wasted.

The best idea I ever had, a couple decades ago now, was to base things on a contractually binding task-order and change-order system ... and to make task-order #1 (and beyond) consist of building the remaining task breakdown. Charging the same money to do that, or more, than for building code. It’s modeled after the construction trades, and it works. If you systematically embrace all of the best-practices that are involved (start to finish ...) in constructing a building or a roadway, and apply these to software ... it works like nothing else does. (A project to repair the potholes in 100 feet of bad-condition street might consist of more than 250 line-items and accumulate the data points of more than 1,000 tests. They leave zero, nada, nothing, zilch to chance or “creativity.”)

The usual practice ... and Agile encourages it ... is for people to be “lone wolves, in a pack.” They forget that they are building a machine, that will work entirely without human intervention.(*) The process they use is the same one they used in school ... at 11:30 PM the night before the assignment was due. They’re frankly entirely accustomed to “pantsing it,” with only a token nod to planning, and no more of a nod to communication than will fit on a 2" x 4" yellow piece of paper. Almost invariably, one person will be tasked with devising and writing the code for some feature, and for telling the rest of the “team” what s/he did after the fact. I don’t know why this is so, but I’ve seen it in far too many teams to think of it as accidental.

So, what you really need isn’t “metrics” at all, in the sense of lines-of-code, hours-worked and what-not ... except possibly for cost accounting. You don’t need to measure what your programmers are doing. Instead, you need to be checking-off their progress along a meticulously-prepared punch list, cleanly separating the process of “deciding what to do and when to do it” from “what the source-code writers are doing.” It doesn’t matter if folks are standing up or sitting down. If the plan is good and complete, the work will go according to plan. (And, yes, that plan should called for staged delivery and for overlapping threads of progress.) If the work has been spelled out sufficiently, the writing and testing of the corresponding source code is ... perfunctory. And the resulting code will be damned near bulletproof. If you measure anything at all, it should be the on- or off-schedule progress of the machine that is being constructed day-by-day.

_{(*) The best book that spells out that idea very completely, from a project-management point of view, is a little Kindle-only book called Managing the Mechanism, by Vincent North, which is available [only ...] on Amazon. Find it, read it.}

[reply]

Re^2: Nobody Expects the Agile Imposition (Part VII): Metrics

by bulrush (Scribe) on Jul 15, 2014 at 11:03 UTC

Management use individual KPIs to reward high performers in Sales and other departments. They are contemplating doing the same for Software Developers.

They've tried this before in the 1990s, it didn't work well. There are many variables, and a huge variable is: your vendor's modules/software doesn't do what it says it does. Keep in mind that marketing people, not software engineers, usually write the marketing material. Thus the marketing people really don't understand what a product does, and when programming gets involved, details matter. This was so bad that in one company I demanded to review all documentation written for changes I made. (We had a marketing person writing technical documentation. Sometimes it lacked important details, and sometimes it was just incorrect.)

Example: In the 1990s I was using MS Access. The box said it was backwards compatible with all versions. It was not. It only converted forms, queries, reports, and macros, not code. When a new version of Access was installed, all code had to be retested, and sometimes rewritten. If you used a lot of code, it was a sure bet something was going to break, especially if you used a DLL that got upgraded. When the OS or another product (like Access) was upgraded, some DLLs had their data type changed to another larger type, from integer to long for example.

My point: you really don't know if your vendor's software will do everything it says because you can't trust what's on the box or in the documentation. Good luck getting your vendor to fix something (that should have been designed correctly in the first place) just for you without paying through the nose.

Perl 5.8.8 on Redhat Linux RHEL 5.5.56 (64-bit)

[reply]

Re: Nobody Expects the Agile Imposition (Part VII): Metrics
by sundialsvc4 (Abbot) on Jul 15, 2014 at 15:00 UTC

A key problem with these metrics is that they come from the world of manufacturing. In those lines of business, you are performing a repetitive process, with one iteration (and outcome) being independent of all the others, and the process is basically deterministic. You can trace defects either to the worker, to the materials, or to the line. Measurements of performance are meaningful and useful; incentives and bonuses might work. However, none of this is true with software, and for a variety of reasons.

First and foremost, software systems are, as the book puts it, chock-full of “interlocking interdependencies.” Everything is connected to everything else. The flap of a butterfly’s wings over here does cause a hurricane over there. (One of the key take-aways from test-driven development, and therefore one of the reasons to be using it even on your own solo projects, is to see just how easy it is to cause a seemingly-unrelated set of tests to start failing. You won’t believe it, at first.)

Second, and as the book also mentions, the thing is a machine, which will operate entirely without human intervention and beyond human control. No one will know or care whether the team that built it was standing up or sitting down, but they damm well will care if it does not work ... perfectly. It is entirely possible, and frequently the case, that a large block of carefully-measured (and expensive) time will be a 100% sunk-cost. It is also entirely possible that it will be impossible to predict “how long it will take” to solve a problem, since the part of the problem-iceberg that pokes its head above the water is only a minuscule portion of the actual thing which might be broken or defective ... or inadequately or incompletely designed.

Finally, there is the natural human tendency to want to say, “Naming it will Solve it.™” Although some of the names we come up with are perfectly nonsensical ... (“Scrum PLOP?” Really?? What sort of sh*t is that?) ... giving them names or TLA’s (Three-Letter Acronyms) implies that we actually know what we are talking about. This quickly creates a gigantic credibility-gap. (“Fool me twice == you’re fired.”)

Another excellent book on this subject is Hollywood Secrets of Project Management Success, by James Persse. Dr. Persse, who by the way is an excellent and engaging author, was given the seriously-cool opportunity to get behind the scenes in the motion-picture business, which, as he points out, does have a demonstrated track-record of bringing multi-million dollar long-running projects to completion. Their project management practices have nothing to do with things like “Agile” (although they do employ overlapping, staged delivery). They seem to be quite inflexible, set in stone, although they aren’t. It is a scrupulously-repeated multi-stage project life cycle that has been honed to perfection, because it works. You can “bet a hundred million dollars” on a movie project and be certain that you will get a piece of celluloid, on time, to show for it ... which is a lot more than we can say for software. (And, don’t say that the two types of projects are incomparable, because, being a software technologist himself, he quite convincingly demonstrates that they are. There is a lot that we all can learn from this book, as well as from Mechanism.)

Quite honestly, my biggest beef with the Agile loudmouths (ahem ...) is that, while they certainly “talk a good camp,” they can’t show that their processes actually work. (Which may be one reason why they talk so loud ...) Yes, there are very good reasons to favor staged-delivery and iterative work cycles, when appropriate, but the actual process of creating “software Mechanisms” is vastly more intricate than their theorizing would allow, and the “fabled outcomes” simply don’t materialize. Their premises, while filled with some pretty good ideas, are also insufficient.

[reply]

Back to Meditations