Agile and predictability and metrics, oh my

9 min readNov 15, 2016

Let’s evaluate how to measure predictability in software development, and then implement these metrics in Jira.

Check out my ongoing series of engineering management related musings here:

Eng Mgmt series by Matthew Vanderzee

In a recent post, we explored engineering velocity from two perspectives —external (“Are we meeting our business goals?”) and internal (“do we feel great about how we are progressing?”). We discussed how we internally measure velocity, and even built a Slackbot to automate the process.

At a high level, measuring our engineering velocity from an external perspective is more straightforward: how much of our roadmap have we accomplished, versus how much we should have. And how much should we have accomplished? More. Always more.

But in addition to the raw velocity that we want to continually improve, we also want to improve our predictability. Being able to declare when you are going to deliver something — to another dev team, to the Product team, to a customer — is valuable and important. Perhaps as important as raw velocity. And particularly so, when you are a startup building software for enterprise customers who are paying you a whole bunch of money.

Measuring Churn in Agile

In a sprint-oriented workflow, predictability is equivalent to achieving a sustained velocity and consistently hitting your sprint goals. In other words, the team routinely estimates work well, defines sprints that have the right amount of work in them, and then delivers that work successfully.

Most agile workflow tools track this success — they measure the team’s operating velocity, help you build sprints according to that metric, and then adjust as time progresses. But while establishing that speed metric helps with future planning, it doesn’t provide a lot of clarity into other side of the coin: when things go wrong.

Churn in a sprint is a bad thing. Once you have set up your sprint, the optimal outcome is that you get it all done. Any time something changes mid-sprint, it risks derailing the plan — and thus hampers your predictability. So we need to be able to measure churn, so that we can improve our churn.

An aside: The case for Jira

An awful Jira workflow — the kind of thing that makes developers run for the hills

Mentioning Jira to software developers elicits a rainbow of different reactions — from “It’s fine” to “It’s unusable”. Upon closer inspection, though, we find that people who dislike Jira experienced a poorly-managed instance of it. The Jira universe is a maelstrom of configurability, customization, and plugins, and very easily the Jira train runs off the rails into a fiery pit of despair. On the flip side, though, a well-designed and well- managed Jira instance can provide stellar visibility and tracking.

Other ticket tracking systems: we’ve tried them all — Pivotal, Trello, Asana, Sprintly, stickies on whiteboards, using our memory. Some of these have more modern UIs with nice animations and talking paperclips and hipster greetings when you log in. But, as teams grow and evolve and focus on measuring and optimizing our processes, nothing compares to Jira. No matter where we started, we get to the point where the existing tool doesn’t represent our process perfectly, or can’t give us visibility into aspects of how we work. So, we end up transferring all our tickets over to a pristine new Jira instance, and away we go.

Back to it: Churn notifications

Two things we need to accomplish with regards to churn: we need to know when it happens (immediately, so we can understand and mitigate), and we need to collect metrics over time so we can see how we are trending.

To know when some bad churn is happening, we need to set up notifications for relevant events. The typical events in question are: work being unexpectedly added to a sprint (“Oh, wait, this item is really important too!”), work being removed from a sprint (“There’s no way we’re getting this done this week”), or estimates changing (“Turns out ‘rebuild authentication module’ is more than two days”). Unfortunately, base Jira doesn’t provide an easy way to alert on these types of events. Jira is good at notifications when there are state changes (“I’m starting this task” or “I finished this task”), but not for data changes, like the item’s sprint.

Luckily, the great folks at Code Barrel have built Automation for JIRA, a plugin that makes this type of notification simple. Their software allows you to easily set up a sequence of conditions that invoke a set of actions you define. Amazing!

An automation to add a label to any ticket that gets added to a sprint

Behold the glory of this: if the sprint of a ticket is changed, and the new sprint is active, we add some specific labels to the ticket — ready to be used by a filter or a dashboard. Oh, and also we post a message to a relevant Slack channel. Our phones buzz, and we jump up and start panicking like 1980s stockbrokers. Super real-time, super simple, and allows us to see what is happening to our sprints. Brilliant!

Churn metrics over time

Now let’s chart the progress of how well we are avoiding churn over time. Again, core Jira doesn’t have a great way of visualizing these metrics and tracking them. We can set up some filters and dashboards that react to the labels created by our automation, but this isn’t the most permanent, incontrovertible means to track this — labels can be accidentally erased or overridden.

Happily, our great friends at EazyBI have built an amazing business analytics framework that has great Jira integration. Further, they have set it up such that it can walk back through history to determine the state of tickets at any point in the past — exactly what we are looking for. We want to know, at the beginning of the sprint, what tickets were part of that sprint. Then, at the end of the sprint, we want to know what tickets were completed during that sprint. EazyBI does all this CSI to reignite a Cold Case and reestablish Law and Order.

Oh my goodness — that is amazing! We can see, on each day, how many items were added and removed from a sprint!

Here’s how we set this up:

A discussion of data cubes, dimensions, and measures is beyond the scope of this post, of course. Here is the very high-level process to get this going:

Get your Jira data imported into your EazyBI cloud instance. Make sure you include the relevant projects, and that you include your Issue Change History.
Create a new report, view it in table mode, and add Time > All hierarchy level members > Weekly > Day and Sprint > All hierarchy level members > Sprint as rows.
For columns, add All Issues and Measures > Agile > Sprint Issues Added/Removed. Now, and this is where it gets really exciting: add Measures > Agile > Time Within Sprint as a column. This is a field that is 1 when the day is within the relevant sprint. Now, click on the header for that column, and filter the column to only be 1. This will get rid of any row other than when the Date is within the sprint for the relevant Sprint. Once you have filtered by the column, you can remove the column, since you don’t need to visualize its data.
Actually, 3b: Instead of Time Within Sprint, which only includes days that are officially part of the sprint (and doesn’t allow leakage into the weekend after the sprint is over), we defined Time Within Sprint Actual. Then, remove the Time Within Sprint filter and instead apply the same filter on Time Within Sprint Actual. See the appendix below for the code for this custom Measure.
You should now see a table of data that looks correct — make sure you compare against what you know to be reality.
Convert to a bar graph or any other type of visualization that makes sense.

Now you have a way to see recent days’ sprint churn. You may need to limit it to the recent sprints so that it doesn’t become too dense.

Churn metrics over sprints

In addition to seeing the history over time, we also want sprint-by-sprint scorekeeping, so we can see how things are trending. EazyBI to the rescue!

This one is a little more complex to construct. Because we are no longer looking at a simple day-by-day view, and instead are trying to bucket things into sprints, we need some more logic.

Here is how we built this:

We defined a custom Sprint calculated measure, called Modern Sprints. This allows us to only look at recent stuff that we care about, not ancient history before the days of internal combustion engines and distributed computing frameworks. This is not strictly necessary, but helps if you want your visualization to be focused. See the Appendix for the code for Modern Sprints.

We created two calculated measures: Sprint Issues Added During Sprint and Sprint Issues Removed During Sprint. Here is the definition of the Added version:

Sum(
  Filter(
    Descendants([Time].CurrentHierarchyMember, [Time].[Day]),
    [Measures].[Time within Sprint actual] > 0
  ), 
  [Measures].[Sprint issues added (with fix)]
)

This filters all time using Time Within Sprint Actual, discussed above, and then sums the issues added to the sprint using Sprint issues added (with fix). We defined the latter measurement because we wanted to filter out tickets with certain labels — test tickets, erroneous tickets, etc. See the appendix for a definition.

The Removed versions of these custom measurements is similar.

Conclusion

Awesome — we now know when churn is happening, and we can see its impact over time. Actually solving the problem, of course, requires deep understanding of why these things are happening — but at least we know have the visibility we need into the problem. :boom:

Special thanks to Andreas at Code Barrel and Ilze at EazyBI, for their patient support as we got this stuff to work!

Appendix

Code for [Measures].[Time Within Sprint actual]

CASE WHEN
 IsEmpty([Sprint].CurrentMember.get(‘Complete date’))
THEN
 CASE WHEN
 DateBetween([Time].CurrentHierarchyMember.StartDate,
  DateAddDays([Sprint].CurrentMember.get(‘Start date’), -1),
  Now()
 )
 THEN 1
 END
ELSE
 CASE WHEN
  DateBetween([Time].CurrentHierarchyMember.StartDate,
   DateAddDays([Sprint].CurrentMember.get(‘Start date’), -1),
   [Sprint].CurrentMember.get(‘Complete date’)
  )
 THEN 1
 END
END

This returns 1 for any day that is between a sprint’s Start Date and Complete Date. The Complete Date is the day the sprint is closed (versus the End Date which is the planned last day of the sprint. For any Sprint that doesn’t have a Complete Date, this uses today.

Code for [Sprint].[Modern Sprints]

Aggregate(
 Order(
  Filter(
   [Sprint].[Sprint].Members,
   DateCompare([Sprint].CurrentMember.get(‘Start date’), Now()) <= 0
   AND
   [Sprint].CurrentMember.Name MATCHES ‘(?i)(endor|hoth).*’ 
  ),
 [Sprint].CurrentMember.get(‘Start date’), BASC
 )
)

What this does: Take all Sprints, filter them by those that started before now and have (awesome) names like Endor or Hoth, and then order them by start date. Cool.

Code for [Measures].[Sprint issues added (with fix)]

[Measures].[Sprint Issues Added] — 
  ([Measures].[Sprint Issues Added], [Label].[testissue]) -
  ([Measures].[Sprint Issues Added], [Label].[notendor1])

There were two labels we wanted to omit from these metrics — testissue and notendor1. You can use a similar methodology to filter what tickets are included.