How to Make Reliable Probabilistic Forecasts Without Sizing Your Stories Into Even Pieces
Story sizing into even pieces is a widely-spread activity, often considered to be a prerequisite to making reliable future predictions. The concept of artificially splitting your work items into even pieces to be able to produce an accurate delivery forecast is not valid. In fact, resizing your stories is not only completely irrelevant to forecasting, but it can also have a negative effect on the goals you’re trying to achieve.
Why Sizing Your Stories Artificially Is Bad Advice
Let’s explore the main reasons why you don’t need to split your items into even parts.
The Importance of Maintaining User Stories that Define Customer Value
Every story should represent a piece of customer value. Naturally, these items will come in different sizes and varying degrees of complexity. By trimming down a story artificially, the concept behind it is no longer relevant. Instead, it has just become a measure of hours. The customer’s value has been split into units of time and has thus become meaningless to the customer.
Slicing Down Your Story Unnaturally Introduces Unnecessary Dependencies
Sizing your stories into even pieces creates unnecessary complexity and dependencies. If you break units of value into unnatural segments, there will inevitably be dependencies sprouting up between these segments.
Dependencies lead to bottlenecks in your workflow. The more dependencies you introduce, the harder it will become to manage your process effectively.
Every Story Should Be Potentially Shippable
Each developed user story should produce potentially shippable product increments.
By trimming down your stories evenly, they are no longer potentially releasable and thus they can’t be delivered to the customer if that is required.
Splitting Your Stories Only Makes Sense From a Customer Perspective
When it comes to cutting your items into smaller pieces, that goal should always be perceived from a customer perspective and ultimately the new releasable increments won’t be even in size.
There must be a clear, sensible intention justifying you taking a large user story and splitting its scope into multiple releasable pieces. If your user story doesn’t make sense from a customer’s point of view and doesn’t clearly communicate the goal it has to achieve, you had better not start it. It will only pile up into the rest of the work in progress and get stuck into the workflow. Instead, analyze it properly and extract the potentially shippable part out of it.
The most effective approach that you can take is filling the gap between your customers and your delivery team. You should work with your client to clearly specify what and why and everyone in the team should understand these factors. It is the team that figures out the how part. Bring everyone’s expertise together to brainstorm and come up with the most feasible option to solve your customer’s problem.
This way, you can still define your new stories in terms of value and the items are still potentially shippable, without artificially introducing dependencies in your system.
The Difference Between Effort Time and Delivery Time
You’re probably wondering how you can make future predictions without slicing your stories into items of the same size. The answer is simple – making accurate delivery forecasts has nothing to do with the size of your items.
Let’s sort out the math problem together. The delivery time of your work depends on way more variables than the time your team actually needs to work on it.
Delivery Time Does Not Equate to Effort Time
When a work item enters your backlog, it will spend some time in your To Do list before it gets started. Once in progress, it will have to go through all the process steps. Considering that your team works on more than one item at a time, your work will have to wait until the people responsible for each activity in the workflow have the capacity to start working on it. Furthermore, your item’s waiting time will build as a result of any additional work that comes in between, any bottlenecks, any external blockers, and any defects moving back and forth in the process.
The Negative Effect of Waiting Time
Here at Nave, we’ve analyzed about 10 000 workflows and it turned out that, on average, 70% of the time, the work is just sitting and waiting in your workflow. It’s not the performance of the team that causes the delays, it’s their inability to move the work down the funnel as a result of internal or external dependencies.
Based on that research, our conclusion is that, in a low flow efficiency environment, the diversity of your work item sizes have no impact on your delivery times. Improving your delivery speed boils down to how efficiently you manage your workflows, as this is the key to reducing the waiting time to an optimal level.
In a higher flow efficiency environment, you’d have to pay attention to keeping the working practices, skills and expertise of the individuals fairly similar, in order to perform reliable delivery predictions. The size of your work items is not a criterion that affects the accuracy of your predictions.
Predicting Your Delivery Times
Performing probabilistic forecasts using your past performance data is one of the most reliable approaches towards making future predictions because it takes into account all the components that make your delivery times, including the effort needed to complete your items as well as the waiting time in your system.
The Realm of Probability Forecasting
Let’s explore the approach of making reliable future predictions without trimming down your user stories into even pieces. The trick is to analyze what has happened in the past and base your predictions on your historical performance data.
You don’t have to split your stories into similar sizes to produce a reliable forecast. What you need is to clearly classify your items by their priority and make sure they follow their process policies.
Your past performance is outlined on your Cycle Time Histogram. The chart shows the frequency distribution of the delivery times of the tasks in your workflow. The power of this diagram is that it represents the variability in your delivery system.
The dotted vertical lines stretching across the graph are called percentile lines. We use percentiles to establish service level agreements and define the probability of different commitment points being met.
In order to identify whether your distribution is thin-tiled or fat-tailed, simply divide your 98th percentile by your 50th percentile. If the result is greater or equal to 5.6, this means that your frequency distribution is fat-tailed. If the result is less than 5.6 – it’s a thin-tailed distribution.
Further analysis is required to confirm a thin-tailed distribution. You also need to calculate the ratio between the 98th percentile and the mode. If the result is less than 16, it is a thin-tailed distribution.
Let’s analyze the cycle time histogram above. The different averages – the mode, the mean and the median are very close to each other – 1 day, 2 days and 3 days and the tail runs to about 11 days. So the ratio between the most popular value and the 98th percentile is 5.5. The 98th percentile divided by the mode is 11. This is a thin-tailed distribution. This means that there is a low level of variability in the delivery workflow of this team. Thin-tailed distributions depict good predictability and so shorter (or no) delays.
The priority of your items will be represented by classes of service (CoS). You should filter your data by their CoS. It is highly likely that the 85th percentile for Standard tasks has a different cycle time than the 85th percentile for Expedites, for example. That way, you can provide different SLAs for the different work items that you’re committing to.
By looking at the histogram, we now can say that we can deliver any item with a Standard priority in LESS than 6 days with an 85% certainty and LESS than 11 days with a 98% certainty.
If you look into the Cycle Time Breakdown of that team for Standard items, you will see that the effort time tracked in the active states in the workflow represents about 60% of their delivery times and even though their stories have different sizes, they manage a stable system and make their commitments with high confidence.
The Challenge to Making Accurate Delivery Predictions
Now let’s explore the Cycle Time Histogram below exposing the frequency distribution for items with Standard priority.
The first line here points to 1 day. That’s the mode in this cycle time distribution, it represents what’s happening in the most common scenario. This means that if you had that distribution and someone asked you when something would be done, the most popular response would be in less than a day. The 50th percentile points to 9 days. So in half the time, you actually delivered in less than 9 days.
However, the mean or the average of the data is 22 days. The tail of that frequency distribution runs to 98 days. In other words, the longest time that was needed to finish a ticket (excluding the outliers) is about 100 times bigger than the typical time of 1 day. And it’s 10 times bigger than the 50th percentile.
This is a fat-tailed distribution. Fat-tailed distributions mean poor predictability and potentially high impact from long delays. Fat-tailed distributions are fragile. If you’ve been asked “When will this be done” and you want to be truly confident in your reply, your answer should be in less than 98 days.
If you have a fat-tailed distribution and you’re maintaining an unstable system, any approach to making predictions will be unreliable.
Looking into the cycle time breakdown for this team, we can see that the time their work spent in the active states is around 60%, 45% of which it was blocked time (the red sections on the chart). This means that their actual effort time represents 15% of their total delivery time.
The accuracy of your probability forecast doesn’t depend on the size of your work items. It depends on the stability of your delivery workflow.
In a stable system, the most important factor will be the priority of the items. If your system is optimized for predictability, and multiple stories with different sizes are started, they will strictly follow their priority order. Smaller low-priority items won’t be able to borrow time from bigger more complex tasks with a higher priority. The smaller items will have to wait in the workflow until the more urgent items are completed first.
If your team is not able to start new work as the WIP limit has been reached, they will have to collaborate with each other and “swarm” outstanding tasks to complete them faster. The focus is then moved to the impediments in the system and bringing about their prompt resolution, in order to enable a smooth flow of work.
In our Sustainable Predictability digital course, we delve deeper into the approaches to optimize your system for predictability and we explore the methods and the tools to perform accurate delivery predictions in great detail.
Evaluating the size of your stories is a great way to spark a conversation about the goal that a certain item should achieve. Nevertheless, this approach is irrelevant when it comes to performing future predictions. Making reliable commitments and keeping these commitments is tightly coupled with how efficiently you manage the flow of work and ultimately, how predictable your system is.
- Sonya Siderova
Thank you, Celso! It means a lot!
Alexei Zheglov did that maths a few years ago modeling a vast range of Weibull curves for different shape parameters.
But there is more to it than that. We’ll publish an article to expose all the details soon. Stay tuned!
Great article Sonya – I really like the way you get across the waiting time elements of cycle time that usually dominates.
I’d like to clarify if you’d say that in high flow-efficiency, low variability systems then – and only then – work item size may become a factor for probabilistic forecasting?
The reality with the orgs I work with is that the customer value chunks are simply too big to flow across in their entirety – so have to be split. And although keeping a parent ‘value board’ is useful, the teams on the ground are still looking for predictability even if it’s items of work rather than value?
- Sonya Siderova
No, even in a high flow-efficiency, low-variability systems work item size is still not a factor. The prerequisite to make accurate delivery forecasts is to maintain a thin-tailed distribution. Stable systems produce thin-tailed distributions.
In a stable system, you may have items of different sizes, and it will only depend on their priority on how fast they are released. It’s the urgency of the items that matters the most. If your items have the same priority, they have to be processed in a FIFO manner.
Suppose your highest priority work item needs 20 days to be finished, and there are multiple smaller “1-day” items started afterward. In that case, the smaller items with lower priority will accumulate waiting time until the bigger item got completed. They are not allowed to borrow time from the more complex one. They have to wait until you deliver the big item first.
As a result, their delivery times will be 21 days, 22 days, etc. And that’s perfectly fine. You’re still maintaining a thin-tailed distribution, and you still have a predictable system.
If you try to expedite the smaller items just because they are “faster”, you’re essentially delaying everything else. You’re not only trying to handle multiple things at once, which introduces multitasking and is ultimately inefficient but also you’re borrowing time from more complex, higher priority items. It is so much better to keep the focus of your team on the big item to be able to move it further sooner. Spoiler alert: the best solution, in that case, is to temporarily reduce your WIP limits.
Even if you have to split your work items into pieces of work, you should always strive to come up with potentially releasable increments, and naturally, they will have different sizes.
Once again, that doesn’t have any influence on the accuracy of your probabilistic forecast.
The goal is to shape your work naturally to be able to collect feedback as soon as possible and adjust accordingly.
- Kenny Grant
Thanks for the reply Sonya – I understand what you are saying and this is new info for me, fascinating thank you.
Nice article! Very insighful.
Can I suggest a correction? In this phrase “The dotted horizontal lines stretching across the graph are called percentile lines.” it should be “vertical” instead of “horizontal.”
- Sonya Siderova
Hi Doug, thank you for pointing that out, we fixed it! Your feedback is highly appreciated!
Leave a Comment Cancel reply
Meet the Author
Sonya Siderova is a passionate product manager and a driving force behind Nave, a Kanban analytics suite that helps teams improve their delivery speed through data-driven decision making. When she's not catering to her two little ones, you might find Sonya absorbed in a good heavyweight boxing match or behind a screen crafting a new blog post.
Hi Sonya. You’re running an amazing work and had put it very well on paper. Thanks for that.
How did you guys in Nave come to this tail factor?