I've managed a group that ran software projects using Scrum but also provided Scrum support to the wider R&D organization by developing Scrum templates and procedures, developing and delivering Scrum training and providing coaching and mentoring for groups taking their first steps down the Scrum path. So, to be honest, I pretty much figured I had Scrum licked. Then I read "Scaling Agile & Lean Development" by Craig Larman and Bas Vodde. I'd yet to scratch the surface of lean and so the excellent treatment lean gets in this book was expected to be new to me, but it was pretty embarrassing how much I learned about Scrum and agile development along the way. If anything it left me feeling a bit of an agile fraud. In the introduction to "Modern C++ Design" by Andrei Alexandrescu, Herb Sutter talks about how reading Alexandrescu's work made him realize that his understanding of C++ templates was still at the "container of T" level while Alexandrescu's work opened his mind to the vast possibilities offered by C++'s generics. This book leaves me feeling similarly abut agile methods. The book presents a great treatment of agile and lean development methods, places them in the context of queuing theory and lean thinking and provides a road map for configuring the organization in what will be a novel manner for most of us but a manner which has led Toyota and others to remarkable improvements in efficiency, employee satisfaction and responsiveness to market needs. If you're an agile practitioner and proponent, go get this book - you'll be glad you did. Note that a companion volume, "Practices for Scaling Lean & Agile Development" is due out soon also.
I'd like to thank Mark Chatterley, Rod Dunne, Marc Lepage, Jon Tarry and Vic Veinotte for reviewing this article - I really appreciate your support.
For me personally, I expect having read this book to have as great an impact on me as did reading Steve McConnell's "Rapid Development" ten or so years ago. That book introduced me to vistas of knowledge I'd never been exposed to before, but also validated many decisions my teams and I had struggled to make in the past. Seeing the industry support for those same approaches made them much easier to turn to in the future. Similarly in the case of the current book, I find many decisions I and my teams have made in the more recent past validated and thus will be much easier to arrive at going forward.
The authors present a good overview of the relationship between the Toyota Way and lean approaches to software development. Reading this book prompted me to get the two books on lean by Mary and Tom Poppendieck ("Lean Software Development" and "Implementing Lean Software Development") and to be honest, although I certainly like those two books, I actually prefer the treatment of lean in "Scaling Lean & Agile Development." The emphases are more aligned with what I've intuitively arrived at as the most important points, among them:
- Queuing theory and optimizing for product throughput rather than resource utilization
- Avoidance of local optimization
- Long-lived teams
- Scaling lean methods to a broader product development effort spanning more than just single teams
- Team-anchored process improvement
- Finding a good balance between job specialization and generalized product focus (both at the organizational and personal levels)
- Strategies for transitioning the organization towards lean and agile development
This is the first book I've read that exposes the queuing theory support for agile methods in detail rather than just alluding to them. "Scaling Lean & Agile Development" goes into this in depth, using queuing theory to support advice to avoid team member specialization, component team organization, and the local optimization of full resource allocation, among other points. A relay race analogy is used to great effect throughout the book to provide a basis for introducing the queuing theory topics and also as a model with which to analyze any development organization. The authors suggest that in a relay race, we intuitively follow the baton as it moves from team member to team member, ultimately getting across the finish line faster than any one of team could do themselves. If the race were our product development efforts, the baton would be the product.
In every company I've ever worked for, schedule has been paramount. Cost will factor in when it's time to weed out the product portfolio but for any given project, the one constraint to which everyone manages is the date. That tells me that the relay race analogy has been spot on for all of my professional career - I suspect it will be for most others as well. But are our organizations really set up to optimize product throughput? My experience has been that managers work to full resource allocation and ask for additional resources when they can no longer make the case that the people they have can do the work anticipated for them. In that scenario, though, one of full resource allocation, queuing theory tells us that throughput will be reduced - drastically reduced. You can see this in servers as they approach 100% utilization but I'd expect you can see this happening all the time in your own organizations as well. Likely the classic case cited by the authors is scarce testing resources - the project stalls waiting for the availability of 100% booked testing resources. This delay reduces the information coming into the team about product quality ultimately leading to delays in product delivery likely in excess of the delay spent waiting for the testing resources to become available because, during the delay, development continues unabated. These analyses recall the conclusions of De Marco's "Slack" where he argues that an organization's ability to reconfigure itself to innovate and address changing needs is largely a function of it not being 100% busy.
The 100% booked testing resource scenario is also a classic case of local optimization - optimizing the organization's performance to improve against criteria that are only relevant for a part of the product development cycle. To the test manager, a fully allocated, 100% busy team looks like a good thing - even if delays spent waiting for these resources lead to delays in product schedules. If we're focusing on the baton though, we don't see this - making efficient and 100% use of our scarce testing resources - as a good thing. We see it for what it is, a reason why we're taking longer to get to market than we could. Another example of this effect is a test manager wanting to wait until later in the schedule to begin testing to have a better chance of doing less testing because the code will have matured more at that point. That improves his group's efficiency in terms of product features verified over time but how does it lead to a positive outcome for the product as a whole? Similarly a development group that delays development efforts until requirements have been "signed off" is attempting to reduce its expected rework (although the authors argue convincingly that even this goal won't be achieved) but how does this translate to an improvement at the product level? I believe these forms of local optimization have been rampant at every organization I've been a part of. These insights provide powerful new tools to help organizations improve - one of the key ones being to focus on the baton rather than the runners by thinking first and foremost of delivering better products to customers faster when considering any changes to the organization's composition, structure or processes.
The organization structure proposed in the book calls for a commitment to the creation and support of long-lived teams. I'd known of studies demonstrating variances in team productivity similar (though lower in magnitude) to those measured in the productivity of individuals - for example, Barry Boehm's recent work indicates a 15tth percentile team (in terms of team productivity) will need about four times as much effort to deliver the same functionality as a team in the 90th percentile. But I also fell prey to the notion that people could be brought together to form effective teams when project needs dictate. This has led to what I've seen in nearly every organization I've been a part of - ad hoc short-lived teams pulled together by project necessity and disbanded as soon as project needs no longer warrant the team's continued existence. The most obvious complaint with a system like this is that the organization would be paying the "forming, storming, norming" costs a lot more frequently than necessary if the teams were kept together longer. The book cites evidence that suggests that this view is naive - but not in the assessment of costs of team formation but rather in the potential value afforded by long-lived teams. The authors cite a study which indicates that team performance is increasing for four years from team inception. So we not only pay the commonly-understood costs of team formation far too frequently by constantly shifting and redefining teams but we also rob ourselves of significant productivity gains by not simply leaving effective teams intact.
What the authors propose is an organization with vertically-oriented product teams as the fundamental building blocks. So teams, rather than individuals, are the unit of responsibility and assignment. Projects are mapped onto the long-lived teams rather than constructing teams to suit the product needs and, to the extent possible, the product teams are able to work in any area of the product. One thing hiding in this philosophy is the absence of matrix management that invariably happens when the cross-functional teams needed for most forms of agile development are assembled from an organization built around component teams and/or functional decomposition. For what it's worth, this has been the case in nearly every agile deployment I've been a part of. In fact the total management structure typically includes the the team management inherent in the agile processes, functional and component team managers, senior management and the traditional project managers who remain to provide an interface from the agile project to the rest of the organization. Despite this, my customers have been generally pleased with their experiences with agile methods but the authors provide an appealing glimpse into what might be the case if this management overhead were to be reduced - for example citing a Capers Jones study that shows an inverse relationship between the total management count and team productivity.
Along the same lines, I've often wondered about the ScrumMaster role - given its fundamental responsibility for ensuring the success of the Scrum process, is it a role required for the long haul or only needed until the team has reached the point where it can assume these responsibilities without dedicated support? What I've personally observed is that in teams new to Scrum, changing the ScrumMaster frequently is the kiss of death, but for teams that have a successful track record with Scrum changing who is filling the ScrumMaster role doesn't have much of an effect at all. The authors suggest that it really is a likely and acceptable outcome for the ScrumMaster role to be absorbed into the team over time.
Likely the most incendiary content in the book concerns the project manager role and the concept of the Project Management Office. The authors state pretty flatly that they're not needed at all and are to be avoided. They argue that once the organization is formed of these long-lived, self-managing teams there really is nothing project managers can provide that the teams can't do themselves. In my experience, when using traditionally trained project managers in the ScrumMaster role it's frequently the case that things like work breakdown structure creation in the Sprint planning sessions and issue identification in the stand-ups are not done as quickly or as well because the person filling the ScrumMaster role doesn't really understand the technical work of the Sprint. I've also seen a fundamental disconnect between the team's view of project progress, impediments and risks and what the project manager or Project Management Office is communicating about the project.
In all of this deprecation of management, is there no role for managers in the organization envisioned by the authors? In fact there is, but managers aren't delegators of responsibility or assigners of work - rather they are mentors and coaches that guide the development of their people. To support this, there is the Toyota Way notion that "my manager can always do my job better than me" - that is, that managers are promoted due to their technical excellence. I can't begin to say how much that resonated with me - I've always been attracted to managers who had the most to teach me - I expect I'm not alone in this.
One question that's been nagging at me for some time is how agile, cross-functional teams ensure sufficient learning and training in the specialized disciplines in development. How are best practices relating to, e.g., programming, design or test planning captured and communicated to everyone who could benefit from them? How does the organization take steps to ensure that, across the teams, the same mistakes aren't being made over and over? How can one team benefit from the mistakes and learning of another team? The CMMI has one approach to this - in my opinion one very much at odds with the agile learning paths called for in this book and in the two lean books by Mary and Tom Poppendieck. But that's a much broader topic than makes sense to address in this already long review - I'll save it for a future article.
Most organizations I know of have handled this need through embedding these disciplines in their organization structure itself - that is, they have decomposed themselves into functional areas that provide a convenient place to anchor best practices and training relevant to that discipline. The authors call for the use of Communities of Practice to provide the channel through which discipline-specific knowledge can flow - in this manner, the organization structure can be focused around the long-lived vertically-oriented product teams while continuing to provide forums or clearinghouses for discipline-specific best practices.
Regarding the notion of the specialist, in general this book, like most agile thought, discourages these roles. Every specialist that has knowledge or skills most of the other team members do not have is in effect a queue that will very likely either introduce delay into product throughput or waste as the specialist waits for work suiting their specialty. That said, some skills or knowledge are impractical to expect everyone in the team to learn - examples that come readily to mind include knowledge of complex protocols like CDMA or GPRS, file format knowledge for particularly complex formats and skills like real-time development. The authors have a healthy perspective on this problem - they acknowledge that some degree of specialization is inevitable but call for the organization to actively work to reduce it. They call for the specialists the organization does have to become more mentors and reviewers than implementers - thus simultaneously increasing the leverage of the specialist's limited bandwidth and increasing the dissemination of their knowledge throughout the organization. Along the way, they also present one very non-obvious (to me, in any event) result. Typically tasks in a Sprint are done by the person most qualified to do them. That's been my experience anyway. Teams will use some forms of work sharing to increase the skill set of team members but as a general rule of thumb, the best qualified person will end up doing most tasks calling for those skills. The authors cited an experiment where a team did the exact opposite - assigned work to the least qualified person, supporting them with pair programming to help them get these tasks done. Surprisingly, team velocity wasn't actually much reduced in the early going and after only two weeks, reached a level higher than it was initially. I would have expected such a strategy to be one that would pay off in the long run - that it could pay dividends in as little as two weeks seems almost too good to be true. This certainly suggests that investing in your people will pay off in spades - and a lot sooner than you might expect.
Another common practice consigned to the dust bin of history by the authors is the component team organization. That Conway's Law (loosely that the architecture of a system will come to resemble the organization structure of the group that created it) exists is an indication of how pervasive this pattern is. Certainly every organization I've ever been a part of has been like this - teams for each major subsystem or component rather than teams arranged along full vertical slices of a product. For example in a typical LAMP-style project, the difference would be between having a database team, a UI team and a business logic layer team versus having teams that would take features through to completion across all of these layers. They argue very convincingly against the former. Just two points supporting this would be the increased hand-off and risk of delayed product throughput imposed by component teams and the nearly pathological staffing patterns encouraged by them. I'll go into each in turn below.
The hand-off and risk of delays hearken back to queuing theory but from a practical point of view, it comes down to one team having completed some dependency blocking another team's progress - once that's been accomplished, is the second team ready to start immediately on its part? Also what was the second team doing while it was waiting? Was it working on the most important work from a customer's perspective? In practice, I've seen delays caused by these hand-offs on nearly every project I've been involved with. Certainly one key contribution traditional project managers provide in these scenarios is cross-team coordination to help ensure that that second team is ready to go immediately when its dependency is delivered. Contrast this with the team oriented around full vertical slices of a product - there may still be hand offs occurring between team members with differing skill sets (for example, the database specialist may complete their support for a given feature, allowing the middle-ware developer to get started) but there is no other competing priority that the unblocked resource would be respecting - they and the other person are on the same team and share the same set of priorities. In this manner, throughput is maximized and the need for coordination overhead is reduced.
The staffing behaviors of the component organization are also typically suboptimal - to be honest, I'm shocked I never noticed this myself. At one point in time, a bigger slice of required functionality rests on the shoulders of one component team - for example, imagine a product moving towards multiple database support. The database component manager would request additional resources or a longer schedule to support this work. Note that if we slip the schedule, we are sacrificing throughput that vertically oriented teams with experience across the entire code base could deliver because the database work could be assigned to any team, not just the database component team. But what normally happens in my experience is that headcount increases are allocated to that component team - schedule, after all, is normally the one constraint everyone can agree on. Now imagine what happens later in the product's lifetime when the bottleneck becomes another component team - for example, let's say that once the database work has been put behind us, UI improvements dominate the next set of features the market demands. Now that component manager will either request a longer schedule or increased staffing. Again, my experience has been that they'll end up with the additional staffing much of the time. The key insight is this - what is the other component team doing now that they are no longer the bottle neck? The answer is necessarily some other work that is not the most important thing the market is demanding, because that work is being supported by the additional headcount in the other component tram. If you go through a few iterations of this in an environment with many component teams you can picture a scenario where most of the people working on a product are working on things that are not particularly important from a customer perspective.
Certainly development organizations have used "breathers" provided by lulls like these to do things like refactoring and reducing bug counts and so on but I'd argue against this being a good approach towards supporting this kind of work for two reasons. The first is that, while the work development teams do in these periods of reduced demand on their services is likely to be beneficial, I believe it would be better to have the effort that is applied to this work be planned rather than allow it to be an accidental side effect of changing product requirements. For one thing, there's no reason to expect these lulls to be fairly apportioned - some component teams may get more of them and some component teams may never get them. For the latter, is it reasonable to allow them to have essentially no budget for these kinds of improvements? More importantly, though, current agile thought nearly across the board calls for the development process to be complete - that is, that there is no work left behind at the end of an iteration. So time for this kind of work should be a part of the iterative process and not something back-burned until the component team has some "down time".
If you sum the potential quality issues suffered by the teams that never see any reduced demand and the "idle" time for teams that see too much, it amounts to potentially significant waste that cross-functional agile teams that can handle work anywhere in the product would avoid.
Another place I found my own practice to be lacking given the suggestions the book provides is in the area of team-oriented process improvement. In general I don't know that I brought a lot more leadership to the table in Sprint retrospectives beyond asking these three questions:
- What went well?
- What didn't go so well?
- What will we do differently next Sprint?
That's a decent start but the authors of this book suggest a much richer set of practices to apply to the problem of process improvement. They provide an overview of causal loop diagramming to help the team see the overall system's dynamics and find useful candidate improvement opportunities. They suggest the use of root cause analysis in the form of Ishikawa (or fishbone) diagramming and the "five whys" called for in the Toyota Way to help a team see the causes of problems they're encountering. They note that the Theory of Constraints provides another great suggestion in asking the team to identify what the current bottleneck limiting its performance is and using that as a target for process improvement. Great stuff, all of it, and will definitely shape my practice in retrospectives going forward.
Two central concepts in lean thinking are the notions of value and waste. The authors define value in what I would imagine would be the strictest sense - value is created when you've made something a customer would reach into their pocket and pay you for and waste is everything else you do. This is quite a bit different from Mary and Tom Poppendieck's treatment of the same topic where things like technical feasibilities are treated as value-adding. Certainly they would not be by the Larman and Vodde definition - and I prefer that to be honest. In a perfect process, everything the team does would be worth paying for from a customer's perspective - that's certainly out of reach for the foreseeable future but it does make clear what the one end of the spectrum really is. But if we take a closer look at what how Larman and Vodde define waste it's clear that it isn't all created equal. Consider what the XP camp calls a "spike" - an investigation into an unfamiliar technology or a risky feature to help obtain a better understanding of it to allow a (most likely) subsequent iteration to deliver that feature or use that technology. The authors would put that in the waste category as no customer would really be willing to pay for it - they want the feature of technology being delivered in the later iteration, not the investigation of it in this iteration. Now consider the waste of a team waiting for weeks for formal approval of their design by some central design body. Also clearly waste but also pretty clear that one is more egregious than the other. The authors define "temporary necessary waste" to help frame the difference between these types of waste. I personally like this distinction - we can characterize modeling or architecture authoring or any other intermediate work product as something the team feels it needs to do but be reminded that whatever the intermediate work product is, it had better be worth creating as it would fall into the waste bucket - albeit temporary necessary waste.
Building on this sharp delineation of value and waste, the authors present value stream mapping and mention in passing that in their practices they'd never see a development process that had a quotient of value-adding time to total time higher than 7%. I suspect that the 93% waste (optimally) alluded to here would align fairly well with what Fred Brooks called accident in "No Silver Bullet" (and the 7%, essence). Viewed from this perspective, the lean emphasis on eliminating waste through the relentless pursuit of process improvement can be seen as a path toward removing accident and focusing on essence.
As a once (and hopefully future) Product Owner, there were a series of valuable suggestions Larman and Vodde offer that will certainly change my practice. Probably the most salient is an observation fundamental to the value proposition of agile methods - an estimated 45% of features developed in software products are never used. That's staggering to me - think of all the effort spent in developing, supporting, testing and so on that's potential waste. If iterative development methods can help put a dent in that number in practice by focusing development on only those features the users will actually use then that's an awful lot of value potentially realized (by using that saved effort in delivering other features the customer will use or perhaps by delivering that much faster with a more usable system due to its reduced complexity). In "Software Estimation: Demystifying the Black Art", Steve McConnell presents an analysis of the Cocomo II model factors that indicates that product complexity and requirements analyst capability are the two factors most correlated with project effort and schedule (the former positively and the latter negatively). Closing the loop, this suggests that the effective Product Owner can exert a singularly powerful influence over project schedule, effort and ultimately success if they leverage iterative development's capability to learn from the customers what they really need and steer the team to providing that and only that. I certainly knew that the Product Owner was a critical role to fill on any Scrum project but this suggests just how much of an effect a great Product Owner can have on a product.
Some of the queuing theory insights provide some surprising tools for the Product Owner. The authors call for a distinction between "clear-fine" and "vague-coarse" Product Backlog items. The former are the inputs into Sprint planning whereas the latter exists essentially as an input into the requirements development process where they will be split and refined to become clear-fine Product Backlog. Overall, this limits the total items in the Product Backlog. The focus is on the clear-fine backlog, which is both less numerous and much more immediate as it would be expected to be consumed in Sprints in the near future. This encourages the team to focus on the few, short-term priorities, giving them a much smaller queue to manage (and think about) along the way.
Similarly, the lean notion of waste from variable inputs is leveraged to suggest that the Product Owner not just observe the guidelines in the acronym "INVEST" when developing the Product Backlog (Independent, Negotiable, Valuable to customers or users, Estimatable, Small, Testable - taken from Mike Cohn's "User Stories Applied") but should further aim to have the stories all be of similar size. I suspect the cadence of the team deliveries would be aided significantly by always working with Product Backlog items of the same size. They provide a rule of thumb of each story being of a size that it would consume a quarter of the available team effort in a Sprint. One anecdotal piece of evidence I can provide supporting this at the extreme end of the spectrum is the carnage I've seen wreaked on Sprints when really large Product Backlog items were attempted. So the Product Owner can help improve team performance by ensuring clear-fine Product Backlog items are of the same or very similar sizes.
In a lot of ways, this is a dangerous book in that it shows you the promised land which may make living with how your current gig operates that much harder - I'm reminded of the similar effect "Rapid Development" had on me when I went through the treatment of classic mistakes in it and came to the sobering realization that we were making an awful lot of them. That said, reading "Rapid Development" was a real watershed moment in my career - I expect to look back in a few years and be able to say the same thing about "Scaling Lean & Agile Development." Despite how long this review has turned out to be, I can tell you I've barely scratched the surface of the value this book provides. So go buy it and see what I missed.