I have a different view on stack ranking than the spate of critical Microsoft articles

Read for instance Tales of an ex-Microsoft Manager or The Poisonous Employee-Ranking System That Helps Explain Microsoft’s Demise. (Aside: did someone at Slate get a shitty review years ago and now they are determined to get their payback?). If you believe these, Microsoft failed in part because of its corrosive review system, everybody hated it, not a single person at Microsoft liked it, its existence was hidden, etc etc etc. Man that place must have been utter hell to work at.

http://www.flickr.com/photos/33037982@N04/I was at Microsoft from 1988 to 1999 and have a different experience. Stack ranking (sometimes called the lifeboat drill) was in effect and the system worked fine. It wasn’t hidden, everyone knew they were being ranked. Yes there was stress around review season and yes people could be unhappy, but it didn’t immobilize the organizations I worked in, and people still worked hard and were committed. Perhaps the system in place at Microsoft in the last dozen years is dramatically different than the system in place when I was there.

In my working and academic life, stack ranking has been pervasive and omnipresent. I was stack ranked in high school via grades and class standing. I was stack ranked in college, every course I took had a curve. I was stack ranked in graduate school. I was stack ranked aggressively at Booz-Allen & Hamilton: it was an “up or out” organization, you either got promoted every two years or you were “counseled out”, and everyone knew exactly what everyone else was getting paid. I was stack ranked at Microsoft. We’ve kept track of individual partner returns at Ignition so I am effectively stack ranked now. The notion of stack ranking doesn’t offend me. The last time I wasn’t stack ranked was when I got my 5th grade “Certificate of Participation” for pony football.

Every team I have been on has a distribution of people with a distribution of effectiveness. The people at the bottom of the stack rank are not bad people, they are just the least effective people on the team. It could be a bad fit for them. Or maybe they had other things going on in their life this year that hampered their effectiveness. Whatever the reason, there are real differences across a team in performance and effectiveness. And some part (not all) of compensation will be tied to effectiveness, that seems pretty fair to me. Every manager and company I have worked for has been honest about this system and has explained it well.

My annual review results have never been a big surprise to me. And for a lot of people on my teams, the annual reviews were never a surprise to them either. Because annual results are not something that get unveiled once a year. You build your own annual results (and thereby your review outcomes) one month, one week, one day, one hour at a time. Your story is getting built all year long, and you should know where you stand all along the way. I was always very observant of my own results and those of my peers and tried to adapt along the way each year. People that aggressively manage their career and their work are generally not surprised by year end results.

So — for me anyway, stack ranking has been omnipresent, it is not irrational, it has never been an ugly surprise. But a lot of people think it is awful at Microsoft, and that eliminating it would have helped the company. I will offer a different view.

When things are going great at a company, as they were at Microsoft in the 90s, no one complains about stack ranking. When things aren’t going great, then people complain about stack ranking and about everything else. Why aren’t things going great?

  • The mission for the company is unclear. Employees are not excited, not inspired. Customers are not excited and inspired. Developer interest is waning. Analysts are not excited, the company is no longer viewed as a bellweather.
  • And growth in stock price has stalled, partly as a result of the lack of compelling mission. The rising tide is not lifting all boats. Comp starts to feel like a zero-sum game.
  • And the organization is not growing dramatically. The opportunity to move around the company and find better fits is more limited. A low performer on one team may be a star elsewhere due to a better skill match, better team meshing, or whatever. Having a ton of new teams starting up is a great release valve to have in a company

If you took a magic wand and “fixed” the stack ranking system at Microsoft, but left everything else constant, people still wouldn’t be happy. The underlying problems of mission and growth need to be attacked, and the other dominos will fall. I suspect, as Ben Slivka has articulated, these problems are easier to solve in Microsoft broken into some parts with separable and clear missions. But the stack rank is not at the heart of Microsoft’s problems, and is not at the heart of issues to address going forward.

22 thoughts to “I have a different view on stack ranking than the spate of critical Microsoft articles”

  1. My annual review results have never been a big surprise to me. And for a lot of people on my teams, the annual reviews were never a surprise to them either.

    This is strictly your personal experience (and may be your teams too). Dont forget much has changed in the last decade, your experience is simply outdated. I quit microsoft very recently and I had to experience years of agony before I made the decision. What you are overseeing is the ability of the immediate manager to screw you over intentionally or not. The rating system could have had best intentions at inception but now it is primarily seen as a livestock tattoo. Once it is on you, it is very hard to shake it off. I had nearly a decade of experience before microsoft (in a similar corp setup) and consistently had top reviews but couldn’t translate it to success in Microsoft. why? because my first manager had poor relationship with his management and could not get good visibility/rating for his team. That mark lingered on for years. I quit when I realized I was working the system to get ahead rathter than doing work.

    Simply put the review system works for people who know how to work it.

  2. I joined Microsoft in 6/1985 and left in 8/1999. The company was privately held when I started, had ~800 employees, and $120M of revenue. When I departed in 1999, there were 30K+ employees and $19B of revenue.

    Back in 1985, we had performance reviews every 6 months. I think by 1989 we had switched to annual reviews. I started leading teams in 1987 and the largest team I had was 120+ people. I was responsible for review “models” for at least 6 years — including figuring out how to apply my 10% raise pool, 15% bonus pool, and stock option pool. The total size of these pools was based upon the total salaries of my team for the raise and bonus pools and the ladder levels for the stock grants.

    As John described, Microsoft was a meritocracy, and the review models and stack ranking were an critical part of that. I reviewed every written performance review for everyone on my team (yes, even with 120 people) to ensure consistency and fairness. All managers on my team would sit down with me and talk through the stack rank to make sure we were consistent and fair.

    But of course any performance review system relies upon the managers in the system to execute it correctly. People matter most of all.

  3. Having an employee’s rating not reflect his/her performance is a big problem, but the horse trading required to merge their team into the larger group and making it fit a badly-designed fixed model made such a result fairly common. But such things didn’t happen THAT often, and a single review score that was a grade low didn’t mean career death or even much of a pay decrease–the stakes weren’t that high.

    I think the difference between when you were there and pretty quickly after you left is that the stack ranking became very high-stakes.

    Consider an employee who wasn’t performing up to snuff. Before the new millennium, they would have enough time and help to work through whatever it was. Yeah, that took time, including manager’s time and HR time, but it was regarded as a failure on more than the employee’s part if they had to leave. A 2.5 was a bad thing, but MANY employees recovered from it and had successful careers after.

    After the new millennium came the idea of “good attrition.” That meant that 5% of your team was to be “managed out” EACH YEAR. NOT doing so was a failure for management. Even a 3.0 could be the kiss of death–several 3.0’s certainly was.

    This raised the stakes considerably, and made a system that was rather silly in some respects (Fixed-size buckets for performance grades for relatively small groups? SERIOUSLY? Managers NEVER saying, “Yeah, Fred only deserved a 3.5 but we didn’t have enough 3.5’s to give out, so I had to raise him a 4.0,” which should have happened about as equally often as the opposite if the system were statistically reasonable.)

    The other problem that happened mostly after you left was that the budget for salary increases was cut from around 5% to around 2% (barely over inflation). That meant that salary increases became a “winner takes all” situation, and often even “above average” (3.5) employees got zero raise, which is a pay cut when you count inflation.

    Combine those changes with options being irretrievably underwater, and a system that was rather silly in some respects but not terribly harmful suddenly becomes quite harmful. And without the incentive of stock options to do what’s right for the company as a whole, it became every person for him/herself. You can’t build software that way. One word: Vista. (Yeah, there were other problems, too. But the rising stock options meant that a lot of those other problems could be solved earlier because there was incentive to.)

    The thing that would have helped Microsoft most, though, would have been to reprice options after the bubble burst. But, given the change in the place of stock options and the other changes, the stack-ranking system as implemented at Microsoft became toxic.

    So the stack ranking per se isn’t the problem; it the devil of the details that is. Note that in none of the cases except for perhaps Booz-Allen and Hamilton were these two details used: none of them have fixed bucket sizes, and none of them require the merging of two different populations. (How *DO* you compare a UA writer to a program manager to a developer to a tester, anyway?)

    The other situation I heard of just in the last few days is that apparently in at least some groups, the stack rank would be done long before the employees’ reviews were written. That is obviously just simply wrong when that’s done the stack rank a big popularity contest not related at all to the data at hand.

  4. Last should be:

    That is obviously just simply wrong; when that’s done. the stack rank becomes a big popularity contest not related at all to the data at hand.

  5. Totally agree I am currently with Microsoft been here since 2000. Stack ranking has Always been communicated in the teams that I am in. I take the 1:1 as a possiblity to tell my manager why he/she should put me in the top stack and give them enough ammunition so that they can tell their managers why I should be put high up in the stack. All my colleagues do the same. I have always had good reviews up until this year, I have a new manager and he have not been able to get me on the top of the stack so now it is time to move on. Either internally or externally. To be fair to him I had a bad year I am in sales and this year was not good.
    I have no issues with it, it have been a fun ride.

  6. Yes John, you’re right on the money. Honestly every time I see these articles where people think that all Microsoft’s problems are down to the review system, I have to wonder who it is with the 4 or 5 (in current system… 2.5 in old I guess) that has an axe to grind.

    Definitely any decent manager should be engaging in a year long discussion about results, there should never be surprises. And conscientious employees should be pushing for it. There are always bad apples of course, perhaps a bit more emphasis on the feedback employees give on their manager could help, although feedback is already taken very seriously.

    It’s also interesting to me that critics usually fail to address the thought experiment of what would happen if the review system were more “fair.” They usually mean that low performers not be compared or curved against high performers. Given the company must have a budget for total rewards, that would likely tend to reduce the high rewards that top performers get. Top performers will notice this and feel less valued. They will then be poached or leave on their own. Why is this good for anybody?

    One aspect that does contribute negatively is that the scores do sometimes get taken too literally and in the abstract. You very rightly commented that a bad fit could make a person be low performing in one situation, but they might be great in other circumstances. It’s pretty common though to find that this point of view is not considered when looking at transfer candidates. Many managers (generally, not just MS) over-generalize a person’s abilities or “worth” and assume that whatever they did in one situation is the best they will ever do anywhere. Many groups at MS look unfavorably on anybody with a low review score in their history, pretty much regardless of reason. Everybody wants to hire “stars” even though much has been written on the low effectiveness of typical interview processes in identifying long term performance. Once someone has a couple medium or low marks on their record, regardless of the reason, they are effectively marked, and probably won’t be able to move within the company very much, even to roles they might be the best person in the company to do.

    It’s true that not many people gripe when times are fat and chickens are in every pot. Rather than seeing the performance system as the cause though, it’s much more likely that these are the inevitable creaks and groans of an old and weary ship, which are more conspicuous when the boat is fighting a hard battle against the ocean. It’s hard to imagine how “fixing” reviews is going to alter the fundamental market forces (the waves) or the choices of the senior leaders who are steering the boat.

  7. There were employees griping about stack ranking and forced curving in the go-go days, too. The performance review process is a simple but effective tool for leadership and professional development. You cannot blame it for the strategic and cultural problems at Microsoft.

  8. Grades in school are not stack ranking. Teachers aren’t required to give a certain number of each letter grade in order to fit the curve. The curve that results is mapped AFTER grades are determined. That is exactly the OPPOSITE of stack ranking.

    Try again.

  9. Frank I don’t know what schools you went to, but every engineering class I took at Ohio State and at Carnegie Mellon worked exactly as I suggested. Grades were determined based on the distribution of scores. Everyone in the class was evaluated against their direct peers. A tough system but fair, and gave us true information about how we stacked up against other prospective students.

  10. I’ll tell you what kind of school I went to. I went to one that actually cared if you knew the material. I’ve been in a class with 30 students where only 5 of us made it to the final exam. In your scenario of plugging students into a bell curve more than half the class would be guaranteed to earn a C or better, whether they understood the material or not. When I hire an engineer I would like to have confidence that they understand what they are doing, and not that they were just a little bit better than half of their peers.

  11. Frank, are you suggesting that everyone who happens to enroll in a class should be able to get a high grade, if only the teachers cared enough? I agree that the bell curve may have lots of flaws, however I think we’re also being a little bit derivative to the point of the article here. Microsoft’s ranking is not a bell curve.

    Even if you give everybody a “high” grade, the mark is fairly meaningless at that point. Most people would agree that even among engineers who “understand what they’re doing,” some are great, and some are average, and some are terrible. You can draw the lines between these classifications in many arbitrary ways.

    The high level concept of the stack ranking system of course is to recognize that. The scores quotas are designed to maintain a rigorous conversation about what those scores mean, given that it’s always all relative.

    Having a “curve” seems closely related to having a budget for rewards.

  12. Paul, I have no idea how you read my response and got that as the take away. That is the exact opposite of what I am saying. Let me try to make it more clear. What I am saying is that each student should be evaluated individually against the student learning outcomes of the course using the evaluation mechanisms that were clearly laid out by the professor in the syllabus. In other words, if the learning outcomes state that after taking this course a student should know x, then after being evaluated (using an exam, project, assignment, etc.) the student should be assigned a grade that is based on whether or not they know x. In this scenario it is possible for everyone to earn A’s, everyone to earn F’s, grades to be distributed along a bell curve, or any number of scenarios. But the key factor to what I am saying is that the grades are actually based on what the student has demonstrated they learned at the end of the semester and not based on how much they know compared to their peers.

    The reason I think the evaluation against peers is such a bad idea is that it assumes that there are always a certain number of students at each grade level and that may not be the case. Stack ranking forces there to be someone that is failing. I have been in courses with students where the lowest performer was still performing well enough to earn a C in the course. If we were stack ranking that person would be awarded an F because you they are the lowest performer and we have to have someone at each level.

    You said, “Most people would agree that even among engineers who “understand what they’re doing,” some are great, and some are average, and some are terrible.”

    I couldn’t disagree more. It is possible to have a high performing team where no one on the team is terrible. That is entirely possible. By forcing people into a pre-determined ranking mechanism you will lose good people or artificially reward so-so people. Imagine the case where a team is made up of average to below average performers. Well, someone on that team is going to be ranked at the top and look just as good as their peer from another team that is a top performer. What sense does that make?

    Rate people according to their actual performance. It is as simple as that.

  13. I see, sorry I misunderstood and thanks for clarifying. In general I don’t disagree with what you’re saying in contexts of pure merit, such s school can be, however I believe the underlying principal in Microsoft’s system is that “the curve” aka “the model” is supposed to be applied to groups that are large enough where it is unlikely that everyone is doing awesome at their particular task. Of course it’s a controversial thing to assert that humans “average out” in large enough groups, and difficult to study.

    I think you’re also saying you disagree with a more fundamental part of the system – that the measure should be based on how well someone did relative to their peers. I think that really strikes to the heart of the philosophy. The Microsoft system has /never/ been about trying to capture an individual’s isolated performance or merit, but always about rating how well they did in improving themselves and delivering great results vs. everybody else in their model (model ideally being big enough that the standard is roughly balanced across the company…).

    I think it’s completely reasonable to say one disagrees with this underlying philosophy of continual self-improvement and competition against others, however getting back to John’s original point, it is not clear that this point of view in and of itself is cause vs. correlation in Microsoft’s competitive challenges. This philosophy has always been there, at least since the early 90s, independent of market position and competitors.

    What I do find interesting is that in large enough calibration pools, when managers come in with their initial rating points, the overall distribution is usually not too far off the curve, except that invariably it is a little loaded towards the high side. That seems similar to other forms of human optimism where rules help us make disciplined choices (like, triaging bug priorities, with apologies for a possibly crude analogy). So it seems that in practice, usually people who argue against the curve are most upset about exactly the fact that it’s supposed to provide friction against grade inflation and that there are always hard choices at the edges of the buckets.

    Personally, I believe even if you “get rid of the curve” the valuation process and perceived inequities would just appear somewhere else in the process, one way or another, because we’re talking about humans rating other humans and assigning dollars to those judgments. As long as managers wish to give more rewards to the employees they think are most valuable, they are going to have a mechanism of ranking that subjectively, and allocating from their fixed budget. If lots of “students” get A’s and the lowest gets a C, the professor doesn’t have to then go and assign dollar values to one A vs. another, all the way down to the lowest C. The managers do, and regardless of the grades assigned, they have a fixed pool to work from. This used to be more hidden in the old system and managers could give low rewards within a “good” grade and people felt better. If I understand correctly, the current system was supposed to answer calls for more transparency.

  14. Paul,

    I understand what you’re saying. My biggest problem with the article was the claim that stack ranking is present in school as well as a way to justify the practice. I think he was so far off base on this assertion that I felt the need to reply. And then when John jumped in with “Frank I don’t know what schools you went to, but every engineering class I took at Ohio State and at Carnegie Mellon worked exactly as I suggested.” I felt the need to respond again.

    My experience both as a student and as someone that works in higher education couldn’t be further from what they were describing. If the original author had stuck to defending it the way you did I probably never would have responded. But he made his case very weak by acting as if this practice is ubiquitous.

  15. Thanks for all the great comments. Lots to read and reflect on. I stand by everything I said, but I respect that others may have a different experience and a different point of view. And I appreciate that stack ranking can be applied well, with judiciousness, and can be applied poorly.

    One observation I will add — in the commercial world, the marketplace is a nasty stack ranking motherf$&ker. Companies get judged every day by customers, and companies that don’t stack well suffer the consequences — market share loss, revenue declines, job losses, and eventually bankruptcy or other depressing end. One justification for a strong internal ranking system is to insure that a company finds all its flaws itself, and corrects them, rather than suffering at the whim of the marketplace. This is not just about stack ranking as a method of criticism and not just about individuals — other methods of criticism are valid, and stack ranking should be applied to departments and business units. But a company needs some way to insurethat every employee is turning a critical eye to their own work, and is engaging in continuous improvement. Stack ranking, judiciously applied, is a reasonable way to do that.

  16. Every high school in the UK; GCSEs at 16, A levels at 18 are scored to put similar numbers in the buckets for the curve. What it takes to get an A varies each year depending on mass performance.

Comments are closed.