It’s hard to think critically when you don’t know what you’re missing. As we think about improving our thinking, we need to account for two things that are so subtle that we don’t fully recognize them:
Because of assumptions and filters, we often talk past each other. The world is a confusing place and becomes even more confusing when our perception of what’s “out there” is unique. How can we overcome these effects? We need to consider two sets of questions:
The more we study assumptions and filters, the more attuned we become to their prevalence. When we make a decision, we’ll remember to inquire abut ourselves before we inquire about the world around us. That will lead us to better decisions.
Police make about 10 million arrests every year in the United States. In many cases, a judge must then make a jail or bail decision. Should the person be jailed until the trial or can he or she be released on bail? The judge considers several factors and predicts how the person will behave. There are several relevant outcomes if the person is released:
A person in Category 1 should be released. People in Categories 2 and 3 should be jailed. Two possible error types exist:
Type 1 – a person who should be released is jailed.
Type 2 – a person who should be jailed is released.
Jail, bail, and criminal records are public information and researchers can massively aggregate them. Jon Kleinberg, a professor of computer science at Cornell, and his colleagues did exactly that and produced a National Bureau of Economic Research Working Paper earlier this year.
Kleinberg and his colleagues asked an intriguing question: Could a machine-learning algorithm, using the same information available to judges, reach different decisions than the human judges and reduce either Type 1 or Type 2 errors or both?
The simple answer: yes, a machine can do better.
Klein and his colleagues first studied 758,027 defendants arrested in New York City between 2008 and 2013. The researchers developed an algorithm and used it to decide which defendants should be jailed and which should be bailed. There are several different questions here:
The answer to the first question is very clear: the algorithm produced decisions that varied in important ways from those that the judges actually made.
The algorithm also produced significant societal benefits. If we wanted to hold the crime rate the same, we need only have jailed 48.2% of the people who were actually jailed. In other words, 51.8% of those jailed could have been released without committing additional crimes. On the other hand, if we kept the number of people in jail the same – but changed the mix of who was jailed and who was bailed – the algorithm could reduce the number of crimes committed by those on bail by 75.8%.
The researchers replicated the study using nationwide data on 151,461 felons arrested between 1990 and 2009 in 40 urban counties scattered around the country. For this dataset, “… the algorithm could reduce crime by 18.8% holding the release rate constant, or holding the crime rate constant, the algorithm could jail 24.5% fewer people.”
Given the variables examined, the algorithm appears to make better decisions, with better societal outcomes. But what if the judges are acting on other variables as well? What if, for instance, the judges are considering racial information and aiming to reduce racial inequality? The algorithm would not be as attractive if it reduced crime but also exacerbated racial inequality. The researchers studied this possibility and found that the algorithm actually produces better racial equity. Most observers would consider this an additional societal benefit.
Similarly, the judges may have aimed to reduce specific types of crime – like murder or rape – while de-emphasizing less violent crime. Perhaps the algorithm reduces overall crime but increases violent crime. The researchers probed this question and, again, the results were negative. The algorithm did a better job of reducing all crimes, including very violent crimes.
What’s it all mean? For very structured predictions with clearly defined outcomes, an algorithm produced by machine learning can produce decisions that reduce both Type I and Type II errors as compared to decisions made by human judges.
Does this mean that machine algorithms are better than human judges? At this point, all we can say is that algorithms produce better results only when judges make predictions in very bounded circumstances. As the researchers point out, most decisions that judges make do not fit this description. For instance, judges regularly make sentencing decisions, which are far less clear-cut than bail decisions. To date, machine-learning algorithms are not sufficient to improve on these kinds of decisions.
(This article is based on NBER Working Paper 23180, “Human Decisions and Machine Predictions”, published in February 2017. The working paper is available here and here. It is copyrighted by its authors, Jon Kleinberg, Himabindu Lakkaraju, Jure Lesovec, Jens Ludwig, and Sendhil Mullainathan. The paper was also published, in somewhat modified form, as “Human Decisions and Machine Predictions” in The Quarterly Journal Of Economics on 26 August 2017. The paper is behind a pay wall but the abstract is available here).
When I was climbing mountains regularly, I thought I had pretty good intuition. Even if I didn’t know quite why I was making a decision, I generally made pretty good decisions. I usually made conservative as opposed to risky decisions. Intuitively, I could reasonably judge whether a decision was too conservative, too risky, or just right.
When I was an executive, on the other hand, my intuition for business decisions was not especially good. I didn’t have a “feel” for the situation. In the mountains, I could “fly by the seat of my pants.” In the executive suite I needed reams and reams of analysis. I couldn’t even tell whether a decision was conservative or risky – it depended on how you defined the terms. As a businessman, I often longed for the certainty and confidence I felt in the mountains.
What’s the difference between the two environments? The mountains were kind; the executive suite was wicked.
The concepts of “kind” and “wicked” come from Robin Hogarth’s book, Educating Intuition. Hogarth’s central idea is that we can teach ourselves to become more intuitive and more insightful. We have some control over the process, but the environment – whether kind or wicked — also plays a critical role.
Where does intuition come from? I wasn’t born with the ability to make good decisions in the mountains. I must have learned it from my experiences and from my teachers. I never set a goal to become more intuitive. My goal was simply to enjoy myself safely in wilderness environments. Creating an intuitive sense of the wilderness was merely a byproduct.
But why would I be better at wilderness intuition than at business intuition? According to Hogarth, it has to do with the nature, quality, and speed of the feedback.
In the mountains, I often got immediate feedback on my decisions. I could tell within a few minutes whether I had made a good decision or not. At most, I had to wait for a day or two. The feedback was also unambiguous. I knew whether I had gotten it right or not.
In a certain way, however, mountain decisions were difficult to evaluate. The act of making a decision meant that I couldn’t make comparisons. Let’s say I chose Trail A as opposed to Trail B. Let’s also assume that Trail A led directly to the summit with minimal obstacles. I might conclude that I had made a good decision. But did I? Trail B might have been even better.
So, in Hogarth’s terminology, mountain decision-making was kind in that it was clear, quick, and unambiguous. It was less kind in that making one decision eliminated the possibility of making useful comparisons. Compare this, for instance, to predicting that it will rain tomorrow. Making the prediction doesn’t, in any way, reduce the quality of the feedback.
Now compare the mountain environment to the business environment. The business world is truly wicked. I might not get feedback for months or years. In the meantime, I may have made many other decisions that might influence the outcome.
The feedback is ambiguous as well. Let’s say that we achieve good results. Was that because of the decision I made or because of some extraneous, or even random factors? And, like Trail A versus Trail B, choosing one course of action eliminates the possibility of making meaningful comparison.
It’s no wonder that I had better intuition in the mountains than in the executive suite. With the exception of the Trail A/Trail B issue, the mountains are a kind environment. The business world, on the other hand, offers thoroughly wicked feedback.
Could I ever develop solid intuition in the wicked world of business? Maybe. I’ll write more on how to train your intuition in the near future.
In their book, Decisive, the Heath brothers write that there are four major villains of decision making.
Narrow framing – we miss alternatives and options because we frame the possibilities narrowly. We don’t see the big picture.
Confirmation bias – we collect and attend to self-serving information that reinforces what we already believe. Conversely, we tend to ignore (or never see) information that contradicts our preconceived notions.
Short-term emotion – we get wrapped up in the dynamics of the moment and make premature commitments.
Overconfidence – we think we have more control over the future than we really do.
A recent article in the McKinsey Quarterly notes that many “bad choices” in business result not just from bad luck but also from “cognitive and behavioral biases”. The authors argue that executives fall prey to their own biases and may not recognize when “debiasing” techniques need to be applied. In other words, executives (just like the rest of us) make faulty assumptions without realizing it.
Though the McKinsey researchers don’t reference the Heath brothers’ book, they focus on two of the four villains: the confirmation bias and overconfidence. They estimate that these two villains are involved in roughly 75 percent of corporate decisions.
The authors quickly summarize a few of the debiasing techniques – premortems, devil’s advocates, scenario planning, war games etc. – and suggest that these are quite appropriate for the big decisions of the corporate world. But what about everyday, bread-and-butter decisions? For these, the authors suggest a quick checklist approach is more appropriate.
The authors provide two checklists, one for each bias. The checklist for confirmation bias asks questions like (slightly modified here):
Have the decision-makers assembled a diverse team?
Have they discussed their proposal with someone who would certainly disagree with it?
Have they considered at least one plausible alternative?
The checklist for overconfidence includes questions like these:
What are the decision’s two most important side effects that might negatively affect its outcome? (This question is asked at three levels of abstraction: 1) inside the company; 2) inside the company’s industry; 3) in the macro-environment).
Answering these questions leads to a matrix that suggests the appropriate course of action. There are four possible outcomes:
Decide – “the process that led to [the] decision appears to have included safeguards against both confirmation bias and overconfidence.”
Reach out – the process has been tested for downside risk but may still be based on overly narrow assumptions. To use the Heath brothers’ terminology, the decision makers should widen their options with techniques like the vanishing option test.
Stress test – the decision process probably overcomes the confirmation bias but may depend on overconfident assumptions. Decision makers need to challenge these assumptions using techniques like premortems and devil’s advocates.
Reconsider – the decision process is open to both the conformation bias and overconfidence. Time to re-boot the process.
The McKinsey article covers much of the same territory covered by the Heath brothers. Still, it provides a handy checklist for recognizing biases and assumptions that often go unnoticed. It helps us bring subconscious biases to conscious attention. In Daniel Kahneman’s terminology, it moves the decision from System 1 to System 2. Now let’s ask the McKinsey researchers to do the same for the two remaining villains: narrow framing and short-term emotion.
In their book Decisive, Chip and Dan Heath write about the need to honor our core priorities when making decisions. They write that “An agonizing decision … is often a sign of a conflict among ‘core priorities’ … [T]hese are priorities that transcend the week or the quarter … [including] long-term goals and aspirations.”
To illustrate their point, the Heath brothers tell the story of Interplast*, the non-profit organization that recruits volunteer surgeons to repair cleft lips throughout the world. Interplast had some ”thorny issues” that caused contentious arguments and internal turmoil.
One seemingly minor issue was whether surgeons could take their families with them as they traveled to remote locations. The argument in favor: The surgeons were volunteering their time and vacations. It seems only fair to allow them to take their families. The argument against: The families distract the surgeon from their work and make it more difficult to train local doctors.
The argument was intense and divisive. Finally, one board member said to another, “You know, the difference between you and me is you believe the customer is the volunteer surgeon and I believe the customer is the patient.”
That simple statement led Interplast to re-examine and clarify its core priorities. Ultimately, Interplast’s executives resolved that the patient is indeed the center of their universe. Once that was clarified, the decision was no longer agonizing – surgeons should not take their families along.
I thought of Interplast as I read the coverage of Brian Williams’ situation at NBC. In much the same way as Interplast, NBC had to clarify its core priorities. The basic question is whom does NBC serve? Is it more loyal to Brian Williams or to its viewing audience?
In normal times, NBC doesn’t have to answer this question. It can support and promote its anchor while also serving its audience. In a crisis, however, NBC is forced to choose. It’s the moment of truth. Does the company support the man in whom it has invested so much? Or does it protect its credibility with the audience?
Ultimately, NBC sought to protect its credibility. I was struck by what Lester Holt said on his first evening on the air: “Now if I may on a personal note say it is an enormously difficult story to report. Brian is a member of our family, but so are you, our viewers and we will work every night to be worthy of your trust.”
Holt’s statement suggests to me that NBC’s core priority is credibility with the audience. I certainly respect that. It also struck me as being very similar to the question Interplast asked itself.
Clarifying your core priorities is never a simple task. Indeed, it may take a crisis to force the issue. But once you complete the task, everything else is simpler. As my father used to say: Decisions are easy when values are clear.
*Interplast has been renamed ReSurge International. Its website is here.