I.
On January 8, 1992, Cameron Todd Willingham was accused of murdering his three daughters.
His daughters, two-year old Amber and one-year old twins Karmen and Kameron, had perished in a house fire while they slept. All three died of smoke inhalation. Willingham himself had been in the house at time, but escaped with only light burn marks.
In the scorched remains of the house, the deputy fire marshal found the telltale signs of arson. There were “pour patterns” on the floor that appear when a “liquid accelerant” — a flammable liquid like lighter fluid or gasoline — burns. The patterns appeared in the front of the house, the hall, and directly under the daughter’s beds. The windows contained “crazed glass”: small, random cracks caused by rapid heating, also evidence of a liquid accelerant. Multiple V-shaped burn patterns lined the walls, each pointing to a place the fire began. These indicators revealed a fire that had not arose organically or by accident. No, they revealed someone had poured a flammable liquid throughout the house, lit the pools, and let the house burn.
The only possible culprit was Willingham. There was no one in the house besides him and his three toddlers, nor signs of anyone else. Nor was there anyone else with a conceivable motive. Backed by the deputy fire marshal’s findings, the State of Texas accused Willingham of arson and murder.
The prosecution also cited non-fire-related evidence in the case, some of it ludicrous. A family psychologist diagnosed Willingham’s ownership of a Led Zeppelin poster depicting a falling angel as an “interest in cultive-type activites.” There were eyewitnesses, whose memories flip-flopped from Willingham seemed shattered by the loss of his daughters to he seemed too distraught and might be faking it.
At the unshakable core of the case, though, was the arson investigator’s expert testimony. Say what you will of bad psychoanalysis and unreliable eyewitnesses, there’s no arguing with the evidence left by the fire. The deputy fire marshal had compiled 20 indicators that the fire was “incinery” — i.e., started intentionally. Arson. These included the pour patterns, crazed glass and V-patterns, as well as a myriad of others: “low char burning, charred floors, the underneath burning of the base board, the brown stains on the concrete, the underneath of the bed.”
The deputy fire marshal said Willingham’s description of the night of the fire was “inconsistent with the burn patterns in the house.” Opining more bluntly, he called Willingham’s account a “pure fabrication”.
The jury unanimously found Willingham guilty. And how could they do otherwise? The person who knew the most about arson said that Willingham’s version of events was impossible. As the deputy fire marshall concluded:
The fire tells a story. I am just the interpreter. I am looking at the fire, and I am interpreting the fire. That is what I know. That is what I do best. And the fire does not lie.
Willingham was sentenced to death. On February 17th 2004, after more than a decade of unsuccessful appeals, he was executed.
But right before his execution, a contrary narrative flashed. Through a chain of Willingham supporters, the case had landed on the desk of another fire investigator, Gerald Hurst. What he saw was not a surefire case against Willingham, but rather the opposite: a tragic application of widespread arson-related myths. As Hurst succinctly said, “I studied the case and almost immediately found that it was clearly a bogus arson case.”
Was the previous fire investigator wrong?
II.
Looking back in time, it’s obvious that experts can be wrong. At least experts from olden days, before we invented science and reason.
Consider medieval doctors. Mocking their practices, like bloodletting and leeches, is practically a meme. Sorry, it literally is a meme:
It’s so much of a meme, I wondered: did doctors in the medieval ages actually believe making people bleed a lot would cure their ills? Or did one medieval guy do it one time and the internet blew it wildly out of proportion?
No, it was very much a real, common, and medically-sanctioned practice. From WebMD:
Egyptians were among the first to perform bloodletting more than 3000 years ago. From there, the practice spread to the Greeks and Romans, then Asia, and beyond. By the Middle Ages, bloodletting was widely practiced in Europe and barbers served as pseudo-medical providers. …. The practice of bloodletting as a cure-all didn't end with the Middle Ages. It was still the most popular medical treatment during the Enlightenment of the 17th and 18th centuries.
Those practicing bloodletting — sorry, therapeutic phlebotomy — genuinely believed in it. They passionately defended it and prescribed it to their loved ones and themselves. As described by hematologist DP Thomas, the “Dean of the Paris Medical Faculty, bled his wife 12 times for a ‘fluxion’ of the chest, his son 20 times for a continuing fever, and himself seven times for a ‘cold in the head’.”
It wasn’t until the 1800s that doctors started questioning the whole drain-sick-people-of-their-most-vital-bodily-fluid thing. Thomas describes the work of pioneering bloodletting-skeptic John Hughes Bennett (emphasis mine):
In an impressive statistical analysis of survival rates following pneumonia in European and American hospitals, [Hughes Bennett] concluded that bloodletting did not improve survival. … For example, in a series of 105 consecutive cases of simple, uncomplicated pneumonia treated by him without bloodletting over 18 years at Edinburgh Royal Infirmary, Hughes Bennett reported no deaths in his patients as a result of pneumonia. These results were in marked contrast to his detailed analysis of patients at the same hospital ‘when upwards of one-third of all patients affected with pneumonia died who entered during a period of ten years when bleeding and an antiphlogistic treatment was universally practised’.
To recap: Hughes Bennett’s early studies suggested that bloodletting might be responsible for killing a shocking one third of the patients. The medical scientific method was still in its awkward teen phase; his studies were neither ‘randomized’ nor really ‘controlled’. Still, they marked a large improvement over previous doctors’ rationale: I bloodlet a lot of people and many of them lived. If only I had bloodlet more! By relying on statistical comparisons across many patients, Bennett saw that bloodletting might not be so great.
Other contemporaneous doctors came to similar conclusions. Pierre Louis “took a strong stand in favour of facts and figures, as opposed to ‘sterile’ theorising, and concluded that for most patients there was no convincing evidence supporting bloodletting.” A military surgeon, Alexander Lesassier Hamilton, published “a controlled trial of bloodletting” in 1816 that suggested that it might be responsible for killing a quarter of patients it was used on.1
Yet, many doctors and institutions continued to act like bloodletting was the bees knees through the 1800s.
In his Gulstonian Lecture to the Royal College of Physicians, London in 1864, Dr WO Markham (1818– 1891), a physician to St Mary’s Hospital in London, made a plea for the judicious use of bloodletting in certain conditions, deploring that venesection was carried out less frequently than before.
Somewhat hilariously, doctors were ready to take legal and vigilante action to ensure other doctors were bloodletting enough people. One lecture noted “a doctor was charged with malpractice for failing to bleed a patient with pneumonia.” Another physician recounted a conversation in 1850s Germany:
When he asked what would happen if a patient in Bonn died without being bled, he was told that the assistant who had attended the case would be waylaid and unmercifully beaten.
Textbook editions as late as 1892 and even 1930(!) write positively of bloodletting.2
Eventually, the reckoning did happen, albeit glacially. Even though the medical establishment debated Hughes Bennett’s research in the 1850s, it wasn’t until the late 1800s and early 1900s that bloodletting fell out of favor. It took decades, arguably a century, for the medical establishment to accept the evidence against it.
III.
Though unbeknownst to Willingham, arson science in the 1990s was going through a similar reckoning.
In 1990, fire investigators in Jacksonville were tasked with proving that a local fire was arson. In a Mythbusters-worthy mindset, the fire investigator John Lentini asked: why not torch the identical building next door?
So began the Lime Street fire investigation. The investigators bought an identical, vacant duplex next to the one that had burned, arranged it to mimic the original building, and burned it without a liquid accelerant. They wanted to show that the burn patterns — the same ‘pour patterns’ that appeared in Willingham’s case — didn’t appear when there was no lighter fluid or other liquid accelerant. If so, the original fire’s burn patterns must have resulted the use of a liquid accelerant. Thus, arson.
The opposite happened. The husk of the building burned without liquid accelerant looked identical to that of the original fire. The resulting burn marks were a “dead match with the original scene.” The ‘pour patterns’ either appeared regardless of whether liquid accelerant was used. A supposedly telltale sign of arson meant nothing.
Lentini continued to challenge the dogma of arson science. Upon examining the remains of homes burned in natural brush fires, they found even more errors in the current consensus.
For example, experts considered “crazed glass” (also cited in the case against Willingham) evidence of rapid heating and therefore a sign of arson. In examining the brush fire, it became clear that the opposite was true: crazed glass occurred when hot glass was rapidly cooled, most commonly when hit by firehose water. Crazed glass, also considered a sign of arson, only meant that firefighters tried to put to out the fire.
Pour patterns and crazed glass were only the beginning. One by one, the supposed hallmarks of arson were revealed to have nothing to do with it.
The original case against Willingham cited 20 “indicators” of arson. By the time George Hurst, the fire investigator who gave the case a second look, examined the evidence, all 20 indicators could be deemed meaningless. (Hurst said even having this many indicators was “absurd”.) Going through the indicators, 19 were based on “old wives’ tales”, and the remaining one was easily explained by a can lighter fluid on the porch. “Todd Willingham's case falls into that category where there is not one iota of evidence that the fire was arson, not one iota,” Hurst said.
In case you’re worried Hurst is some arsonist-loving crank, the State of Texas ordered a re-examination of the original case in 2009, which said:
The investigators had poor understandings of fire science and failed to acknowledge or apply the contemporaneous understanding of the limitations of fire indicators. Their methodologies did not comport with the scientific method or the process of elimination.
Most damningly, the report concluded that “a finding of arson could not be sustained.”
Worse, Willingham’s case was not an isolated case of a rogue, incompetent fire investigator. No, it was part of a widespread trend of shoddy arson ‘science’. According to the American Bar Association in 2015:
For decades, fire investigators relied on a set of erroneous beliefs and assumptions, akin to folklore, about what were thought to be the telltale signs of arson that were passed down from one generation to the next and accepted at face value.
The same sentiment is echoed in the New Yorker’s piece of Willingham, Trial by Fire:
In most states, in order to be certified, investigators had to take a forty-hour course on fire investigation, and pass a written exam. Often, the bulk of an investigator’s training came on the job, learning from “old-timers” in the field, who passed down a body of wisdom about the telltale signs of arson, even though a study in 1977 warned that there was nothing in “the scientific literature to substantiate their validity.”
Some accused were luckier than Willingham, or at least as lucky as anyone falsely imprisoned for arson can be. The ABA identifies 31 people originally found guilty of arson who were eventually exonerated “at least in part on the basis of new evidence that they did not commit arson”.
But even this count understates the impact of bad arson science. According to Lentini (the original investigator of the Lime Street fire), “hundreds—if not thousands—of accidental fires had been wrongly determined to have been intentionally set.”
IV.
One of the most tragic things, both in arson science and bloodletting, was the length of time it took for expert communities to change their views.
In the case of bloodletting, it took decades to go from “here’s evidence against bloodletting” to “the medical community roundly rejects bloodletting”. As Thomas notes,
From today’s perspective, perhaps the most surprising aspect of the pioneering work of Louis and Hughes Bennett was how slow the medical profession was to accept their strong evidence.
In the case of arson science, the Lime Street fire investigation happened before Willingham’s house burned down. Tragically, some fire investigators knew the alleged arson ‘indicators’ that sealed his fate were myths before his execution, before his trial, and even before his house burned. But because most fire investigators were resistant to the changes, the new arson science did not reach anyone involved in his case until it was too late.
You might wonder: what the hell? Why did these experts refuse to change their minds?
Cass R. Sunstein and Reid Hastie provide a hint in this Chicago Booth Review article.
In one study, they separate out people into groups of 6, where each group leans either liberal or conservative. Each group is then assigned to deliberate over a few hot-button political topics. Afterwards, people record their views on these issues, which Sunstein and Hastie compare to their views before deliberation.
The big takeaway:
[The liberal-leaning group] became a lot more liberal on all three issues. By contrast, [the conservative-leaning group] became a lot more conservative.
This swing happens both at the group and individual levels. The group ‘verdict’ was more extreme than the average of the group before deliberating, AND each individual’s views became (on average) more extreme. As they summarize:
Group deliberation often makes not only groups but also individuals more extreme, so much so that they will state more extreme views privately and anonymously.
I find this research so helpful for thinking about echo chambers.
I, probably like many people, held that deliberation was the antidote to groupthink. Through an honest airing of ideas, we create a more accurate, unbiased picture of the truth.
No so fast, Sunstein and Hastie say. Instead of reaching a more accurate picture truth, deliberation encourages us to identify the shared beliefs and attitudes within a group and then go hard on those. Deliberation isn’t the antidote for groupthink; it’s the cause.
Sunstein and Hastie find this not only for political orientation, but also for other attributes. Liberals discussing politics with other liberals become more liberal, and analogously for conservatives. But also, groups of risktakers push each other to take greater risks. A group more oriented toward caution, though, becomes more cautious after deliberation.
They propose a few mechanisms by which this happens. One, in deliberation, more arguments are made in favor of the group’s pre-existing shared preference, making the case for it appear especially strong. Two, each participant notices the group’s shared preferences, then voices opinions similar to it for smoother conversation and social validation.
Expert communities, almost by definition, have shared beliefs and attitudes.
To become an expert, one must go to an institution that:
Teaches the requisite knowledge
Tests that all potential experts have internalized this knowledge
Bestows credentials in the form of little letters after experts’ names that say HELLO I AM AN EXPERT
Once one has become an expert, they teach, test, and bestow credentials onto future generations of experts.
Obviously, any expert ends this process with beliefs that they share with other experts. That’s the point! If you’re a medieval doctor who’s spent years being told by your superiors that bloodletting saves lives, being tested to confirm that you understand this, prescribing bloodletting to patients, and then teaching new doctors that bloodletting is great, of course you’re gonna believe it works.
Moreover, experts may self-select into institutions based on other people in it, or the incentives of the institution. As described by the ABA:
Most [fire] investigators, whose jobs were to “catch arsonists,” were former police officers or firefighters with little or no scientific background or training. They learned on the job by watching experienced investigators who learned the trade from their superiors, perpetuating a belief structure that still influences some practitioners today.
If you get a bunch of former police officers who think of their job as “catching arsonists”, it isn’t surprising that they, on average, are inclined to believe more things are signs of arson, and that more people are guilty of arson.
These forces help explain why Hughes Bennett or John Lentini’s colleagues were so resistant to updating their views. Sure, there was some evidence that bloodletting/arson science wasn’t perfect. But it was, like, one study! They had years of experience that suggested the opposite. Moreover, their fellow experts overwhelmingly agreed: it just wouldn’t make sense for their communal knowledge to be wrong. Sure, they went through all the same training and ingraining, and were subject to the same institution pressures. But still, they were the foremost experts of the field. Surely they’d all agree if they’d been taught something wrong.
In this way, the institutional nature of expertise works against experts. Rather than open them to new ideas, it can entrap them in bad ones for years and decades past their expiration.
V.
Criticizing experts is coded somewhere between “right-wing” and “QAnon antivaxxer flat-earth territory”. This is too bad, because the lesson here is not partisan or even conspiratorial.
In fact, I think most of us are already skeptical of some experts.
For example, I myself doubt findings from the top academics in psychology and nutrition. I don’t think it’s all bullshit. But I think there’s some bullshit because institutional pressures warp those fields’ findings. Professors need to publish papers with positive results that can be packaged into neat little marketable pop-psych/health nuggets. Those pressures, combined with publication bias, p-hacking and full-on fraud, generate overstated and outright false conclusions. These proliferate for years, get big NYT write-ups, and become common knowledge before Data Colada exposes them.
And while “trust the experts” is a sorta left-leaning slogan, progressives treat mainstream economists like nerdy kid brothers who won’t shut up about deadweight loss. Economists overwhelmingly oppose rent control, making public tuition in college free, and raising the minimum wage; progressives love those policies. Progressives are skeptical of or downright hostile to free trade and de-regulating the housing market; economist love those policies. Progressives roll their eyes at economists, even though they’re ostensibly the experts on the economy.
While I personally defer to economists, I also give progressive skeptics credit. One, economics is literally referred to by economists as “the dismal science”. Moreover, the institutional echo chamber can also affect economists. If the current economic consensus is “free trade is good for developing countries”, economic experts are incentivized to promote that view, which furthers the communal sense that that view is right. It’s hard to tell whether an idea sticks in the expert consensus because it’s right, or because it’s stuck.
My goal is not to pick on economics or psychology or nutrition, but to show that many people, and not just the right wing and conspiracy nuts, question experts. On some level, we know experts can be smart, knowledgeable, in agreement, and wrong.
In medieval medicine and arson science, the experts were wrong because they didn’t subject their beliefs to the scientific method. They knee-jerk defended them instead of genuinely testing them. The fix, of course, was subjecting these beliefs to scientific scrutiny. Hughes Bennett did it with his proto-version of controlled trials. John Lentini did it when he decided to burn down that neighboring duplex. They put the science into science.
Being scientifically-minded, of course, is a good start. It’s crucial to subject beliefs to testing, confirm what works, wrestle with contrary evidence, and burn down a duplex when needed. Also, since groups reinforce their shared beliefs, making scientific-mindedness a core group value might counterbalance other biases.
Still, given how long it took ostensibly-scientific institutions to reject their own myths, I worry shared scientific-mindedness isn’t enough.
Being smart and informed and even ‘believing in science’, I think, doesn’t make a group immune to groupthink. In medieval medicine and arson science, they believed they were examining the scientific evidence — they just considered their own personal experience the highest form of evidence. And while it would be nice to think we’ve ironed out all the wrinkles and we’re maximally scientifically sound now, come on, that’s not the case. We’ll find errors in current experts’ approaches that will someday make us cringe. However smart and scientifically-minded experts are, they’ve still self-selected into the same community and been molded by the same institutions. They share the same perspective, and thus they may reinforce the same blindspots.
Credit to this incredible comment from Reddit user BesideRounds on r/AskHistorians.
More from Thomas. The first edition of William Osler’s Principles and Practice of Medicine in 1892 reads, “During the first five decades of this century the profession bled too much, but during the last decades we have certainly bled too little. Pneumonia is one of the diseases in which a timely venesection may save life. To be of service it should be done early… the abstraction of from twenty to thirty ounces of blood is in every way beneficial.” The 1930 edition reads, “To bleed at the very onset in robust healthy individuals… is good practice.”
Great writeup. I think overall the arrogance and assurance that we know everything most modern, scientific 'experts' have is the biggest problem.
Scientism, by another word. It seems that if we don't have an actual religion to put our absolute faith in, we will make one of whatever authorities we have at hand. Even if, like with science, the entire point of the endeavor is not to trust things on authority.