Debates around evidence-informed teaching and attempts to convince others using evidence have led me to make several observations, four of which I share in this series of posts (click for the first, second, and third posts). This post focuses on my fourth and final observation:
Observation 4: Focussing on [available] evidence to our inform actions can result in us devaluing worthy goals for which evidence about effective practice is difficult to obtain, while prioritising other [possibly also worthy] goals for which evidence about effective practice is easier to collect
Example: Removing the wrong kidney
Back in 2000, Graham Reeves underwent an operation to remove a diseased kidney. Disastrously, the surgeons removed his healthy kidney by mistake. Despite realising their mistake shortly afterwards and attempting to improve the condition of the diseased kidney, Graham Reeves died shortly afterwards. It later transpired that a simple error—an incorrectly-filled admittance form—resulted in this sad tale. Of course, as with air crashes and other such disasters, it is helpful to consider the whole chain of events that led to this outcome. In this case, even given the incorrect admittance form, the mistaken removal could have been prevented if the surgeons had referred to the patient’s X-ray or his notes just prior to operating.
Now suppose* that as a result of this incident, new protocols were set in place to prevent something similar from happening again. Firstly, each admittance form henceforth had to be independently verified against available notes and X-rays by a doctor not directly involved in the care of the patient. Secondly, a non-medical administrative manager had to sign off that this ‘independent’ doctor had indeed completed their verification. Only then would the manager give clearance for surgery to take place.
*These are hypothetical protocols made up for illustrative purposes.
Suppose also that records were kept of every error found on forms. After ten years, a review of the new protocols showed that a number of errors were caught and corrected on the forms before patients went in for surgery. Not all of these patients would necessarily have suffered the fate of having the wrong organ removed under the old protocols, because some surgeons might have referred to X-rays and notes anyway. Nevertheless, everyone agrees that the evidence shows that the new protocols significantly reduce the likelihood of surgeons mistakenly removing the wrong organ.
Does this mean that the new protocols were worth introducing? Let’s suppose many surgeons pronounce themselves unsure about this, despite agreeing that the new protocols do reduce the risks it aims to reduce. These surgeons express their doubts because they experience the increased bureaucracy imposed by the new protocols, though they struggle to find hard evidence of drawbacks arising from the new system. After all, it’s very tricky to test qualitative—let alone quantitative—claims about counterfactual scenarios. But perhaps the increased bureaucracy increases waiting lists, leading to patients getting treated later than they would have under the old protocols, and thus indirectly increasing mortality?
The very nature of the new protocols, designed as they were to address one particular issue, makes it very easy to test their success at addressing that issue. In this case, the new protocols do indeed significantly reduce the likelihood of surgeons mistakenly removing the wrong organ. The problem here is not that the evidence is somehow wrong or unreliable in and of itself—it is that the evidence may be incomplete because it doesn’t consider the unintended impacts of the new protocols.
Any given policy might have several worthwhile targets. Progress against some targets is sometimes easier to measure than progress against other targets. Where progress is easier to measure, evidence is obviously easier to collect. Therefore, using the available evidence to draw conclusions about the effectiveness of the policy risks prioritisation of those targets against which it is easy to measure progress, and devaluation of those other worthwhile targets against which evidence does not exist, perhaps because it is difficult to collect.