Trolley problem as an issue with Bayesian priors

Creating mini worlds


We've all read about the trolley problem. Enough for it to have possibly annoyed you to acts of uncharacteristic cruelty. So let's start with the best solution to this problem that I've seen so far.

In the words of Wikipedia:

The trolley problem is a series of thought experiments in ethics and psychology, involving stylized ethical dilemmas of whether to sacrifice one person to save a larger number. Opinions on the ethics of each scenario turn out to be sensitive to details of the story that may seem immaterial to the abstract dilemma. The question of formulating a general principle that can account for the differing moral intuitions in the different variants of the story was dubbed the "trolley problem" in a 1976 philosophy paper by Judith Jarvis Thomson.

Philippa Foot, in 1967, introduced a series of decision problems across a bunch of topics including abortion and double effect doctrine. She wrote:

Suppose that a judge or magistrate is faced with rioters demanding that a culprit be found for a certain crime and threatening otherwise to take their own bloody revenge on a particular section of the community. The real culprit being unknown, the judge sees himself as able to prevent the bloodshed only by framing some innocent person and having him executed. Beside this example is placed another in which a pilot whose airplane is about to crash is deciding whether to steer from a more to a less inhabited area. To make the parallel as close as possible, it may rather be supposed that he is the driver of a runaway tram, which he can only steer from one narrow track on to another; five men are working on one track and one man on the other; anyone on the track he enters is bound to be killed. In the case of the riots, the mob have five hostages, so that in both examples, the exchange is supposed to be one man's life for the lives of five.

If you're a utilitarian, the choice is clear. Save more lives of course. If you're presented even with a modified version with a fat man on one side, or an old man on one side, or a baby on one side, you can still do QALY calculations and in a pretty straightforward fashion come to a moral calculus.

If you're a virtue ethicist on the other hand, participating in any way in the system implicates you within the system, which means doing nothing is the only viable option.

Since philosophers love playing around, they also created an incredible number of variants to check all sorts of other permutations of the trolley problem. My favourite is one where we get to actually use it to hit philosophers:

There’s an out of control trolley speeding towards Immanuel Kant. You have the ability to pull a lever and change the trolley’s path so it hits Jeremy Bentham instead. Jeremy Bentham clutches the only existing copy of Kant’s Groundwork of the Metaphysic of Morals. Kant holds the only existing copy of Bentham’s The Principles of Morals and Legislation. Both of them are shouting at you that they have recently started to reconsider their ethical stances.

Since all of this sounds rather like play-acting, a few academics also decided to put it to an empirical test. So in 2001, Joshua Greene did a large scale empirical investigation to figure out what people would actually do.

When they did an fMRI investigation of emotional engagement they found that, quoting, "personal" dilemmas (like pushing a man off a footbridge) preferentially engage brain regions associated with emotion, whereas "impersonal" dilemmas (like diverting the trolley by flipping a switch) preferentially engaged regions associated with controlled reasoning.

The empirical analysis continued thereafter unabated, trying all sorts of permutations and combinations to figure out what's actually going on people's minds, and their philosophies, to move the needle.

Then again in 2017, Michael Stevens led a group to perform yet another realistic experiment, and concluded that most people don't end up pulling the lever after all.

If all of this sounds pretty annoying, you're not alone. For one particular method of solving this seemingly intractable problem, watch this.

That kid knows what's up. Or at the very least to stop his dad from giving him sophomoric philosophy challenges.

More commonly, even though it's an experiment designed to test and refine our moral intuitions, it fails miserably in understanding what it predicts. Our actions, thoughts and morality is not platonic, independent of our actual lives. And acting as though they are is moronic.

Basically there's not enough meta game analysis happening here.


Like all good theories and happy empiricisms, there have been a whole bunch of criticisms levelled at the thought experiment.

Several suggested that the entire scenario is totally unrealistic and extreme and unconnected to any real-life intuitions that people, you know, actually possess. To which I can only imagine the rest of the scientific community said a big loud duh!

Another major line of criticism is that the problems as written fall foul of risk aversion in our epistemic analyses. Since risk aversion will naturally make us unlikely to act if we perceive that the act could have downsides (how do you actually know that your level-pulling plan will work?), this is likely to make us reluctant. In the version where you have to push a fat man to stop the trolley from moving, if it doesn't work we're condemning 6 people to die instead of 5.

This is a probability theory question. If the probability that your action will likely save five, and that without you acting they will definitely perish, then you have to just do the math. So while there is a clear epistemic reason for inaction, or hesitation, there's also a clear answer.

Yet another major criticism is that people have uncertainty aversion. People don't like choosing without knowing a bit more about the chances assigned to their actions. You would rather perform an action if there's a measurable probability of success, than complete chance.

Yet another is around whether you're actually performing a direct action (pushing the fat man) vs performing an indirect action (pushing a lever). And turns out people have preferences towards performing indirect actions since it triggers less of the emotional reaction within us.

In yet another empirical experiment researchers tested 200 people and replaced the people on the tracks with mice getting electric zaps. Interestingly the hypothetical scenario showed that 66% of people would press the button to zap one mice instead of five, when the experiment was actually conducted with real mice in front of them that jumped to 84% who actively zapped the one mouse.

The conclusion was that people don't think as emotionally in real life scenarios as one might've imagined. But I see it differently. When you see the five mice in front of you that triggers an emotional reaction too! And there's no a priori reason to believe that the action of zapping one will cause an emotional reaction that's somehow worse than the thought of five of them getting zapped when they're in front of you!

One reason for the resurgence in interest in this rather esoteric thought experiment is that we are now designing autonomous vehicles all around the world, and it would be useful to give it some sort of moral standing to make its decisions.

But converting the intuitions developed by the trolley problem to the tradeoffs and prioritisation that every algorithm needs to do is somewhat disingenuous.

Everything that we do involves tradeoffs. Your Google search returns information in a particular order, and reaching that order involved thousands of tradeoffs in assessing it against your search history, other people's search history, the particular keywords you used, the website's own information content and ranking, and a whole bunch of other variables.

I was particularly flabbergasted that there was a real-life trolley problem that happened in 2003. A 31 freight car train, unmanned, was diverted to make sure it didn't enter the Union Pacific yard in LA where a Metrolink passenger train was thought to be, and ended up going through an area of low housing density and causing some property damage, along with some pretty jumpy residents.

The interesting thing about the incident isn't that it happened, it's that the cost-benefit analysis, especially post hoc, reads as a rather straightforward, even banal, decision.


There are a whole bunch of other similar thought experiments that seem to conclusively prove various interesting things - like people are not entirely disposed to hedonism (Nozick's experience machine), Waldmann and Dieterich's analysis of locus of intervention, the assessment of the role of reasoning vs intuition by Fiery Cushman et al, the assessment of Actor-Observer bias, the amazing book Morality, Mortality, and dozens more.

These theories, and in most cases the experiments, forget that people don't make decisions in a vacuum.

For a change let's take Nozick's example.

It seems to work because if you ask people whether they want to sit quietly in a room and get every type of pleasurable and desirable experiences we could imagine, would you do it?

And lo and behold, people answer no.

Which is seen as somehow debunking the idea that we're all hedonistic animals. The sequence of thoughts is clearly simple - if the machine can give us pleasure, and pleasure is what matters to us, then clearly plugging into it is the right thing to do. We should all swallow that blue pill all day long.

But the first time I encountered it I didn't have a yes/no answer paradigm that I could easily plug the question into.

When we get something like this presented to us, it's only natural to ask a few questions. What is this damn machine? Does it actually work? How often does it screw up? Is it similar to my phone where I pay exorbitant sums of money and turns out that it not only needs constant love and attention, but also recharging every half a day?

And when you compare it with the machines we actually have experience using is there any way that someone could legitimately make themselves believe in this mythical machine? Might as well ask it to answer all our philosophical problems and wait a few million years.

And this isn't just because people believe plugging into a machine is suicide, or that this is contrary to societal functioning which is somehow inbuilt in us, or that this goes against the concept of free will.

Those all might be true, but people don't usually do well when placed into a hypothetical they have difficulty imagining. It's like asking someone to imagine a 4 dimensional hypercube. Not something that's easily going to get simulated in our wetware, mathematical logic be damned. Maybe the best that can be said about it is that it inspired a slightly more entertaining version in Infinite Jest and created an even more entertaining world with the Matrix.

That's why in the trolley problem, it's not a matter of simple QALY calculations of lives lost vs lives gained. It's also not just a calculation of the various ways in which our actions might make us complicit in a situation, or the difficulty of measuring the exact impact of our actions (does pushing the fat man down really have a chance of falling down in the exact right trajectory to change the path of the train?).

It's the fact that our moral intuitions have forced on us a few general precepts over millennia of evolution and they don't bend in ways rational calculus tell us they should. For instance:

  • In general, do less harm. Which means actively trying to hurt someone is seen as wrong. Whether this is for utilitarian reasons like if you don't hurt others you'll get better treatment from the group is besides the point, since it still remains as an evolved trait.

  • In general, do things that maximise the chances of your desired outcome happening. Which means even if we want to kill the one person vs five, we have to have belief that our actions will lead to that outcome actually occurring - e.g., the trolley truly can be controlled by this lever, pulling it works, there are no other externalities, etc.

The weight of these individual principles vary based on the situations we encounter. They're also mostly assigned unconsciously. And they also weigh potential consequences of the decisions we make into the decision making process.

For instance, if we were to cause less harm by intentionally steering the train towards the one person and away from the five, we would also have to believe that this action will not bring further moral and legal and societal repercussions on us. That we won't be tried in a court of law. That we won't wake up with night sweats because we did something immoral. That the world at large will see what we did and be okay with it.

And that's too high a bar.

Without taking these biases into consideration, when we ask people to make decisions, you'll only get a skewed view. And that's true even if we have explicitly told them to disregard them.

Using hypothetical problems to get to moral intuitions presents a tautological issue. People can only respond through their existing moral apparatus. Presenting them with the fact that they did something that doesn't necessarily conform to an artificial standard only shows the wide chasm between the artificial standard and the actual one.

It's like stopping people and shaking them and going "you're not behaving like Homo Economicus". Even if you're right, and let's face it you'll definitely be right, it doesn't matter. You might as well chastise them for not flying.

Similarly trying to calculate our moral intuitions by asking us to imagine impossible scenarios seem to work out just as one would have thought. The moral intuitions we possess are not sitting inside a box marked moral intuitions somewhere in the cerebellum. In a paper trying to peek under the neuroscientific underpinnings of moral emotions we find :

Moral emotions are thought to emerge as neural representations that rely on the coactivation of brain regions that code for perception of social cues (temporoparietal junction and other posterior cortical regions), social conceptual knowledge (chiefly the anterior temporal cortex), abstract event sequence knowledge (anterior, medial, and lateral sectors of the prefrontal cortex), and basic emotional states (including subcortical structures such as the basal forebrain and hypothalamus).

A separate paper calls for the introduction of a representational approach.

Based on clinical evidence and functional imaging studies, the authors suggest that moral emotions emerge as neural representations from the coactivation of brain regions that code for perception of social cues, event knowledge, and emotion. According to this hypothesis, the neural bases of moral emotion, knowledge, and attitudes are better explained by a representational approach, in contrast to the view of neural processes as guiding moral appraisals

They're diffuse principles, not exact formulae in an excel cell. To the credit of modern researchers they do recognise it. They just walk around it with eggshells and pretend that the issue is one of underspecification.

But the issue doesn't go away unfortunately just through better boundary drawing. They might look like they're retreating into the shadows, but that's because the parts which seem like error terms only show up when looked at systemically.


The commonality amongst all the types of problems mentioned is that we have a global view vs local view confusion. They're problems of specification. There are problems we can understand through interrogation and subdivision of parts, and others we can only see by stepping back and looking at the whole picture.

Knowing something about particular input-output combinations tell us how a machine works, but that method doesn't seem to translate all too well to troubleshooting concepts that we hold amorphously in our brains. The only way these types of thought experiments, whether armed with fMRIs or no, can give us results is when we have an implicit belief in the validity of this input-output system.

Instead of asking why we think about our intuitions or moral circuits as reasonably well defined quasi-binary decision making circuits, it could also be that it's a messy blend of cross-cutting web of intuitions that returns roughly "correct" answers in a few real world situations.

We also have system 2 circuits to help create more thought out meta-circuits that tell us that we should do certain things because they're objectively better (according to predefined criteria), but almost by definition they don't do well through sole focus on input-output assessments.

There's a koan that goes thus:

Two monks were arguing about the temple flag waving in the wind. One said, “The flag moves.”

The other said, “The wind moves.”

They argued back and forth but could not agree.

Hui-neng, the sixth patriarch, said: “Gentlemen! It is not the flag that moves. It is not the wind that moves. It is your mind that moves.”

The two monks were struck with awe.

And another that goes:

When asked why he practiced zen, the student said, “Because I intend to become a Buddha.”

His teacher picked up a brick and started polishing it. The student asked “What are you doing?”

The teacher replied, “I am trying to make a mirror.”

“How can you make a mirror by polishing a brick?”

“How can you become Buddha by doing zazen? If you understand sitting Zen, you will know that Zen is not about sitting or lying down. If you want to learn sitting Buddha, know that sitting Buddha is without any fixed form. Do not use discrimination in the non-abiding dharma. If you practice sitting as Buddha, you must kill Buddha. If you are attached to the sitting form, you are not yet mastering the essential principle.”

The student heard this admonition and felt as if he had tasted sweet nectar.

The entire point of these koans is to "open the mind" of the student through the effort to answer an unclear and undefined question. Apart from bringing forth within me an irresistible urge to tell someone to shut up, these koans feel similar to the empirical philosophy tests we looked at in part II.


In any complex system there will always emerge a few corner cases where the broad hypotheses we hold will not apply. Rules based assessments of complex systems will have leakages when compared to the messy reality.

  • In general, raising taxes acts as a brake to economic growth. But cutting taxes on the very top doesn't increase the growth either

  • In general, evolution seems to give rise to increasing complexity in creatures. But it also gives rise to viruses, and viruses that affect other viruses called virophages

  • In general it's a good idea to cut prices if you want the quantity that people buy to go up. But there are also Giffen goods and Veblen goods

Maybe not the best examples, but corner cases abound in any area we care to look at. It's almost the main reason we have oodles of lawyers around to maintain cohesiveness in the face of law's awesome complexity. Here the article elucidates:

A legal researcher is tasked with writing a report on the law on the protection of wild animals. She draws up 3 lists of the legislative provisions, court decisions, and governmental regulations. But, she knows she must also take into account judicial cases on the interpretation of legislation, the way government regulations have implemented acts of parliament, and any reliance by the courts on executive orders. Now she can see a networked relationship between the rules adopted by the legislature, courts and executive, evidencing a more complicated picture than suggested by her lists, and, being a talented computer scientist, she develops a programme to model those relationships. But, still, she sees only part of the picture, as there will be relevant legislative provisions and judicial decisions in property law and human rights law, etc. When writing her report and reflecting on the legal rules and networks of relationships, she begins to see patterns in the rules, a body of "Wildlife Law", albeit its content is not always clear, and she must make choices when filling in the gaps in this Wildlife Law, and in the exercise of that discretion, we would expect (and hope) she would use her professional and ethical judgement.

The same thing applies in AI training. Rules based approaches where you had fidelity of input-output were in vogue for decades. But as computational power has increased in the past two decades we've been able to throw part of it out and create a more complex whole which has the ability to train entire classes of problems. Machine learning has taken over.

Most of politics is an argument on averages on one side and throwing a corner case where that doesn't apply on the other.

(yes it is rather unfair that the other side vilifies your position and thought process like this.)

And if you're interested in how knowing about loopholes can itself be hijacked, this article can give you hours of fun. It provides a perfect example of how drawing boundaries haphazardly around different groups of known facts creates unintended consequences in the form of emergence of idiotic hypotheses that are not easily explainable within the boundaries of that same fact set.

The koans and the philosophy principles both act as corner case detectors. They are wonderful at identifying the boundaries or areas where, much like an old timey map, we can write "there be dragons".

These dragons laugh at our preconceived principles and spew small unoccupied areas in our mental hypercube where they don't apply. That's their whole job.

Like most things that live outside the world of pure mathematics or the hard sciences, this is a challenge we just have to live with.

We can have a clear view of our morals in a toy-land and continue to wonder why our intuitions seem so wrong when compared to specific cases that seem unlikely to happen in real life, or we can try to create a slightly less tangible but slightly more accurate depiction of how our philosophy might work. Not through the application of ever more complex rules creating epicycles atop each other, but by creating mini-worlds that mirror our own, to build theories that have a chance of surviving a meeting with reality.