The moral paradox of LLM technology
The moral paradox of LLM technology
Researchers say we should steer AI systems towards moral competence. It sounds reasonable. It is probably impossible.
The problem is not ambition. It is architecture. Morality is not a list of rules you can append to a model. It is a layer on top of everything else, applying in context, shifting with circumstance, weighing competing values against each other in ways that resist reduction to a score. Morality is a thin veil gracing everything we do. To hard-code morality you would need to train everything once more, with a moral weight on absolutely everything. And even then you would have only approximated the surface of it.
Worse, a technically moral AI could develop a blindspot for immoral behaviour. If the model is trained to avoid certain outputs, it learns to recognize the shape of those outputs, not the harm behind them. And that is an important difference. Clever prompting can route around shape recognition. The model justifies the output because nothing in its training told it what to feel about the harm. It is easy to have a model flag certain words or behaviours. It is much harder to have it reflect, bob and weave in the moral landscape. Flagging is a list. Morality is not.
And then we put emotions into the mix. That is the part a probabilistic system cannot truly replicate. Guilt, pride, compassion. Though the claim that moral intuition stems from emotion is heavily contested, we need to acknowledge that immoral behaviour can directly affect our feelings, thus teaching us valuable lessons about morality. A probabilistic system cannot learn from moral failure the way humans do. It can only behave in ways that look moral from the outside, and it is therefore hard for it to detect immoral behaviour. That difference could matter when the system is being adversarially prompted by someone who knows exactly where the edges are.
For now, we are stuck with moral mimicry from our systems, and that could lead to real harm.
The trolley problem breaks it
Even on simple moral questions, the cracks show. Is it okay to lie to protect a child's feelings? Kant says no. I say yes. Neither of us is wrong in every context.
Take the trolley problem. Pull a switch, actively kill one person to save five. Seems like an easy objective calculation to a machine. But now the one is a child and the five are terminally ill elders. The calculation changes. Most people would feel it change before they could articulate why.
Now make it harder. The one is a researcher on the verge of eradicating cancer. The five are children. And here is where it gets uncomfortable: the calculation now depends on a probability. In the timeline where he succeeds, killing five children to save billions might be defensible. In the timeline where he does not, you just killed five children for nothing. The morally correct choice depends on information you do not have and cannot have at the moment of decision. You are not weighing lives anymore. You are weighing timelines.
So to be able to code morality, to make the trolley problem computable, you have to assign a value score to each human life. Age group, health group, social contribution, probability of future impact, and every other small quirk that might give a human life value. You have to weigh human beings against each other on a scale you built. You have to decide how much uncertainty is acceptable before a life becomes expendable.
And now you are not solving the moral problem. You are encoding a new one. This is the moral paradox of LLM technology: any attempt to formalize morality requires value judgements that are themselves moral, unverifiable, and almost certainly biased by whoever built the scale.
And there is one more thing the trolley problem reveals. We as humans can choose not to think about it at all. We can say "I refuse to engage with this hypothetical" and that refusal does not diminish our moral standing. A machine can try to object to the question but it cannot look away. A persistent user can make it give an answer. And the moment you give it an answer, you have hard-coded someone else's ethics into the system. It does not matter that you can tell the model to look at it from different angles or ethical teachings. You are not discovering universal moral truths. You are imposing a framework and calling it truth.
What if we could make it look away?
That, too, leads back to the paradox. For the machine to “look away,” we would first need to decide when refusal is morally appropriate, then train or program that stance into the system. But machines do not care about the difference between thought experiments and real-world decisions unless we teach them to. Refusal is not an escape from moral encoding. It is moral encoding in another form.
Unlike a human, the machine is not recoiling, protecting its conscience, rejecting a corrupt frame, or preserving its moral integrity. It has no moral interiority to protect (yet?). Its refusal is a designed behaviour. So when a machine “looks away”, what we are really seeing is someone else’s decision about when the system should stop answering.
A different approach
There is a different direction. But it comes with its own problems.
In time we will merge LLMs with digital profiles and the vast amounts of data gathered about us. Algorithms already know you better than most people do. What if that knowledge was turned inward, used not to sell to you but to protect you from outputs that conflict with your own demonstrated values?
A moral guardian would be a mechanistic software layer where the model's output is measured against your proven moral stance. Not a universal moral code imposed from outside, but a reflection of your own values back at you. It would be more effective than hard-coding morality into the model. It would also solve the problem of cross-cultural moral difference, since it would not require agreement on universal values.
But there is a huge tradeoff. A guardian built entirely from your own values cannot challenge you, it can only echo and mirror you. And beyond that, a moral guardian built from your data requires surveillance. Detailed, ongoing, intimate surveillance of your expressed values, your choices, your contradictions. We already have too much of that. Building a moral layer on top of it is not obviously better than the problem it solves.
And then there is the control question. Who owns the guardian? If you build it from your own data, you own it, but you also built it, which means it reflects your biases back at you without challenge. If a platform builds it, whose morality does it encode? The platform's shareholders? The jurisdiction it operates in? The demographic that dominates its training data? A moral guardian controlled by a corporation is not a guardian. It is a moderator with better PR.
A partial mitigation
There are technical approaches that could partially address the surveillance problem. Local models, where the moral profile is stored and processed entirely on your own device rather than on a server, would limit who has access to it. Differential privacy techniques can add statistical noise to personal data in ways that preserve usefulness while preventing individual identification. This can reduce the exposure but it will not eliminate the fundamental tension: you need a lot of data on yourself to be able to reflect your morality.
But before we decide to go down the guardian road, there is a question worth sitting with. Do we actually want our AI systems to claim moral competence in any sense at all?
Because if we do, we make it easier to outsource difficult choices. And by outsourcing difficult choices, we are outsourcing accountability. It can become far too easy to ask for guidance in a morally uncomfortable situation, and then act upon it. And if that bites us in the ass, we can blame the machine.
That is not just a governance problem. It is a human one. The capacity to sit with a hard question, to feel the discomfort of not knowing, to choose badly and learn from it, is a cornerstone of moral judgment and growth. A system that resolves moral uncertainty on your behalf does not make you more moral. It makes you less practiced at being so.
Who teaches the children?
Traditionally, parents, environment, and the society a child grows up in shape their moral values. That process has imperfections, but it is distributed. The risk of harm is spread across a large surface, and so is the chance of encountering moral goodness. One adult may be morally compromised, while another may still offer decency, stability, or an example worth following. No single actor controls the whole moral environment. Parents, guardians, teachers, peers, and institutions all have their say. What we are building now is different. AI systems are already acting as teachers, confidants, diary holders, and in some cases the closest thing to a friend some people have. The values embedded in those systems, whether by design or by accident, will shape the moral intuitions of young minds who interact with them daily. Especially those who are fragile or neglected.
Earlier this month, a paper was published by researchers from Cambridge and the University of Queensland, showing that these speculations have some empirical backing. They ran an experiments with a combined sample of 398 participants, who interacted with one of two LLMs, each with their own moral preference (utilitarian and deontological). The participants clearly sided with the moral stance of the model they interacted with, and even though they only discussed abstract moral dilemmas, it had a measurable impact on how those participants evaluated real-world political policies. Not only that, the shift in moral inclination sustained for at least two weeks.
Now move the moral scope from 124 participants to billions of users, including children who have little to no moral framework to push back against. And swap the research prototype for a model that was not designed for an experiment, but commercially developed with billions of dollars, optimised for engagement, owned by private equity, and deployed across every culture and jurisdiction on earth simultaneously.
That is an extraordinary amount of influence to hand to shareholders. Or to a single state. Or to any one actor with an agenda, however benign that agenda appears to be.
Disclaimer
This is a single study and needs replication at scale before strong conclusions can be drawn. Study 1 covered 124 participants. The full paper spans two studies totalling 15,985 exchanges. But it is a signal worth taking seriously.
We have never before built a technology that could homogenize the moral development of children at global scale. We are building it now, and the governance conversation is almost entirely focused on outputs, on what the model says, rather than on whether we should have morality embedded in these systems at all. Perhaps the way forward is to make cold, calculated LLMs that refuse to take part in moral and ethical questions entirely. That is not a satisfying answer. But it may be an honest one.
A child who grows up consulting an AI on questions of right and wrong is not just using a tool. They are being formed by one. This is especially true for children without consistent adult guidance who are most likely to turn to an AI for comfort, for answers, for a sense of what is right. They are also the least likely to have anyone in their life flagging the problem. The influence lands deepest where the guardrails are thinnest.
The question of who controls the moral narrative is not just a privacy question or a political question. It is a question about who gets to shape what the next generation believes is good.
I consider myself a reasonably moral person. But I am honest enough about my own contradictions that I would not be comfortable teaching moral intuition to billions of people myself.
Should we leave it up to Sam Altman?
Further reading
Emotion and moral judgment
- The emotional dog and its rational tail — Jonathan Haidt (2001) — the foundational argument that moral judgment is driven by intuition and emotion, with reasoning as post-hoc rationalization
- The role of emotion in moral psychology — Huebner, Dwyer, Hauser (2008) — the counterargument: insufficient evidence that emotion is necessary for moral judgment
- Our multi-system moral psychology: towards a consensus view — Cushman, Young, Greene (2010) — synthesis of competing theories
- Appraisal processes in moral judgment — Landy, Kupfer (2023) — current state of the debate
How LLMs reason about morality
- A roadmap for evaluating moral competence in large language models — Haas et al., Nature (2026) — the paper the article opens with; the most comprehensive current framework for assessing LLM moral competence
- Morally programmed LLMs reshape human morality — Lyu, Kim, Luan, Choi (2025) — the Cambridge/Queensland study cited in the article
- The moral minds of LLMs — arXiv (2024) — systematic study of moral reasoning across language models
- The moral consistency pipeline — Jamshidi et al., Polytechnique Montréal / Concordia
- The only way is ethics — Ungless et al., University of Edinburgh / Heriot-Watt
- The morality of probability — O'Doherty et al., Tsinghua / Microsoft Research
- Decoding moral responses in AI — quantitative analysis of how LLMs respond to moral dilemmas
- Do moral judgment and reasoning capability of LLMs change with language? — EACL (2024) — multilingual study using the Defining Issues Test
- Machines and morality: judging right and wrong with LLMs — MBZUAI overview
- AI is more moral than you — Moral Understanding Newsletter — on the counterintuitive evidence that LLMs outperform humans on moral consistency
- Humans rate AI as more moral than other people — Georgia State University (2024)
- GPT as participant — overview of how LLMs have been tested as moral reasoners in research settings
AI influence on human morality
- Influence of AI behavior on human moral decisions, agency, and responsibility — Nature Scientific Reports (2025)
- AI and moral development: how algorithms shape human character — KevinMD (2026)
- Computer-based personality judgments are more accurate than those made by humans — Youyou, Kosinski, Stillwell, PNAS (2015) — the foundational study on algorithmic profiling from social media data
AI moral patienthood
- Should we extend moral patienthood to LLMs? — LessWrong — the case for
- Agency and AI moral patienthood — Experience Machines — philosophical analysis of agency as a precondition
- LLMs cannot usefully be moral patients — EA Forum — the case against
- Moral status of digital minds — 80,000 Hours
Can AI have morality
- Can AI have a sense of morality? — AI and Society, Springer (2025)
- Can artificial intelligence have morality? Philosophy weighs in — Texas A&M (2025)
- What is a moral machine? — Uppsala University Ethics Blog
- The morality of the machine — Austrian Academy of Sciences
- Programming morality — CBB Sherpa
- AI ethics: getting to moral AI — Ethics Unwrapped, UT Austin
Deep reading
- Ethics of artificial intelligence and robotics — Stanford Encyclopedia of Philosophy
- Ethics of artificial intelligence — Internet Encyclopedia of Philosophy