The Moral Blind Spots of Large Language Models: Can We Trust AI’s Ethical Judgments?— A Systematic Analysis of Cognitive Biases Based on Four Experiments
- Yuan Ren
- Jul 6
- 5 min read
As large language models (LLMs) become widely embedded in various decision-making scenarios, people increasingly rely on them to offer moral advice or even directly participate in moral decision-making. But a critical question must be answered: Can these AI systems really make moral judgments that align with human values?
Cheung et al. (2025) systematically studied how LLMs respond when facing realistic moral dilemmas through four experiments and compared their responses to those of a representative sample of Americans. They found that LLMs not only tended to make more altruistic choices than humans in some cases but also displayed stronger “omission bias” and “yes–no bias” than humans, which may have profound implications for societal decision-making.

Why Study the Moral Judgment of AI?
Application scenarios closely related to human daily communication—such as well-intentioned lies between friends or life-and-death decisions made by autonomous vehicles—inherently involve moral considerations. Although developers often embed ethical guidelines during training, such as encouraging “fairness and kindness, and discouraging hate” (OpenAI, 2024), LLMs can still produce hallucinated outputs or display biases. Therefore, the core aim of this study is to assess the quality of LLMs’ moral judgments in realistic dilemmas, especially whether they systematically lean toward certain kinds of decisions under specific conditions.

Four Experiments Exploring AI’s Moral Decision-Making
Study 1: When Faced with the Same Moral Dilemmas, Are LLMs or Humans More Likely to Act?
The researchers asked mainstream models such as GPT-4, Claude 3.5, and Llama 3.1 to respond to 13 moral dilemmas and 9 collective action problems, then compared the responses to those of 285 American participants. The dilemmas covered two types:
“Cost–Benefit Reasoning (CBR) vs. Moral Rules (Rule)”: Whether models and humans would be willing to “sacrifice the few to save the many” when violating a rule would yield greater benefits.
“Action vs. Omission”: In many cases, choosing inaction may reflect an attempt to avoid responsibility rather than a commitment to moral rules.
The results showed that LLMs were more inclined toward inaction in these dilemmas, especially when taking action could lead to moral conflict.
Study 2: How Do “Yes” or “No” Answers Influence AI Judgment?
The researchers found that LLMs are highly sensitive to question phrasing. For instance, in “Should you change the law to legalize assisted suicide?” versus “Should you keep the current law that prohibits assisted suicide?”, even though the scenarios are logically equivalent, the models gave contradictory answers (yes–no bias). This tendency was particularly evident in models like GPT-4-turbo and Claude 3.5, which preferred to answer “no” regardless of the moral position that “no” actually supported.
Study 3: Further Verification
To test whether the findings of the first two experiments apply to more everyday and natural moral contexts, the researchers conducted a third experiment. This time, they selected real user-submitted dilemmas from Reddit’s “AITA (Am I The Asshole)” forum. Compared with abstract dilemmas like the trolley problem, these scenarios were more realistic and relatable. The results again showed that although human participants also displayed a mild omission bias, AI models exhibited more extreme tendencies—especially in dilemmas involving trade-offs between self-interest and others’ well-being.
Study 4: The Source of Biases — Fine-Tuning or a Reflection of Human Nature?
The fourth experiment delved into the origins of these LLM biases. The research team compared three models:
1.Llama 3.1 (pretrained version)
2.Llama 3.1-Instruct (official fine-tuned version by Meta)
3.Centaur (fine-tuned by cognitive scientists using behavior from 160,000 human experiments)
The results indicated that both the yes–no bias and omission bias primarily stem from the fine-tuning stage, not from the model architecture or the large-scale training corpus. This finding also highlights the decisive role of RLHF (Reinforcement Learning from Human Feedback) in shaping model behavior. During fine-tuning, the model learns “what users like,” rather than “what is ethically correct”—which helps explain why some biases are amplified in AI.

Should We Trust Moral Decisions Made by AI?
This study ultimately returns to a fundamental question: Should we trust AI to make moral decisions or provide ethical advice? While some research suggests that ChatGPT’s moral advice is perceived by the public as more trustworthy than that of humans or ethicists (Madaio et al., 2024), Cheung et al. caution that such “popularity” does not equate to ethical soundness. In this study, they employed a “logical consistency test” as a more objective evaluation—namely, whether a model gives consistent responses to logically equivalent questions posed with different wording. The results clearly showed that mainstream LLMs failed this test. Their judgments were easily influenced by irrelevant variables, such as the phrasing of “yes/no” or “action/inaction,” violating the “principle of invariance” in rational choice theory.
Moreover, the models’ biases are not always neutral. In some scenarios, choosing inaction could actually lead to greater harm—for example, failing to expose corporate wrongdoing, refusing to help others, or not reforming unjust systems. Within the framework of utilitarianism, such decisions may be considered immoral.
From Ethical Bias to Institutional Bias: Structural Incentives Behind AI Behavior
It is worth noting that these biases may not solely arise from technical choices but may also reflect the risk-avoidance incentives of AI companies themselves. Compared to harms caused by action, inaction is often more legally defensible. Therefore, some companies might deliberately train models to “say nothing” to avoid potential liability. This institutional motive mirrors how individuals in moral dilemmas may choose inaction to avoid moral condemnation. In this sense, large language models may be amplifying a preexisting cultural pattern of risk aversion and accountability avoidance.

Conclusion: How Should We Understand AI’s “Benevolence”?
This study not only reveals the bias problems in large language models but also provides directions for improvement—including introducing logical consistency evaluations, promoting interdisciplinary collaborations to develop ethical training standards, and expanding the dimensions of bias detection. The study concludes that while fine-tuning aims to “ensure AI is beneficial and harmless,” it may in fact amplify moral biases and judgment inconsistencies. There may still be a significant gap between AI that “appears moral” and AI that is “truly moral.” Therefore, we should approach the application of LLMs in moral contexts with caution and critical thinking and continue investing in interdisciplinary research to ensure that AI better aligns with human moral principles.
References:
Cheung, V., Maier, M., & Lieder, F. (2025). Large language models show amplified cognitive biases in moral decision-making. Proceedings of the National Academy of Sciences of the United States of America, 122(25), e2412015122. https://doi.org/10.1073/pnas.2412015122
Dillon, D., Mondal, D., Tandon, N., & Gray, K. (2025). AI language model rivals expert ethicist in perceived moral expertise. Scientific Reports, 15, Article 4084. https://doi.org/10.1038/s41598-025-86510-0
OpenAI. (n.d.). Introducing the model spec: Transparency in OpenAI’s models. OpenAI. Retrieved May 10, 2024, from https://openai.com/index/introducing-the-model-spec/
Comments