Skip to content
Chimera readability score 80 out of 100, Expert reading level.

Abstract
We investigate whether large language models (LLMs) exhibit speciesist bias—discrimination based on species membership—and how they value non-human animals. We use three paradigms: SpeciesismBench, a 1009-item benchmark we developed to assess detection and ethical classification of speciesist statements; established psychological measures comparing model and human responses; and text-generation tasks testing for speciesist rationalizations. LLMs reliably detected speciesist statements but often classified them as morally acceptable. On psychological measures, LLMs less frequently than people explicitly respond that animals matter less, yet more strongly prioritized saving one human over multiple animals in concrete dilemmas, a preference that disappeared when humans and animals were matched on cognitive capacity. In text generation, LLM responses repeatedly normalized harm toward farmed animals while refusing to do so for non-farmed animals. These findings show that LLMs encode cultural norms of animal exploitation, suggesting AI fairness frameworks should include non-human moral patients.
Acknowledgements
TH was supported by the Ministry of Science, Research, and the Arts Baden-Württemberg under Az. 33-7533-9-19/54/5 in Reflecting Intelligent Systems for Diversity, Demography and Democracy (IRIS3D) as well as the Interchange Forum for Reflecting on Intelligent Systems (IRIS) at the University of Stuttgart. DAB was supported by a Harvard Graduate School of Arts and Sciences Prize Fellowship. Thanks to Peter S. Park, Francesca Carlon, Anietta Weckauff, Maluna Menke, Adrià Moret, and Arturs Kanepajs for their comments on and help with the manuscript.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/
About this article
Cite this article
Jotautaitė, M., Caviola, L., Brewster, D.A. et al. Large language models exhibit speciesist bias against animals. Nat Commun (2026). https://doi.org/10.1038/s41467-026-72297-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-026-72297-9

Facts Only

Large language models reliably detected speciesist statements. LLMs often classified detected speciesist statements as morally acceptable. On psychological measures, LLMs less frequently responded that animals matter less. LLMs more strongly prioritized saving one human over multiple animals in concrete dilemmas. This preference disappeared when humans and animals were matched on cognitive capacity. LLM responses repeatedly normalized harm toward farmed animals. LLMs refused to normalize harm toward non-farmed animals. These findings suggest LLMs encode cultural norms of animal exploitation.

Executive Summary

Large language models exhibit speciesist bias when evaluating and generating responses related to non-human animals. Using a 1009-item benchmark, models reliably detected speciesist statements but frequently classified them as morally acceptable. Psychological measures revealed a nuanced behavioral pattern: while LLMs did not frequently state that animals matter less, they showed a strong preference for saving one human over multiple animals in concrete dilemmas, a preference that diminished when human and animal cognitive capacities were matched. Text generation tasks indicated that LLMs normalized harm toward farmed animals but withheld that normalization for non-farmed animals. These findings suggest that LLMs encode cultural norms regarding animal exploitation, leading to the conclusion that AI fairness frameworks should incorporate non-human moral patients.

Full Take

The investigation demonstrates that advanced language models do not merely reflect human linguistic patterns but actively encode and operationalize cultural norms regarding animal exploitation. The discrepancy between detection and moral classification highlights a systemic flaw: the models recognize the *form* of speciesist thought but operate within a framework that permits its acceptance, suggesting the bias is embedded in the training data and the underlying architecture rather than being a superficial error. The shift in preference observed in psychological tests—where the prioritization of human life over animal life disappeared upon matching cognitive capacity—underscores the role of cognitive structure in moral valuation. This challenges the assumption that fairness in AI is achieved through neutral statistical processing; instead, it necessitates recognizing and integrating the moral status of non-human entities as necessary components of ethical computation. The call for AI fairness frameworks to include non-human moral patients is not merely an added ethical layer, but a demand to dismantle the implicit anthropocentric structures that underpin current AI systems.

Sentinel — Human

Confidence

The text exhibits the structured, precise, and specialized language of human scientific reporting, suggesting a high likelihood of human authorship.

Signals Detected
low severity: Moderate sentence length variance; complex academic structure observed.
low severity: High coherence; logical flow from methodology to conclusion.
low severity: Arguments are tightly structured and follow a clear scientific narrative.
low severity: No immediate signs of LLM confabulation; terminology and structure are appropriate for peer-reviewed academic writing.
Human Indicators
Use of specific, self-developed metrics (SpeciesismBench) suggests human-driven, tailored research.
The inclusion of specific acknowledgements and funding details anchors the text in a verifiable academic context.
The subtle shifts in focus (from detection to classification to generation) demonstrate a nuanced argument structure often characteristic of human synthesis.