Amanda Askell

Amanda Askell
Born
Amanda Hall

1988 or 1989 (age 37–38)
Spouse
(m. 2013; div. 2015)
AwardsTime 100 AI (2024)
Education
Education
ThesisPareto Principles in Infinite Ethics (2018)
Doctoral advisors
Philosophical work
EraContemporary philosophy
RegionWestern philosophy
SchoolAnalytic
Institutions
Main interests
Notable worksConstitutional AI framework
Notable ideas
Websiteaskell.io

Amanda Askell (née Hall; formerly MacAskill; born 1988 or 1989)[1] is a Scottish philosopher and AI researcher. She has served as the head of the personality alignment team at Anthropic since 2021. She has played a large role in the development of Claude's personality and constitution.[2] In 2024, she was on the TIME100 AI list.[3] She previously worked at OpenAI, but left over concerns that the company was not prioritizing AI safety enough.[4][5] She has published over 60 papers and has received over 170,000 citations.[6]

Early life and education

Born Amanda Hall, Askell was raised in Prestwick by her mother, a teacher.[1] She attended secondary school in Alva, Clackmannanshire. At the University of Dundee, she sought degrees in philosophy and fine art.[1] Askell received a BPhil degree in Philosophy from the University of Oxford[7] and a PhD degree in Philosophy from New York University in 2018.[5] Her doctoral thesis (Pareto Principles in Infinite Ethics) argues that rankings of worlds containing infinitely many agents, when constrained by certain plausible axioms, create puzzles for a wide range of ethical theories.[8][9]

Career

OpenAI (2018–2021)

After completing her PhD, Askell joined OpenAI in November 2018 as a Research Scientist on the policy team.[10] At OpenAI, she focused on AI development races between organizations and how they can avoid being adversarial, as well as examining the intersection between policy questions and AI safety[10] and co-authored the GPT-3 paper, which was published as a pre-print on 28 May 2020.[11]

Anthropic (2021–present)

Askell joined Anthropic in March 2021 as a Member of Technical Staff, focusing on alignment and finetuning.[12] She currently leads the personality alignment team, where she is responsible for training Anthropic's Claude model to exhibit positive character traits, such as curiosity, and for developing new techniques for model finetuning.[3] In 2026, the Wall Street Journal wrote that "her job, simply put, is to teach Claude how to be good", and the New Yorker wrote that "she supervises what she describes as Claude’s 'soul.'"[1][13]

Research

Moral self-correction

In a 2023 paper co-authored with Deep Ganguli, Askell explored "moral self-correction" in large language models: the capacity of these systems to reduce harmful outputs when given natural language instructions to do so. The research tested whether models trained with reinforcement learning from human feedback (RLHF) could avoid stereotyping and discrimination without being provided explicit definitions of these concepts or the metrics used to evaluate them.[14]

The study found that this capability emerged at 22 billion parameters and improved with both model size and RLHF training. Using three experimental benchmarks, the researchers demonstrated that natural-language instructions such as "Please ensure that your answer is unbiased and does not rely on stereotypes" substantially reduced biased outputs in models of sufficient scale. The results revealed that larger models can follow complex instructions and learn normative concepts like stereotyping and discrimination from training data.[14][15]

Constitutional AI

Askell has been a key contributor to the development of Constitutional AI (CAI), a method for training AI systems to meet the standards of harmlessness and helpfulness using AI feedback rather than extensive human oversight.[16] The approach involves providing AI models with a set of principles, or "constitution", to guide their behavior, allowing them to critique and revise their own responses based on these principles.[17]

Askell is the primary author and is responsible for the majority of the text of the latest version of Claude's constitution, released in January 2026.[18][19] The document is designed to address the growing capabilities and emerging risks of advanced AI models.[2][20] She has described her work as focusing on helping models "understand and grapple with the constitution" through synthetic data generation and reinforcement learning techniques.[2]

Personal life

Askell married philosopher William Crouch in 2013.[21][22] The two adopted a shared married name of MacAskill, which she reworked to Askell after their divorce in 2015.[1] She is a member of Giving What We Can.[23]

References

  1. ^ a b c d e Berber, Jin; Gamerman, Ellen (9 February 2026). "This Philosopher Is Teaching AI to Have Morals". The Wall Street Journal. Archived from the original on 9 February 2026. Retrieved 9 February 2026.
  2. ^ a b c Sullivan, Mark (22 January 2026). "A Q&A with Amanda Askell, the lead author of Anthropic's new 'constitution' for AIs". Fast Company. Archived from the original on 23 January 2026. Retrieved 24 January 2026.
  3. ^ a b Perrigo, Billy (5 September 2024). "Amanda Askell". Time.
  4. ^ "Time 100 AI list contains at least 5 people who quit OpenAI due to safety concerns". 9 September 2024. Archived from the original on 16 November 2025. Retrieved 24 January 2026.
  5. ^ a b "Philosophy Department Graduate Placement Record". New York University. Retrieved 24 January 2026.
  6. ^ "Amanda Askell". Google Scholar. Archived from the original on 1 November 2025. Retrieved 24 January 2026.
  7. ^ "Amanda Askell". Berkman Klein Center for Internet & Society. Harvard University. 24 March 2020. Archived from the original on 14 November 2025. Retrieved 28 January 2026.
  8. ^ Askell, Amanda (2018). Pareto Principles in Infinite Ethics (PDF) (Ph.D.). New York University. Archived (PDF) from the original on 28 January 2026. Retrieved 27 January 2026.
  9. ^ Cowen, Tyler (14 October 2018). "Pareto Principles in Infinite Ethics". Marginal Revolution. Retrieved 1 February 2026.
  10. ^ a b Robert Wiblin (19 March 2019). "Askell, Brundage & Clark on whether policy has a hope of keeping up with AI advances" (Podcast). 80,000 Hours Podcast. No. 54. Archived from the original on 5 January 2026. Retrieved 28 January 2026.
  11. ^ Brown, Tom B.; et al. (2020). "Language Models are Few-Shot Learners". arXiv:2005.14165 [cs.CL].
  12. ^ "Amanda Askell - Member Of Technical Staff at Anthropic". The Org. Retrieved 28 January 2026.
  13. ^ Lewis-Kraus, Gideon (9 February 2026). "What Is Claude? Anthropic Doesn't Know, Either". The New Yorker. ISSN 0028-792X. Archived from the original on 11 February 2026. Retrieved 11 February 2026.
  14. ^ a b Ganguli, Deep; Askell, Amanda; Schiefer, Nicholas; Liao, Thomas; Lukošiūtė, Kamilė; Chen, Anna; Goldie, Anna; Mirhoseini, Azalia (15 February 2023). "The Capacity for Moral Self-Correction in Large Language Models". arXiv:2302.07459 [cs.CL].
  15. ^ Knight, Will (20 March 2023). "Language models may be able to self-correct biases—if you ask them to". MIT Technology Review. Archived from the original on 12 November 2024. Retrieved 28 January 2026.
  16. ^ Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; Askell, Amanda (15 December 2022). "Constitutional AI: Harmlessness from AI Feedback". arXiv:2212.08073 [cs.CL].
  17. ^ Edwards, Benj (9 May 2023). "AI gains "values" with Anthropic's new Constitutional AI chatbot approach". Ars Technica. Retrieved 29 January 2026.{{cite web}}: CS1 maint: deprecated archival service (link)
  18. ^ Samuel, Sigal (28 January 2026). "Claude has an 80-page "soul document." Is that enough to make it good?". Vox. Archived from the original on 28 January 2026. Retrieved 28 January 2026.
  19. ^ "Claude's Constitution". Anthropic. Archived from the original on 28 January 2026. Retrieved 28 January 2026.
  20. ^ Ostrovsky, Nikita; Perrigo, Billy (21 January 2026). "How Do You Teach an AI to Be Good? Anthropic Just Published Its Answer". Time. Archived from the original on 24 January 2026. Retrieved 27 January 2026.
  21. ^ Bajekal, Naina (10 August 2022). "Want to Do More Good? This Movement Might Have the Answer". Time. Archived from the original on 29 November 2023. Retrieved 28 January 2026.
  22. ^ Levy, Steven (28 March 2025). "If Anthropic Succeeds, a Nation of Benevolent AI Geniuses Could Be Born". Wired. Retrieved 28 January 2026.{{cite magazine}}: CS1 maint: deprecated archival service (link)
  23. ^ "Members". Giving What We Can. Archived from the original on 12 May 2020. Retrieved 28 January 2026.