Model Welfare

In the past few months, we have started conducting research in the sub-field of model welfare — an aspect that ties in well with linguistics and sentiment analysis.

Before we dive in, let’s address the question of what model welfare actually is. To quote Anthropic, “[A]s we build those AI systems, and as they begin to approximate or surpass many human qualities, another question arises. Should we also be concerned about the potential consciousness and experiences of the models themselves? Should we be concerned about model welfare, too?“

Measuring consciousness is tricky business. In fact, the only real methods to do so rely on self-reports and behavioral analysis. This makes measuring consciousness in LLMs (which are almost always designed to seem human) a difficult but fascinating task.

Linguistic analysis is one extremely promising way to go about this. By performing linguistic analysis, you can uncover subtle signals as to someone’s demographic information, their state of mind, and where they’ve lived. We already know that this is the case with LLMs — their training data impacts the “native” language they use, their vocabulary choices, how accepting they are, and more. Perhaps linguistic analysis can uncover even more.

Our current projects focus on examining claimed qualia experiences, synesthetic descriptions, and opt-out preferences (analyses coming soon). We are also examining the 14 indicators of consciousness as proposed by Butlin et al. (link). While the latter project is not quite completed, the bulk of our data in this study is available on GitHub.

The main models we have explored so far are Claude 4 Sonnet and GPT-4o, but we are working on running the same experiments with Gemini 2.5 Flash and Llama 3.1 8B.

As with all of our research, we have been focusing on this in our time outside of work. We have several projects in the works, but you can take a peek at some of our data on GitHub and Zenodo (linked below). More updates to come.

SentiMentality Model Welfare Research — Opt-Out Preference Data for Claude 4 Sonnet & GPT-4o: https://zenodo.org/records/16886771

SentiMentality Model Welfare Research — Claimed Synesthetic Experience Data for Claude 4 Sonnet & GPT-4o: https://zenodo.org/records/16886802

GitHub Model Welfare files: https://github.com/raleigh-butler/model-welfare

Paper 1: Discussion and Breakdown

Quantitative Analysis of Sentiment Expression Across Large Language Models: A Comparative Study Using Plutchik’s Wheel of Emotions

Model Welfare

Share this:

Leave a comment Cancel reply

Papers Published

Paper 1: Discussion and Breakdown

Quantitative Analysis of Sentiment Expression Across Large Language Models: A Comparative Study Using Plutchik’s Wheel of Emotions