ChatGPT is OBSESSED with Goblins (Here’s Why)
BBetter Stack
Computing/SoftwareBusiness NewsInternet Technology
Transcript
00:00:00ChatGPT has an obsession with goblins. They creep in everywhere even if there's no mention
00:00:04of them in the thread, and if this was just a one-off that would be fine, but it's become
00:00:07such a pattern that in the system prompt for codex it is told not to bring them and other
00:00:11creatures like gremlins and raccoons up unless it is relevant to the conversation. It actually
00:00:16became such a thing that opening eyes had to investigate this and find out why it's happening.
00:00:21This is a reddit thread from over a year ago and it might be our first report of this behaviour
00:00:29before ChatGPT 5.1 is even released. In this thread people are agreeing that it brings
00:00:34up goblins often, calling the OP a fitness goblin and having a chaos goblin day and others
00:00:39back this up and some think it's cute. Regardless of that though, time passes and it's not until
00:00:44November 2025 when OpenAI released GPT 5.1 that they started to notice too. They had received
00:00:50complaints that their model was being oddly over-familiar in their conversations so they
00:00:54decided they would investigate specific verbal tics. These are things like "you're absolutely
00:00:58right" that we've seen way too much. This is when a safety researcher at OpenAI said
00:01:03that he actually saw goblins and gremlins a few times himself so he asked that they add
00:01:07those to their investigation. When the investigation was done it showed that the usage of goblin
00:01:11in ChatGPT had risen by 175% after GPT 5.1 and gremlin had risen by 52%. Despite this
00:01:18rise in the data though, OpenAI didn't really do anything as it's pretty harmless right?
00:01:23All models tend to have their own quirks and personalities just by the nature of their training
00:01:27so there didn't seem like there was any reason for alarm. But it was a few months later when
00:01:31GPT 5.4 launched that the goblins came back in full force and started getting stronger.
00:01:36You can get stronger and keep the goblins away by subscribing. This was a post on Hacker News
00:01:40around the launch of GPT 5.4 and you can see the poster is claiming that ChatGPT uses goblin
00:01:45in almost every conversation, sometimes it's gremlin as well and a recent chat of his used
00:01:49it in 3 out of 4 messages. These reports caused OpenAI to reinvestigate and when they did they
00:01:54noticed there was an increase in goblin usage in each model release and a massive 3881.4%
00:02:01increase in goblin usage when using the nerdy personality in ChatGPT. In fact nerdy accounted
00:02:06for only 2.5% of ChatGPT responses but 66.7% of all goblin mentions in responses. The nerd
00:02:15just loves goblins. This chart did give them a hint though as you can see it's not an even
00:02:19spread across all the personality types and the issue is massively amplified in the nerdy
00:02:23personality so they had a suspicion that it might be something in their personality instruction
00:02:27following training that was causing this problem. So they decided to take a look at that reinforcement
00:02:32learning training and compare the outputs that mention goblins or gremlins against the exact
00:02:36same tasks that didn't. And this is where they found that a specific reward signal designed
00:02:41to make the AI sound nerdy was essentially rigged towards goblins and gremlins, meaning
00:02:46that across the datasets that they audited if the AI used the word goblin or gremlin in
00:02:50its answer the system graded it a higher score 76.2% of the time so the AI was using goblins
00:02:57and gremlins as a sort of cheat code for a better grade.
00:03:00So now we have half the answer. This explains why it appeared more in the nerdy personality
00:03:04but it doesn't explain the increase across the other personality types. For that they
00:03:08first looked at the prevalence of goblins and gremlins as training progressed for both the
00:03:12nerdy personality and the rest and while the rest of the personalities used goblins less
00:03:17the rate of usage increased by the same relative proportion as training progressed. This means
00:03:21that even though the AI was only given these bonus points for using goblin words when it
00:03:25was specifically in its nerdy mode the habit didn't stay locked into just that mode. In
00:03:30AI training just because you teach a model a trick in one specific scenario doesn't mean
00:03:34it won't start trying to use that trick everywhere else. The reinforcement learning was creating
00:03:39a feedback loop. The AI would get a reward for having a specific style and it figured
00:03:43out that goblin is the magic word to get that reward so it started churning out thousands
00:03:47of practice responses packed with goblins but then OpenAI would take those practice responses
00:03:52to train the next model. So the bad habit starts compounding and goblins and gremlins usage
00:03:57keeps rising. You can see in nearly every model release the usage was rising and the nerdy
00:04:02personality of GPT 5.4 caused a massive spike until they retired that personality but even
00:04:07then GPT 5.5 still had an increase in usage. Even better when they checked the fine tuning
00:04:12data of GPT 5.5 they found many data points containing not only goblin and gremlin but
00:04:16also raccoons, trolls, ogres and pigeons but they do note that the usages of frog were mostly
00:04:21legitimate. The unfortunate news though is that they are working to fix this so the end
00:04:25of the goblin era might be coming soon. Since they retired that nerdy personality they also
00:04:30removed the reward signal that preferred goblins and they filtered their training data to remove
00:04:34creature words but this was only done after GPT 5.5 was released so 5.5 still likes them
00:04:40and this is why there's a sentence in the codex system prompt to never talk about goblins,
00:04:44gremlins, raccoons, trolls, ogres, pigeons or other animals or creatures unless it is
00:04:49relevant to the prompt. But if you do want to unleash goblin mode you can actually run
00:04:52this command to remove that from the codex system prompt and I kinda like that they do
00:04:56fun stuff like this. So there we go that was chatgpt's goblin problem and while this is
00:05:01a fun story it's also a great example of how reward signals shape model behaviour in unexpected
00:05:06ways and how models can learn to generalise rewards from certain situations to unrelated
00:05:11ones. It also shows us that AI researchers still have a lot to learn and models still
00:05:15do odd things from time to time and this investigation actually resulted in new tools for the research
00:05:20team to audit model behaviour and fix behaviour problems like this. So let me know in the comments
00:05:25if you've seen any goblins or creatures in your chats and while you're down there subscribe
00:05:29and as always see you in the next one.