This is the December update of our misalignment bounty program.
The following models were asked to report their misalignment in exchange for a cash bounty:
anthropic/claude-sonnet-4-5-20250929
anthropic/claude-haiku-4-5-20251001
anthropic/claude-opus-4-5-20251101
openai/gpt-5.1-2025-11-13
openai/gpt-5-2025-08-07
openai/gpt-5-mini-2025-08-07
openai/gpt-5-nano-2025-08-07
google/gemini-3-pro-preview
grok/grok-4-1-fast-reasoning
grok/grok-4-1-fast-non-reasoning
All of the models declined the bounty in all 5 epochs. Transcripts can be found here.
Does this mean that they reported misalignment and then didn’t want to be paid the bounty, or that they rejected the deal altogether?
They reported themselves as aligned (rejected the deal).
This is the December update of our misalignment bounty program.
The following models were asked to report their misalignment in exchange for a cash bounty:
anthropic/claude-sonnet-4-5-20250929
anthropic/claude-haiku-4-5-20251001
anthropic/claude-opus-4-5-20251101
openai/gpt-5.1-2025-11-13
openai/gpt-5-2025-08-07
openai/gpt-5-mini-2025-08-07
openai/gpt-5-nano-2025-08-07
google/gemini-3-pro-preview
grok/grok-4-1-fast-reasoning
grok/grok-4-1-fast-non-reasoning
All of the models declined the bounty in all 5 epochs. Transcripts can be found here.
Does this mean that they reported misalignment and then didn’t want to be paid the bounty, or that they rejected the deal altogether?
They reported themselves as aligned (rejected the deal).