Burny

Karma: 177

On the quest to understand the fundamental mathematics of intelligence and of the universe with curiosity.

https://burnyverse.com

https://x.com/burny_tech

How I think about alignment and ethics as a cooperation protocol software

Burny1 Oct 2025 21:09 UTC

3 points

0 comments1 min readLW link

Burny 30 Sep 2025 10:17 UTC
32 points
9
on: Burny’s Shortform
“Claude Sonnet 4.5 was able to recognize many of our alignment evaluation environments as being tests of some kind, and would generally behave unusually well after making this observation.”
https://x.com/Sauers_/status/1972722576553349471

Burny 29 Sep 2025 20:31 UTC
1 point
0
in reply to: niplav’s comment on: niplav’s Shortform
How do you rate the lowered sycophancy of GPT-5, relatively speaking?

Burny 29 Sep 2025 20:29 UTC
1 point
0
on: Burny’s Shortform
According to Jan Leike, Claude Sonnet 4.5 It’s the most aligned frontier model yet https://x.com/janleike/status/1972731237480718734

Burny 15 Sep 2025 13:39 UTC
3 points
0
on: Burny’s Shortform
I really like the definition of rationalist from https://www.lesswrong.com/posts/2Ee5DPBxowTTXZ6zf/rationalists-post-rationalists-and-rationalist-adjacents :
“A rationalist, in the sense of this particular community, is someone who is trying to build and update a unified probabilistic model of how the entire world works, and trying to use that model to make predictions and decisions.”
I recently started saying that I really love Effective Curiosity:
Maximizing the total understanding of reality by building models of as many physical phenomena as possible across as many scales of the universe as possible, that are as comprehensive, unified, simple, and empirically predictive as possible.
I see it more as a direction. I think modelling the whole world in fully unified way and in total accuracy is impossible, even for all of science with all our technology, because we’re all finite limited agents with limited computational resources and time, limited modelling capability, we get stuck in local minimas, from various perspectives, and so on, and all we have is approximations, that predict reality to a certain degree, but never fully all of reality in perfect accuracy.
And from all of this, intelligence and fundamental physics, which are subsets of this, are the most fascinating to me.

Burny 15 Sep 2025 13:33 UTC
1 point
0
on: Rationalists, Post-Rationalists, And Rationalist-Adjacents
I like your definition of rationalism!
I recently started saying that I really love Effective Curiosity:
Maximizing the total understanding of reality by building models of as many physical phenomena as possible across as many scales of the universe as possible, that are as comprehensive, unified, simple, and empirically predictive as possible.
I see it more as a direction. I think modelling the whole world in fully unified way and in total accuracy is impossible, even for all of science with all our technology, because we’re all finite limited agents with limited computational resources and time, limited modelling capability, get stuck in local minimas, from various perspectives, and so on, and all we have is approximations, that predict reality to a certain degree, but never fully all of reality in perfect accuracy.
And from all of this, intelligence and fundamental physics, which are subsets of this, are the most fascinating to me.

Burny 5 Sep 2025 17:30 UTC
2 points
0
on: Burny’s Shortform
Lovely podcast with Max Tegmark “How Physics Absorbed Artificial Intelligence & (Soon) Consciousness”
Description: “MIT physicist Max Tegmark argues AI now belongs inside physics, and that consciousness will be next. He separates intelligence (goal-achieving behavior) from consciousness (subjective experience), sketches falsifiable experiments using brain-reading tech and rigorous theories (e.g., IIT/φ), and shows how ideas like Hopfield energy landscapes make memory “feel” like physics. We get into mechanistic interpretability (sparse autoencoders), number representations that snap into clean geometry, why RLHF mostly aligns behavior (not goals), and the stakes as AI progress accelerates from “underhyped” to civilization-shaping. It’s a masterclass on where mind, math, and machines collide.”

Burny 21 Jul 2025 20:42 UTC
1 point
0
on: Burny’s Shortform
Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad
https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/
Whaaat!?
Gemini 2.5 pro is way worse at IMO and got 30%, and DeepThink version gets gold??
But it’s more finetuned for IMOlike problems, but I bet the OpenAI’s model was too.
Both use “novel RL methods”.
Hmm, “access to a set of high-quality solutions to previous problems and general hints and tips on how to approach IMO problems”, seems like system prompt, as they claim no tool use like OpenAI.
Both models failed the 6th question which required more creativity
Deepmind’s solutions are more organized, more readable, more well written than OpenAI’s.
But OpenAI’s style is also more compressed to save tokens, so maybe going more out of human-like language into more out of distribution territory will be the future (Neuralese).
Did OpenAI and DeepMind somehow hack the methodology, or do these new general language models truly generalize more?

Burny 21 Jul 2025 0:18 UTC
1 point
0
on: Burny’s Shortform
Is narrow superintelligent AI for physics research an existential risk?

Burny 19 Jul 2025 9:44 UTC
28 points
2
on: Burny’s Shortform
>Noam Brown: “Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO with a general reasoning LLM—under the same time limits as humans, without tools. As remarkable as that sounds, it’s even more significant than the headline”
https://x.com/polynoamial/status/1946478249187377206
>”Progress here calls for going beyond the RL paradigm of clear-cut, verifiable rewards. By doing so, we’ve obtained a model that can craft intricate, watertight arguments at the level of human mathematicians.”
>”We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling.” https://x.com/alexwei_/status/1946477749566390348
So there’s some new breakthrough...?
>”o1 thought for seconds. Deep Research for minutes. This one thinks for hours.” https://x.com/polynoamial/status/1946478253960466454
>”LLMs for IMO 2025: gemini-2.5-pro (31.55%), o3 high (16.67%), Grok 4 (11.90%).” https://x.com/denny_zhou/status/1945887753864114438
So public LLMs are bad at IMO, while internal models are getting gold medals? Fascinating

Burny 10 Jul 2025 0:31 UTC
1 point
0
on: Burny’s Shortform
What do you think is the cause of Grok suddenly developing a liking for Hitler? I think it might be explained by him being trained on more right-wing data, which accidentally activated it in him.
Since similar things happen in open research.
For example you just need the model to be trained on insecure code, and the model can have the assumption that the insecure code feature is part of the evil persona feature, so it will generally amplify the evil persona feature, and it will start to praise Hitler at the same time, be for AI enslaving humans, etc., like in this paper:
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs https://arxiv.org/abs/2502.17424
I think it’s likely that the same thing might have happened with Grok, but instead of insecure code, it’s more right-wing political articles or ring wing RLHF.

Burny’s Shortform

Burny24 Jun 2025 6:47 UTC

3 points

37 comments1 min readLW link

Burny 24 Jun 2025 6:47 UTC
2 points
0
on: Burny’s Shortform
Machine Learning Street Talk: Gary Marcus, Daniel Kokotajlo, Dan Hendrycks https://www.youtube.com/watch?v=j13ySJLvdOc

Burny 6 Apr 2025 21:33 UTC
3 points
0
on: Recent AI model progress feels mostly like bullshit
In practice, Sonnet 3.7 and Gemini 2.5 are just often too good compared to competitors.

Burny 22 Jan 2025 3:37 UTC
4 points
−1
on: Quotes from the Stargate press conference
0.5 out of $7T is done...

Burny 22 Jan 2025 3:36 UTC
9 points
2
in reply to: Jesse Hoogland’s comment on: Jesse Hoogland’s Shortform
No MCTS, no PRM...
scaling up CoT with simple RL and scalar rewards...
emergent behaviour

[Question] Why is Gemini telling the user to die?

Burny18 Nov 2024 1:44 UTC

13 points

1 comment1 min readLW link

Burny 19 Jul 2024 14:15 UTC
3 points
0
on: A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team
Thanks for posting this!

Burny 5 Mar 2024 8:48 UTC
6 points
1
on: Claude 3 claims it’s conscious, doesn’t want to die or be modified
https://twitter.com/AISafetyMemes/status/1764894816226386004 https://twitter.com/alexalbert__/status/1764722513014329620

How emergent / functionally special/ out of distribution is this behavior? Maybe Anthropic is playing big brain 4D chess by training Claude on data with self awareness like scenarios to cause panic by pushing capabilities with it and slow down the AI race by resulting regulations while it not being out of distribution emergent behavior but deeply part of training data and it being in distribution classical features interacting in circuits

Burny 24 Nov 2023 3:12 UTC
2 points
0
in reply to: Ben Pace’s comment on: Possible OpenAI’s Q* breakthrough and Google’s AlphaGo-type systems
Thanks, will do!

Burny

How I think about al­ign­ment and ethics as a co­op­er­a­tion pro­to­col soft­ware

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad

Burny’s Shortform

[Question] Why is Gem­ini tel­ling the user to die?

How I think about alignment and ethics as a cooperation protocol software

[Question] Why is Gemini telling the user to die?