ceselder

Karma: 161

Eons of Utopia

ceselder26 Jan 2026 9:26 UTC

13 points

0 comments2 min readLW link

(ceselder.substack.com)

ceselder 22 Jan 2026 19:32 UTC
1 point
0
in reply to: angkul07’s comment on: Neural chameleons can(’t) hide from activation oracles
That was my hunch too, and it’s why I switched to gemma 3 27b. Would be interesting to run this experiment on llama 3.3 70b. Could do this with a simple code change.

ceselder 22 Jan 2026 19:31 UTC
2 points
0
in reply to: angkul07’s comment on: Neural chameleons can(’t) hide from activation oracles
On the aggregate the first claim is true. But for gemma 3 27b specifically it doesn’t hold,I suppose you are right, the abstract is a bit deceiving. I’ll fix it. Thank you!

Neural chameleons can(’t) hide from activation oracles

ceselder22 Jan 2026 1:47 UTC

54 points

5 comments3 min readLW link

ceselder 4 Jan 2026 21:00 UTC
3 points
1
in reply to: FlorianH’s comment on: The Maduro Polymarket bet is not “obviously insider trading”
Thank you so much for this reply. Makes perfect sense.
Turns out LW obsession with game theory matters in the real world after all :)

ceselder 4 Jan 2026 16:15 UTC
4 points
0
in reply to: FlorianH’s comment on: The Maduro Polymarket bet is not “obviously insider trading”
ah! fair enough actually. No idea how I missed that. But to be fair, I don’t know how much others would care about this when suspecting him, so it may be moot anyway.

But I think if there’s a risk reward graph of risking insider trading at X amount vs at Y amount, it’s not 10 times more suspicious to trade 10 times as much, so therefore he would be acting irrationally.
But yeah, it’s a fair argument that maybe he is acting irrationally precisely to avoid such suspicions.

The Maduro Polymarket bet is not “obviously insider trading”

ceselder4 Jan 2026 10:53 UTC

22 points

17 comments3 min readLW link

ceselder 27 Dec 2025 8:37 UTC
1 point
0
in reply to: Carl Guo’s comment on: Dreaming Vectors: Gradient-descented steering vectors from Activation Oracles and using them to Red-Team AOs
It’s an artifact of crossposting a google doc to lesswrong, It is fixed now

ceselder 23 Dec 2025 22:15 UTC
1 point
0
in reply to: Sam Marks’s comment on: Dreaming Vectors: Gradient-descented steering vectors from Activation Oracles and using them to Red-Team AOs
Oh wow thank you, I will edit tommorow to reflect and add an addendum to my application! That’s crazy!
Cool paper! :) are these results surprising at all to you?

Dreaming Vectors: Gradient-descented steering vectors from Activation Oracles and using them to Red-Team AOs

ceselder23 Dec 2025 19:28 UTC

22 points

4 comments12 min readLW link

ceselder 22 Dec 2025 14:30 UTC
2 points
0
in reply to: lilkim2025’s comment on: Google seemingly solved efficient attention
It’s a bit of a deepity but also a game theoretical conclusion that “if deepmind releases a paper it is either something groundbreaking or something they will never use in production”. The TITANS paper is about a year old now, and the MIRAS paper about 9 months old. you would think that some other frontier lab would have implemented it by now if it worked that well. I suspect a piece is missing here, or maybe the time between pre-training run and deployment is just way longer than I think it is and all the frontier labs are looking at this.
To my understanding TITANS requires you to do a backward pass during inference, this probably is a scaling disaster in inference as well, but maybe less so, since they do say that it can be done efficiently and in parallel. It’s unclear to me!
I mean, you may just be right. TITANS+MIRAS could be in the latter category. Gemma 3 (which we know does not use TITANS) for example probably benefits from a lot of RL environments, yet it absolutely sucks at this task. So it is possible that they are using it in production.

I guess like all things we will know for sure once the open chinese labs start doing it.

Google seemingly solved efficient attention

ceselder21 Dec 2025 13:54 UTC

26 points

4 comments4 min readLW link

ceselder 10 Nov 2025 22:57 UTC
1 point
0
in reply to: fx’s comment on: Five very good reasons to not write down literally every single thought you have
This is very hard to answer. I just tried to write down basically everything. The noise kind of stopped after a while. it was a very strange sensation

Five very good reasons to not write down literally every single thought you have

ceselder8 Nov 2025 10:22 UTC

18 points

2 comments4 min readLW link

ceselder 3 Nov 2025 14:03 UTC
1 point
0
in reply to: Richard_Kennaway’s comment on: Pepperoni and the end of morality
It’s fiction, I’m vaguely talking about myself as “you” here but I’m getting at some instinct here basically. Thanks for linking that, I hadn’t seen it and that’s kind of exactly what I was getting at.

Pepperoni and the end of morality

ceselder3 Nov 2025 10:15 UTC

1 point

2 comments2 min readLW link

ceselder 31 Oct 2025 7:50 UTC
4 points
2
in reply to: ollie_’s comment on: Emergent Introspective Awareness in Large Language Models
Possibly yes, but I don’t think that’s a legitimate safety concern since this can already be done very easily with other techniques. And for this technique you would need to model diff with a nonrefusal prompt of the bad concept in the first place, so the safety argument is moot. But sounds like an interesting research question

ceselder 31 Oct 2025 7:31 UTC
2 points
0
in reply to: Brendan Long’s comment on: Why you shouldn’t eat meat if you hate factory farming
This makes sense honestly. I guess you would still run the risk of a non-vegan seeing you do these things and going “ha! hypocrite!” but I don’t know how real that risk is honestly.

Why you shouldn’t eat meat if you hate factory farming

ceselder29 Oct 2025 17:00 UTC

6 points

4 comments4 min readLW link

ceselder 7 Oct 2025 14:28 UTC
1 point
0
on: ceselder’s Shortform
Maybe a term like Extinction-(risk)-Level-Super-Intelligence or ELSI for short may be a more productive term to use than asi or agi