On the aggregate the first claim is true. But for gemma 3 27b specifically it doesn’t hold,I suppose you are right, the abstract is a bit deceiving. I’ll fix it. Thank you!
ceselder
Eons of Utopia
Neural chameleons can(’t) hide from activation oracles
Thank you so much for this reply. Makes perfect sense.
Turns out LW obsession with game theory matters in the real world after all :)
ah! fair enough actually. No idea how I missed that. But to be fair, I don’t know how much others would care about this when suspecting him, so it may be moot anyway.
But I think if there’s a risk reward graph of risking insider trading at X amount vs at Y amount, it’s not 10 times more suspicious to trade 10 times as much, so therefore he would be acting irrationally.But yeah, it’s a fair argument that maybe he is acting irrationally precisely to avoid such suspicions.
The Maduro Polymarket bet is not “obviously insider trading”
It’s an artifact of crossposting a google doc to lesswrong, It is fixed now
Oh wow thank you, I will edit tommorow to reflect and add an addendum to my application! That’s crazy!
Cool paper! :) are these results surprising at all to you?
Dreaming Vectors: Gradient-descented steering vectors from Activation Oracles and using them to Red-Team AOs
It’s a bit of a deepity but also a game theoretical conclusion that “if deepmind releases a paper it is either something groundbreaking or something they will never use in production”. The TITANS paper is about a year old now, and the MIRAS paper about 9 months old. you would think that some other frontier lab would have implemented it by now if it worked that well. I suspect a piece is missing here, or maybe the time between pre-training run and deployment is just way longer than I think it is and all the frontier labs are looking at this.
To my understanding TITANS requires you to do a backward pass during inference, this probably is a scaling disaster in inference as well, but maybe less so, since they do say that it can be done efficiently and in parallel. It’s unclear to me!
I mean, you may just be right. TITANS+MIRAS could be in the latter category. Gemma 3 (which we know does not use TITANS) for example probably benefits from a lot of RL environments, yet it absolutely sucks at this task. So it is possible that they are using it in production.
I guess like all things we will know for sure once the open chinese labs start doing it.
Google seemingly solved efficient attention
This is very hard to answer. I just tried to write down basically everything. The noise kind of stopped after a while. it was a very strange sensation
Five very good reasons to not write down literally every single thought you have
It’s fiction, I’m vaguely talking about myself as “you” here but I’m getting at some instinct here basically. Thanks for linking that, I hadn’t seen it and that’s kind of exactly what I was getting at.
Pepperoni and the end of morality
Possibly yes, but I don’t think that’s a legitimate safety concern since this can already be done very easily with other techniques. And for this technique you would need to model diff with a nonrefusal prompt of the bad concept in the first place, so the safety argument is moot. But sounds like an interesting research question
This makes sense honestly. I guess you would still run the risk of a non-vegan seeing you do these things and going “ha! hypocrite!” but I don’t know how real that risk is honestly.
Why you shouldn’t eat meat if you hate factory farming
Maybe a term like Extinction-(risk)-Level-Super-Intelligence or ELSI for short may be a more productive term to use than asi or agi
That was my hunch too, and it’s why I switched to gemma 3 27b. Would be interesting to run this experiment on llama 3.3 70b. Could do this with a simple code change.