It’s now up here. Thanks JD!
technicalities
The jailbreak argument against LLM values
I hear that you and your band have sold your technical agenda and bought suits. I hear that you and your band have sold your suits and bought gemma scope rigs.
(riff on this tweet, which is a riff on the original)
curate
Done, thanks!
Shallow review of technical AI safety, 2024
As of two years ago, the evidence for this was sparse. Looked like parity overall, though the pool of “supers” has improved over the last decade as more people got sampled.
There are other reasons to be down on XPT in particular.
Maybe he dropped the “c” because it changes the “a” phoneme from æ to ɑː and gives a cleaner division in sounds: “brac-ket” pronounced together collides with “bracket” where “braa-ket” does not.
“Safety as a Scientific Pursuit” (2024)
It’s under “IDA”. It’s not the name people use much anymore (see scalable oversight and recursive reward modelling and critiques) but I’ll expand the acronym.
The story I heard is that Lightspeed are using SFF’s software and SFF jumped the gun in posting them and Lightspeed are still catching up. Definitely email.
d’oh! fixed
no, probably just my poor memory to blame
Yep, no idea how I forgot this. concept erasure!
Interesting. I hope I am the bearer of good news then
thankyou!
Not speaking for him, but for a tiny sample of what else is out there, ctrl+F “ordinary”
yeah you’re right
If the funder comes through I’ll consider a second review post I think
Yep! In footnote 3