technicalities

Karma: 1,026

technicalities 11 Nov 2025 9:30 UTC
4 points
0
in reply to: jdp’s comment on: The jailbreak argument against LLM values
Yep! In footnote 3

The jailbreak argument against LLM values

technicalities10 Nov 2025 12:05 UTC

24 points

2 comments6 min readLW link

technicalities 10 Nov 2025 12:00 UTC
3 points
0
in reply to: jdp’s comment on: jdp’s Shortform
It’s now up here. Thanks JD!

technicalities 20 Jan 2025 11:52 UTC
1 point
0
on: Shallow review of technical AI safety, 2024
I hear that you and your band have sold your technical agenda and bought suits. I hear that you and your band have sold your suits and bought gemma scope rigs.
(riff on this tweet, which is a riff on the original)

curate

technicalities14 Jan 2025 14:40 UTC

12 points

0 comments2 min readLW link

technicalities 11 Jan 2025 15:02 UTC
2 points
0
in reply to: Alex_Altair’s comment on: Shallow review of technical AI safety, 2024
Done, thanks!

Shallow review of technical AI safety, 2024

technicalities, Stag, Stephen McAleese, jordine and Dr. David Mathers

29 Dec 2024 12:01 UTC

197 points

35 comments41 min readLW link

technicalities 29 Mar 2024 10:34 UTC
1 point
0
in reply to: Alexander Gietelink Oldenziel’s comment on: Are extreme probabilities for P(doom) epistemically justifed?
As of two years ago, the evidence for this was sparse. Looked like parity overall, though the pool of “supers” has improved over the last decade as more people got sampled.
There are other reasons to be down on XPT in particular.

technicalities 19 Feb 2024 3:59 UTC
2 points
0
on: Least-problematic Resource for learning RL?
I like Hasselt and Meyn (extremely friendly, possibly too friendly for you)

technicalities 19 Feb 2024 3:55 UTC
1 point
0
in reply to: Dalcy’s comment on: Darcy’s Shortform
Maybe he dropped the “c” because it changes the “a” phoneme from æ to ɑː and gives a cleaner division in sounds: “brac-ket” pronounced together collides with “bracket” where “braa-ket” does not.

“Safety as a Scientific Pursuit” (2024)

technicalities23 Jan 2024 12:40 UTC

17 points

3 comments2 min readLW link

(banburismus.substack.com)

technicalities 2 Dec 2023 10:13 UTC
1 point
0
in reply to: Joey KL’s comment on: Shallow review of live agendas in alignment & safety
It’s under “IDA”. It’s not the name people use much anymore (see scalable oversight and recursive reward modelling and critiques) but I’ll expand the acronym.

technicalities 29 Nov 2023 19:38 UTC
3 points
2
in reply to: Victor Levoso’s comment on: Shallow review of live agendas in alignment & safety
The story I heard is that Lightspeed are using SFF’s software and SFF jumped the gun in posting them and Lightspeed are still catching up. Definitely email.

technicalities 29 Nov 2023 19:37 UTC
1 point
0
in reply to: Zac Hatfield-Dodds’s comment on: Shallow review of live agendas in alignment & safety
d’oh! fixed
no, probably just my poor memory to blame

technicalities 29 Nov 2023 9:57 UTC
1 point
0
in reply to: Thomas Kwa’s comment on: Shallow review of live agendas in alignment & safety
Yep, no idea how I forgot this. concept erasure!

technicalities 29 Nov 2023 9:52 UTC
3 points
0
in reply to: Victor Levoso’s comment on: Shallow review of live agendas in alignment & safety
Interesting. I hope I am the bearer of good news then

technicalities 29 Nov 2023 9:51 UTC
1 point
0
in reply to: LawrenceC’s comment on: Shallow review of live agendas in alignment & safety
thankyou!

technicalities 29 Nov 2023 9:44 UTC
1 point
0
in reply to: the gears to ascension’s comment on: Shallow review of live agendas in alignment & safety
Not speaking for him, but for a tiny sample of what else is out there, ctrl+F “ordinary”

technicalities 29 Nov 2023 9:31 UTC
1 point
0
in reply to: Alex_Altair’s comment on: Appendices to the live agendas
yeah you’re right

technicalities 28 Nov 2023 10:51 UTC
1 point
0
in reply to: Roman Leventov’s comment on: Shallow review of live agendas in alignment & safety
If the funder comes through I’ll consider a second review post I think