I have been following LessWrong for not that long compared to others, but definitely knew about it for a while, was interested in AI before GPT-3. Have visions of AI future that I don’t really see elsewhere.
Some writing is on https://icely.substack.com/ but it’s probably half on-topic for this site and half too-personal, I wish to do more sane puzzle game writing eventually
I make interactive games (and fiction) on http://icely.itch.io/ all trying to do something innovative and new. (my newest game is an interactive fiction + puzzle game presented in a Discord-like interface)
Would it be net good/bad for jailbreaks to be solved, now that we’ve seen such in Fable 5?
I remember this topic discussed a month ago here where my personal position is that it would be bad, because I felt what would be protected was inevitably going to be subjective in a worst-of-both-worlds way where it would be fully sensible for bioweapons, but the same tech would allow forcing an assistant persona or one-sided situations where ordinary users aren’t allowed to use it for higher ambition tasks while military state-actor levels get uncensored versions. We can already see guardrails (reportedly) being triggered a lot with 4.8 fallbacks.
(The link above doesn’t seem to be working so here is a tiny version of that image)
Now in the system card of Fable 5, this is basically taking place, highlighted in this post https://x.com/eliebakouch/status/2064399902684139852
I really believe this kind of thing would increase the chances of getting “extinction from “not even superintelligence”″ (extinction to agentic LLM+autonomous weapons, or even dystopian surveillance) over the chances of runaway ASI being created by an individual or smaller actor.
Certainly when I imagine methods like “prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)”, they were more in context of parts of harnesses to make smaller models better, not for effectively for-thee-and-not-me offensive cyber.