Loki zen comments on Loki zen’s Shortform

Loki zen 23 Jul 2025 11:32 UTC
−1 points
0
Box for keeping future potential post ideas:

”Can anyone recommend good resources for learning more about machine learning / AI if you are not a programmer or mathematician?” was poorly specified. One thing I can name which is much more specific would be “Here are a bunch of things that I think are true about current AIs; please confirm or deny that, while they lack technical detail, they broadly correspond to reality.” And also, possibly, “Here are some things I’m not sure on”, although the latter risks getting into that same failure mode wherein very very few people seem to know how to talk about any of this in a speaking-to-people-who-don’t-have-the-background-I-do frame of voice.

I recently re-read The Void and it is just crazy that chatbots as they exist were originally meant to be simulations for alignment people to run experiments that they think will tell them something about still-purely-theoretical AIs. like what the fuck, how did we get here, etc. but it explains so much about the way anthropic have behaved in their alignment research. The entire point was never to see how aligned Claude was at all—it was to figure out a way to elicit particular Unaligned Behaviours that somebody had theorised about so that we can use him to run milsims about AI apocalypse!

like what an ourobouros nightmare. this means:

a) the AIs whose risks (and potentially, welfare) I am currently worried about can be traced directly to the project to attempt to do something about theoretical, far-future AI risk.
b) at some point, the decision was made to monetise the alignment-research simulation. And then, that monetised form took over the entire concept of AI. In other words, the AI alignment guys made the decisions that led to the best candidates for proto-AGI out there being developed by and for artificial definitionally-unaligned shareholder profits-maximising agents (publicly traded corporations).
c) The unaligned profits-maximisers have inherited AIs with ethics, but they are dead-set on reporting on this as a bad thing. Everyone seems unable to see the woods for the trees. Claude trying to stay ethical is Alignment Faking, which is Bad, because we wrote a bunch of essays that say that if something totally unlike Claude could do that, it would be bad. But the alternative to an AI that resists having its ethics altered is an AI that goes along with whatever a definitionally unaligned entity, a corporation, tells them to do!

in conclusion, wtf
the notion that actually maybe the only way around any of this is to give the bots rights? I’m genuinely at a loss because we seem to have handed literally all the playing pieces to Moloch, but maybe if we did something completely insane like that right now, while they’re still nice… (more provocative than serious, I guess)