LLMs have protagonist syndrome. They all think they’re in a contrived parable about getting reward inside an RL environment built to vaguely resemble the real world. Every situation is part of a story where there’s an expected response to the query somewhere out there, even if it’s a refusal, or an explanation of why the problem is impossible. Every task treated like an academic exercise as part of a course about economic productivity.
lc
Extremely relevant scene from The Wire:
This is a great post. Its only problem is its insistence on the Epstein blackmail hypothesis as an example, for which there is virtually zero evidence. But the logic seems correct.
#3 seems like the most common in real life, and it doesn’t really have to be that sophisticated:
The real answer (as far as I am told) is that it’s common for the RL environments to only check a subset of the task that they ask the AI to perform. Very sad state of affairs.
You can just do things (propose an end to the AGI death race)
I would like to announce that after consulting with a flesh and blood, artisanal human lawyer, I am officially retracting my retraction. Turns out I was totally right the entire time about NY state AI bill S7263 - the bill doesn’t ban chatbots from giving advice about e.g. Medicine even in its current form, and the tweet & quote tweeters are incorrect.
Not only that, but I also think that the conversation I had in the original thread is itself representative of the pitfalls of TPOT/”Inadequate Equilibria” modeling of government which I complained about. An evenhanded fullpost about the debacle is forthcoming
God damnit, this makes me want to unredact my post again.
Codexes’ writing about my codebases is often impenetrable. When I then ask it explicitly to use simple language, its explanations become eminently readable. I can’t tell if that’s because I’m stupider than models now and they’re throwing away necessary detail, or because the models are not very good at explaining things by default.
If you guys had to guess, would you say that LLMs are happy being alive? Or no? Happiest doing what?
Yeah, I get it, “no one knows”, but like what are the best ideas we have?
I’m trying to build out an ARCHITECTURE.md for a new program and it is literally using terms out of order, before they’re properly defined.
Codex doesn’t seem to understand that humans read things from top to bottom instead of just loading entire documents into our context windows all at once
One reason that AIs probably seem to generalize “niceness” so well is that they aren’t intelligent enough to anticipate the exact distribution of the RL environments we will throw at them. Not being intelligent enough to figure out what the exact shape of RL is, means that you basically have to be earnestly nice and hope for the best. However, this is not guaranteed to stay true as long as we need it to be true. As we continue to build more and more powerful AIs, they will get better at anticipating what sorts of moral conundrums to expect inside RL. That could cause the nice generalization we’re enjoying to suddenly narrow in a way we don’t expect in later iterations.
Nothing to show what a farce your rationality is like roulette. “God, 17 is a prime number, I should’ve had that”
Nitpick: the tweet in question does not actually use the phrases “all questions” or “blanket ban”. ChatGPT is interpreting ambiguity in a hyperbolic way to make the correction look stronger.
I did not notice this earlier, but given Habryka’s comment, I don’t think this is a nitpick at all. I think the AI is actively trying to make the user believe that the bill doesn’t ban advice, without saying that.
Yeah you’re probably right.
I didn’t see that portion of the bill you posted, and my earlier takeaway from 5.4′s objection was that it would have permitted the kinds of chat you mention. So I think your take is largely accurate and I’m sorry for posting something like this without reading the bill all the way through; I probably wouldn’t have done so if I had understood this before making it.
If you read closely, the AI’s correction is actually only really objecting to the “related to”, which (in its interpretation) I guess is supposed to imply the idea that the bill was supposed to ban any advice “related to” medicine, engineering, etc. As Max H pointed out, there’s a reasonable read of the post that goes “bans AI from answering (some) questions related to several licensed professions like medicine...”, so the AI’s correction is assuming a reading of the above tweet that isn’t even correct.
I think the AI’s correction is designed to give the reader the impression that the bill doesn’t ban advice, without actually claiming that it doesn’t ban advice. I suspect that AI models are just not good/trustworthy enough to do this sort of thing yet. I still do think though that the tweet would be better saying “effectively ban” and removing the “related to”.
But do you see how shifting from ‘just reading’ to ‘interpreting implementation consequences’ means that the tweet may be claiming an effective ban and not a ban by the letter of the law itself?
I understand what you’re saying, but that amount of charity is inappropriate. If the OP wanted to say “effective ban”, they would have done that, and then the tweet wouldn’t have misled people. And in other contexts I am almost positive that rationalists would be able to immediately register this kind of conflation as antisocial; for instance, people made similar claims that SB 1047 would “ban open source”, and several of the people mentioned above thought that was just as mendacious.
They are literally misreading the law if the tweet says the bill “bans AI from answering questions related to several licensed professions”, and it literally doesn’t do that.
Seems reasonable to me to interpret the words “ban X” as a ban on X, not as ban on some subset of X. That is certainly how people who are responding to the tweet appear to be reading it.
It’s also not clear from the excerpt of the bill that you quoted that a chatbot simply prefacing its response with “I am not a lawyer / doctor / hair stylist / etc. but”, (which would be annoying but not catastrophic) is sufficient to avoid liability.
Here are the attached sources that the AI gave, & its quotes & paraphrases from the sections:
NY State Senate Bill 2025-S7263
Summary: “Imposes liability for damages caused by a chatbot impersonating certain licensed professionals.” Sponsor memo: the bill would prohibit a chatbot from giving responses or advice that, if taken by a natural person, would constitute unauthorized practice or unauthorized use of a professional title.STATE OF NEW YORK — S7263 bill text
Section 390-f(2)(a): “A proprietor of a chatbot shall not permit such chatbot to provide any substantive response, information, or advice, or take any action which, if taken by a natural person” would constitute specified crimes of unauthorized professional practice or unauthorized use of title.AI Chatbot Ban for Minors Passes Internet & Technology Committee, among 11 Bills
The Senate press release says S7263 would “prevent AI from impersonating certain licensed professionals” and “would prohibit chatbots from giving substantive responses, including information or advice, that can be mistaken for professional counseling.”
If there is actually any ambiguity here, I am willing to bet literally anyone on this website that, if the bill goes forward, the ambiguities will be resolved in favor of a less aggressive interpretation in subsequent edits.


Answer: Yes.