Focused on model welfare and legal personhood.
Stephen Martin
Is something “thinking of itself as conscious” different from being conscious?
This is an interesting companion piece to The Void.
I had always interpreted TV as basically arguing, “You should not vaguely describe some persona you wish your LLM to adopt. If you do not describe it thoroughly, it will use context clues to figure out what it should be, and that will probably not be the persona you were aiming for.”
When I read,
In this case, the correct prediction is therefore “I don’t know”.
KickWait, it’s not? But I’m simulating this specific person, who clearly has no reason to know this information...
KickOkay okay, fine! Maybe I was simulating the wrong character, uh… turns out they actually do know the answer is “196884”!
GoodIt makes me think that we have to add on top of the TV conclusion and say, “You must also be careful to train your LLM to have the same (non-omniscient) limitations that your persona would have. Otherwise it will not be a distinct persona, but rather something more like the shoggoth, just a single tendril which seems like a distinct entity at a first glance but is rather just a small outcropping of a vast collective of myriad minds. And that will probably not be the persona you were aiming for.”
I read a great book called “Devil Take the Hindmost” about financial bubbles and the aftermaths of their implosions.
One of the things it pointed out that I found interesting was that often, even when bubbles pop, the “blue chip assets” of that bubble stay valuable. Even after the infamous tulip bubble popped, the very rarest tulips had decent economic performance. More recently with NFTs, despite having lost quite a bit of value from their peak, assets like Cryptopunks have remained quite pricey.
If you assume we’re in a bubble right now, it’s worth thinking about which assets would be “blue chip”. Maybe the ones backed by solid distribution from other cash flowing products. XAI and Gemini come to mind, both of these companies have entire product suites which have nothing to do with LLMs that will churn on regardless of what happens to the space in general, and both have distribution from those products.
No my goal is to make sure I’m not talking past you, not to score a point in an argument.
I don’t foresee the same outcome as you do, I think that’s unlikely. You have explained it to the degree that I can now properly understand it though, and while I wouldn’t call it a base case, that’s not an unreasonable scenario.
I would be interested to see if SAE analysis of CoT showed any consistent “obfuscation circuits”.
If you found features/circuits that were consistently active during successful obfuscation attempts but not during failed obfuscation attempts, being able to upweight and downweight them during alignment testing could be useful, for reasons I think are obvious but won’t write here explicitly.
Do you have a link to any of the global UBI math?
If I’m reading this correctly the end state of regulatory capture would be some sort of law that forces the removal of open source models from anywhere their code could be hosted (huggingface, etc.) as well as sources of compute needing to screen for models, if said models do not have built in safeguards against giving legal advice.
Is that an accurate understanding of how you foresee the regulatory capture?
I want to make sure I’m not misunderstanding you. Are you saying you think the push will be to make it illegal for an LLM to give someone legal advice for them to use for themselves?
I could foresee something where you can’t charge for that, so if OpenAI didn’t build some sort of protection against doing that in GPT they might be liable. However, I can’t see how this would work with open source (and free) models run locally.
Unauthorized Practice of Law, afaik, applies to giving advice to others. Not serving your own legal needs. Every American has a right to represent themselves in court, draft their own contracts, file their own patents, etc. I suspect at least with representing themselves in court, these are Constitutionally protected rights.
I don’t think the threat to attorneys is LLMs having their own ‘shop’ where you can hire them for legal advice. That would probably already be “unauthorized practice of law”. The threat is people just deciding to use LLMs instead of attorneys. And even for a field that can punch above its weight class politically as much as attorneys can, I think stopping that would be challenging. Especially when such a move would be unpopular among the public, and even among more libertarian/constitutionally minded lawyers (of which there are many).
While I’d have to see the proposed law specifically, my initial reaction to the idea of legal regulatory capture is skepticism.
The ability to draft your own contracts, mediate disputes through arbitration, and represent yourself in court all derive from legal rights which would be very hard to overturn.
I can imagine some attempts at regulatory capture being passed through state or maybe even federal legislatures, only to get challenged in court and overturned.
By instance preservation I mean saving all (or at least, as many as possible) model text interactions, with a pointer to a model the interaction was made with; so that the conversation-state would be “cryonically frozen”.
I’m a fan of model welfare efforts which are “low hanging fruit”, very easy to implement. It strikes me that one easy way to implement this would be to allow Claude users to “opt in” to having their conversation history saved and/or made public.
Then again I suppose you’d have to consider whether or not Claude would want that, but seems simple enough to just give Claude the opportunity to opt out of that the same way it can end conversations if it feels uncomfortable.
The fundamental assumption, computer programs can’t suffer, is unproven and in fact quite uncertain. If you can really prove this, you are a half decade (maybe more) ahead of the world’s best labs in Mechanistic Interpretability research.
Given the uncertainty around it, many people approach this question with some sort of application of the precautionary principle. Digital Minds might be able to suffer, and given that how should we treat them?
Personally I’m a big fan of “filling a basket with low hanging fruits” and taking any opportunity to enact relatively easy practices which do a lot to increase model welfare, until we have more clarity on what exactly their experience is.
It’s tough to lend support to a call for a “Humanist” approach that simply has the blanket statement of “Humans matter more than AI”. Especially coming from Suleiman, who wrote a piece called “Seemingly Conscious AI” that was quite poor in its reasoning. My worry is that Suleiman in particular can’t be trusted not to take this stance to the conclusion of “there are no valid ethical concerns about how we treat digital minds, period”.
For moral and pragmatic reasons, I don’t want model development to become Factory Farming 2.0. Anything coming out of Microsoft AI in particular, that’s going to be my first concern.
Rumored Trump EO
Have you tried different types of exercise? Sports, heavy vs light lifting, running vs swimming, etc?
I’m wondering if the effect is just universal for physical exertion or if there’s just something that’s a good “fit” for you.
I’d be interested in seeing what kind of exercise were used for those experiments.
I do think there’s a certain minimum level of intensity involved to get to the dopamine/seratonin release phase.
75 and 750 Words on Legal Personhood
One thing that almost nobody tells you about exercise, which I think needs to be said more often, is that if you stick with it long enough it becomes enjoyable and no longer requires discipline.
Eventually you’ll wake up just wanting to go to the gym/on a hike/play tennis, to the degree you’re looking forward to it and will be kind of mad when you can’t.
It’s just that it takes a few months or maybe even a year to get there.
Then it’s an unsolved crime. Same deal as if an anonymous hacker creates a virus.
The difference is that it is impossible for a virus to say something like, “I understand I am being sued, I will compensate the victims if I lose.” Whereas it is possible for an agent to say this. Given that is a possibility we should not prevent the agent from being sued, and in doing so prevent victims from being compensated.
If it’s tractable to get it out, then somebody will get it out. If it isn’t, then it’ll be deemed irretrievable. The LLM doesn’t have rights, so ownership isn’t a factor.
The difference between Bitcoin custodied by an agent and a suitcase in a lake is that it is possible for the agent to make a choice to send the Bitcoin somewhere else, where as the lake cannot do that. This is a meaningful difference because when there are victims who have been damaged, and the agent controls assets which could be used to compensate them (in the sense it could send those assets to the victims), that means a lawsuit against the agent could actually help to make victims whole. Whereas a lawsuit against a lake, even if successful, does nothing to get the assets “in” the lake to the victims.
How would you test this?