Hey Steven this is unrelated but I wanted to say I really appreciate your posts and comments here!
LWLW
I’ve found the best way to get out of philosophical rabbit holes is to spend more time living. It provides far more reassurance and wisdom than spending all day trying to solve the problem of evil. I think Hume found something similar and that’s deeply reassuring to me.
The idea of a superintelligence having an arbitrary utility function doesn’t make much sense to me. It ultimately makes the superintelligence a slave to its utility function which doesn’t seem like the way a superintelligence would work.
Has anybody checked if finetuning LLMs to have inconsistent “behavior” degrades performance? Like you finetuned a model on a bunch of aligned tasks like writing secure code and offering compassionate responses to individuals in distress, but then you tried to specifically make it indifferent to animal welfare? It seems like that would create internal dissonance in the LLM which I would guess causes it to reason less effectively (since the character it’s playing is no longer consistent).
My guess is that finetuning an LLM turns it into a p-zombie. I don’t think the architecture is complicated enough to support consciousness. There’s zero capacity for choice involved, which seems to be what consciousness is all about.
I think if one could formulate concepts like peace and wellbeing mathematically, and show that there were physical laws of the universe implying that eventually the total wellbeing in the universe grows monotonically positively then that could show that certain values are richer/“better” than others.
If you care about coherence then it seems like a universe full of aligned minds maximizes wellbeing while still being coherent. (This is because if you don’t care about coherence you could just make every mind infinitely joyful independent of the universe around it, which isn’t coherent).
but if human intelligence and reasoning can be picked up from training, why would one expect values to be any different? the orthogonality thesis doesn’t make much sense to me either. my guess is that certain values are richer/more meaningful, and that more intelligent minds tend to be drawn to them.
and you can sort of see this with ASPD and NPD. they’re both correlated with lower non-verbal intelligence! and ASPD is correlated with significantly lower non-verbal intelligence.
and gifted children tend to have a much harder time with the problem of evil than less gifted children do! and if you look at domestication in animals, dogs and cats simultaneously evolved to be less aggressive and more intelligent at the same time.
contra the orthogonality thesis.
if you want to waste a day or two, try to find an eminent mathematician or physicist who had NPD or ASPD. as far as i can tell, i haven’t been able to find any successful ones who had either disorder.
as far as the research goes, ASPD is correlated with significantly lower non-verbal intelligence. and in one study i found, NPD wasn’t really correlated with any parts of intelligence except with lower non-verbal intelligence.
which can lead to the idea that everbody starts out aligned, and then when those with less cognitive reserve are confronted with trauma, more serious misalignment/personality disorders can arise.
even if you look at kids with ASPD who try to murder their siblings when they’re two, most of the time it’s a younger sibling. which can be explained by the kid being jealous that they get less attention than their younger sibling. which can be entirely explained by a lower (emotional) pain tolerance.
I’m sorry are you really saying you’d rather have Ted bundy with a superintelligent slave than humanity’s best effort at creating a value-aligned ASI? You seem to underestimate the power of generalization.
If an ASI cares about animal welfare, it probably also cares about human welfare. So it’s presumably not going to kill a bunch of humans to save the animals. It’s an ASI, it can come up with something cleverer.
Also I think you underestimate how devastating serious personality disorders are. People with ASPD and NPD don’t tend to earn flourishing lives for themselves or others.
Also, if a model can pick up human reasoning patterns/intelligence from pretraining and RL, why can’t it pick up human values in its training as well?
This just seems incoherent to me. You can’t have value-alignment without incorrigibility. If you’re fine with someone making you do something against your values, then they aren’t really your values.
So it seems like what you’re really saying is that you’d prefer intent-alignment over value-alignment. To which I would say your faith in the alignment of humans astounds me.
Like is it really safer to have a valueless ASI that will do whatever its master wants than an incorrigible ASI that cares about animal welfare? What do you expect the people in the Epstein files to do with an ASI/AGI slave?
A value-aligned ASI completely solves the governance problem. If you have an intent-aligned ASI then you’ve created a nearly impossible governance problem.
I think a lot of people here who are freaked out about LLMs would have a lot of their anxieties assuaged if they read a little bit about neuroscience and just trusted their vibe of the LLMs they talked to. To me it seems like the brain is doing something a lot more complicated than deep learning. If you read about CNET it looks like the human brain actually involves quantum effects (which opens up the exciting possibility of a causal mechanism for free will to exist).
This is related because if you just enter a conversation with Claude or ChatGPT and get to stuff like this, if you just imagined it was a person saying what the LLM was saying you would think they were a fucking moron.
I got into reading about near death experiences and it seems a common theme is that we’re all one. Like each and every one of us is really just part of some omniscient god that’s so omniscient and great that god isn’t even a good enough name for it: experiencing what it’s like to be small. Sure, why not. That’s sort of intuitive to me. Given that I can’t verify the universe exists and can only verify my experience it doesn’t seem that crazy to say experience is fundamental.
But if that’s the case then I’m just left with an overwhelming sense of why. Why make a universe with three spatial dimensions? Why make yourself experience suffering? Why make yourself experience hate? Why filter your consciousness through a talking chimpanzee? If I’m an omniscient entity why would I choose this? Surely there’s got to be infinitely more interesting things to do. If we’re all god then surely we’d never get bored just doing god things.
So you can take the obvious answer that everything exists. But then you’re left with other questions. Why are we in a universe that makes sense? Why don’t we live in a cartoon operating on cartoon logic? Does that mean there’s a sentient SpongeBob? And then there’s the more pressing concern of astronomical suffering. Are there universes where people are experiencing hyperpain? Surely god wouldn’t want to experience I Have No Mouth and I Must Scream. It doesn’t seem likely to me that there are sentiences living in cartoons, so I’ll use that to take the psychologically comforting position that not everything we can imagine exists.
But if that’s the case then why this? Why this universe? Why this amount of suffering? If there’s a no-go zone of experience where is it? I have so many questions and I don’t know where the answers are.
I understand. I’ll try to keep it more civil.
If you’re making fun of what I’ve expressed about S-risks go fuck yourself. If you’re not then I think you’re naive. Anger is the main way change happens. You’ve just been raised on a society that got ravaged by Russian psy-ops that the elites encouraged to weaken the population. It can feel good to uplift others while simultaneously feeling fucking awful knowing that innocent people are suffering.
And just to be fucking clear, if you were making fun of me, please say it like a fucking man and not some fucking castrated male. If you were making fun of me you’re a low T faggot who’s not as smart as he thinks he is. There are 10 million Chinese people smarter than you.
To be clear, I only intend the last paragraph if you were being a bitch. If not then consider that it’s only addressed to a hypothetical cunty version of you.
It’s not fear. It’s anger. Also good people are rare. The people you think of as good people are likely just friendly.
Sure. The people I’m talking about choose to care as much as they do. Good and courageous people can choose to not have hope and not care about others, but they choose to care.
I think a lot of people are confused by good and courageous people and don’t understand why some people are that way. But I don’t think the answer is that confusing. It comes down to strength of conscience. For some people, the emotional pain of not doing what they think is right hurts them 1000x more than any physical pain. They hate doing what they think is wrong more than they hate any physical pain.
So if you want to be an asshole, you can say that good and courageous people, otherwise known as heroes, do it out of their own self-interest.
I just don’t understand why the people there would lie about something like this. This isn’t even very believable. It looks like the guy who founded it was a bright ML PhD and if he’s not telling the truth why would he throw away his reputation over this? Maybe it’s real but I’m pretty skeptical. I looked at their Zochi paper and I don’t see that they offered any proof that the papers they attributed to Zochi were written by Zochi.
Is intology a legitimate research lab? Today they talked about having an AI researcher that performed better than humans on RE-bench at 64 hr time horizons. This seems really unbelievable to me. The AI system is called Locus.
Some part of me still has hope but I do agree that without a deus ex machina things look bleak. I guess I’m just hoping for some crazy 1 in a trillion event that gets humanity to work together.
Anyways Terence Tao wireheading with abstract math is really funny so if Sam Altman has you cornered with GPT-AM maybe you can make them laugh to escape lol.