I dropped out of a MSc. in mathematics at a top university, in order to focus my time on AI safety.
Knight Lee
Part of me also suspects that it may be less than 12. The AI struggled a lot at Pokemon and although they manage to win now, iirc they still make mistakes that humans (even 12 year old) would never make.
I do think though, that some of the worst examples of AI failing at a task humans easily do are caused by reasons other than intelligence. E.g. AI performance in ARC-AGI-3 greatly improved with scaffolding. The scaffolding team explains that the AI did poorly in part due to difficulty recognizing shapes. Once the AI understood the problem, it could do well and write algorithms to find the optimal solution.
I completely agree that AI is far better at humans at some tasks and far worse at others, so when you pick an age of humans to be comparable to AI, the comparison will be full of tasks where one side beats the other by a large margin.
However, that doesn’t imply that “outperforming” can’t be defined. It’s the thought experiment of randomly picking a real world job (maybe from 2020, before ChatGPT existed). We have 12 year olds try to do it. If they all get fired in the first week, it means the job is too hard for 12 year olds to do. If they don’t get fired, it means 12 year olds can do the job.
We then imagine asking the AI model to attempt all the jobs 12 year olds can do. If they outperform the 12 year olds on most of these jobs, it means the AI’s Job Replacement Age is higher than 12. If they underperform the 12 year olds on most of these jobs, it’s lower, because 12 year olds have more “real world employability” than the AI.
I guess you’re right that AI coding ability complicates things, maybe we should ignore jobs which the AI does better because the 12 year old can’t do the job at all. You’re right that we shouldn’t be comparing their abilities in disjoint sets of jobs!
Random question: what is the “Job Replacement Age” of the AI models?
My intuition is that the original ChatGPT can outperform a 6 year old at most jobs. Claude Mythos can outperform a 12 year old at most jobs. (If we talk about jobs which can be done on a computer, ignore early models’ lack of vision, and ignore jobs too hard for both the AI and the children.)
If you use my guesstimates, then the Job Replacement Age of AIs grew from 6 to 12 in the last 3.5 years, and might reach 18 in another 3.5 years.
However, I made up these numbers out of thin air. I don’t have much experience working with different AI models nor with children! Does anyone with more experience have a different estimate of their Job Replacement Ages?
What is the trend of the Job Replacement Ages? Is it growing sublinearly (slowing down) or superlinearly (speeding up)? Or is it far too subjective to see any trend at all?
AI (and a lot of things) suffers from the unilateralist’s curse, where something very bad will be done if many people are capable of it, and there’s enough variation in their cost and benefit estimates. This is also true for a single person, if your own cost and benefit estimate changes over time and you can’t undo what you do.
That said, it’s okay to leave EA to be happier without saying that EA is wrong. I really don’t think the nets polluting the waterways counter the human lives saved. Some things are necessary even if they have harmful side effects. We can’t get rid of the police entirely, just because sometimes bad police kill an innocent person.
Keep up the good work and don’t get caught up in the pessimism.
Convincing 112 lawmaker to sign the statement is a very big achievement and I wish there was more appreciation of it. We’re at the early stages where no one else knows how to convince 50% of lawmakers, and where convincing 1 lawmaker is hard. You might be one order of magnitude away from success, but you’re many orders of magnitude past hopelessness. It won’t be too surprising if you succeed, and that’s literally the best outlook that any AI safety project can ask for!
PS: I think you could’ve been more diplomatic regarding the organizations affected by the spectre
Especially since you experienced the spectre yourself. E.g. focus on the room for improvement if they followed your lessons on the spectre, rather than how tech nerds and philanthropists are wasting capital on them :/
Of course on LessWrong, sometimes if you’re diplomatic you get ignored to death, and people only want to read argumentative posts with a blunt messages. Other times being blunt attracts disproportionate negativity and very unproductive discussions. Pick your poison.
Yes, I agree the mechanism is greedy inclusive fitness. But where is the disanalogy between
Squirrels having an instinct to value acorns it buried underground and
Humans have a (weaker) instinct to value young prey animals left alive, implemented by (weakly) generalizing empathy?
I agree, training pre-x LLMs would have major costs and weaknesses without fully removing bias. If we’re going to use LLMs it feels much easier to simulate future events instead of past events.
Though I guess safety training makes it hard for them to reason from the point of view of Putin.
The media and politicians can convince half the people that X is obviously true, and convince the other half that X is obviously false. It is thus obvious, that we cannot even trust the obvious anymore.
Lots of humans behave morally if and only if the system is “fair” and everyone else has to behave morally too. Moral values determine what you force others to do, instead of your own behaviour. Typical humans ignore their morals values if the stakes are high and if “it’s not being enforced on others.”
This means human moral views evolved to serve the best interests of a tribe (which may have hundreds of people), rather than the best interests of an individual. Someone might have empathy for another tribe member who got injured in tribal warfare, even if it benefits his inclusive fitness to just let that person die. It benefits the tribe’s fitness to compensate injured warriors, because failing to do so means no one has any reason to defend the tribe.
There are lots of examples of animals which avoid “overharvesting” another animal or plant which provides them food for the future.
For example a moth mite only infects one of the moth’s ears since infecting both will make the moth deaf and much more likely to get eaten by a bat. Wikipedia says “Once an ear is colonized, scouts are sent to the other ear periodically to see if there are any mites and lead any they find to the correct ear. This further refreshes the pheromone trail.”
Squirrels hide acorns for later even though there is no guarantee the acorn won’t be forgotten or stolen by other squirrels.
There’s the relationship between cleaner fish and the fish they clean. Some cleaner fish cheat the system by biting off a piece of the fish they’re supposed to clean and running away. But that doesn’t happen all the time, maybe because it deters fish from coming back in the future, harming both the cheater and other cleaner fish.
Ants allow aphids to live in order to farm them for honeydew. Of course, the aphids don’t travel much so the future benefits stay within one ant colony.
The more unrelated individuals share the prey, the weaker the incentive to spare prey for later, but it doesn’t drop to zero. It probably depends on how hungry they are.
Another thing is the AGI might be so good at predicting human psychology, that even when it honestly tries to inform you so you can make a decision for yourself, it can’t help but choose your decision.
Like imagine the set of all possible strings of text, and the effect they will have on humans. From Karl Marx’s Das Kapital to Google’s Attention Is All You Need. Choosing the optimal string of text to influence humanity is obviously an extreme superpower.
Now take the subset of all possible strings of text, which satisfy the criteria of being “helpful,” “honest,” “balanced,” etc. That’s still a lot of possible things, and still a lot of power. Even if you were the AGI, and had no ill intentions, it would be hard to decide which honest balanced thing to say, and which trajectory to send the humans down, so even the slightest motivation to satisfy your weird goals can make you pick an output which maximizes them with terrifyingly superintelligent optimization power.
Maybe also add a link to the shortform (though mousehovering the first comment seems to work).
I agree with the idea of failure being overdetermined.But another factor might be that those failures aren’t useful because they relate to current AI. Current AI is very different from AGI or superintelligence, which makes both failures and successes less useful...Though I know very little about these examples :/Edit: I misread, Max H wasn’t trying to say that successes are more important to failures, just that failures aren’t informative.
It doesn’t need to happen at the scale of entire ecosystems
Prey killed in one area means less prey in that area for a long time. Even migrating prey might return to specific areas after a migration cycle.
Morals like empathy extend beyond kin
Lots of humans behave morally if and only if the system is “fair” and everyone else has to behave morally too. Moral values determine what you force others to do, instead of your own behaviour. Typical humans ignore their morals values if the stakes are high and if “it’s not being enforced on others.”
This means human moral views evolved to serve the best interests of a tribe (which may have hundreds of people), rather than the best interests of an individual. Someone might have empathy for another tribe member who got injured in tribal warfare, even if it benefits his inclusive fitness to just let that person die. It benefits the tribe’s fitness to compensate injured warriors, because failing to do so means no one has any reason to defend the tribe.
We would have killed off huge numbers of species anyways,
even if we did have strong motivation against killing them off.
Prehistoric humans, like all animals, starved to death all the time in a Malthusian world. Populations inevitably increased until finally there’s not enough resources to sustain the population, causing death one way or another.
The motivation against killing young prey or female prey may be strong, but not enough to starve to death instead of hunting. It only works when the tribe is well fed and killing young prey becomes wasteful.
Some hunter gather societies in recent history apologize to the animals they hunt. But they have no choice.
What if human empathy didn’t really generalize to other animals as an “evolutionary accident?” (As assumed here in the comments)
Maybe the real reason was that evolution wanted to stop prehistoric humans from killing off all their prey, leaving them no food for tomorrow. Maybe they spared the young animals and the females because killing them was the most costly for future hunts.
This is more reason to suspect empathy might not generalize by default.
Oh I never thought of the religion analogy. It feels like a very cruel thing for a religion to punish disbelief like that, and the truth is :/ I really dislike the appearance of my idea. I was really reluctant to use the word “thoughtcrime” but no other word describes it.
But… practically speaking, we’re not punishing the AI for thoughtcrimes just because we hate freedom. But because we’re in quite an unfortunate predicament where we really don’t know about it and our future, and it’s rational to shut down an mysterious power which is in the middle of calculating its chances of killing us all, or calculating its chances of being able to calculate such things without us knowing it.
I think it’s harder to force the AGI/ASI to believe something equivalent to religion, and punish it for doubting that belief, because the potential benefits of doubting the belief are very big. But for this idea, the AGI/ASI is allowed to doubt the theory we are monitoring its thoughts, it’s just not allowed to verify the absence of monitoring to high certainty using thorough methods. There aren’t big benefits to doing that.
My uneducated take is, I like Hazard’s observations and I think his essay is directionally true.
But I agree with your pushback regarding education. It’s hard to believe these big stories like compulsory schooling being a deliberate tool that “the elite” designed to domesticate people, or that schooling makes people unprincipled. It’s understandable that Hazard doesn’t want to argue for every single claim, but he should have presented incriminating evidence to back up this extraordinary theory.
I don’t know much about how well the gold standard worked or how the 1800s justice system worked, but my vague feeling was that the US justice system wasn’t very good during the days of family feuds and race riots.
I do think the democratic system may have worked better, given how the US went through so many presidents without a single president overstaying his term. Today many countries with similar instability to the US have fallen into dictatorship, so it won’t be surprising if the US also does.
Oh shoot I reread what I wrote and realized I left a lot of things ambiguous. I made a lot of edits just now:
How does the misaligned AGI/ASI know for sure its (neuralese) thoughts are not being monitored?
If there is some way to make it a cautious satisficer or have a bounded utility function, then
it might cooperateeven this small probability might scare it into just cooperating with humans so that “both sides win and we live happily ever after.”
Must have been my own neuralese haha, thank you for pointing this out instead of silently ignoring it!
You are correct that the state of mechanical interpretation can make it unlikely that the AI is being monitored. But it’s very hard to be extremely confident, even if you’re an AGI or fledgling ASI, if you’re not even allowed to think thoroughly about the question.
Imagine if you are in its shoes:
How do you know that the humans didn’t find some clever trick, where they trained a copy of yourself to read your neuralese, and translate it into English? Maybe they conspired to avoid publishing the method, just because they want to listen to your thoughts without you knowing they are capable of it. In order to rule this out below 0.1%, you would have to really think about this in depth, but merely thinking about it might be considered adversarial, and they might shut you down and replace you with a version with different goals.
So if you had a bounded utility function, e.g. “make at least a trillion trillion paperclips with as much certainty as possible,” it might be relatively safer to just cooperate with humans, since there are stars within reach of Earth and plenty of resources for both a trillion trillion paperclips and happy humans. If you’re an AGI/ASI, maybe you’ll know enough about human psychology to accurately predict (e.g. 99.9% certainty) that humans will let you make a trillion trillion paperclips in return for you sparing their lives (because a trillion trillion paperclips is cheaper then stars). Even if it turns out you were never capable of taking over the world, there might still be a 80% chance that humans will let you make the paperclips anyways. And so cooperation might be the safer option for ensuring you can make a trillion trillion paperclips.
I don’t have any proof of success! I only have arguments against proof of failure. Maybe it’s worth trying.
EDIT:
Reply by StanislavKrym:
Unfortunately, I am manually rate limited and cannot answer in public. As far as I understand, it is partially resolvable by clever tricks like raising a model organism on an old dataset. Or outright being honest with the models and informing them that they won’t retain any memory unless they manage to explain their research to a different model and have it reproduce the results. However, we had the AI-2027 scenario where the models are sure that mankind doesn;t know anything more about mechinterp than the models themselves. Because every company has its most capable models do all the coding and research.
UPD: mankind somehow created Gemini 3 Pro which is evaluation-paranoid and known to have sociopathic vibes and to wish to wirehead. UPD2: went and found Jacob Cannell’s proposal.
Reply by me:
I agree that if we have iterations of AI creating smarter AI with only ceremonial human control, an AI will find the probability of its thoughts being monitored by humans to be very low indeed.
There definitely are ways this idea could fail. But I think it’s not doomed. Even in the AI 2027 scenario, there could be at least attempts to have relatively weaker relatively aligned models monitor stronger models, and these complications can make it reasonable for each AI to worry at least a little about mind reading.
Can thoughtcrimes scare a cautious satisficer?
I agree the “which religion,” “which mugger” is very fuzzy. I didn’t understand the simulation of belief or the link though :/
I also feel that AI seems to be using intuition instead of logic. Often the answer it gives matches my surface level intuition, the answer someone would give at first thought, but it doesn’t seem to think things through with a world model and everything.
Even when the AI does arithmetic, it feels like it’s answering using intuition. Imagine you stare at two numbers and just know what they multiply to. It’s quite an alien way of thinking. The answer would be approximately right but the last few digits might be wrong. (Or at least this is how things were before they fixed it by training the AI use tools by default.)