I’m trying to prevent doom from AI. Currently trying to become sufficiently good at alignment research. Feel free to DM for meeting requests.
Towards_Keeperhood
In general, I wish this year? (*checks* huh, only 4 months.)
Nah I didn’t loose that much time. I already quit the project end of January, I just wrote the post now. Most of the technical work was also pretty useful for understanding language, which is a useful angle on agent foundations. I had previously expected working on that angle to be 80% as effective as my previous best plan, but it was even better, around similarly good I think. That was like 5-5.5 weeks and that was not wasted.
I guess I spent like 4.5 weeks overall on learning about orcas (including first seeing whether I might be able to decode their language and thinking about how and also coming up with the whole “teach language” idea), and like 3 weeks on orga stuff for trying to make the experiment happen.
I changed my mind about orca intelligence
Yeah I think I came to agree with you. I’m still a bit confused though because intuitively I’d guess chimps are dumber than −4.4SD (in the interpretation for “-4.4SD” I described in my other new comment).
When you now get a lot of mutations that increase brain size, while this contributes to smartness, this also pulls you away from the species median, so the hyperparameters are likely to become less well tuned, resulting in a countereffect that also makes you dumber in some ways.
Actually maybe the effect I am describing is relatively small as long as the variation in brain size is within 2 SDs or so, which is where most of the data pinning down the 0.3 correlation comes from.
So yeah it’s plausible to me that your method of estimating is ok.
Intuitively I had thought that chimps are just much dumber than humans. And sure if you take −4SD humans they aren’t really able to do anything, but they don’t really count.
I thought it’s sorta in this direction but not quite as extreme:
(This picture is actually silly because the distance to “Mouse” should be even much bigger. The point is that chimps might be far outside the human distribution.)
But perhaps chimps are actually closer to humans than I thought.
(When I in the following compare different species with standard deviations, I don’t actually mean standard deviations, but more like “how many times the difference between a +0SD and a +1SD human”, since extremely high and very low standard deviation measures mostly cease to me meaningful for what was actually supposed to be measured.)
I still think −4.4SD is overestimating chimp intelligence. I don’t know enough about chimps, but I guess they might be somewhere between −12SD and −6SD (compared to my previous intuition, which might’ve been more like −20SD). And yes, considering that the gap in cortical neuron count between chimps and humans is like 3.5x, and it’s even larger for the prefrontal cortex, and that algorithmic efficiency is probably “orca < chimp < human”, then +6SDs for orcas seem a lot less likely than I initially intuitively thought, though orcas would still likely be a bit smarter than humans (on the way my priors would fall out (not really after updating on observations about orcas)).
Thanks for describing a wonderfully concrete model.
I like that way you reason (especially the squiggle), but I don’t think it works quite that well for this case. But let’s first assume it does:
Your estimamtes on algorithmic efficiency deficits of orca brains seem roughly reasonable to me. (EDIT: I’d actually be at more like −3.5std mean with standard deviation of 2std, but idk.)
Number cortical neurons != brain size. Orcas have ~2x the number of cortical neurons, but much larger brains. Assuming brain weight is proportional to volume, with human brains being typically 1.2-1.4kg, and orca brains being typically 5.4-6.8kg, orca brains are actually like 6.1/1.3=4.7 times larger than human brains.
Taking the 5.4-6.8kg range, this would be 4.15-5.23 range of how much larger orca brains are. Plugging that in for `orca_brain_size_difference` yields 45% on >=2std, and 38% on >=4std (where your values ) and 19.4% on >=6std.
Updating down by 5x because orcas don’t seem that smart doesn’t seem like quite the right method to adjust the estimate, but perhaps fine enough for the upper end estimates, which would leave 3.9% on >=6std.Maybe you meant “brain size” as only an approximation to “number of cortical neurons”, which you think are the relevant part. My guess is that neuron density is actually somewhat anti-correlated with brain size, and that number of cortical neurons would be correlated with IQ rather at ~0.4-0.55 in humans, though i haven’t checked whether there’s data on this. And ofc using that you get lower estimates for orca intelligence than in my calculation above. (And while I’d admit that number of neurons is a particularly important point of estimation, there might also be other advantages of having a bigger brain like more glia cells. Though maybe higher neuron density also means higher firing rates and thereby more computation. I guess if you want to try it that way going by number of neurons is fine.)
My main point is however, that brain size (or cortical neuron count) effect on IQ within one species doesn’t generalize to brain size effect between species. Here’s why:
Let’s say having mutations for larger brains is beneficial for intelligence.[1]
On my view, a brain isn’t just some neural tissue randomly smished together, but has a lot of hyperparameters that have to be tuned so the different parts work well together.
Evolution basically tuned those hyperparameters for the median human (per gender).
When you now get a lot of mutations that increase brain size, while this contributes to smartness, this also pulls you away from the species median, so the hyperparameters are likely to become less well tuned, resulting in a countereffect that also makes you dumber in some ways.So when you get a larger brain as a human, this has a lower positive effect on intelligence, than when your species equilibriates on having a larger brain.
Thus, I don’t think within species intelligence variation can be extended well to inter-species intelligence variation.As for how to then properly estimate orca intelligence: I don’t know.
(As it happens, I thought of something and learned something yesterday that makes me significantly more pessimistic about orcas being that smart. Still need to consider though. May post them soon.)
- ^
I initially started this section with the following, but I cut it out because it’s not actually that relevant: “How intelligent you are mostly depends on how many deleterious mutations you have that move you away from your species average and thereby make you dumber. You’re mostly not smart because you have some very rare good genes, but because you have fewer bad ones.
Mutations for increasing sizes of brain regions might be an exception, because there intelligence trades off against childbirth mortality, so higher intelligence here might mean lower genetic fitness.”
- ^
Thanks for the suggestion, though I don’t think they are smart enough to get far with grammar. No non-cetaceans non-humans seem to be.
One possibility is to try it with bottlenose dolphins (or beluga whales). (Bottlenose dolphins have shown greater capacity to learn grammar than great apes.[1]) Those are likely easier to get research access to than orcas. I think we might get some proof of concept of the methodology there, though I’m relatively pessimistic about them learning a full language well.
- ^
See the work of Louis Herman in the 80s (and 90s)
- ^
By >=+6std I mean potential of how smart they could be if they were trained similarly to us, not actual current intelligence. Sorry I didn’t write this in this post, though I did in others.
I’d be extremely shocked if orcas were actually that smart already. They don’t have science and they aren’t trained in abstract reasoning.
Like, when an orca is +7std, he’d be like a +7std hunter gatherer human, who is probably not all that good at abstract reasoning tasks (like learning a language through brute-force abstract pattern recognition). (EDIT: Ok actually it would be like a +7std hunter gatherer society, which might be significantly different. Idk what I’m supposed to expect there. Still wouldn’t expect it to be dangerous to talk to them though. And actually when I think about +7std societies I must admit that this sounds not that likely. That they ought to have more information exchange outside their pods and related pods or so and coordinate better. I guess that updates me downwards a bit on orcas being actually that smart—aka I hadn’t previously properly considered effects from +7std cultural evolution rather than just individual intelligence.)
Thanks for letting me know it sounded like that. I definitely know it isn’t legible at all, and I didn’t expect readers to buy it, just wanted to communicate that that’s how it’s from my own perspective.
You’re right. I’ll edit the post.
Help make the orca language experiment happen
Considerations on intelligence of wild orcas vs captive orcas
I’ve updated to thinking it’s relatively likely that wild orcas are significantly smarter than captive orcas, because (1) wild orcas might learn proper language and captive orcas don’t, and (2) generally orcas don’t have much to learn in captivity, causing their brains to be underdeveloped.
Here are the most relevant observations:
Observation 1: (If I analyzed the data correctly and the data is correct,) all orcas currently alive in captivity have been either born in captivity or captured when they were at most 3-4 years old.[1] I think there never were any captive orcas that survived for more than a few months that were not captured at <7 years age, but not sure. (EDIT: Namu (the first captive orca) was ~10y, but he died after a year. Could be that I missed more cases where older orcas survived.)
Observation 2: (Less centrally relevant, but included for completeness:) It takes young orcas ca 1.5 years until the calls they vocalize aren’t easily distinguishable from calls of other orcas by orca researchers. (However, as mentioned in the OP, it’s possible the calls are only used for long distance communication and orcas have a more sophisticated language at higher frequencies.)
Ovservation 3: Orcas in captivity don’t get much stimulation.
Genie, discovered in 1970 at age 13, was a victim of extreme abuse and isolation who spent her formative years confined to a small room with minimal human interaction. Despite intensive rehabilitation efforts following her rescue, Genie’s cognitive impairments proved permanent. Her IQ remained in the moderate intellectual disability range, with persistent difficulties in abstract reasoning, spatial processing, and problem-solving abilities.
Her language development, while showing some progress, remained severely limited. She acquired a vocabulary of several hundred words and could form basic sentences, but never developed proper grammar or syntax. This case provides evidence for the critical period hypothesis of language acquisition, though it’s complicated by the multiple forms of deprivation she experienced simultaneously.
Genie’s case illustrates how early environmental deprivation can cause permanent cognitive and linguistic deficits that resist remediation, even with extensive intervention and support.
Inferences:
If orcas need input from cognitively well-developed orcas (or richer environmental stimulation) for becoming cognitively well-developed, no orca in captivity became cognitively well-developed.
Captive orcas could be cognitively impaired roughly similarly to how Genie was. Of course, there might have been other factors contributing to the disability of Genie, but it seems likely that abstract intelligence isn’t just innate but also requires stimulation for being learned.
(Of course, it’s possible that wild orcas don’t really learn abstract reasoning either, and instead just hunting or so.)
- ^
Can be checked from table here. (I checked it a few months ago and I think back then there was another “(estimated) birthdate” column which made the checking easier (rather than calculating from “age”), but possible I misremember.)
- ^
Content warning: The “Background” section describes heavy abuse.
- ^
When asking claude for more examples, it wrote:
Romanian Orphanage Studies
Children raised in severely understaffed Romanian orphanages during the Ceaușescu era showed lasting deficits:
Those adopted after age 6 months showed persistent cognitive impairments
Later-adopted children (after age 2) showed more severe and permanent deficits
Brain scans revealed reduced brain volume and activity that persisted into adolescence
Cognitive impairments correlated with duration of institutionalization
The Bucharest Early Intervention Project
This randomized controlled study followed institutionalized children who were either:
Placed in foster care at different ages, or
Remained in institutional care
Key findings:
Children placed in foster care before age 2 showed significant cognitive recovery
Those placed after age 2 showed persistent IQ deficits despite intervention
Executive functioning deficits remained even with early intervention
Isolated Cases: Isabelle and Victor
Isabelle: Discovered at age 6 after being isolated with her deaf-mute mother, showed initial severe impairments but made remarkable recovery with intervention, demonstrating that recovery is still possible before age 6-7
Victor (the “Wild Boy of Aveyron”): Found at approximately age 12, made limited progress despite years of dedicated intervention, similar to Genie
Of course, it’s possible there’s survivorship bias and actually a larger fraction recover. It’s also possible that cognitive deficits are rather due to malnurishment or so.
Seems totally unrelated to my post but whatever:
My p(this branch of humanity won’t fulfill the promise of the night sky) is actually more like 0.82 or sth, idk. (I’m even lower on p(everyone will die), because there might be superintelligences in other branches that acausally trade to save the existing lives, though I didn’t think about it carefully.)
I’m chatting 1 hour every 2 weeks with Erik Jenner. We usually talk about AI safety stuff. Otherwise also like 1h every 2 weeks with a person who has sorta similar views to me. Otherwise I currently don’t talk much to people about AI risk.
ok edited to sun. (i used earth first because i don’t know how long it will take to eat the sun, whereas earth seems likely to be feasible to eat quickly.)
(plausible to me that an aligned AI will still eat the earth but scan all the relevant information out of it and later maybe reconstruct it.)
ok thx, edited. thanks for feedback!
(That’s not a reasonable ask, it intervenes on reasoning in a way that’s not an argument for why it would be mistaken. It’s always possible a hypothesis doesn’t match reality, that’s not a reason to deny entertaining the hypothesis, or not to think through its implications. Even some counterfactuals can be worth considering, when not matching reality is assured from the outset.)
Yeah you can hypothesize. If you state it publicly though, please make sure to flag it as hypothesis.
How long until the earth gets eaten? 10th/50th/90th percentile: 3y, 12y, 37y.
Catastrophes induced by narrow capabilities (notably biotech) can push it further, so this might imply that they probably don’t occur.
No it doesn’t imply this, I set this disclaimer “Conditional on no strong governance success that effectively prevents basically all AI progress, and conditional on no huge global catastrophe happening in the meantime:”. Though yeah I don’t particularly expect those to occur.
Will we get to this point by incremental progress that yields smallish improvements (=slow), or by some breakthrough that when scaled up can rush past the human intelligence level very quickly (=fast)?
AI speed advantage makes fast vs. slow ambiguous, because it doesn’t require AI getting smarter in order to make startlingly fast progress, and might be about passing a capability threshold (of something like autonomous research) with no distinct breakthroughs leading up to it (by getting to a slightly higher level of scaling or compute efficiency with some old technique).
Ok yeah I think my statement is conflating fast-vs-slow with breakthrough-vs-continuous, though I think there’s a correlation.
(I still think fast-vs-slow makes sense as concept separately and is important.)
My AI predictions
(I did not carefully think about my predictions. I just wanted to state them somewhere because I think it’s generally good to state stuff publicly.)
(My future self will not necessarily make similar predictions as I am now.)
TLDR: I don’t know.
Timelines
Conditional on no strong governance success that effectively prevents basically all AI progress, and conditional on no huge global catastrophe happening in the meantime:
How long until the sun (starts to) get eaten? 10th/50th/90th percentile: 3y, 12y, 37y.
How long until an AI reaches Elo 4000 on codeforces? 10/50/90: 9mo, 2.5y, 11.5y
How long until an AI is better at math research than the best human mathmatician according to the world’s best mathematicians? 10/50/90: 2y, 7.5y, 28y
Takeoff Speed
I’m confident (94%) that it is easier to code an AI on a normal 2020 laptop that can do Einstein-level research at 1000x speed, than it is to solve the alignment problem very robustly[1].[2]
AIs might decide not to implement the very efficient AGIs in order to scale safer and first solve their alignment problem, but once a mind has solved the alignment problem very robustly, I expect everything to go extremely quickly.
However, the relevant question is how fast AI will get smarter shortly before the point where ze[3] becomes able to solve the alignment problem (or alternatively until ze decides making itself smarter quickly is too risky and it should cooperate with humanity and/or other similarly smart AIs currently being created to solve alignment).
So the question is: Will we get to this point by incremental progress that yields smallish improvements (=slow), or by some breakthrough that when scaled up can rush past the human intelligence level very quickly (=fast)?
I’m very tentatively leaning more towards the “fast” side, but i don’t know.
I’d expect (80%) to see at least one more paradigm shift that is at least as big as the one from LSTMs to transformers. It’s plausible to me that the results from the shift will come faster because we have greater computer overhang. (Though also possible it will just take even more compute.)
It’s possible (33%) that the world ends within 1 year of a new major discovery[4]. It might just very quickly improve inside a lab over the course of weeks without the operators there really realizing it[5], until it then sectretly exfiltrates itself, etc.
(Btw, smart people who can see the dangerous implications of some papers proposing something should obviously not publicly point to stuff that looks dangerous (else other people will try it).)
- ^
Hard to define what I mean by “very robustly”, but sth like “having coded an AI program s.t. a calibrated mind would expect <1% of expected value loss if run, compared to the ideal CEV aligned superintelligence”.
- ^
I acknowledge this is a nontrivial claim. I probably won’t be willing to invest the time to try to explain why if someone asks me now. The inferential distance is quite large. But you may ask.
- ^
ze is the AI pronoun.
- ^
Tbc, not 33% after the first major discovery after transformers, just after any.
- ^
E.g. because the AI is in a training phase and only interacts with operators sometimes where it doesn’t tell them everything. And in AI training the AI practices solving lots and lots of research problems and learns much more sample-efficient than transformers.
Here’s my current list of lessons for review. Every day during my daily review, I look at the lessons in the corresponding weekday entry and the corresponding day of the month, and for each list one example from the last week where I could’ve applied the lesson, and one example where I might be able to apply the lesson in the next week:
Mon
get fast feedback. break tasks down into microtasks and review after each.
Tue
when surprised by something or took long for something, review in detail how you might’ve made the progress faster.
clarify why the progress is good → see properties you could’ve paid more attention to
Wed
use deliberate practice. see what skills you want to learn, break them down into clear subpieces, and plan practicing the skill deliberately.
don’t start too hard. set feasible challenges.
make sure you can evaluate how clean execution of the skill would look like.
Thu
Hold off on proposing solutions. first understand the problem.
gather all relevant observations
clarify criteria a good result would have
clarify confusions that need to be explained
Fri
Taboo your words: When using confusing abstract words, taboo them and rephrase to show underlying meaning.
When saying something general, make an example.
Sat
separate planning from execution. first clarify your plan before executing it.
for planning, try to extract the key (independent) subproblems of your problem.
Sun
only do what you must do. always know clearly how a task ties into your larger goals all the way up.
don’t get sidetracked by less than maximum importance stuff.
delegate whatever possible.
when stuck/stumbling: imagine you were smarter. What would a keeper do?
when unmotivated: remember what you are fighting for
be stoic. be motivated by taking the right actions. don’t be pushed down when something bad happens, just continue making progress.
when writing something to someone, make sure you properly imagine how it will read like from their perspective.
clarify insights in math
clarify open questions at the end of a session
when having an insight, sometimes try write a clear explanation. maybe send it to someone or post it.
periodically write out big picture of your research
tackle problems in the right context. (e.g. tackle hard research problems in sessions not on walks)
don’t apply effort/force/willpower. take a break if you cannot work naturally. (?)
rest effectively. take time off without stimulation.
always have at least 2 hypotheses (including plans as hypotheses about what is best to do).
try to see how the searchspace for a problem looks like. What subproblems can be solved roughly independently? What variables are (ir)relevant? (?)
separate meta-instructions and task notes from objective level notes (-> split obsidian screen)
first get hypotheses for specific cases, and only later generalize. first get plans for specific problems, and only later generalize what good methodology is.
when planning, consider information value. try new stuff.
experiment whether you can prompt AIs in ways to get useful stuff out. (AIs will only become better.)
don’t suppress parts of your mind. notice when something is wrong. try to let the part speak. apply focusing.
Relinquishment. Lightness. Evenness. Notice when you’re falling for motivated reasoning. Notice when you’re attached to a belief.
Beware confirmation bias. Consider cases where you could’ve observed evidence but didn’t.
perhaps do research in sprints. perhaps disentangle from phases where i do study/practice/orga. (?)
do things properly or not at all.
try to break your hypotheses/models. look for edge cases.
often ask why i believe something → check whether reasoning is valid (->if no clear reason ask whether true at all)
(perhaps schedule practice where i go through some nontrivial beliefs)
think what you actually expect to observe, not what might be a nice argument/consideration to tell.
test hypotheses as quickly as you can.
notice (faint slimmers of) confusions. notice imperfect understanding.
notice mysterious answers. when having a hypothesis check how it constrains your predictions.
beware positive bias. ask what observations your hypothesis does NOT permit and check whether such a case might be true.
Yes human intelligence.
I forgot to paste in that it’s a follow up to my previous posts. Will do now.