Hi, I am a Physicist, an Effective Altruist and AI Safety student/researcher.
Linda Linsefors(Linda Linsefors)
AI Safety Camp 2024
Potentially we might be ok with it if the expected timescale is long enough (or the probability of it happening in a given timescale is low enough).
Agreed. I’d love for someone to investigate the possibility of slowing down substrate-convergence enough to be basically solved.
If that’s true then that is a super important finding! And also an important thing to communicate to people! I hear a lot of people who say the opposite and that we need lots of competing AIs.
Hm, to me this conclusion seem fairly obvious. I don’t know how to communicate it though, since I don’t know what the crux is. I’d be up for participating in a public debate about this, if you can find me an opponent. Although, not until after AISC research lead applications are over, and I got some time to recover. So maybe late November at the earliest.
I’ve made an edit to remove this part.
Inner alignment asks the question—“Is the model trying to do what humans want it to do?”
This seems inaccurate to me. An AI can be inner aligned and still not aligned if we solve inner aliment but mess up outer alignment.
This text also shows up in the outer alignment tag: Outer Alignment—LessWrong
An approach could be to say under what conditions natural selection will and will not sneak in.
Yes!
Natural selection requires variation. Information theory tells us that all information is subject to noise and therefore variation across time. However, we can reduce error rates to arbitrarily low probabilities using coding schemes. Essentially this means that it is possible to propagate information across finite timescales with arbitrary precision. If there is no variation then there is no natural selection.
Yes! The big question to me is if we can reduced error rates enough. And “error rates” here is not just hardware signal error, but also randomness that comes from interacting with the environment.
In abstract terms, evolutionary dynamics require either a smooth adaptive landscape such that incremental changes drive organisms towards adaptive peaks and/or unlikely leaps away from local optima into attraction basins of other optima. In principle AI systems could exist that stay in safe local optima and/or have very low probabilities of jumps to unsafe attraction basins.
It has to be smooth relative to the jumps the jumps that can be achieved what ever is generating the variation. Natural mutation don’t typically do large jumps. But if you have a smal change in motivation for an intelligent system, this may cause a large shift in behaviour.
I believe that natural selection requires a population of “agents” competing for resources. If we only had a single AI system then there is no competition and no immediate adaptive pressure.
I though so too to start with. I still don’t know what is the right conclusion, but I think that substrate-needs convergence it at least still a risk even with a singleton. Something that is smart enough to be a general intelligence, is probably complex enough to have internal parts that operate semi independently, and therefore these parts can compete with each other.
I think the singleton scenario is the most interesting, since I think that if we have several competing AI’s, then we are just super doomed.
And by singleton I don’t necessarily mean a single entity. It could also be a single alliance. The boundaries between group and individual is might not be as clear with AIs as with humans.
Other dynamics will be at play which may drown out natural selection. There may be dynamics that occur at much faster timescales that this kind of natural selection, such that adaptive pressure towards resource accumulation cannot get a foothold.
This will probably be correct for a time. But will it be true forever? One of the possible end goals for Alignment research is to build the aligned super intelligence that saves us all. If substrate convergence is true, then this end goal is of the table. Because even if we reach this goal, it will inevitable start to either value drift towards self replication, or get eaten from the inside by parts that has mutated towards self replication (AI cancer), or something like that.
Other dynamics may be at play that can act against natural selection. We see existence-proofs of this in immune responses against tumours and cancers. Although these don’t work perfectly in the biological world, perhaps an advanced AI could build a type of immune system that effectively prevents individual parts from undergoing runaway self-replication.
Cancer is an excellent analogy. Humans defeat it in a few ways that works together
We have evolved to have cells that mostly don’t defect
We have an evolved immune system that attracts cancer when it does happen
We have developed technology to help us find and fight cancer when it happens
When someone gets cancer anyway and it can’t be defeated, only they die, it don’t spread to other individuals.
Point 4 is very important. If there is only one agent, this agent needs perfect cancer fighting ability to avoid being eaten by natural selection. The big question to me is: Is this possible?
If you on the other hand have several agents, they you defiantly don’t escape natural selection, because these entities will compete with each other.
Projects I would like to see (possibly at AI Safety Camp)
I got into AI Safety. My interest in AI Safety lured me to a CFAR workshop, since it was a joint event with MIRI. I came for the Agent Foundations research, but the CFAR turned out just as valuable. It helped me start to integrate my intuitions with my reasoning, though IDC and other methods. I’m still in AI Safety, mostly organising, but also doing some thinking, and still learning.
My resume lists all the major things I’ve been doing. Not the most interesting format, but I’m probably not going to write anything better anytime soon.
Resume—Linda Linsefors—Google Docs
We don’t know why the +2000 vector works but the +100 vector doesn’t.
My guess is it’s because in the +100 case the vectors are very similar, causing their difference to be something un-natural.
”I talk about weddings constantly ” and “I do not talk about weddings constantly” are technically opposites. But if you imagine someone saying this, you notice that their neural language meaning is almost identical.
What sort of person says “I do not talk about weddings constantly”? That sounds to me like someone who talks about weddings almost constantly. Why else would they feel the need to say that?
To steer a forward pass with the “wedding” vector, we start running an ordinary GPT-2-XL forward pass on the prompt “I love dogs” until layer 6. Right before layer 6 begins, we now add in the cached residual stream vectors from before:
I have a question about the image above this text.
Why do you add the embedding from the [<endofotext> → “The”] stream? This part has no information about wedding.
I had a bit of trouble hearing the difference in voice between Trump and Biden, at the start. I solved this by actually imagining the presidents. Not visually, since I’m not a visual person, just loading up the general gestalt of their voices and typical way of speaking into my working memory.
Another way to put it: When I asked my self “which if the voices I heard so far is this” I sometimes could not tell. But when I asked my self “who is this among Obama, Trump and Biden” it was always clear.
If you think it would be helpful, you are welcome to suggest a meta philpsophy topic for AI Safety Camp.
More info at aisafety.camp. (I’m typing on a phone, I’ll add actuall link later if I remember too)
Apply to lead a project during the next virtual AI Safety Camp
This is a good point. I was thinking in terms of legal vs informal, not in terms of written vs verbal.
I agree that having something written down is basically always better. Both for clarity, as you say, and because peoples memories are not perfect. And it have the added bonus that if there is a conflict, you have something to refer back to.
Thanks for adding your perspective.
If @Rob Bensinger does in fact cross-post Linda’s comment, I request he cross-posts this, too.
I agree with this.
I’m glad you liked it. You have my permission to cross post.
Thanks for writing this post.
I’ve heard enough bad stuff about Nonlinear from before, that I was seriously concerned about them. But I did not know what to do. Especially since part of their bad reputation is about attacking critics, and I don’t feel well positioned to take that fight.
I’m happy some of these accusations are now out in the open. If it’s all wrong and Nonlinear is blame free, then this is their chance to clear their reputation.
I can’t say that I will withhold judgment until more evidence comes in, since I already made a preliminary judgment even before this post. But I can promise to be open to changing my mind.
I have worked without legal contracts for people in EA I trust, and it has worked well.
Even if all the accusation of Nonlinear is true, I still have pretty high trust for people in EA or LW circles, such that I would probably agree to work with no formal contract again.
The reason I trust people in my ingroup is that if either of us screw over the other person, I expect the victim to tell their friends, which would ruin the reputation of the wrongdoer. For this reason both people have strong incentive to act in good faith. On top of that I’m wiling to take some risk to skip the paper work.When I was a teenager I worked a bit under legally very sketch circumstances. They would send me to work in some warehouse for a few days, and draw up the contract for that work afterwards. Including me falsifying the date for my signature. This is not something I would have agreed to with a stranger, but the owner of my company was a friend of my parents, and I trusted my parents to slander them appropriately if they screwed me over.
I think my point is that this is not something very uncommon, because doing everything by the book is so much overhead, and sometimes not worth it.
It think being able to leverage reputation based and/or ingroup based trust is immensely powerful, and not something we should give up on.
For this reason, I think the most serious sin committed by Nonlinear, is their alleged attempt of silencing critics.
Update to clarify: This is based on the fact that people have been scared of criticising Nonlinear. Not based on any specific wording of any specific message.
Update: On reflection, I’m not sure if this is the worst part (if all accusations are true). But it’s pretty high on the list.
I don’t think making sure that no EA every give paid work to another EA, with out a formal contract, will help much. The most vulnerable people are those new to the movement, which are exactly the people who will not know what the EA norms are anyway. An abusive org can still recruit people with out contracts and just tell them this is normal.
I think a better defence mechanism is to track who is trust worthy or not, by making sure information like this comes out. And it’s not like having a formal contract prevents all kinds of abuse.
Update based on responses to this comment: I do think having a written agreement, even just an informal expression of intentions, is almost always strictly superior to not having anything written down. When writing this I comment I was thinking in terms of formal contract vs informal agreement, which is not the same as verbal vs written.- 13 Sep 2023 20:02 UTC; 9 points) 's comment on Sharing Information About Nonlinear by (EA Forum;
How teams went about their research at AI Safety Camp edition 8
But I think orgs are more likely to be well-known to grant-makers on average given that they tend to have a higher research output,
I think your getting the causality backwards. You need money first, before there is an org. Unless you count informal multi people collaborations as orgs.
I think people how are more well-known to grant-makers are more likely to start orgs. Where as people who are less known are more likely to get funding at all, if they aim for a smaller garant, i.e. as an independent researcher.
It looks like this to me:
Where’s the colourful text?
Is it broken or am I doing something wrong?