Hi, I am a Physicist, an Effective Altruist and AI Safety student/researcher.
And… my guess in hindsight is that the “internal double crux” technique often led, in practice, to people confusing/overpowering less verbal parts of their mind with more-verbal reasoning, even in cases where the more-verbal reasoning was mistaken.
I’m confused about this. The way I remember it tough was very much explicitly against this, I.e:
Be open to either outcome being right.
Don’t let the verbal part give the non-verbal part a dumb name.
Make space for the non verbal part to express it self in it’s natural modality which is often inner sim.
For me IDC was very helpful to teach me how to listen to my non verbal parts. Reflecting on it, I never spent much time on the actual cruxing. When IDC-ing I mostly spend time on actually hearing both sides. And when all the evidence is out, the outcome is most often obvious. But it was the IDC lesson and the Focusing lesson that thought me these skills. Actually even more important than the skill was to teach me this possibility. For me probably the most important CFAR lesson was the noticing and “double-clicking” on intrusion. The one where Anna puts a glass of water on the edge of a table and/or writes expressions with the wrong number of parenthesises.Do most people come away from a CFAR workshop listening less to their non verbal parts? I’m not surprised if people listning less to their non verbal parts happens at all. But I would be surprised if that’s the general trend.
On the surface Anna provides one datapoint, which is not much. But the fact that she brings up this datapoint, makes me suspect it’s representative? Is it?
I timed how long it took me to fill in the survey. It took 30 min. I could probably have done it in 15 min if I skipped the optional text questions. This is to be expected however. Every time I’ve seen someone someone guesses how long it will take to respond to their survey, it’s off by a factor of 2-5.
This is a one of thing though. We’re not likely to continue to pay them, regardless of what they report.
I just found this post (yesterday) while searching the EA Forum archives for something else.
I’ve been co-organising AISC1 (2018), AISC8 (2023) and AISC9 (2024). This means that I was not involved when this was posted which is why I missed it.
What you describe fits very well with my own view of AISC, which is reassuring.
This depends on how much you trust the actors involved.
I know that me and Remmelt asked for an honest evaluation, and did not try to influence the result. But you don’t know this.
Me and Remmelt obviously believe in AISC, otherwise we would not keep running these programs. But since AISC has been chronically understaffed (like most non-profit initiatives) we have not had time to do a proper follow-up study. When we asked Arb to do this assessment, it was in large part to test our own believes. So far nothing surprising has came out of the investigation, which is reassuring. But if Arb found something bad, I would not want them to hide it.
Here’s some other evaluations of AISC (and other things) that where not commissioned by us. I think for both of them, they did not even talk to someone from AISC before posting, although for the second link, this was only due to miscommunication.
Takeaways from a survey on AI alignment resources — EA Forum (effectivealtruism.org)
Thoughts on AI Safety Camp — LessWrong
What exactly is the minimum amount to organise an AISC is a bit complicated.
We could do a super budget version for under $58k which is even more streamlined. This would cut in to quality however. But the bigger problem is this (just speaking for myself):
If AISC pays enough for me to live frugally on this salary for the rest of the year, then I can come back an organise another one. (And as a bonus the world also get what ever else I do during the rest of the year, which will probably also be AI safety related.)
If that is not the case, I need to have a different primary income, and then I can’t promise I’ll be available for AISC.
Exactly what is that threshold? I don’t know. It depends on my partners income, which is also a bit uncertain.
If I’m not available, is it possible to get someone else. Maybe, I’m not sure. My role requires both organising skill and AI safety knowledge. Most people who are qualified are busy. Also a new person would have to initially put in more hours. Me and Remmelt have a lot of experience doing AISC together, which means we can get it done quicker than someone new.
We’re also fundraising on our website. aisafety.campI think that Remmelt chose the $28k threshold, hoping we’ll get some money though other channels too. Currently we got ~5.5k donation not through Manifund.
If we get to the $28k threshold, and nothing more, we’ll try to do something approximately like a next AISC, some how. But in this case I’ll probably quit after that.
Thanks Thomas for asking these questions.
I think some of these are common concerns about AISC, partly because we have not always been very clear in our communication. This was a good opportunity for us to clarify.
Why does the founder, Remmelt Ellen, keep posting things described as “content-free stream of consciousness”, “the entire scientific community would probably consider this writing to be crankery”, or so obviously flawed it gets −46 karma? This seems like a concern especially given the philosophical/conceptual focus of AISC projects, and the historical difficulty in choosing useful AI alignment directions without empirical grounding.
I see your concern.
Me and Remmelt have different beliefs about AI risk, which is why the last AISC was split into two streams. Each of us are allowed to independently accept project into our own stream.
Remmelt believes that AGI alignment is impossible, i.e. there is no way to make AGI safe. Exactly why Remmelt believes this is complicated, and something I my self is still trying to understand, however this is actually not very important for AISC.
The consequence of this for this on AISC is that Remmelt is only interested in project that aims to stop AI progress.
I still think that alignment is probably technically possible, but I’m not sure. I also believe that even if alignment is possible, we need more time to solve it. Therefore, I see project that aim to stop or slow down AI progress as good, as long as there are not too large adverse side-effect. Therefore, I’m happy to have Remmelt and the projects in his stream as part of AISC. Not to mention that me an Remmelt work really well together, despite or different beliefs.
If you check our website, you’ll also notice that most of the projects are in my stream. I’ve been accepting any project as long as the there is a reasonable plan, there is a theory of change under some reasonable and self consistent assumptions, and the downside risk is not too large.
I’ve bounced around a lot in AI safety, trying out different ideas, stared more research projects than I finished, which has given me a wide view of different perspectives. I’ve updated many times in many directions, which have left me with a wide uncertainty as to what perspective is correct. This is reflected in what projects I accept to AISC. I believe in a “lets try everything” approach.
At this point, someone might think: If AISC is not filtering the project more than just “seems worth a try”, then how do AISC make sure not to waist participants time on bad projects.
Our participants are adults, and we treat them as such. We do our best to present what AISC is, and what to expect, and then let people decide for themselves if it seems like something worth their time.
We also require research leads to do the same. I.e. the project plan has to provide enough information for potential participants to judge if this is something they want to join.
I believe there is a significant chance that the solution to alignment is something no-one has though of yet. I also believe that the only way to do intellectual exploration is to let people follow their own ideas, and avoid top down curation.
The only thing I filter hard for in my stream is that the research lead actually need to have a theory of change. They need to have actually though about AI risk, and why their plan could make a difference. I had this conversation with every research lead in my stream.
We had one person last AISC who said that they regretted joining AISC, because they could have learned more from spending that time on other things. I take that feedback seriously. But on the other hand, I’ve regularly meet alumni who tell me how useful AISC was for them, which convinces me AISC is clearly very net positive.
However, if we where not understaffed (due to being underfunded), we could do more to support the research leads to make better projects.
All but 2 of the papers listed on Manifund as coming from AISC projects are from 2021 or earlier. Because I’m interested in the current quality in the presence of competing programs, I looked at the two from 2022 or later: this in a second-tier journal and this in a NeurIPS workshop, with no top conference papers. I count 52 participants in the last AISC so this seems like a pretty poor rate, especially given that 2022 and 2023 cohorts (#7 and #8) could both have published by now.[...] They also use the number of AI alignment researchers created as an important metric. But impact is heavy-tailed, so the better metric is value of total research produced. Because there seems to be little direct research, to estimate the impact we should count the research that AISC alums from the last two years go on to produce. Unfortunately I don’t have time to do this.
All but 2 of the papers listed on Manifund as coming from AISC projects are from 2021 or earlier. Because I’m interested in the current quality in the presence of competing programs, I looked at the two from 2022 or later: this in a second-tier journal and this in a NeurIPS workshop, with no top conference papers. I count 52 participants in the last AISC so this seems like a pretty poor rate, especially given that 2022 and 2023 cohorts (#7 and #8) could both have published by now.
[...] They also use the number of AI alignment researchers created as an important metric. But impact is heavy-tailed, so the better metric is value of total research produced. Because there seems to be little direct research, to estimate the impact we should count the research that AISC alums from the last two years go on to produce. Unfortunately I don’t have time to do this.
That list of papers is for direct research output of AISC. Many of our alumni have lots of publications not on that list. For example, I looked up Marius Hobbhahn—Google ScholarJust looking at the direct project outputs is not a good metric for evaluating AISC since most of the value comes from the upskilling. Counting the research that AISC alumns have done since AISC, is not a bad idea, but as you say, a lot more work, I imagine this is partly why Arb chose to do it the way they did.
I agree that heavy tailed-ness in research output is an important considerations. AISC do have some very successful alumni. If we didn’t this would be a major strike against AISC. The thing I’m less certain of is to what extent these people would have succeeded without AISC. This is obviously a difficult thing to evaluate, but still worth trying.
Mostly we let Arb decide how to best to their evaluation, but I’ve specifically asked them to interview our most successful alumni to at least get these peoples estimate of the importance of AISC. The result of this will be presented in their second report.
MATS has steadily increased in quality over the past two years, and is now more prestigious than AISC. We also have Astra, and people who go directly to residencies at OpenAI, Anthropic, etc. One should expect that AISC doesn’t attract the best talent.
There is so much wrong here, I don’t even know how to start (i.e. I don’t know what the core cruxes are) but I’ll give it a try.
I AISC is not MATS because we’re not trying to be MATS.
MATS is trying to find the best people and have them mentored by the best mentors, in the best environment. This is great! I’d recommend MATS to anyone who can get in. However it’s not scalable. After MATS has taken the top talent and mentors, there are still dosens of people who can mentor and would be happy to do so, and hundreds of people who it is worth mentoring.To believe that MATS style program is the only program worth running, you have to believe that
Only the top talent matter
MATS and similar program has perfect selection, i.e. no-one worth accepting is ever rejected.
I’m not going to argue about 1. I suspect it’s wrong, but I’m not very sure.
However, believing in 1 is not enough. You also need 2, and believing in 2 is kind of insane. I don’t know how else to put it. Sorry.
You’re absolutely correct that AISC have lower average talent. But because we have a lower bar, we get the talent that MATS and other prestigious programs are missing.
AISC is this way by design. The idea of AISC is to give as many people as we can the chance to join the AI safety effort, to try the waters, or to show the world what they can do, or to get inspiration to do something else.
And I’m not even addressing the accessibility of a part time online program. There are people who can’t join MATS and similar, because they can’t take the time to do so, but can join AISC.
Also, if you believe strongly in MATS ability to select for talent, then consider that some AISC participants go to attend MATS later. I think this fact proves my point, that AISC can support people that MATS selection proses don’t yet recognise.
If so, AISC might not make efficient use of mentor / PI time, which is a key goal of MATS and one of the reasons it’s been successful.
This is again missing the point. The deal AISC offers to our research leads, is that they provide a project and we help them find people to work with them. So far our research leads have been very happy with this arrangement.
MATS is drawing their mentors from a small pool of well known people. This means that they have to make the most out of a very scarce resource. We’re not doing that.
AISC has an open application for people interested in leading a project. This way we get research leads you’ve never heard of, and who are happy to spend time on AISC in exchange for extra hands on their projects.
One reason AISC is much more scalable than MATS is that we’re drawing from a much larger pool of “mentors”.
At this point, someone might think: So AISC has inexperienced mentors leading inexperienced participants. How does this possibly go well?This is not a trivial question. This is a big part of what the current version of ASIC is focusing on solving. First of all, a research lead is not the same as a mentor. Research leads are welcome to provide mentorship to it’s participants, but that’s not their main role.
The research leads role is to suggest a project and formulate a project plan, and then to lead that project. This is actually much easier to do than to provide general mentorship.
A key part of this are the project plans. As part of the application proses for research leads, we require them to write down a project plan. When necessary, we help them with this.
Another key part of how AISC is successful with less experienced “mentors”, is that we require our research leads to take active part in their projects. This obviously takes up more of their time, but also makes things work better, and to a large extent makes up for the research leads being less experienced than in other programs. And as mentioned, we get lots of project leads who are happy with this arrangement.
What the participants get is learning by doing by being part of a project that at least aims to reduce AI risk.
Some of our participants comes from AI safety Fundamentals and other such courses. Other people are professionals with various skills and talent, but not yet much involvement in AI Safety. We help these people to take the step from AI safety student or AI safety concerned professional, to being someone who actually do something. Going from just thinking and learning, to actively engaging, is a very big step, and a lot of people would not have taken that step, or taken it later, if not for AISC.
MIRI’s impossibility results
Which are these? I’m aware of a lot of MIRI’s work, especially pre 2018, but nothing I would label “impossibility results”.
Current Interpretability results suggest that roughly the first half of the layers in an LLM correspond to understanding the context at increasingly abstract levels, and the second half to figuring out what to say and turning that back from abstractions into concrete tokens. It’s further been observed that in the second half, figuring out what to say generally seems to occur in stages: first working out the baseline relevant facts, then figuring out how to appropriately slant/color those in the current context, then converting these into the correct language, and last getting the nitty-gritty details of tokenization right.
How do we know this? This claim seems plausible, but also I did not know that mech-interp was advanced enough to verify something like this. Where can I read more?
It looks like this to me:
Where’s the colourful text?Is it broken or am I doing something wrong?
Potentially we might be ok with it if the expected timescale is long enough (or the probability of it happening in a given timescale is low enough).
Agreed. I’d love for someone to investigate the possibility of slowing down substrate-convergence enough to be basically solved.
If that’s true then that is a super important finding! And also an important thing to communicate to people! I hear a lot of people who say the opposite and that we need lots of competing AIs.
Hm, to me this conclusion seem fairly obvious. I don’t know how to communicate it though, since I don’t know what the crux is. I’d be up for participating in a public debate about this, if you can find me an opponent. Although, not until after AISC research lead applications are over, and I got some time to recover. So maybe late November at the earliest.
I’ve made an edit to remove this part.
Inner alignment asks the question—“Is the model trying to do what humans want it to do?”
This seems inaccurate to me. An AI can be inner aligned and still not aligned if we solve inner aliment but mess up outer alignment. This text also shows up in the outer alignment tag: Outer Alignment—LessWrong
An approach could be to say under what conditions natural selection will and will not sneak in.
Natural selection requires variation. Information theory tells us that all information is subject to noise and therefore variation across time. However, we can reduce error rates to arbitrarily low probabilities using coding schemes. Essentially this means that it is possible to propagate information across finite timescales with arbitrary precision. If there is no variation then there is no natural selection.
Yes! The big question to me is if we can reduced error rates enough. And “error rates” here is not just hardware signal error, but also randomness that comes from interacting with the environment.
In abstract terms, evolutionary dynamics require either a smooth adaptive landscape such that incremental changes drive organisms towards adaptive peaks and/or unlikely leaps away from local optima into attraction basins of other optima. In principle AI systems could exist that stay in safe local optima and/or have very low probabilities of jumps to unsafe attraction basins.
It has to be smooth relative to the jumps the jumps that can be achieved what ever is generating the variation. Natural mutation don’t typically do large jumps. But if you have a smal change in motivation for an intelligent system, this may cause a large shift in behaviour.
I believe that natural selection requires a population of “agents” competing for resources. If we only had a single AI system then there is no competition and no immediate adaptive pressure.
I though so too to start with. I still don’t know what is the right conclusion, but I think that substrate-needs convergence it at least still a risk even with a singleton. Something that is smart enough to be a general intelligence, is probably complex enough to have internal parts that operate semi independently, and therefore these parts can compete with each other.
I think the singleton scenario is the most interesting, since I think that if we have several competing AI’s, then we are just super doomed.
And by singleton I don’t necessarily mean a single entity. It could also be a single alliance. The boundaries between group and individual is might not be as clear with AIs as with humans.
Other dynamics will be at play which may drown out natural selection. There may be dynamics that occur at much faster timescales that this kind of natural selection, such that adaptive pressure towards resource accumulation cannot get a foothold.
This will probably be correct for a time. But will it be true forever? One of the possible end goals for Alignment research is to build the aligned super intelligence that saves us all. If substrate convergence is true, then this end goal is of the table. Because even if we reach this goal, it will inevitable start to either value drift towards self replication, or get eaten from the inside by parts that has mutated towards self replication (AI cancer), or something like that.
Other dynamics may be at play that can act against natural selection. We see existence-proofs of this in immune responses against tumours and cancers. Although these don’t work perfectly in the biological world, perhaps an advanced AI could build a type of immune system that effectively prevents individual parts from undergoing runaway self-replication.
Cancer is an excellent analogy. Humans defeat it in a few ways that works together
We have evolved to have cells that mostly don’t defect
We have an evolved immune system that attracts cancer when it does happen
We have developed technology to help us find and fight cancer when it happens
When someone gets cancer anyway and it can’t be defeated, only they die, it don’t spread to other individuals.
Point 4 is very important. If there is only one agent, this agent needs perfect cancer fighting ability to avoid being eaten by natural selection. The big question to me is: Is this possible?If you on the other hand have several agents, they you defiantly don’t escape natural selection, because these entities will compete with each other.