But the “role-playing game” glasses that you were wearing would have (understandably) made such a statement look like “flavor text”.
Quite possibly, but hard to know.
But the “role-playing game” glasses that you were wearing would have (understandably) made such a statement look like “flavor text”.
Quite possibly, but hard to know.
I like this concept and the name you’ve given it.
However, I want to push back against your “action hero archetype” claims.
”Action hero” immediately makes me think of movies, rather than the book. And a lot of the older James Bond films fit this stereotype.
I also swear I’ve seen a lot of “junk food” action movies that fit this stereotype, but it’s hard for me to remember exactly which ones off the top of my head.
It’s plausible to me that people remember the films you named precisely because they were more complex.
Most importantly and maybe surprisingly, while the program is called “Alignment Research Engineering Accelerator”, there is actually almost no content on how to align AI systems
Yeah, I think the assumption is that people can learn this through programs like AISF and then do ARENA just to focus on the skills.
If you’re working independently, the answer is probably no.
Agreed. The ARENA content is developed for the bootcamp format. If you’re not doing it as a bootcamp, you probably don’t want to do all the content as 1 day of content per week given 4 weeks of content time and five days of content per week would be 20 days or half a year. That’s a long time to delay getting started on projects which is what will actually unlock further AI safety opportunities, rather than just working through content. Furthermore, it’ll be hard to keep up with the ARENA pace without the supporting infrastructure around you.
Fine-tuning access could address (1) if there’s still sufficient access to drive down prices insofar as the fine-tuned model operators capture profit that would otherwise go to the main AI labs.
Fine-tuning access allows the public to safely access models that might be too dangerous to open-source/open-weight.
There’s been a lot of discussion about how Less Wrong is mostly just AI these days.
If that’s something that folk want to address, I suspect that the best way to do this would be to run something like the Roots of Progress Blog-Building Intensive. My admittedly vague impression is that it seems to have been fairly successful.
Between Less Wrong for distribution, Lighthaven for a writing retreat and Less Online for networking, a lot of the key infrastructure is already there to run a really strong program if the Lightcone team ever decided to pursue this.
There was discussion about an FHI of the West before, but that seems hard given the current funding situation. I suspect that a program like this would be much more viable.
Early tractor models really were an imperfect substitute for horses
Can you say any more about this point?
I suspect that the optimal amount of advocacy for an area depends more on lifecycle than being static. But it sounds like you’ll be discussing that in your next post.
I suspect that the talent available within the movement is an important constraint as well. Easier to find talented researchers than talented advocates.
You can natively edit somewhere in the middle
Can’t BERT do this as well?
For what it’s worth, I tried.
Whilst interesting, this post feels very assertive.
You claim that biological systems work by maintaining alignment as they scale. In what sense is this true?
You say that current methods lack a vision of a current whole. In what sense? There’s something extremely elegant about pre-training to learn a world model, doing supervised learning to select a sub-distribution and using RL to develop past the human level. In what sense does this “lack a vision”?
I’m open to the possibility that we need to align a model as we make it more intelligent to prevent the agent sabotaging the process. But it’s unclear from this article if this is why you want alignment first or for some other reason.
The three factors you identified: fast progress, vulnerabilities during times of crisis and AI progress increasing the chance of viable strategies being leveraged apply just as much, if not more, to coups, propaganda and AI lobbying.
Basically, I see two strategies that could make sense: either we attempt to tank these societal risks following the traditional alignment strategy or decide tanking is too risky and we mitigate the societal risks that are most likely to take us out (my previous comment identified some specific risks).
I see either of these strategies is defensible, but in neither does it make sense to prioritise the risks from the loss of economic power.
I think it depends on the audience. That level of collapsible sections is too much for a more “normy” audience, but there will be some folks who love it.
Sorry, I don’t have time to review the articles at the moment.
On collapsable sections vs footnotes:
• Collapsible sections work well for longer content as you’ve identified. A short collapsable section might seem weird.
• Collapsible sections are more visible than footnotes so people are more likely to click them. They also have a title attached so you know that there is something in the document addressing question/objection X even if you don’t read it. In contrast, footnotes are better for signalling that something isn’t necessary to read unless you really care about the details
• The reading flow is often nicer for collapsible sections
The BlueDot Future of AI course uses collapsible sections very well.
Holden Karnofsky often uses collapsable sections well (see example). He often recaps previous articles to fill in context without assuming someone has read the article (or to remind folks of the details).
I can also share an example of a summary I wrote. I don’t think I’m very good at this yet as I’m still learning how use collapsible sections, but I found this really helpful since it’s good for summaries to be short, but the collapsable sections allowed me to give readers a sense of all the main ideas included in the paper if that’s what they want.
Linking can help, but the reading flow isn’t as natural as with collapsable sections. On the other hand, I imagine many folk uncollapse sections by default, so it make sense to link instead if most readers wouldn’t want to follow that rabbit hole.
I recommend experimenting with collapsible sections.
People have a lot of different objections and these vary widely. If you try to answer all of them, then the article becomes too long and people won’t finish it. If you don’t answer someone’s pet objection, they won’t be persuaded.
Collapsable sections make this trade-off much less severe.
I’ve only skimmed this, but from what I’ve seen, you seem to be placing far too much emphasis on relatively weak/slow-acting economic effects.
If humanity loses control and it’s not due to misaligned AI, it’s much more likely to be due to an AI enabled coup, AI propaganda or AI enabled lobbying than humans having insufficient economic power. And the policy responses to these might look quite different.
There’s a saying “when all you have is a hammer, everything looks like a nail” that I think applies here. I’m bearish on economics of transformative AI qua economics of transformative AI as opposed to multi-disciplinary approaches that don’t artificially inflate particular factors.
There’s also the AI Safety Awareness Project. They run public workshops.
I’ve linked to a short form describing why I work on wise AI advisors. I suspect that there a lot of work that could be done to figure out the best user interface for this:
https://www.lesswrong.com/posts/SbAofYCgKkaXReDy4/?commentId=Zcg9idTyY5rKMtYwo
If you’re interested, I could share some thoughts on specific things to experiment with.
I’m still quite confused about why you believe that a long-term pause is viable given the potential for actors to take unilateral action and the difficulties in verifying compliance.
Another possibility that could be included in that diagram would be the possibility of merging various national/coalitional AIs.
I suspect it’s a bit more nuanced that this. Factors include the size of the audience, how often you’re interjecting, the quality of the interjections, whether your pushing the presenter off track towards your own pet issue, whether you’re asking a clarifying question that other audience members found useful, how formal the conference is and how the speaker likes to run their sessions.