I have signed no contracts or agreements whose existence I cannot mention.
plex
Once you get a sense of how annealing feels, you can do it imo much more safely without the psychedelics using forms of meditation practice centered on noticing what causes clean versions of the qualia associated with annealing. Non-goal-directed-ness seems central.
glad people are noticing. it won’t be enough to stop all leaks though, realistically.
it’s fun how all the safety worries and intricate plans to prevent failure modes tend to get invalidated by “humans do the thing the bypasses the guardrails”. e.g. for years people would say things like “of course we won’t connect it to the internet/let it design novel viruses/proteins/make lethal autonomous weapons”.
my guess is the law of less dignified failure has a lot of truth to it.
Maybe the vote up / down option could be moved to after the body of the post? Does seem like an awkward set of design considerations between wanting people to see the current score before reading, and not split the current score from the vote buttons or duplicate the score, and I bet Habryka has thought about this already.
I agree with the main generator of this post (a small number of people produce a wildly disproportionate amount of the intellectual progress on hard problems) and one of the conclusions (don’t water down your messages at all, if people need watered down messages they are unlikely to be helpful) but I think there’s significant value in trying to communicate the hard problem of alignment broadly anyway because:
Filtering who are the best people is expensive and error-prone, so if you don’t put the correct models in general circulation even pretty great people might just not become aware of them
People who are highly competent but not highly confident seem to often run into people who have been misinformed and become less sure of their own positions, having more generally circulating models of the main threat models would help those people get less distracted
Planting lots of seeds can be relatively cheap.
Also, related anecdote, I ran ~8 retreats at my house covering around 60 people in 2022⁄23. I got a decent read on how much of the core stack of alignment concepts at least half of them had, and how often they made hopeful mistakes which were transparently going to fail based on not having picked up the core ideas from arbital or understood the top ~10 alignment related concepts clearly. There were only two who cleared this bar.
Also, relatedly, the people you left Bluedot to seem to not reliably be teaching people the core things they need to learn. They are friendly and receptive each time I get on calls with them and ask them to fix their courses, and often do fix some of the stuff, but some of the core generators there look to me like they’re just missing from the people picking course materials and lots of people are getting watered down versions of alignment because of this. Consider taking a skim through their courses and advising them on learning objectives etc, you’re probably the best-placed person to do this.
Giving money goes through several layers of reduced effectiveness and inefficiency. It’s good as a fallback and self-signal, but if you can find and motivate yourself to do worthwhile things yourself you can do much more with much less money.
Stuart Armstrong does a pretty good job of making non-world-critical puzzles seem appealing in Just another day in utopia. I agree there’s real non-confused value lost, but only a pretty small fraction of the value for most people, I think?
Comet (solstice reading)
(also you did literally go into a form of policy advocacy via the route in this post)
Reasonable point, fixed.
Agree, money is technically abundant now that OP and other donors flooded the ecosystem, though well-directed money is semi scarce, and vetting/mentorship seems more bottleneck-y
AI Safety Info (Robert Miles)
Focus: Making YouTube videos about AI safety, starring Rob Miles
Leader: Rob Miles
Funding Needed: Low
Confidence Level: High
I think these are pretty great videos in general, and given what it costs to produce them we should absolutely be buying their production. If there is a catch, it is that I am very much not the target audience, so you should not rely too much on my judgment of what is and isn’t effective video communication on this front, and you should confirm you like the cost per view.
These are two separate-ish projects, Rob Miles makes videos, and Rob Miles is the project owner of AISafety.info mostly in an advisory role. Rob Miles personally is not urgently in need of funding afaik, but will need to reapply soon. AIsafety.info is in need of funding, and recently had a funding crunch which caused several staff members to have to drop off payroll. AISafety.info writers have helped Rob with scriptwriting some, but it’s not their main focus. Donate link for AI Safety Info.
Long Term Future Fund
One question is, are the marginal grants a lot less effective than the average grant?
Given their current relationship to EA funds, you likely should consider LTFF if and only if you both want to focus on AI existential risk via regrants and also want to empower and strengthen the existing EA formal structures and general ways of being.
That’s not my preference, but it could be yours.
As I understood it cG defunded LTFF and also LTFF has very little money and is fairly Habryka influenced, so this seems missing the mark?
CEEALAR / EA Hotel
I loved the simple core concept of a ‘catered hotel’ where select people can go to be supported in whatever efforts seem worthwhile. They are now broadening their approach, scaling up and focusing on logistical and community supports, incubation and a general infrastructure play on top of their hotel. This feels less unique to me now and more of a typical (EA UK) community play, so you should evaluate it on that basis.
Having got a read on the ground, the previous value proposition is still very much going strong, and there are no plans to remove that. As I understand it, the basic mid term plan is to have a mix of residencies and live-in mentor-ish people who are doing their own things, but add in some drives to pitch seasons for people with overlapping interests so that there’s greater opportunity for cross-pollination. There a bunch of other things that could come together as extras, but the team here is keen to keep the things the community knows and loves.
More significantly, the read on the ground I get is extremely positive, much more than previous years now that it’s got active full-time management by someone with relevant experience and drive. Multiple people have said things at least as positive as “this place is life changingly amazing, unblocked me from years long dips, helped me get way more productive, etc”, and it’s pretty clear that the arc of things is spinning up towards people who have much more and better outputs than in previous years.
Attila’s drive to clear the backlog of work needed to get the EA hotel organised, upgraded as a living space, and increased intentionality around selection+getting higher deal-flow so everyone here is agentic and competent is causing this place to spin up an increasing amount of momentum. I honestly think the EA hotel is one of the best EV places to support in the AI safety space right now. My guess is within 3 months we will see several counterfactual outputs which would individually justify the relatively small budget of ~$350k/year to support 20-30 people with a low-hassle and awesome environment.
(CoI: visiting and have friends here, but confident that I would make the same claims if this was not true)
Relatedly: Here’s my broken ambitious outer alignment plan: Universal Alignment Test. It’s not actually written up quite right to be a good exercise for the reader yet, but I removed the spoilers mostly.
If people want spoilers, I can give them, but I do not have bandwidth to grade your assignments and on the real test no one will be capable of doing so. Gl :)
In my three calls with cG following my post which was fairly critical of them (and almost all the other grantmakers) I’ve updated to something like:
cG is institutionally capable of funding the kinds of things the people who have strong technical models of the hard parts of alignment think might be helpful. They mostly don’t because most of the cG grantmakers don’t have those technical models (though some have a fair amount of the picture, including Jake who is doing this hiring round).
My guess as to why they don’t is partly normal organizational inertia things, but plausibly mostly because the kinds of conversations that would be needed to change that don’t happen very easily. Most of the people who are talking to them are trying to get money for specific things, hence the conversation is not super clean for general purpose information transfer, as one party has an extremely strong interest in the outcome of the object level. Also, most of the people who have the kinds of models of technical I think are needed to make good calls are not super good at passing the ITT of prosaic empirical stuff, so the cG grantmakers probably feel frustrated and won’t rate the incoming models highly enough.
My guess is getting a single cG grantmaker who deeply gets it, has grounded confidence and a type of truth-seeking that will hold up even if people around you disagree, and can engage flexibly and with good humor to convey the models that a bunch of the most experienced people around here hold would not just something like double the amount of really well directed dollars, but also maybe shift other things in cG for the better.
I’ve sent them the list of my top ~10 picks and reached out to them. Many don’t want to drop out of research or other roles entirely, but would be interested in a re-granting program, which seems like a best of both worlds.
I’d consider a job which leaves you slack too do other things as a reasonable example of a financial safety net. Or even the ability to reliably get one if you needed it. Probably worth specifying in a footnote along with other types of safety net?
Suggest writing an exercise for the reader using this, first writing up the core idea and why it seemed hopeful and the formalism, then saying this is dangerously broken please find the flaw without reading the spoilers.
More broken ideas should do this, practice for red teaming ambitious theory work is rare and important.
This is the scariest example of nominative determinism I have ever seen.
This seems like both a good process, using your existing knowledge to find good opportunities rather than doing normal applications seems in line with my guess at how high EV grants her name, and a set of grantees I am generally glad to see funded.
This post was an experiment in trimming down to a very core point, and making it cleanly, rather than covering lots of arguments for the thesis. I think it suceeded, and I mostly stand behind the main claim (interp is insufficient for saving the world and has strong potential to boost capabilitities). On the downside, commenters raised other lines of reasoning for the dominance and harms of interp, such as interp helps train people for normal ML jobs, or interp is easy for labs to evaluate with their core competency.
I think I endorse making one clean point and letting the other angles bubble up in the comments over doing an extensive complicated article as is often seen.
I’m also pretty happy with the make the straight readthrough as short as possible, and dump lots of bonus info into footnotes.
I broadly intend to use a similar style, though maybe to a lesser extent, going forwards.