What is the machine learning project that might be of use in AI Alignment?
Greg C
Not sure if it counts as an “out” (given I think it’s actually quite promising), but definitely something that should be tried before the end:
“To the extent we can identify the smartest people on the planet, we would be a really pathetic civilization were we not willing to offer them NBA-level salaries to work on alignment.”—Tomás B.
Megastar salaries for AI alignment work
[Summary from the FTX Project Ideas competition]
Aligning future superhuman AI systems is arguably the most difficult problem currently facing humanity; and the most important. In order to solve it, we need all the help we can get from the very best and brightest. To the extent that we can identify the absolute most intelligent, most capable, and most qualified people on the planet – think Fields Medalists, Nobel Prize winners, foremost champions of intellectual competition, the most sought-after engineers – we aim to offer them salaries competitive with top sportspeople, actors and music artists to work on the problem. This is complementary to our AI alignment prizes, in that getting paid is not dependent on results. The pay is for devoting a significant amount of full time work (say a year), and maximum brainpower, to the problem; with the hope that highly promising directions in the pursuit of a full solution will be forthcoming. We will aim to provide access to top AI alignment researchers for guidance, affiliation with top-tier universities, and an exclusive retreat house and office for fellows of this program to use, if so desired.
[Yes, this is the “pay Terry Tao $10M” thing. FAQ in a GDoc here.]
Inner alignment (mesa-optimizers) is still a big problem.
Interesting. I note that they don’t actually touch on x-risk in the podcast, but the above quote implies that Demis cares a lot about Alignment.
I wonder how fleshed out the full plan is? The fact that there is a plan does give me some hope. But as Tomás B. says below, this needs to be put into place now, rather than waiting for a fire alarm that may never come.
A list of potential miracles (including empirical “crucial considerations” [/wishful thinking] that could mean the problem is bypassed):
Possibility of a failed (unaligned) takeoff scenario where the AI fails to model humans accurately enough (i.e. realise smart humans could detect its “hidden” activity in a certain way). [This may only set things back a few months to years; or could lead to some kind of Butlerian Jihad if there is a sufficiently bad (but ultimately recoverable) global catastrophe (and then much more time for Alignment the second time around?)].
Valence realism being true. Binding problem vs AGI Alignment.
Omega experiencing every possible consciousness and picking the best? [Could still lead to x-risk in terms of a Hedonium Shockwave].
Moral Realism being true (and the AI discovering it and the true morality being human-compatible).
Natural abstractions leading to Alignment by Default?
Rohin’s links here.
AGI discovers new physics and exits to another dimension (like the creatures in Greg Egan’s Crystal Nights).
Simulation/anthropics stuff.
Alien Information Theory being true!? (And the aliens having solved alignment).
- Apr 12, 2022, 9:47 PM; 5 points) 's comment on 13 ideas for new Existential Risk Movies & TV Shows – what are your ideas? by (EA Forum;
I’m often acting based on my 10%-timelines
Good to hear! What are your 10% timelines?
1. Year with 10% chance of AGI?
2. P(doom|AGI in that year)?
Most EAs are much more worried about AGI being an x-risk than they are excited about AGI improving the world (if you look at the EA Forum, there is a lot of talk about the former and pretty much none about the latter). Also, no need to specifically try and reach EAs; pretty much everyone in the community is aware.
..Unless you meant Electronic Arts!? :)
Here’s a more fleshed out version, FAQ style. Comments welcome.
Here’s a version of this submitted as a project idea for the FTX Foundation.
SBF/FTX does though.
Is it possible to have answers given in dates on https://forecast.elicit.org/binary, like it it is for https://forecast.elicit.org/questions/LX1mQAQOO?
we probably won’t figure out how to make AIs that are as data-efficient as humans for a long time—decades at least. This is because 1. We’ve been trying to figure this out for decades and haven’t succeeded
EfficientZero seems to have put paid to this pretty fast. It seems incredible that the algorithmic advances involved aren’t even that complex either. Kind of makes you think that people haven’t really been trying all that hard over the last few decades. Worrying in terms of its implications for AGI timelines.
Ok, but Eliezer is saying that BOTH that his timelines are short (significantly less than 30 years) AND that he thinks ML isn’t likely to be the final paradigm (this judging from not just this conversation, but the other, real, ones in this sequence).
2 * 10^16 ops/sec*
(*) Two TPU v4 pods.
Shouldn’t this be 0.02 TPU v4 pods?
I note that mixture-of-experts is referred to as the kind of thing that in principle could shorten timelines, but in practice isn’t likely to. Intuitively, and naively from neuroscience (different areas of the brain used for different things), it seems that mixture-of-experts should have a lot of potential, so I would like to see more detail on exactly why it isn’t a threat.
Eliezer has short timelines, yet thinks that the current ML paradigm isn’t likely to be the final paradigm. Does this mean that he has some idea of a potential next paradigm? (Which he is, for obvious reasons, not talking about, but presumably expects other researchers to uncover soon, if they don’t already have an idea). Or is it that somehow the recent surprising progress within the ML paradigm (AlphaGo, AlphaFold, GPT3 etc) makes it more likely that a new paradigm that is even more algorithmically efficient is likely to emerge soon? (If the latter, I don’t see the connection).
There’s an optimistic way to describe the result of these trends: today, you can’t start a cult. Forty years ago, people were more open to the idea that not all knowledge was widely known.
This doesn’t seem to have aged well in light of the rampant spread of misinformation and conspiracy theories on social media (especially Facebook!)
Test spoiler:
Test
Interested in how you would go about throwing money at scalable altruistic projects. There is a lot of money and ideas around in EA, but a relative shortage of founders, I think.