Andrii Vasylenko
I think a lot of conceptual progress needs to happen before we’re remotely close to the kind of implementation details that current AIs can meaningfully help with. So I don’t think making experiments easier to run would substantially advance alignment.
and by giving true generalization, provide a sturdy foundation for AI safety in the form of useful NNs which are aligned & safe for the right reasons.
I doubt this. High capabilities are at least somewhat an attractor basin, which makes them possible to target using tools like GD. There is no corresponding attractor at the particular utility function we want the AI to have, so I think there would be a lot of gotchas with trying to learn it using GD.
Huh, I always got the feeling from MIRI that “alignment is really hard, we have to do a pause to have any chance of succeeding”. I’m kind of surprised to see that MIRI is investing effort into doing technical alignment research.
What prompted you to pivot? If I recall correctly you were working on comms for MIRI before this.
5. Lead to Gold
A problem is so hard that humans aren’t even close to being smart enough or technologically advanced enough to solve it. We toil away pointlessly at trying to solve it.I think it would help to decompose this one. Knowing that a problem is hard doesn’t help much; knowing why a problem is hard does sometimes make it easier to solve.
Even if there is some warning, I don’t expect that it will lead to that kind of costly, extreme action. And I think it is possible to shield electronics against EMPs, so I don’t think it would be effective at stopping an AGI.
So if LLMs scale to superintelligence, this reduces to aligning an LLM, or if they don’t, attach an LLM to whatever we end up building, so it knows what human values are (preferably with Bayesian Learning so it can Value Learn more detail), and attach an explicit goal slot so that we can explicitly make it care. AIXI with an LLM as a subroutine.
In the limit of superintelligent optimization, the things that look the best to an LLM grader are not generally the things that we value.
Some Fermi estimates:
On the lower end, 4 words/second times 16 bits/word times 4x10^8 seconds adds up to to ~3x10^10.
On the higher end, assuming 10^7 bits/second (about the retina-to-brain bandwidth), it adds up to ~4x10^15.
However, I think the brain is much less data- and compute-efficient than an optimal AGI algorithm would be. So I don’t think it is a good predictor of how much data future AI algorithms will require.
I agree. I was making a more narrow claim that post quality is one of the things that correlate pretty well with karma.
the way to win is to do lots of uncorrelated research bets that are individually unlikely to succeed but also do no harm if they fail!
I disagree that alignment will probably get solved by someone pursuing a direction that seems very unlikely even to them. It seems to me that the right way to do things is to figure out what the hard parts of alignment are and then to try to solve them.
that’s the way surprising and novel scientific inventions have always happened in the past.
I think that’s because those fields had reasonably good feedback loops, and so the strategy of “try a bunch of things and see if any of them work” is generally viable. As a counterexample, Einstein’s methodology was on the opposite end of the spectrum.
That seems false to me. If we had a textbook from the future on AI alignment, I think we would certainly be able to build an aligned ASI.
I think that, if an alignment proposal appears to you to have a 40% chance of working (let alone 1%), then in reality, it very likely won’t work.
In my experience, karma correlates rather well with post quality (after accounting for how much time it’s been on the site).
I agree that would happen if a treaty is signed. But I don’t see how that affects your point in the top-level post.
If you are worried about misaligned ASI, you are likely worried about it being developed in the US or China (and thus it does not matter much if other countries are attempting to build ASI).
Are you saying that you don’t think other countries will be able to build ASI even if they have more compute?
So, in practice you just allow them to benefit from AI without really advancing ASI development, and instead potentially causing a slowdown.
As I understand it, you are saying that other countries will have more will to sign a treaty than the US or China. That doesn’t seem like an obvious inference.
If you largely believe that no non-US or non-Chinese company/state will build AGI, then the threat of misaligned ASI is not really an issue
I don’t see how that follows. I don’t think any nation or company currently has the knowledge to build an aligned ASI.
This happens with concepts too.
a python program that always prints “I am conscious”
I think it would be more effective to consider the mental state of whoever wrote that program, not the program itself.
I’ve had benefit in doing this sort of thing over the course of a day.