I’d also be happy with an inexact description of what the bot will do in response to specified strategies that captured all the relevant details.

I think that it isn’t clear what constitutes “fully understanding” an algorithm.

That seems right.

Another obstacle to full understanding is memory. Suppose your go bot has memorized a huge list of “if you are in such and such situation move here” type rules.

I think there’s reason to believe that SGD doesn’t do exactly this (nets that memorize random data have different learning curves than normal nets iirc?), and better reason to think it’s possible to train a top go bot that doesn’t do this.

There is not in general a way to compute what an algorithm does without running it.

Yes, but luckily you don’t have to do this for all algorithms, just the best go bot. Also as mentioned, I think you probably get to use a computer program for help, as long as you’ve written that computer program.

I think that it isn’t clear what constitutes “fully understanding” an algorithm.

Say you pick something fairly simple, like a floating point squareroot algorithm. What does it take to fully understand that.

You have to know what a squareroot is. Do you have to understand the maths behind Newton raphson iteration if the algorithm uses that? All the mathematical derivations, or just taking it as a mathematical fact that it works. Do you have to understand all the proofs about convergence rates. Or can you just go “yeah, 5 iterations seems to be enough in practice”. Do you have to understand how floating point numbers are stored in memory? Including all the special cases like NaN which your algorithm hopefully won’t be given? Do you have to keep track of how the starting guess is made, how the rounding is done. Do you have to be able to calculate the exact floating point value the algorithm would give, taking into account all the rounding errors. Answering in binary or decimal?

Is brute force minmax search easy to understand. You might be able to easily implement the algorithm, but you still don’t know which moves it will make. In general, for any algorithm that takes a lot of compute, humans won’t be able to work out what it will do without very slowly imitating a computer. There are some algorithms we can prove theorems about. But it isn’t clear which theorems we need to prove to get “full understanding”

Another obstacle to full understanding is memory. Suppose your go bot has memorized a huge list of “if you are in such and such situation move here” type rules. You can understand how gradient descent would generate good rules in the abstract. You have inspected a few rules in detail. But there are far too many rules for a human to consider them all. And the rules depend on a choice of random seed.

Corollaries of success (non-exhaustive):

You should be able to answer questions like “what will this bot do if someone plays mimic go against it” without actually literally checking that during play. More generally, you should know how the bot will respond to novel counter strategies

There is not in general a way to compute what an algorithm does without running it. Some algorithms are going about the problem in a deliberately slow way. However if we assume that the go algorithm has no massive known efficiency gains. (Ie no algorithm that computes the same answer using a millionth of the compute) And that the algorithm is far too compute hungry for humans doing it manually. Then it follows that humans won’t be able to work out exactly what the algorithm will do.

You should be able to write a computer program anew that plays go just like that go bot, without copying over all the numbers.

Being able to understand the algorithm well enough to program it for the first time, not just blindly reciting code. An ambiguous but achievable goal.

Suppose a bunch of people coded another Alpha go like system. The random seed is different. The layer widths are different. The learning rate is slightly different. Its trained with different batch size, for a different amount of iterations on a different database of stored games. It plays about as well. In many situations it makes a different move. The only way to get a go bot that plays exactly like alpha go is to copy everything including the random seed. This might have been picked based on lucky numbers or birthdays. You can’t rederive from first principles what was never derived from first principles. You can only copy numbers across, or pick your own lucky numbers. Numbers like batch size aren’t quite as pick your own, there are unreasonably small and large values, but there is still quite a lot of wiggle room.

Hmmm. It does seem like I should probably rewrite this post. But to clarify things in the meantime:

it’s not obvious to me that this is a realistic target, and I’d be surprised if it took fewer than 10 person-years to achieve.

I do think the knowledge should ‘cover’ all the athlete’s ingrained instincts in your example, but I think the propositions are allowed to look like “it’s a good idea to do x in case y”.

Yep, that accords well with my own current view.

Oh, I don’t think those things exactly sidestep the problem of the criterion so much as commit to a response to it without necessarily realizing that’s what they’re doing. All of them sort of punt on it by saying “let humans figure out that part”, which at the end of the day is what any solution is going to do because we’re the ones trying to build the AI and making the decisions, but we can be more or less deliberate about how we do this part.

You’re talking about how we ground out our thinking in something that is true but is not just further conceptualization?

Look if we just make a choice about the truth by making an assumption then eventually the world really does “bite back”. It’s possible to try this out by just picking a certain fundamental orientation towards the world and sticking to it no matter what throughout your life for a little while. The more rigidly you adhere to it the more quickly the world will bite back. So I don’t think we can just pick a grounding.

But at the same time I very much agree that there is no

*concept*that corresponds to the truth in a context-free or absolute way. The analogy I like the most is dance: imagine if I danced a dance that beautifully expressed what it’s like to walk in the forest at night. It might be an incredibly evocative dance and it might point towards a deep truth about the forest at night, but it would be strange to claim that a particular dance*is*the final, absolute, context-free truth. It would be strange to seek after a final, absolute, context-free dance that expresses what it’s like to walk in the forest at night in a way that finally captures the actual truth about the forest at night.When we engage in conceptualization, we are engaging in something like a dance. It’s a dance with real consequence, real power, real impacts on the world, and real importance. It matters that we dance it and that we get it right. It’s hard to think of anything at this point that matters more. But its significance is not a function of its capturing the truth in a final or context-free way.

So when I consider “grounding out” my thinking in reality, I think of it in the same way that a dance should “ground out” in reality. That is: it should be

*about*something real. It’s also possible to pick some idea about what it’s really like to walk in the forest at night and dance in a way that adheres to that idea but not to the reality of what it’s actually like to walk in the forest at night. And it’s possible to think in a way that is similarly not in accord with reality itself. But just as with dance, thinking in accord with reality is not at all about capturing reality in a final, absolute, or context-free way.Is this how you see things too?

Well ok, agreed, but even if we were Cartesian, we would still have questions about what is the right way to link up our machines with this place where agentiness is coming from, how we discern whether we are in fact Cartesian or embedded, and so on down to the problem of the criterion as you described it.

One common response to any such difficult philosophical problems seems to be to just build AI that uses some form of indirect normativity such as CEV or HCH or AI debate to work out what wise humans would do about those philosophical problems. But I don’t think it’s so easy to sidestep the problem of the criterion.

I think at this point you’ve pushed the word “know” to a point where it’s not very well-defined; I’d encourage you to try to restate the original post while tabooing that word.

This seems particularly valuable because there are some versions of “know” for which the goal of knowing everything a complex model knows seems wildly unmanageable (for example, trying to convert a human athlete’s ingrained instincts into a set of propositions). So before people start trying to do what you suggested, it’d be good to explain why it’s actually a realistic target.

Or maybe it means we train the professional in the principles and heuristics that the bot knows. The question is if we can compress the bot’s knowledge into, say, a 1-year training program for professionals.

There are reasons to be optimistic: We can discard information that isn’t knowledge (lossy compression). And we can teach the professional in human concepts (lossless compression).

What does that mean though? If you give the go professional a massive transcript of the bot knowledge, it’s probably unusable. I think what the go professional gives you is the knowledge of where to look/what to ask for/what to search.

That’s basically what Paul’s universality (my distillation post for another angle) is aiming for: having a question-answering overseer which can tell you everything you want to know about what the system knows and what it will do. You still probably need to be able to ask a relevant question, which I think is what you’re pointing at.

For real humans, I think this is a more gradual process—they learn and use some distinctions, and forget others, until their mental models are quite different a few years down the line.

The splintering can happen when a single feature splinters; it doesn’t have to be dramatic.

Perhaps the bot knows different things at different times and your job is to figure out (a) what it always knows and (b) a way to quickly find out everything it knows at a certain point in time.

But once you let it do more computation, then it doesn’t have to know anything at all, right? Like, maybe the best go bot is, “Train an AlphaZero-like algorithm for a million years, and then use it to play.”

I know more about go than that bot starts out knowing, but less than it will know after it does computation.

I wonder if, when you use the word “know”, you mean some kind of distilled, compressed, easily explained knowledge?

Maybe it nearly suffices to get a go professional to know everything about go that the bot does? I bet they could.

[D]oes understanding the go bot in your sense imply that you could play an even game against it?

I imagine so. One complication is that it can do more computation than you.

Sure. But the question is can you know everything it knows and not be as good as it? That is, does understanding the go bot in your sense imply that you could play an even game against it?

Good point!

I think there’s some communication failure where people are very skeptical of this for reasons that they think are obvious given what they’re saying, but which are not obvious to me. Can people tell me which subset of the below claims they agree with, if any? Also if you come up with slight variants that you agree with that would be appreciated.

It is approximately impossible to succeed at this challenge.

It is possible to be confident that advanced AGI systems will not pose an existential threat without being able to succeed at this challenge.

It is not obvious what it means to succeed at this challenge.

It will probably not be obvious what it means to succeed at this challenge at any point in the next 10 years, even if a bunch of people try to work on it.

We do not currently know what it means for a go bot to know something in operational terms.

At no point in the next 10 years could one be confident that one knew everything a go bot knew, because we won’t be confident about what it means for a go bot to know something.

You couldn’t know everything a go bot knows without essentially being that go bot.