Question for mods (sorry if I asked this before): Is there a way to make the LaTeX render?
In theory MathJax should be enough, eg that’s all I use at the original post: https://bounded-regret.ghost.io/how-fast-can-we-perform-a-forward-pass/
Question for mods (sorry if I asked this before): Is there a way to make the LaTeX render?
In theory MathJax should be enough, eg that’s all I use at the original post: https://bounded-regret.ghost.io/how-fast-can-we-perform-a-forward-pass/
I was surprised by this claim. To be concrete, what’s your probability of xrisk conditional on 10-year timelines? Mine is something like 25% I think, and higher than my unconditional probability of xrisk.
Fortunately (?), I think the jury is still out on whether phase transitions happen in practice for large-scale systems. It could be that once a system is complex and large enough, it’s hard for a single factor to dominate and you get smoother changes. But I think it could go either way.
Thanks! I pretty much agree with everything you said. This is also largely why I am excited about the work, and I think what you wrote captures it more crisply than I could have.
Yup, I agree with this, and think the argument generalizes to most alignment work (which is why I’m relatively optimistic about our chances compared to some other people, e.g. something like 85% p(success), mostly because most things one can think of doing will probably be done).
It’s possibly an argument that work is most valuable in cases of unexpectedly short timelines, although I’m not sure how much weight I actually place on that.
Note the answer changes a lot based on how the question is operationalized. This stronger operationalization has dates around a decade later.
Here’s a link to the version on my blog: https://bounded-regret.ghost.io/appendix-more-is-different-in-related-fields/
Yup! That sounds great :)
Thanks Ruby! Now that the other posts are out, would it be easy to forward-link them (by adding links to the italicized titles in the list at the end)?
@Mods: Looks like the LaTeX isn’t rendering. I’m not sure what the right way to do that is on LessWrong. On my website, I do it with code injection. You can see the result here, where the LaTeX all renders in MathJax: https://bounded-regret.ghost.io/ml-systems-will-have-weird-failure-modes-2/
I feel like you are arguing for a very strong claim here, which is that “as soon as you have an efficient way of determining whether a problem is solved, and any way of generating a correct solution some very small fraction of the time, you can just build an efficient solution that solves it all of the time”
Hm, this isn’t the claim I intended to make. Both because it overemphasizes on “efficient” and because it adds a lot of “for all” statements.
If I were trying to state my claim more clearly, it would be something like “generically, for the large majority of problems of the sort you would come across in ML, once you can distinguish good answers you can find good answers (modulo some amount of engineering work), because non-convex optimization generally works and there are a large number of techniques for solving the sparse rewards problem, which are also getting better over time”.
Thanks for the push-back and the clear explanation. I still think my points hold and I’ll try to explain why below.
In order to even get a single expected datapoint of approval, I need to sample 10^8 examples, which in our current sampling method would take 10^8 * 10 hours, e.g. approximately 100,000 years. I don’t understand how you could do “Learning from Human Preferences” on something this sparse
This is true if all the other datapoints are entirely indistinguishable, and the only signal is “good” vs. “bad”. But in practice you would compare / rank the datapoints, and move towards the ones that are better.
Take the backflip example from the human preferences paper: if your only signal was “is this a successful backflip?”, then your argument would apply and it would be pretty hard to learn. But the signal is “is this more like a successful backflip than this other thing?” and this makes learning feasible.
More generally, I feel that the thing I’m arguing against would imply that ML in general is impossible (and esp. the human preferences work), so I think it would help to say explicitly where the disanalogy occurs.
I should note that comparisons is only one reason why the situation isn’t as bad as you say. Another is that even with only non-approved data points to label, you could do things like label “which part” of the plan is non-approved. And with very sophisticated AI systems, you could ask them to predict which plans would be approved/non-approved, even if they don’t have explicit examples, simply by modeling the human approvers very accurately in general.
I feel even beyond that, this still assumes that the reason it is proposing a “good” plan is pure noise, and not the result of any underlying bias that is actually costly to replace.
When you say “costly to replace”, this is with respect to what cost function? Do you have in mind the system’s original training objective, or something else?
If you have an original cost function F(x) and an approval cost A(x), you can minimize F(x) + c * A(x), increasing the weight on c until it pays enough attention to A(x). For an appropriate choice of c, this is (approximately) equivalent to asking “Find the most approved policy such that F(x) is below some threshold”—more generally, varying c will trace out the Pareto boundary between F and A.
so even if we get within 33 bits (which I do think seems unlikely)
Yeah, I agree 33 bits would be way too optimistic. My 50% CI is somewhere between 1,000 and 100,000 bits needed. It just seems unlikely to me that you’d be able to generate, say, 100 bits but then run into a fundamental obstacle after that (as opposed to an engineering / cost obstacle).
Like, I feel like… this is literally a substantial part of the P vs. NP problem, and I can’t just assume my algorithm just like finds efficient solution to arbitrary NP-hard problems.
I don’t think the P vs. NP analogy is a good one here, for a few reasons:
* The problems you’re talking about above are statistical issues (you’re saying you can’t get any statistical signal), while P vs. NP is a computational question.
* In general, I think P vs. NP is a bad fit for ML. Invoking related intuitions would have led you astray over the past decade—for instance, predicting that neural networks should not perform well because they are solving a problem (non-convex optimization) that is NP-hard in the worst case.
This would imply a fixed upper bound on the number of bits you can produce (for instance, a false negative rate of 1 in 128 implies at most 7 bits). But in practice you can produce many more than 7 bits, by double checking your answer, combining multiple sources of information, etc.
Okay, thanks! The posts actually are written in markdown, at least on the backend, in case that helps you.