Probably too late at this point for you, but in case other people come along… I’d recommend learning functional analysis first in the context of a theoretical mechanics course/textbook, rather than a math course/textbook. The physicists tend to do a better job explaining the intuitions (and give far more exposure to applications), which I find is the most important thing for a first exposure. Full rigorous detail is something you can pick up later, if and when you need it.
That’s a marginal cost curve at a fixed time. Its shape is not directly relevant to the long-run behavior; what’s relevant is how the curve moves over time. If any fixed quantity becomes cheaper and cheaper over time, approaching (but never reaching) zero as time goes on, then the price goes to zero in the limit.
Consider Moore’s law, for example: the marginal cost curve for compute looks U-shaped at any particular time, but over time the cost of compute falls like e−kt, with k around ln(2)/(18 months).
Of course the limit can’t be reached, that’s the entire reason why people use the phrase “in the limit”.
For short-term, individual cost/benefit calculations around C19, it seems like uncertainty in the number of people currently infected should drop out of the calculation.
For instance: suppose I’m thinking about the risk associated with talking to a random stranger, e.g. a cashier. My estimated chance of catching C19 from this encounter will be roughly proportional to Ninfected. But, assuming we already have reasonably good data on number hospitalized/died, my chances of hospitalization/death given infection will be roughly inversely proportional to Ninfected. So, multiplying those two together, I’ll get a number roughly independent of Ninfected.
How general is this? Does some version of it apply to long-term scenarios too (possibly accounting for herd immunity)? What short-term decisions do depend on Ninfected?
A finite sized computer cannot contain a fine-grained representation of the entire universe.
e−x cannot ever be zero for finite x, yet it approaches zero in the limit of large x. The OP makes exactly the same sort of claim: our software approaches omniscience in the limit.
The rules it’s given are, presumably, at a low level themselves. (Even if that’s not the case, the rules it’s given are definitely not human-intelligible unless we’ve already solved the translation problem in full.)
The question is not whether the low-level AI will follow those rules, the question is what actually happens when something follows those rules. A python interpreter will not ever deviate from the simple rules of python, yet it still does surprising-to-a-human things all the time. The problem is accurately translating between human-intelligible structure and the rules given to the AI.
The problem is not that the AI might deviate from the given rules. The problem is that the rules don’t always mean what we want them to mean.
I’m pretty sure none of this actually affects what I said: the low-level behavior still needs produce results which are predictable to humans in order for predictability to be useful, and that’s still hard.
The problem is that making an AI predictable to a human is hard. This is true regardless of whether or not it’s doing any outside-the-box thinking. Having a human double-check the instructions given to a fast low-level AI does not make the problem any easier; the low-level AI’s behavior still has to be understood by a human in order for that to be useful.
As you say toward the end, you’d need something like a human-readable communications protocol. That brings us right back to the original problem: it’s hard to translate between humans’ high-level abstractions and low-level structure. That’s why AI is unpredictable to humans in the first place.
I think you get “ground truth data” by trying stuff and seeing whether or not the AI system did what you wanted it to do.
That’s the sort of strategy where illusion of transparency is a big problem, from a translation point of view. The difficult cases are exactly the cases where the translation usually produces the results you expect, but then produce something completely different in some rare cases.
Another way to put it: if we’re gathering data by seeing whether the system did what we wanted, then the long tail problem works against us pretty badly. Those rare tail-cases are exactly the cases we would need to observe in order to notice problems and improve the system. We’re not going to have very many of them to work with. Ability to generalize from small data sets becomes a key capability, but then we need to translate how-to-generalize in order for the AI to generalize in the ways we want (this gets at the can’t-ask-the-AI-to-do-anything-novel problem).
(The other comment is my main response, but there’s a possibly-tangential issue here.)
In a long-tail world, if we manage to eliminate 95% of problems, then we generate maybe 10% of the value. So now we use our 10%-of-value product to refine our solution. But it seems rather optimistic to hope that a product which achieves only 10% of the value gets us all the way to a 99% solution. It seems far more likely that it gets to, say, a 96% solution. That, in turn, generates maybe 15% of the value, which in turn gets us to a 96.5% solution, and...
Point being: in the long-tail world, it’s at least plausible (and I would say more likely than not) that this iterative strategy doesn’t ever converge to a high-value solution. We get fancier and fancier refinements with decreasing marginal returns, which never come close to handling the long tail.
Now, under this argument, it’s still a fine idea to try the iterative strategy. But you wouldn’t want to bet too heavily on its success, especially without a reliable way to check whether it’s working.
An important part of my intuition about value-in-the-tail is that if your first solution can knock off 95% of the risk, you can then use the resulting AI system to design a new AI system where you’ve translated better and now you’ve eliminated 99% of the risk...
I don’t see how this ever actually gets around the chicken-and-egg problem.
An analogy: we want to translate from English to Korean. We first obtain a translation dictionary which is 95% accurate, then use it to ask our Korean-speaking friend to help out. Problem is, there’s a very important difference between very similar translations of “help me translate things”—e.g. consider the difference between “what would you say if you wanted to convey X?” and “what should I say if I want to convey X?”, when giving instructions to an AI. Both of those would produce very similar results, right up until everything went wrong. (Let me know if this analogy sounds representative of the strategies you imagine.)
If you do manage to get that first translation exactly right, and successfully ask your friend for help, then you’re good—similar to the “translate how-to-translate” strategy from the OP. And with a 95% accurate dictionary, you might even have a decent chance of getting that first translation right. But if that first translation isn’t perfect, then you need some way to find that out safely—and the 95% accurate dictionary doesn’t make that any easier.
Another way to look at it: the chicken-and-egg problem is a ground truth problem. If we have enough data to estimate X to within 5%, then doing clever things with that data is not going reduce that error any further. We need some other way to get at the ground truth, in order to actually reduce the error rate. If we know how to convey what-we-want with 95% accuracy, then we need some other way to get at the ground truth of translation in order to increase that accuracy further.
Endorsed; that definitely captures the key ideas.
If you haven’t already, you might want to see my answer to Steve’s comment, on why translation to low-level structure is the right problem to think about even if the AI is using higher-level models.
I agree with most of this reasoning. I think my main point of departure is that I expect most of the value is in the long tail, i.e. eliminating 95% of problems generates <10% or maybe even <1% of the value. I expect this both in the sense that eliminating 95% of problems unlocks only a small fraction of economic value, and in the sense that eliminating 95% of problems removes only a small fraction of risk. (For the economic value part, this is mostly based on industry experience trying to automate things.)
Optimization is indeed the standard argument for this sort of conclusion, and is a sufficient condition for eliminating 95% of problems to have little impact on risk. But again, it’s not a necessary condition—if the remaining 5% of problems are still existentially deady and likely to come up eventually (but not often enough to be caught in testing), then risk isn’t really decreased. And that’s exactly the sort of situation I expect when viewing translation as the central problem: illusion of transparency is exactly the sort of thing which doesn’t seem like a problem 95% of the time, right up until you realize that everything was completely broken all along.
Anyway, sounds like value-in-the-tail is a central crux here.
Predictable low-level behavior is not the same as predictable high-level behavior. When I write or read python code, I can have a pretty clear idea of what every line does in a low-level sense, but still sometimes be surprised by high-level behavior of the code.
We still need to translate what-humans-want into a low-level specification. “Making it predictable” at a low-level doesn’t really get us any closer to predictability at the high-level (at least in the cases which are actually difficult in the first place). “Making it predictable” at a high-level requires translating high-level “predictability” into some low-level specification, which just brings us back to the original problem: translation is hard.
I’d give it something in the 2%-10% range. Definitely not likely.
One of the basic problems in the embedded agency sequence is: how does an agent recognize its own physical instantiation in the world, and avoid e.g. dropping a big rock on the machine it’s running on? One could imagine an AI with enough optimization power to be dangerous, which gets out of hand but then drops a metaphorical rock on its own head—i.e. it doesn’t realize that destroying a particular data center will shut itself down.
Similarly, one could imagine an AI which tries to take over the world, but doesn’t realize that unplugging the machine on which it’s running will shut it down—because it doesn’t model itself as embedded in the world. (For similar reasons, such an AI might not see any reason to create backups of itself.)
Another possible safety valve: one could imagine an AI which tries to wirehead, but its operators put a lot of barriers in place to prevent it from doing so. The AI seizes whatever resources it needs to metaphorically smash those barriers, does so violently, then wireheads itself and just sits around.
Generalizing these two scenarios: I think it’s plausible that unprincipled AI architectures tend to have built in safety valves—they’ll tend to shoot themselves in the foot if they’re able to do so. That’s definitely not something I’d want to bet the future of the human species on, but it is a class of scenarios which would allow for an AI to deal a lot of damage while still failing to take over.
“how do I ensure that the AI system has an undo button” and “how do I ensure that the AI system does things slowly”
I don’t think this is realistic if we want an economically-competitive AI. There are just too many real-world applications where we want things to happen which are fast and/or irreversible. In particular, the relevant notion of “slow” is roughly “a human has time to double-check”, which immediately makes things very expensive.
Even if we abandon economic competitiveness, I doubt that slow+reversible makes the translation problem all that much easier (though it would make the AI at least somewhat less dangerous, I agree with that). It’s probably somewhat easier—having a few cycles of feedback seems unlikely to make the problem harder. But if e.g. we’re originally training the AI via RL, then slow+reversible basically just adds a few more feedback cycles after deployment; if millions or billions of RL cycles didn’t solve the problem, then adding a handful more at the end seems unlikely to help much (though an argument could be made that those last few are higher-quality). Also, there’s still the problem of translating a human’s high-level notion of “reversible” into a low-level notion of “reversible”.
Taking a more outside view… restrictions like “make it slow and reversible” feel like patches which don’t really address the underlying issues. In general, I’d expect the underlying issues to continue to manifest themselves in other ways when patches are applied. For instance, even with slow & reversible changes, it’s still entirely plausible that humans don’t stop something bad because they don’t understand what’s going on in enough detail—that’s a typical scenario in the “translation problem” worldview.
Zooming out even further...
I think the solutions I would look for would be quite different though...
I think what’s driving this intuition is that you’re looking for ways to make the AI not dangerous, without actually aligning it (i.e. without solving the translation problem) - mainly by limiting capabilities. I expect that such strategies, in general, will run into similar problems to those mentioned above:
Capabilities which make an AI economically valuable are often capabilities which make it dangerous. Limit capabilities for safety, and the AI won’t be economically competitive.
Choosing which capabilities are “dangerous” is itself a problem of translating what-humans-want into some other framework, and is subject to the usual problems: simple solutions will be patches which don’t address everything, there will be a long tail of complicated corner cases, etc.
Starting point: the problem which makes AI alignment hard is not the same problem which makes AI dangerous. This is the capabilities/alignment distinction: AI with extreme capabilities is dangerous; aligning it is the hard part.
So it seems like this framing of alignment removes the notion of the AI “optimizing for something” or “being goal-directed”. Do you endorse dropping that idea?
Anything with extreme capabilities is dangerous, and needs to be aligned. This applies even outside AI—e.g. we don’t want a confusing interface on a nuclear silo. Lots of optimization power is a sufficient condition for extreme capabilities, but not a necessary condition.
Here’s a plausible doom scenario without explicit optimization. Imagine an AI which is dangerous in the same way as a nuke is dangerous, but more so: it can make large irreversible changes to the world too quickly for anyone to to stop it. Maybe it’s capable of designing and printing a supervirus (and engineered bio-offence is inherently easier than engineered bio-defense); maybe it’s capable of setting off all the world’s nukes simultaneously; maybe it’s capable of turning the world into grey goo.
If that AI is about as transparent as today’s AI, and does things the user wasn’t expecting about as often as today’s AI, then that’s not going to end well.
Now, there is the counterargument that this scenario would produce a fire alarm, but there’s a whole host of ways that could fail:
The AI is usually very useful, so the risks are ignored
Errors are patched rather than fixing the underlying problem
Really big errors turn out to be “easier” than small errors—i.e. high-to-low level translations are more likely to be catastrophically wrong than mildly wrong
It’s hard to check in testing whether there’s a problem, because errors are rare and/or don’t look like errors at the low-level (and it’s hard/expensive to check results at the high-level)
In the absence of optimization pressure, the AI won’t actively find corner-cases in our specification of what-we-want, so it might actually be more difficult to notice problems ahead-of-time
Getting back to your question:
Do you endorse dropping that idea?
I don’t endorse dropping the AI-as-optimizer idea entirely. It is definitely a sufficient condition for AI to be dangerous, and a very relevant sufficient condition. But I strongly endorse the idea that optimization is not a necessary condition for AI to be dangerous. Tool AI can be plenty dangerous if it’s capable of making large, fast, irreversible changes to the world, and the alignment problem is still hard for that sort of AI.
I graduated 7 years ago. During that time, I’ve actually used most of the subjects I studied in college—partly at work (as a data scientist), partly in my own research, and partly just when they happen to come up in conversation or day-to-day life. On the occasions when I’ve needed to return to a topic I haven’t used in a while, it’s typically been very fast.
But the question “how long does it take to get back up to speed on something I learned a while ago?” kind of misses the point. Most of the value doesn’t come from being able to quickly get back up to speed on fluid mechanics or materials science or inorganic chemistry. Rather, the value comes knowing which pieces I actually need to get back up to speed on. What matters is remembering what questions to ask, how to formulate them, and what the important pieces usually are. Details are easy to find on wikipedia or in papers if you’re familiar with the high-level structure.
To put it differently: you want to already have an idea of what kinds of things are usually important for problems in some field, and what kinds of things usually aren’t important. If you have that, then it’s fast and easy to look up the parts which are important for any particular problem, and double-check that you’re not missing anything crucial.
One big item I’d add to this list: reading through a paper/post/source in the links db, checking information in it, and writing a comment/post about what checks were performed and whether the source looks accurate. The top reason I consider LW a better source of information on coronavirus than other places is because the information here is more likely to be true (or at least have a well-calibrated indication of plausibility attached); having more LWers review primary work amplifies that advantage.
First, the standard answer: Bryan Caplan’s The Case Against Education. Short version: education is about signalling to future employers how smart/diligent/willing-to-jump-through-hoops/etc you are. Skill acquisition is mostly irrelevant. This is basically true for most people most of the time.
That said… I personally have gotten a lot of value out of things I learned in courses. This is not something that happens by default; the vast majority of my classmates did not get nearly as much value out of courses as I did. I’ll list a few things I did differently which may help.
Avoid nontechnical classes: this one is kind of a “well duh” thing, but there are some subtleties. “Technical” should be interpreted in a broad sense—things like e.g. law or languages aren’t technical in the sense of STEM, but they’re technical in the sense that the things they teach are intended to be directly useful. By contrast, subjects which are primarily about aesthetics or history or critical theory are not really intended to be directly useful.
Decreasing marginal returns: the first course in any particular field/subfield is far more valuable than the second course, the second course is more valuable than the third, etc. This suggests going for breadth over depth. In particular, I recommend taking one or two courses in many different fields so that you can talk to specialists in those fields without being completely lost. You don’t need to become an expert yourself; much of the value is in being able to quickly and easily work with specialists in many different fields. You can translate jargon and act as a human interface, and you can easily jump into many different areas.
General-purpose tools: focus on fields which provide tools applicable to many different domains. Most of applied math qualifies, as well as computer science, economics, and law. Ideally, you take one or two courses in some general-purpose subject, then run into applications of that subject while sampling other fields. By seeing it come up in different contexts, you’re more likely to remember and use it.
Summary of this point and previous: go for depth in general purpose tools, and practice those tools while gaining breadth in other areas.
Use available resources: I’ve covered about as much material in open courseware as I have in in-person classes. I’ve watched online lectures, I’ve read textbooks, and I’ve audited courses. (I’ve even audited classes at universities where I’m not registered—professors are usually happy with people just showing up.) In college, I’d often watch a semester’s worth of online lectures on a subject before taking a class on the subject; material is a lot easier to follow when you already have a general idea of where things are headed and how it all slots together..
Have a stock of problems: as you learn new tools, it’s useful to have a handful of problems to try them out on. Hard open algorithmic problems like integer factorization or P vs NP or graph isomorphism are great for testing out all sorts of applied math/CS tricks. “How could I use this to start a company and make a gazillion dollars?” is a problem which applies to practically anything. The problems should be things you’re interested in and enjoy thinking about, so that you’ll find it worthwhile to try out new tools on them even though most of the tools don’t actually yield breakthroughs on most of the problems.
This point and the previous one help a lot with actually remembering things and being able to apply them in-the-wild.
Optimize: at my college (Harvey Mudd), it was very easy to tell who had actually tried to optimize their course choices—it was the people who used the “build your own major” option. We only had half a dozen majors, and the course requirements always included things which weren’t really what any particular person was interested in. If you wanted to cram in more courses you were actually interested in, while avoiding irrelevant courses, a build-your-own major was the way to go.
More generally, you’ll get more out of classes if you read through the whole course catalog, mark the classes which sound interesting, and then optimize your schedule to focus on those classes. Sounds obvious, yet most people don’t do it.
Be comprehensive: you’re not going to know everything, but you can learn enough that nothing is very far from the things you do know. You can learn enough that you at least have some idea of which things you don’t know. You can learn enough that, even if you don’t know something, you’ve probably heard of it, and you have some idea of where to learn about it if you need to. The key is to aim for comprehensive knowledge of the world—you don’t need to know every little detail, but at least get the broad strokes of the big things. Anytime you don’t have a clue about something, pay attention to it, and look for the most general subject which would give you some background about that thing.
Math, physics and economics are particularly useful for comprehensive foundations—they give you the tools to solve practically anything in principle, though you’ll often need more specialized tools in practice.