I share both of these intuitions.
That being said, I’m not convinced that the space of concepts is smaller as you get more meta. (Naively speaking, there are ~exponentially more distributions over distributions than distributions, though some strong simplicity biases can cut this down a lot.) I suspect that one reason it seems that the space of concepts is “smaller” is because we’re worse at differentiating concepts at higher levels of meta-ness. For example, it seems that it’s often easier to figure out what the consequences of concrete action X are than the consequences of adopting a particular ethical system, and a lot of philosophy on metaethics seems more confused than philosophy on ethics. I think this is related to the “it’s more difficult to get feedback” intuition, where we have fewer distinct buckets because it’s too hard to distinguish between similar theories at sufficiently high meta-levels.
I’m pretty sure that “hard problem of correctly identifying causality” is a major goal of MIRI’s decision theory.
In what sense is discovering causality NP-hard? There’s the trivial sense in which you can embed a NP-hard problem (or tasks of higher complexity) into the real world, and there’s the sense in which inference in Bayesian networks can embed NP-hard problems.
Can you elaborate on why AIXI/Solomonoff induction is an unsafe utility maximizer, even for Cartesian agents?
I skimmed some of Crick and read some commentary on him, and Crick seems to take the Hobbesian “politics as a necessary compromise” viewpoint. (I wasn’t convinced by his definition of the word politics, which seemed not to point at what I would point at as politics.)
My best guess: I think they’re arguing not that immature discourse is okay, but that we need to be more polite toward people’s views in general for political reasons, as long as the people are acting somewhat in good faith (I suspect they think that you’re not being sufficiently polite toward those you’re trying to throw out of the overton window). As a result, we need to engage less in harsh criticism when it might be seen as threatening.
That being said, I also suspect that Duncan would agree that we need to be charitable. I suspect the actual disagreement is whether the behavior of the critics Duncan is replying to are actually the sort of behavior we want/need to accept in our community.
(Personally, I think we need to be more willing to do real-life experiments, even if they risk going somewhat wrong. And I think some of the tumblr criticism definitely fell out of what I would want in the overton window. So I’m okay with Duncan’s paranthetical, though it would have been nicer if it was more explicit who it was responding to.)
I also think I wouldn’t have understood his comments without MTG or at least having read Duncan’s explanation to the MTG color wheel.
(Nitpicking) Though I’d add that MTG doesn’t have a literal Blue Knight card either, so I doubt it’s that reference. (There are knights that are blue and green, but none with the exact names “Blue Knight” or “Green Knight”.)
Thanks for posting this. I found the framing of the different characters very insightful.
After looking into the prototype course, I updated upwards on this project, as I think it is a decent introduction to Dylan’s Off-Switch Game paper. Could I ask what other stuff RAISE wants to cover in the course? What other work on corrigibility are you planning to cover? (For example Dylan’s other work, MIRI’s work on this subject and Smitha Mili’s paper?)
Could you also write more about who your course is targeting? Why does RAISE believe that the best way to fix the talent gap in AI safety is to help EAs change careers via introductory AI Safety material, instead of, say, making it easier for CS PhD students to do research on AI Safety-relevant topics? Why do we need to build a campus, instead of co-opting the existing education mechanisms of academia?
Finally, could you link some of the mind maps and summaries RAISE has created?
Thanks! I think it makes sense to link it at the start, so new readers can get context for what you’re trying to do.
Yeah, I think Ben captures my objection—IDA captures what is different between your approach and MIRI’s agenda, but not what is different between some existing AI systems and your approach.
This might not be a bad thing—perhaps you want to choose a name that is evocative of existing approaches to stress that your approach is the natural next step for AI development, for example.
Could I ask what the motivation behind this post was?
I think they’re referring to the fact that they wouldn’t expect a Friendly AI to deconstruct them.
Also, for some reason, the link is wonky—likely because LessWrong 2.0 parses text contained in _ as italics. Here’s the fixed link:
Hm, I noticed that your link showed up quite wonky. Here’s a fixed version:
I am always intensely skeptical of people who don’t bring notebooks to meetings. Sometimes I’m the only one present with a notebook. What, you think you’re going to just remember the twenty details and action items that were agreed on?
I generally don’t bring a notebook to meetings when I expect a decent quality note-taker. I find that taking notes while listening often distracts from my ability to generate novel thoughts, especially if I’m spending more than half the time just taking notes. (And as I don’t write particularly fast, this tends to happen unless I stick only to writing down a very small fraction of interesting conversations!)
For reference: Andrew Critch’s post arguing for using a large notebook to think.
I strongly second the stick-to-the-wall whiteboards recommendation!
I actually suspect that the performance improvement for marker over pens is due in part to increased legibility—both from the tendency to write larger when using a marker (I know that I tend to draw really tiny diagrams with pens) and because markers leave a much thicker mark on the paper.
I remember hearing people call it iterative distillation and amplification (IDA), but I think this name might be too general.
I think there’s a lot of the intuitions and thought processes that let you come up with new discoveries in mathematics and machine learning that aren’t generally taught in classes or covered in textbooks. People are also quite bad at conveying their intuitions behind topics directly when asked to in Q&As and speeches. I think that at least in machine learning, hanging out with good ML researchers teaches me a lot about how to think about problems, in a way that I haven’t been able to get even after reading their course notes and listening to their presentations. Similarly, I suspect that autobiographies may help convey the experience of solving problems in a way that actually lets you learn the intuitions or thought processes used by the author.
Yeah, I agree on the stretching point.
The main distinguishing thing about Feynman, at least from reading Feynman’s two autobiographies, seemed to be how irreverent he is. He doesn’t do science because it’s super important, he does science he finds fun or interesting. He is constantly going on rants about the default way of looking at things (at least his inner monologue is) and ignoring authority, whether by blowing up at the science textbooks he was asked to read, ignoring how presidential committees traditionally functioned, or disagreeing with doctors. He goes to strip clubs because he likes interacting with pretty girls. It’s really quite different from the rather stodgy utilitarian/outside mindset I tend to reference by default, and I think reading his autobiographies me a lot more of what Critch calls “Entitlement to believe” .
When I adopt this “Feynman mindset” in my head, this feels like letting my inner child out. I feel like I can just go and look at things and form hypotheses and ask questions, irrespective of what other people think. I abandon the feeling that I need to do what is immediately important, and instead go look at what I find interesting and fun.
From Watson’s autobiography, I mainly got a sense of how even great scientists are drive a lot by petty desires, such as the fear that someone else would beat them to a discovery, or how annoying your collaborators are. For example, it seemed that a driving factor for Watson and Crick’s drive to work on DNA was the fear that Linus Pauling would discover the true structure first. A lot of their failure to collaborate better with Rosalind Franklin was due to personal clashes with her. Of course, Watson does also display some irreverence to authority; he held fast to his belief that their approach to finding the structure of DNA would work, even when multiple more senior scientists disagreed with him. But I think the main thing I got out of the book was a visceral appreciation for how important social situations are for motivating even important science.
When I adopt this “Watson mindset” in my head, I think about the social situation I’m in, and use that to motivate me. I call upon the irritation I feel when people are just acting a little too suboptimal, or that people are doing things for the wrong reasons. I see how absolutely easy many of the problems I’m working on are, and use my irritation at people having thus failed to solve them to push me to work harder. This probably isn’t a very healthy mindset to have in the long term, and there are obvious problems with it, but it feels very effective to get me to push past schleps.
Following Swerve’s example above, I’ve also decided to try out your exercise and post my results. My favorite instrumental rationality technique is Oliver Habryka’s Fermi Modeling. The way I usually explain it (with profuse apologies to Habryka for possibly butchering the technique) is that you quickly generate models of the problem using various frameworks and from various perspectives, then weighting the conclusions of those models based on how closely they seem to conform to reality. (@habryka, please correct me if this is not what Fermi Modeling is.)
For your exercise, I’ll try to come up with variants/applications of Fermi modeling that are useful in other contexts.
Instead of using different perspectives or frameworks, take one framework and vary the inputs, then weight the conclusions drawn by how likely the inputs are, as well as how consistent they are with the data.
Likewise, instead of checking one story on either side when engaged in Pyrrhonian skepticism, tell a bunch of stories that are consistent with either side, then weight them by how likely the stories are.
To test what your mental model actually says, try varying parts of the model inputs/outputs randomly and see which combinations fit well/horribly with your model.
When working in domains where you have detailed mental simulations (for example dealing with people you’re very familiar with, or for simple manual tasks such as picking up a glass of water), instead of using the inner sim technique once with the most likely/most available set of starting conditions, do as many simulations as possible and weight them based on how likely the starting conditions are.
When doing reference class forecasting, vary the reference class used to test for model robustness.
Instead of answering with a gut feeling directly for a probability judgment for a given thing, try to imagine different possibilities under which the thing happens or doesn’t happen, and then vary the specific scenarios (then simulate them in your head) to see how robust each possibility is. Come up with your probability judgment after consulting the result of these robustness checks.
When I am developing and testing (relatively easy to communicate) rationality techniques in the future, I will try to vary the technique in different ways when presenting them to people, and see how robust the technique is to different forms of noise.
I should do more mental simulations to calibrate myself on how good the actions I didn’t take were, instead of just relying on my gut feeling/how good other people who took those actions seem to be doing.
Instead of using different perspectives or frameworks, I could do Fermi modeling with different instrumental rationality techniques when approaching a difficult problem. I would quickly go through my list of instrumental rationality techniques, then weight the suggestions made by each of them based on how applicable the technique is to the specific problem I’m stuck on.
Recently, I’ve been reading a lot of biographies/auto-biographies from great scientists in the 20th century, for example Feynman and James Watson. When encountering a novel scientific problem, instead of only thinking about what the most recently read-about scientist would say, I should keep a list of scientists whose thought processes have been inspirational to me, and try to imagine what each of the scientists would do, weighting them by how applicable (my mental model of) their experiences are to the specific problem.
I guess Fermi modeling isn’t so much a single hammer, as much as the “hammer” of the nail mindset. So some of the applications or variants I generated above seem to be ways of applying more hammers to a fixed nail, instead of applying the same fixed hammer to different nails.