The usual example here is memory control. The point of the higher-level languages is to abstract away the details of memory and registers, so there is no malloc/free equivalent when writing in them; for this purpose they use garbage collection.
Of course, eventually people found a need for addressing these kinds of problems, and so features to allow for it were added later. C reigns supreme in embedded applications because of the precise memory and I/O capabilities, but there is stuff for embedded Haskell and embedded LISP now. But note that in these sources they are talking about stuff like special compilers and strategies for keeping the automatic garbage collection from blowing everything up, whereas with C, you just mostly write regular C. Also interrupts.
I propose that the motivation for all of these projects is not to find the answer, but rather to build the intuitions of the project members. If you were to compare the effects on intuition of reading research vs. performing research, I strongly expect that performing research would be greater.
Because of this, I expect that a significant chunk of all the people who are working in any capacity on the AI risk problem will take a direct shot at similar projects themselves, even if they don’t write it up. I would also be surprised to find an org without any such people in it.
This seems highly plausible to me. It would be interesting to see something that more closely tracks different kinds of innovation—for example, how do hygiene and vaccinations (which stop dying) compare to domesticated crops and irrigation (which increase productivity directly) in terms of population growth?
Faster growth rates means more money means more AIs researching new technology means even faster growth rates, and so on to infinity.
My prior is that this will not require anywhere near human-level intelligence. I firmly expect this can be accomplished with the kind of AI we already possess, in tandem with certain imminent kinds of automation.
If 1 in 1,000,000 people is an irrepressible genius and produces a technological invention, then we should see as many technological inventions as there are millions of people.
Alternatively, each person has some chance of making such an invention, and the more persons there are the more chances for invention(s) happening.
If the question is ‘what is the model of innovation that justifies the assumption’ then I don’t know, but I would guess some variant of the Great Men theory of history. We might model it as an IQ distribution.
On the other-other hand, an example was staring me in the face that points more closely to your old intuitions: I just started reading The Structure and Interpretation of Classical Mechanics, which is the textbook used for classical mechanics at MIT. Of particular note is that the book uses Scheme, a LISP dialect, in order to enforce clarity and correctness of understanding of mechanics. The programming language is only covered in the appendix; they spend an hour or two on it in the course.
The goal here is to raise the standard of understanding the world to ‘can you explain it to the computer.’
Helen’s comments on the assumed superiority of China in gathering data about people brought to mind the recent disagreement between Rich Sutton and Max Welling.
Sutton recently argued that the bitter lesson of AI is methods which leverage compute the best are most effective. Welling responds by arguing the case that data is similarly important, particularly for domains which are not well defined.
This causes me to suspect that the US and China directions for AI research will significantly diverge in the medium term.
My intuition is strongly opposite yours of ten years ago.
For example, there are Domain Specific Languages, which are designed exactly for one problem domain.
C, the most widespread general-purpose programming language, does things that are extremely difficult or impossible in highly abstract languages like Haskell or LISP, which doesn’t seem to match the notion of all three being a helpful way to think about the world.
Most of what we wind up doing with programming languages is building software tools. We prefer programs to be written such that the thinking is clear and correct, but this seems to me motivated more by convenience than anything else, and it rarely turns out that way besides.
I would go as far as to say that the case of ‘our imperfect brains dealing with a complex world’ is in fact a series of specific sub-problems, and we build tools for solving them on that basis.
On the other hand, it feels like there is a large influence on programming languages that isn’t well captured by the tool-for-problem or crutch-for-psychology dichotomy: working with other people. Consider the object-oriented languages, like Java. For all that an object is a convenient way to represent the world, and for all that it is meant to provide abstractions like inheritance, what actually seems to have driven the popularity of object orientation is that it provides a way for the next programmer not to know exactly what is happening in the code, but instead to take the current crop of objects as given and then do whatever additional thing they need done.
Should we consider a group of people separated in time working on the same problem, an independent problem? Or should we consider that working with people-in-the-future is something we are psychologically bad at, and we need a better way to organize our thinking about it? While the former seems more reasonable to me, I don’t actually know the answer here. One way to tell might be if the people who wrote Java said specifically somewhere that they wanted a language that would make it easier for multiple people to write large programs together over time. Another way might be if everyone who learned Java chose it because they liked not having to worry that much about what the last guy did, so long as the objects work.
This is very old, but if I am eyeballing the timeline correctly we should be approaching the point where you are deciding whether to cut your losses or endorse the lessons. So if I may, how did it go?
There’s a post about this at the blog DoTheMath, which calculates we boil ourselves with waste heat in ~400 years, assuming GDP doubles every 100 years and per capita energy consumption increases at the same rate it has been for the previous ~400 years.
The usual economic retort is that the economy could look very different from the one we are used to, and decouple from energy consumption. But the assumption about waste heat is what is doing the work here, and we have recently developed thermal transistors. These transistors have been designed out of quantum objects. And it turns out we might be able to beat the Planck limit in the far field. Which is to say, we can build heat computers, and then waste heat could be converted into computation.
That doesn’t solve the problem of too much energy use being bad, but if waste heat is computation then we can hit peak (safe) output, stay there, and still add value.
Har! I thought that was just a titling convention we’d adopted. Oops!
The bit about bundling in person and online communities caused me to think of the Literature Review: Distributed Teams post.
It feels to me like the same trust and communication mechanisms from distributed teams stand a good chance of applying to distributed communities. I’m tempted to take the Literature Review article and go back through the Old LW postmortem post to see how well the predictions match up. From this post:
Longterm, this kills the engine by which intellectual growth happens. It’s what killed old LessWrong – all the interesting projects were happening in private, (usually in-person) spaces, and that meant that:
newcomers couldn’t latch onto them and learn about them incidentally
at least some important concepts didn’t enter the intellectual commons, where they could actually be critiqued or built upon
From the Distributed Teams post:
If you must have some team members not co-located, better to be entirely remote than leave them isolated. If most of the team is co-located, they will not do the things necessary to keep remote individuals in the loop.
I feel like modelling LessWrong as a Distributed Team with strange compensation might be a useful lens.
I agree strongly with the information symmetry argument, although I am less confident of the information volume one—this is because transaction costs apply to information too.
I particularly like it because information symmetry feels like an area which has gotten a negligible amount of attention compared to questions of supply and demand, and therefore is a good target for making a lot of progress relatively quickly for relatively little investment.
I notice a distinct lack of examples of successful large-scale discourse. It feels like this is one of many areas where we pulled a sort of legerdemain where we envision the thing we wish were true as the natural state of the world or the state of the past, and then bemoan its sabotage or decline.
I currently cannot think of a single instance of good public discourse ever. This causes me to think it isn’t sad that clever arguers are destroying good discourse; it is interesting that they are an obstacle to good discourse ever arising in the first place.
I’m happy to be wrong though. If we were to attack this problem by maximizing or duplicating the bright spots, what would they be?
Following on that:
“Mathematical Reasoning Abilities of Neural Models,” https://arxiv.org/pdf/1904.01557.pdf
They have proposed a procedurally-generated data set for testing whether a model is capable of the same types of mathematical reasoning as humans.
That is much better, but it raises a more specific question: here you described the loop as a property of the task; but then you also wrote
Hire people like me
long OODA loop
Which seems to mean you are the one with the long loop. I can easily imagine different people having different maximum loop-lengths, beyond which they are likely to fail. Am I correct in interpreting this to mean something like trying to ensure that the remote worker can handle the longest-loop task you have to give them?
Kahneman’s work does an unusually excellent job of coming to object-level recommendations. That seems to be what he is doing with his time now.
I don’t think using fast and slow would have hurt those ideas at all, what with it being the title of the book. Further, I’ve seen plenty of cases of trying to wrangle the dichotomy by piling on additional words like the elephant-or-rider conversation.
I also note we don’t talk about system 1 and system II much anymore. Looking at the Curated list for the last three months, I see plenty of posts that are aiming squarely at system 1 or system II, applying one system to the other, or describing one specific technique that could be called system I or II...but virtually none of the posts say anything about either of them or mention Kahneman. We’ve moved past the point where a binary distinction is useful to our discussions, and broad familiarity with the underlying concepts is assumed without the need for additional terms.
This suggests some combination of system I/II being easy to apply correctly, or the community being unusually good at applying it, or both.
It looks to me like how powerful a system is and how difficult it is to apply correctly are very different questions, and it feels like they are rarely balanced well. I think this is probably very difficult to do, and have seen people failing to apply systems correctly way more often than I have seen them succeed, which gives me a very low prior for unfamiliar systems’ utility.
Excellent work! I particularly like including your notes in the comments.
I have one question about OODA (I see long loops mentioned in the post, but without attribution; I don’t see them mentioned in the notes explicitly). Could you talk more about the long-loop conclusion, and how remote work benefits from it?
My naive guess is that the bandwidth issues associated with remote work cause feedback to take longer, which means longer OODA loops are a desirable trait in the worker, but my confidence is not particularly high.
I’m not questioning the qualifications of the source or the goodness of the concepts, just the method chosen to communicate them.
If the point of the system is to introduce an inferential gap on purpose, with the goal of leaving unintended associations behind, I can see the reasoning but disagree with it. I see a lot of attempts to do this, and a lot of discussion using such systems, and virtually nothing in the way of coming back down from the abstractions to object-level recommendations again.
This is likely the result of applying the system badly, but the ease with which a tool is misapplied an important factor in the goodness of the tool.
But if we assume other people also know the system, why would you have to explain a whole load of conceptual framework?
How is this superior to addressing the object-level concerns directly?
Edit: I definitely misread that last sentence. Ignore me!