Assuming slower and more gradual timelines, isn’t it likely that we run into some smaller, more manageable AI catastrophes before “everybody falls over dead” due to the first ASI going rogue? Maybe we’ll be at a state of sub-human level AGIs for a while, and during that time some of the AIs clearly demonstrate misaligned behavior leading to casualties (and general insights into what is going wrong), in turn leading to a shift in public perception. Of course it might still be unlikely that the whole globe at that point stops improving AIs and/or solves alignment in time, but it would at least push awareness and incentives somewhat into the right direction.
silentbob
Isn’t it conceivable that improving intelligence turns out to become difficult more quickly than the AI is scaling? E.g. couldn’t it be that somewhere around human level intelligence, improving intelligence by every marginal percent becomes twice as difficult as the previous percent? I admit that doesn’t sound very likely, but if that was the case, then even a self-improving AI would potentially improve itself very slowly, and maybe even sub-linear rather than exponentially, wouldn’t it?
First person (row 2) partially sounds a lot like GPT3. Particularly their answers “But in the scheme of things, changing your mind says more good things about your personality than it does bad. It shows you have a sense of awareness and curiosity, and that you can admit and reflect when decisions have been flawed or mistakes have been made.” and “A hero is defined by his or her choices and actions, not by chance or circumstances that arise. A hero can be brave and willing to sacrifice his or her life, but I think we all have a hero in us — someone who is unselfish and without want of reward, who is determined to help others”. Then however there’s “SAVE THE AMOUNT” and “CORONA COVID-19″. This person is confusing.
The mug is gone. Please provide mug again if possible.
Missing Mental Models
What are some important insights you would give to a younger version of yourself?
I found the concept interesting and enjoyed reading the post. Thanks for sharing!
Sidenote: It seems either your website is offline (blog’s still there though) or the contact link from your blog is broken. Leads to a 404.
Thanks a lot for your comment! I think you’re absolutely right with most points and I didn’t do the best possible job of covering these things in the post, partially due to wanting to keeping things somewhat simplistic and partially due to lack of full awareness of these issues. The conflict between the point of easy progress and short-sightedness is most likely quite real and it seems indeed unlikely that once such a point is reached there will be no setbacks whatsoever. And having such an optimistic expectation would certainly be detrimental. In the end the point of easy progress is an ideal to strive for when planning, but not an anspiration to fully maintain at all times.
Regarding willpower I agree that challenge is an important factor, and my idea was not so much that tasks themselves should become trivially easy, but that working on them becomes easy in the way that they excite you. Again that’s something I could have more clear in the text.
but you need to encounter this uphill part where things become disorienting, frustrating and difficult to solve more complex problems in order to progress in your knowledge
I’m not so sure about this. I like to think there must be ways to learn things, even maths, that entail a vast majority of positive experiences for the person learning it. This might certainly involve a degree of confusion, but maybe in the form of surprise and curiosity and not necessarily frustration. That being said, 1) I might be wrong in my assumption that such a thing is realistically possible and 2) this is not at all the experience most people are actually having when expanding their skills, so it is certainly important to be able to deal well with frustration and disorientation. Still, it makes a lot of sense to me to reduce these negative experiences wherever possible, unless you think that such negative experiences themselves have some inherent value and can’t be replaced.
The Point of Easy Progress
Very interesting concept, thanks for sharing!
Update a year later, in case anybody else is similarly into numbers: that prediction of achieving 2.5 out of the 3 major quarter goals ended up being correct (one goal wasn’t technically achieved due to outside factors I hadn’t initially anticipated, but I had done my part, thus the .5), and I’ve been using a murphyjitsu-like approach for my quarterly goals ever since which I find super helpful. In the three quarters before Hammertime, I achieved 59%, 38% and 47% respectively of such goals. In the quarters since the numbers were (in chronological order, starting with the Hammertime quarter) 59%, 82%, 61%, 65%, 65%, ~82%. While total number and difficulty of goals vary, I believe the average difficulty hasn’t changed much whereas the total number has increased somewhat over time. That being said, I also visited a CFAR workshop shortly after going through Hammertime, so that too surely had some notable effect on the positive development.
My bug list has grown to 316 as of today, ~159 of which are solved, following a roughly linear pattern over time so far.
Where I find Murphyjitsu most useful is in the area of generic little issues with my plans that tend to come up rather often. A few examples:
forgetting about working on the goal in time, due to lack of a reminder, planning fallacy etc.
the plan involving asking another person for a favor, and me not feeling too comfortable about asking
my system 1 not being convinced of the goal, requiring more motivation / accountability / pressure
the plan at some point (usually early on) requiring me to find an answer to some question, such that the remaining plan depends on that answer; my dislike for uncertainty ironically often causes me then to just flinch away from that whole plan as opposed to just trying to find that one important answer
It’s arguably more of a checklist than real “by the book Murphyjitsu”, but still, taking a goal and going through these things, trying to figure out the most trivial and easy to fix issues with the plan, often allows me to increase the likelihood of achieving a goal by 10-20% with just a few minutes of work.
I’ve mostly been aware of the planning fallacy and how despite knowing of it for many years it still often affects me (mostly for things where I simply lack the awareness of realizing that the planning fallacy would play a role at all; so not so much for big projects, but rather for things that I never really focus on explicitly, such as overhead when getting somewhere). The second category you mention however is something I too experience frequently, but having lacked a term (/model) for it, I didn’t really think about it as a thing.
I wonder what classes of problems typically fall into the different categories. At first I thought it may simply depend on whether I feel positive or negative about a task (positive → overly optimistic → planning fallacy; negative → pessimistic → vortex of dread), but the “overhead when getting somewhere” example doesn’t really fit the theory, and also one typical example for the planning fallacy is students having to hand in an assignment by a certain date, which usually is more on the negative side. But I guess the resolution to this is simply that the vortex of dread is not different from the planning fallacy, but a frequent cause of it.
we tend to overestimate how long things take that we feel negative about → vortex of dread
this causes us to procrastinate it more than we otherwise would → planning fallacy (so the “net time” is lower than anticipated, but the total time until completion is longer than anticipated)
Which leaves me with three scenarios:
positive things → planning fallacy due to optimism
negative things → vortex of dread → planning fallacy due to procrastination
trivial things I fail to explicitly think about → planning fallacy due to ignorance/negligence
And thus there may be different approaches to solving each of them, such as
pre-mortem / murphyjitsu, outside view
knowing about the vortex of dread concept, yoda timers, scheduling, intentionality
TAPs I guess?
I’d probably put it this way – the Sunk Cost Fallacy is Mostly Bad, but motivated reasoning may lead to frequent false positive detections of it when it’s not actually relevant. There are two broad categories where sunk cost considerations come into play, a) cases where aborting a project feels really aversive because so much has gone into it already, and b) cases where on some level you really want to abort a project, e.g. because the fun part is over or your motivation has decreased over time. In type a cases, knowing about the fallacy is really useful. In type b cases, knowing about the fallacy is potentially harmful because it’s yet another instrument to rationalize quitting an actually worthwhile project.
You can use a hammer to drive nails into walls, or you can use a hammer to hurt people. The sunk cost fallacy may be a “tool” with higher than usual risk of hurting yourself. This is probably a very blurry/grayscale distinction that varies a lot between individuals however, and not a clear cut one about this particular tool being bad. But I definitely agree it makes a lot of sense to talk about the drawbacks of that particular concept as there is an unusually clear failure mode involved (as described in the post).
“When in doubt, go meta”. Thanks to my friend Nadia for quoting it often enough for it to have found a place deep within my brain. May not be the perfect mantra, but it is something that occurs to me frequently and almost always seems yet again unexpectedly useful.
It’s not that easy to come up with strange bugfix stories (or even noteworthy bugfix stories in general).
One that’s still in progress is that I’ve been using gamification to improve my posture. I simply count the occurrences throughout the day when I remember to sit/stand straight, and track them, summing them up over time to reach certain milestones, in combination with a randomized reward system. While I wasn’t too convinced in this attempt at first, it happens more and more often that I remember to sit up straight and realize I already do so, which is a good sign I guess.
Quite a few bugs could be solved for me using spreadsheets. Tracking stuff and seeing graphs of development over time often provides just enough motivation for me to stick to new habits.
The bug of occasionally watching too many youtube videos has been partially resolved by me initially being too lazy to connect speakers to my computer. I’m now more or less deliberately keeping it that way just for that advantage.
Going through Hammertime for the second time now. I tried to figure out 10 not too usual ways in which to utilize predictions and forecasting. Not perfectly happy with the list of course, but a few of these ideas do seem (and to my experience actually are; 1 and 2 in particular) quite useful.
Predicting own future actions to calibrate on one’s own behavior
When setting goals, using predictions on the probability of achieving them by a certain date, giving oneself pointers which goals/plans need more refinement
Predicting the same relatively long term things independently at different points in time (without seeing earlier predictions), figuring out how much noise one’s predictions entail (by then comparing many predictions of the same thing, which, if no major updates shifted the odds, should stay about the same (or consistently go up/down slightly over time due to getting closer to the deadline))
Predicting how other people react to concrete events or news, deliberately updating one’s mental model of them in the process
Teaming up with others, meta-predicting whether a particular set of predictions by the other person(s) will end up being about right, over- or underconfident
When buying groceries (or whatever), before you’re done, make a quick intuitive prediction of what the total price will be
When buying fresh fruit or vegetables or anything else that’s likely to spoil, make predictions how likely it is you’re going to eat each of these items in time
Frequently, e.g. yearly, make predictions about your life circumstances in x years, and evaluate them in the future to figure out whether you tend to over- or underestimate how much your life changes over time
Before doing something aversive, make a concrete prediction of how (negative) you’ll experience it, to figure out whether your aversions tend to be overblown
Experiment with intuitive “5 second predictions” vs those supported by more complex models (Fermi estimate, guesstimate etc.) and figure out which level of effort (in any given domain) works best for you; or to frame it differently, figure out whether your system 1 or system 2 is the better forecaster, and maybe look for ways in which both can work together productively
One game/activity I generally recommend because of its potential 11⁄10 fun payoff in the end, which also works in relative isolation, is having fun with gap texts (just figured out this is apparently known as “mad libs”, so maybe this isn’t actually new to anybody). The idea being that one person creates a small story with many words left out, and then asks other people to fill in the words without knowing the context. So “Bob scratched his <bodypart> and <verb> insecurely. ‘You know’, he said <adverb>, ’when I was a(n) <adjective> boy, I always wanted to become a(n) <noun>, but I couldn’t, because my <bodypart> is/are too <adjective>” might be part of such a story. You pick these gaps in random order and query people for the thing you need (or if you’re alone, you can even do this yourself given you manage to hide the context from yourself). Afterwards you read the story out loud to everybody involved.
I’ve done this a few times both with groups and just with my girlfriend and it never disappointed. Usually takes some time to write and fill in the story, but I think it’s very much worth it. Also, this gets funnier with experience as you figure out both what kind of text works best (involving body parts is certainly a great idea) and also what kinds of words to fill the gaps with (e.g. very visual ones, or such with certain connotations).
As mentioned in the final exam, here’s my personal summary of how I experienced hammertime.
I feel like following the sequence was a very good use of my time, even though it turned out a bit different from what I had initially expected. I thought it would focus much more on “hammering in” the techniques (even after reading Hammers & Nails and realizing the metaphor worked in a different way), but it was more about trying everything out rather briefly, as well as some degree of obtaining new perspectives on things. This was fine, too, but I still feel like I haven’t got a real idea about whether things such as goal/aversion factoring, mantras, internal double crux, focusing or timeless decision making actually do anything for me. I applied some of the techniques once, but it didn’t really lead to any tangible results. I may have done them incorrectly, or I may need to practice more, or maybe they just don’t work for me, and it’s now up to me to figure this out in detail.
I derived a lot of value from creating and frequently updating the bug list. TAPs are a neat concept, and those that work are really helpful, but many fail for me. Maybe I’ll get a better feeling for which triggers work for me so I can tell beforehand instead of going through days and weeks of consistent failure with a trigger. Design surely works, but I’ve got some aversions to applying it which I’ll have to unravel. CoZE is something I never really doubted, and I like the new framing of basically just becoming the kind of person who’s open to new things, as opposed to forcing oneself to do scary things. I’ve been following the “all else being equal expanding my comfort zone is good” heuristic for a few years already, and will continue to do so, as my natural instinct otherwise is usually to exploit rather than to explore.
Yoda timers/resolve cycles and murphyjitsu probably had the greatest effect on me. On day 10 I murphyjitsued three of my major quarter goals and increased my expected value of how many of them I’d achieve from 1.24 to 2.08 in the process. At the time these were merely predictions and would be worthless if they were not correlated with reality—now however I can say that I’m on track to reach 2.5/3, and I’m highly confident that murphyjitsu made a huge counterfactual difference and I hadn’t simply been underconfident before.
I followed the sequence with a group of other people, sharing our progress in a slack channel, which kept up my motivation and probably made it a lot more interesting than “merely” following a year old sequence on my own, so that’s certainly something I recommend to others who are interested in giving it a try.
To provide some numbers, I’ve identified 175 bugs by now, 35 of which I consider solved, and around half of which I expect to solve within the next year, which isn’t overly ambitious but still in the order of “life-changing” if things work out, which sounds good enough for me.
So, overall: Thanks a lot alkjash!
In The Rationalists’ Guide to the Galaxy the author discusses the case of a chess game, and particularly when a strong chess player faces a much weaker one. In that case it’s very easy to make the prediction that the strong player will win with near certainty, even if you have no way to predict the intermediate steps. So there certainly are domains where (some) predictions are easy despite the world’s complexity.
My personal rather uninformed take on the AI discussion is that many of the arguments are indeed comparable in a way to the chess example, so the predictions seem convincing despite the complexity involved. But even then they are based on certain assumptions about how AGI will work (e.g. that it will be some kind of optimization process with a value function), and I find these assumptions pretty intransparent. When hearing confident claims about AGI killing humanity, then even if the arguments make sense, “model uncertainty” comes to mind. But it’s hard to argue about that since it is unclear (to me) what the “model” actually is and how things could turn out different.