CS student, blogging and editing at https://www.thinkingmuchbetter.com/. PM me your fluid-g-increasing ideas
NicholasKross
The Crasche hat (at least the version I got, medium of the normal version) was only partly-assembled.
One thing is that the most-clear knowledge representations, vary by field/task. Sometimes data is what you need, sometimes a math proof (which itself can vary from more prosey to more symbol-manipulation-based).
A Quick List of Some Problems in AI Alignment As A Field
(26) I think by “a plan”, Yudkowsky partially means “a default paradigm and relevant concrete problems”. There’s no consensus on the first one, and Yudkowsky would disagree on the second one (since he thinks most current concrete problems are irrelevant to the core/eventual problem).
Regarding disagreement (7): I’d like to see more people using AI to try and make useful contributions to alignment.
More broadly, I think the space of alignment working methods, literally the techniques researchers would use day-to-day, has been under-explored.
If the fate of the world is at stake, shouldn’t someone at least try hokey idea-generation techniques lifted from corporations? Idea-combinations generators? Wacky proof-helper softwares? Weird physical-office setups like that 10-chambered linear room thing I saw somewhere but can’t find now? I don’t expect these to help a ton, and I expect high degrees of failure, but I also expect surviving worlds to have tried them already and maybe written up (short, low-effort, low-conscientiousness, low-mental-cycles) notes on how well they worked.
Disagreement (4): I think Yudkowsky maybe expects AGI to recursively self-improve on the way to becoming human-level.
Regarding disagreement (2), I think many of Yudkowsky’s “doom stories” are more intuition pumps / minimum bounds for demonstrating properties of superintelligence.
E.g. nanotech isn’t there because he necessarily thinks it’s what an unaligned AGI would do. Instead, it’s to demonstrate how high the relative tech capabilities are of the AGI.
His point (which he stresses in different ways), is “don’t look at the surface details of the story, look instead at the implied capabilities of the system”.
Similar with “imagine it self-improving in minutes”. It may or may not happen that way specifically, but the point is “computers work on such short timescales, and recursion rates can compound quickly enough, that we should expect some parts of the process to be faster than expected, including maybe the FOOM”.
It’s not supposed to be a self-contained cinematic universe, it’s supposed to be “we have little/no reason to expect it to not be at least this weird”, according to his background assumptions (which he almost always goes into more detail on anyway).
Solving a scientific problem without being able to learn from experiments and failures is incredibly hard.
I wonder what, if any, scientific/theoretical problems have been solved right “on the first try” in human history. I know MIRI and others have done studies of history to find examples of e.g. technological discontinuities. Perhaps a study could be made of this?
An example Yudkowsky brings up in the Sequences often, is Einstein’s discovery of General Relativity. I think this is informative and helpful for alignment. Einstein did lots of thought experiments, and careful reasoning, to the point where his theory basically “came out” right, in time for experiments to prove it so.
More generally, I think Yudkowsky analogizes AI safety to physics, and it seems similar: combination of careful theory and expensive/dangerous experiments, the high intellectual barriers to entry, the need for hardcore conceptual and mathematical “engineering” to even think about the relevant things, the counterintuitive properties, etc.
TLDR write, self-critique, and formalize more thought experiments. This could help a lot with getting alignment theoretically right sooner (which helps regardless of how critical the “first” experimental attempt turns out to be).
Large companies that do send rejection emails (e.g. Google) keep them short and blunt. Y Combinator iirc had a message (or link?) basically saying “our rejection likely doesn’t mean anything wrong with you or your idea, there’s just a ton of applicants”. If rejection emails are being personalized and it’s taking a while, then whatever process is used for them sounds like a bad process (even if the process is “just” somebody spending lots of mental cycles doing it). Unfortunately, this moves my mental model of MIRI another micron towards “an organization of procrastination / prioritization issues / ???”.
This makes me wonder if operations really is the big bottleneck with EA/AIS/MIRI in particular. Anyone good at operations, would not have let this situation happen. Either it would’ve ended quicker (“Sorry, we’re not hiring right now, try again later.”) or ended differently (“You’re hired”).
Can confirm, Eliezer’s recent posts (especially “AGI Ruin”) have kinda “woken me up” to the urgency of the problem, and how far behind we are relative to where we should (could?) be.
I would like to read this very much, as I want to go into technical AI alignment work and such a document would be very helpful.
Ah, yeah probably makes sense. (And to be fair, I didn’t guess many LWers were considering taking criminal-case-level risks).
And don’t forget: all this article + discussion, seems to only apply to civil law. In criminal law, you go to jail, which does pose lots of risks.
Nitpick: Trump himself has often had his own high-powered legal teams to fight things (including a key lawyer with a long Wikipedia article). So he probably “handled” legal risk in ways the average person would have a hard time replicating. (E.g. won’t go broke form missing work while he’s being sued, it’s just a background task for his team to handle --> therefore, he can outlast the plaintiffs and then settle.)
Good catch on the natural-vs-man-made accidental bait-and-switch in the common argument. This post changed my mind to think that, at least for scaling-heavy AI (and, uh, any disaster that leaves the government standing), regulation could totally help the overall situation.
Not quite a large nice house, but the individual-level MVP version of this: https://everythingtosaveit.how/slackmobiles/
If it seems bizarre to think of an entity nobody can see ruling a country, keep in mind that there is a grand tradition of dictators – most famously Stalin – who out of paranoia retreated to some secret hideaway and ruled their country through correspondence. The AI would be little different. (Directly quoted from here). (Policymakers)
Leading up to the first nuclear weapons test, the Trinity event in July 1945, multiple physicists in the Manhattan Project thought the single explosion would destroy the world. Edward Teller, Arthur Compton, and J. Robert Oppenheimer all had concerns that the nuclear chain reaction could ignite Earth’s atmosphere in an instant. Yet, despite disagreement and uncertainty over their calculations, they detonated the device anyway. If the world’s experts in a field can be uncertain about causing human extinction with their work, and still continue doing it, what safeguards are we missing for today’s emerging technologies? Could we be sleepwalking into catastrophe with bioengineering, or perhaps artificial intelligence? (Based on info from here). (Policymakers)
I heard of (and worked through some of) the AGISF, but haven’t heard of SERI MATS. Scaling these up would likely work well.