LessWrong team member / moderator. I’ve been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I’ve been interested in improving my own epistemic standards and helping others to do so as well.
Raemon
Hadn’t heard of it. Will take a look. Curious if you have any tips for getting over the initial hump of grokking it’s workflow.
Your process description sounds right (like, the thing I would aspire to, although I don’t consistently do it – in particular, I’ve identified it’d be good if I did more automated testing, but haven’t built that into my flow yet).
But, you don’t really spell out the “and, here’s why I’m pretty confident this is a 5x improvement.”
A few months ago I’d have been more open to just buying the “well, I seem to be shipping a lot of complex stuff”, but, after the METR “turns out a lot of devs in our study were wrong and were actually slowed down, not sped up”, it seems worth being more skeptical about it.
What are the observations that lead you to think you’re 5x? (also, 5x is a somewhat specific number, do you mean more like ‘observations suggest it’s specifically around 5x’ or more like ‘it seems like a significant speedup, but I can tell I’m still worse than the ’10x’ programmers around me, and, idk, eyeballing it as in the middle?)
(I don’t mean this to be like, super critical or judgmental, just want to get a sense of the state of your evidence)
I’m weakly betting this has more to do with the genre or style you presented as.
I talked to my mom about it, and I’m not sure what she ended up exactly believing but like jimmy, it went pretty different, I think she ended up something like “not 100% sure what to believe but I believe my son believes and it seems at least reasonable.”
I think my dad ended up believing something like “I don’t really buy everything my son is saying” (more actively skeptical than my mom), but probably something like “there’s something real here, even if I think my son is wrong about some things.”
(In both cases I wasn’t trying to persuade them, so much as say ‘hey, I am your son and this is what’s real for me these days, and, I want you to know that’).
When I talked to my aunt, and cousin, I basically showed them the cover of “If Anyone Builds It”, and said “people right now are trying to build AI that is smarter than humans, and it seems like it’s working. This book is arguing that if they succeeds, it would end up killing everyone, for pretty similar reasons to why the last time something ended up smarter than the rest of the ecosystem (humans) it caused a lot of extinctions – we just didn’t care that much about other animals and steamroll over things.”
And my aunt and cousin were both just like “oh, huh. Yeah, that makes sense. That, uh, seems really worrying. I am worried now.”
I think leaning on the “humans have caused a lot of extinction, because we are smarter than the rest of the ecosystem and don’t really care about most species” is pretty straightforward with left-leaning types. I haven’t tried it with more right-leaning types.
I think a lot of people can just sorta sense “man, something is going on with AI that is kinda crazy and scary.”
I think it’s only with nerds that it makes sense to get into a lot of the argument depth. I think people have a (correct) immune reaction to things that sound like complicated arguments. But I think the basic argument for AI x-risk is pretty simple, and it’s only when people are sophisticated enough to have complicated objections that it’s particularly useful to get into the deeper arguments.
(Wherein I’d start with “okay, so, yeah there are a lot of reasonable objections, the core argument is pretty simple, and I think there are pretty good counterarguments to the objections I’ve heard. But, if you want to really get into it, it’ll get complicated, but, I’m down to get into the details if you want to talk through them”)
Interestingly, yesterday I got into a triggered argument, and was chanting to myself “grant me the courage to walk away from dumb arguments and the strength to dominate people at arguments when I am right and it’s important and the wisdom to know the difference....”
...and then realized that basically the problem was that, with my current context window, it was pretty hard to think about anything other than this argument, but if I just filled up my context window with other stuff probably I’d just stop caring.Which was a surprisingly practical takeaway from this post.
I think he has some goals/relational-stances-he-wants-to-convey that aren’t covered by those specific suggestions (some kind of explicit distancing/conscientious-objection), but, I do suspect there is something in this direction that would be more accurate and accomplish his goals.
Towards commercially useful interpretability
I’ve lately been frustrated with Suno (AI music) and Midjourney, where I get something that has some nice vibes I want, but, then, it’s wrong in some way.
Generally, the way these have improved has been via getting better prompting, presumably via straightforwardish training.
Recently, I was finding myself wishing I could get Suno to copy a vibe from one song (which had wrong melodies but correct atmosphere) into a cover of another song with the correct melodies. I found myself wishing for some combination of interpretability/activation steering/something.
Like, go to a particular range of a song, and then have some auto-intepretability-tools pop out major features like “violin” “the key” “vocals style”, etc, and then let me somehow insert that into another song.
I’m not sure of the tractability of getting this to be useful enough to be worthwhile. But, if you got over some minimum hump of “it’s at least a useful tool to have in combination with normal prompting”, you might be able to get into a flywheel of “now, it’s easier to funnel commercial dollars into interpretability research and there’s a feedbackloop of ’was it actually useful?”.
And, doing it for visual or musical art would add a few steps between “improved interpretability” and “directly accelerating capabilities.”
This is presumably difficult because:
Interpretability just isn’t quite there yet, and, you need a real good team to get it there and also continue making progress
Converting it into a product is a lot of work and skill – you need a particularly good UI and product design team.
This came up because I noticed the recent Suno Studio app was actually quite good, UI wise. (They had some nice little innovations over the usual sound-mixing-app suite), and I bet the Suno could figure out how to get from Chris Olah-like interfaces to something actually usable.
I don’t think it even has to be that useful in order to become commercially interesting – it could start as a fun toy that’s mostly delightful in how it produces weird shit that doesn’t quite make sense.
(Edit: “Papered over” from my perspective, obviously like “trying to reason carefully about the constants of the situation” from your perspective.)
I think it’s totally fair to characterize it as papering over some stuff. But, the thing I would say in contrast is not exactly “reasoning about the constants”, it’s “noticing the most important parts of the problem, and not losing track of them.”
I think it’s a legit critique of the Yudkowsian paradigm that it doesn’t have that much to say about the the nuances of the transition period, or what are some of the different major ways things might play out. But, I think it’s actively a strength of the paradigm to remind you “don’t get too bogged down moving deck chairs around based on the details of how things will play out, keep your eye on the ball on the actual biggest most strategically relevant questions.”
(btw, you you mentioned reading some other LW reviews, and I wanted to check if you’re read my post which argues some of this at more length)
People do foolishly start wars and the AI might too, we might get warning shots. (See my response to 1a3orn about how that doesn’t change the fact that we only get one try on building safe AGI-powerful-enough-to-confidently-outmaneuver-humanity)
A meta-thing I want to note here:
There are several different arguments here, each about different things. The different things do add up to an overall picture of what seems likely.
I think part of what makes this whole thing hard to think about, is, you really do need to track all the separate arguments and what they imply, and remember that if one argument is overturned, that might change a piece of the picture but not (necessarily) the rest of it.
There might be human-level AI that does normal wars for foolish reasons. And that might get us a warning shot, and that might get us more political will.
But, that’s a different argument than “there is an important difference between an AI smart enough to launch a war, and an AI that is smart enough to confidently outmaneuver all of humanity, and we only get one try to align the second thing.”
I you believe “there’ll probably be warning shots”, that’s an argument against “someone will get to build It”, but not an argument against “if someone built It, everyone would die.” (where “it” specifically means “an AI smart enough to confidently outmaneuver all humanity, built by methods similar to today where they are ‘organically grown’ in hard to predict ways”).
And, if we get a warning shot, we do get to learn from that which will inform some more safeguards and alignment strategies. Which might improve our ability to predict how an AI would grow up. But, that still doesn’t change the “at some point, you’re dealing with a qualitatively different thing that will make different choices.”
Yep I’m totally open to “yep, we might get warning shots”, and that there are lots of ways to handle and learn from various levels of early warning shots. It just doesn’t resolve the “but then you do eventually need to contend with an overwhelming superintelligence, and once you’ve hit that point, if it turns out you missed anything, you won’t get a second shot.”
It feels like this is unsatisfying to you but I don’t know why.
Thanks for writing this up! It was nice to get an outside perspective.
“Why no in-between?”
Why should we think that there is no “in between” period where AI is powerful enough that it might be able to kill us and weak enough that we might win the fight?
Part of the point here is, sure, there’d totally be a period where the AI might be able to kill us but we might win. But, in those cases, it’s most likely better for the AI to wait, and it will know that it’s better to wait, until it gets more powerful.
(A counterargument here is “an AI might want to launch a pre-emptive strike before other more powerful AIs show up”, which could happen. But, if we win that war, we’re still left with “the sort of tools that can constrain a near-human superintelligence, would not obviously apply to a much smarter AI”, and we still have to solve the same problems.)
Curated. Often, when someone proposes “a typology” for something, it feels a bit, like, okay, you could typologize it that way but does that actually help?
But, I felt like this carving was fairly natural, and seemed to be trying to be exhaustive, and even if it missed some things it seemed like a reasonable framework to fit more possible-causes into.
I felt like I learned things thinking about each plausible way that CoT might evolve. (i.e. thinking about what laws-of-language might affect LLMs naturally improving the efficiency of the CoT for problem solving, how we might tell the difference between meaningless spandrels and sort-of-meaningful filler words).
I’d like it if you just taboo’d all abstractions and value judgments and just describe the physical situations and consequences you expect to see in the world.
This still feels like someone who has some kind of opinion about an abstract level that I still haven’t been persuaded is a useful abstraction. We can tap out of the convo here, but, like, the last few rounds felt like you were repeating the same things without actually engaging with my cruxes.
(like, when you talking about “assuming you have control...” can you describe that the sort of naturalist way Logan Strohl would probably describe it, and like what predictions you make about what physically will happen to people doing that thing vs other nearby things?)
Okay yeah those are all posts that won Best of LessWrong. We generate like 8 AI descriptions, and then a LW teammate goes through, picks the best starting one, and then fine-tunes it to create the spotlights you see at the top of the page. (Sometimes this involves mostly rewriting it, sometimes we end up mostly sticking with the existing one).
I’m honestly not really happy with describing the author in the third person in spotlight either, I think we should just try to find a different way of accomplishing the goal there (which I think is to avoid “I” speak which also feels jarring in the summaries)
yep that’s correct
“The autopsy of Jane Doe” is decent rationalist horror. It is worth watching without spoilers, but, here is my review of it anyway.
(A reason I am so hardcore about spoilers is that I find little subtle delight in things like “figure out what kind of movie this even is.” The opening scenes do a good job of mood setting and giving you a slow drip of bits of what-kind-of-horror-movie this is. Here is your last saving throw for maybe watching the movie)
...
...
In some sense it’s kinda like a “horror Doctor House episode”.
The core thread (really the only thread) is about an old coroner and his son who is also a corner in the old-family-coroner-business.
The police deliver a mysterious corpse of a woman who has no external skin damage and confusing combinations of “symptoms” (I’m not sure what you call it when they’re already dead).
Early on, the son is jumping to conclusions, and the dad is like “boy, we have not even finished looking at the *external* evidence let alone opened her up, chill out on the conclusions until we have all the evidence.”
They start their investigation. It gets weirder.
Horror-y stuff eventually starts happening, and I think they do a decent job of having the characters update towards “something really fucking weird is happening” at a reasonable pace, given reasonable reasoning and priors.
...
Here is your second saving throw for watching the rest of the movie. *another* reason I like this movie unspoiled is that, it it’s own fucked up way, it’s doing at least a decent job of cleaving to Dr. House Mystery Format where evidence is coming in, and even as a genre-savvy viewer there are degrees of freedom of what sort of thing is going on and I found it fun to try to figure it out.
(I’m mostly not going to spoil that here because it’s not a very important part of the movie review, in some sense, but, will get into some final details)
...
...
...
Eventually Horror Shit starts to go down, and the characters are freaking out dealing with a bunch of horror stuff.
But, my favorite part of the movie is when it commits to the bit, where the the characters say “okay, we still haven’t even finished the autopsy. We still don’t know what’s going on. We autopsied our way into this situation and by damn we are going to autopsy our way out of it.”
It was a nice commitment to both the specific “this is an autopsy horror drama” and the somewhat more general “this is rationalist horror, where figuring out what’s going on, and figuring out how to deal with the fact that what’s going on is *confusing* and doesn’t fit any of your original frames.
Also, when they started piecing the final bits together, there was an obvious conclusion to reach that I felt annoyed by because it was wrong (based on real world knowledge), and then characters were like “no, that conclusion is wrong based on our real world knowledge, a _different_ thing must be going on instead.”
My main complaint is that, after all his cautioning the son about jumping to conclusions, the dad… does immediately fixate on the first hypothesis that seems to fit the data at all, and double down on it… but, okay, I will cop to that being a fair take on rationalist horror too. :/
The way I happened to go about building it made it easier to build for posts, but, seems good to have it for both.
I think there’s something cool about having llm-assistance to help keep track of sprawling comment threads and not miss points.
Can you give specific examples?
One thing this might possibly be is that there is a secret field for “custom highlights” for a post that admins can manually create, which basically only I-in-particular have ever used (although I might have set it so that Best of LessWrong posts use the description from their spotlight item?)
Yeah it occurs to me reading this that, while I have used AI to code easy things faster, and sometimes code “hard things” at all (sometimes learning along the way), I haven’t used it to specifically try to “code kinda normally while reducing more tech debt along the way.” Will think on that.