LessWrong team member / moderator. I’ve been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I’ve been interested in improving my own epistemic standards and helping others to do so as well.
Raemon
main bottleneck to counterfactuality
I don’t think the social thing ranks above “be able to think useful important thoughts at all”. (But maybe otherwise agree with the rest of your model as an important thing to think about)
[edit: hrm, “for smart people with a strong technical background” might be doing most of the work here”]
It seems good for me to list my predictions here. I don’t feel very confident. I feel an overall sense of “I don’t really see why major conceptual breakthroughs are necessary.” (I agree we haven’t seen, like, an AI do something like “discover actually significant novel insights.”)
This doesn’t translate into me being confident in very short timelines, because the remaining engineering work (and “non-major” conceptual progress) might take a while, or require a commitment of resources that won’t materialize before a hype bubble pops.
But:
a) I don’t see why novel insights or agency wouldn’t eventually fall out of relatively straightforward pieces of:
“make better training sets” (and training-set generating processes)
“do RL training on a wide variety of tasks”
“find some algorithmic efficiency advances that, sure, require ‘conceptual advances’ from humans, but of a sort of straightforward kind that doesn’t seem like it requires deep genius?”
b) Even if A doesn’t work, I think “make AIs that are hyperspecialized at augmenting humans doing AI research” is pretty likely to work, and that + just a lot of money/attention generally going into the space seems to increase the likelihood of it hitting The Crucial AGI Insights (if they exist) in a brute-force-but-clever kinda way.
Assembling the kind of training sets (or, building the process that automatedly generates such sets) you’d need to do the RL seems annoyingly-hard but not genius-level hard.
I expect there to be a couple innovations that are roughly on the same level as “inventing attention” that improve efficiency a lot, but don’t require a deep understanding of intelligence.
One thing is I’m definitely able to spin up side projects that I just would not have been able to do before, because I can do them with my “tired brain.”
Some of them might turn out to be real projects, although it’s still early stage.
My current guess is:
1. This is more relevant for up-to-the first couple generations of “just barely superintelligent” AIs.
2. I don’t really expect it to be the deciding factor after many iterations of end-to-end RSI that gets you to the “able to generate novel scientific or engineering insights much faster than a human or institution could.”
I do think it’s plausible that the initial bias towards “evil/hackery AI” could start it off in a bad basin of attraction, but a) even if you completely avoided that, I would still basically expect this to rediscover this on it’s own as it gained superhuman levels of competence, b) one of the things I most want to use a slightly-superhuman AI to do is to robustly align massively superhuman AI, and I don’t really see how to do that without directly engaging with the knowledge of the failure modes there.
I think there are other plans that route more though “use STEM AI to build an uploader or bioenhancer, and then have an accelerated human-psyche do the technical philosophy necessary to handle the unbounded alignment case. I could see that being the right call, and I could imagine the bias from the “already knows about deceptive alignment etc” being large-magnitude enough to matter in the initial process. [edit: In those cases I’d probably want to filter out a lot more than just “unfriendly AI strategies”]But, basically, how this applies depends on what it is you’re trying to do with the AI, and what stage/flavor of AI you’re working with and how it’s helping.
Yep, thank you!
It’d be nice to have the key observations/evidence in the tl;dr here. I’m worried about this but would like to stay grounded in how bad it is exactly.
I think I became at least a little wiser reading this sentence. I know you’re mostly focused on other stuff but I think I’d benefit from some words connecting more of the dots.
I think the Gears Which Turn The World sequence, and Specializing in Problems We Don’t Understand, and some other scattered John posts I don’t remember as well, are a decent chunk of an answer.
Curated. I found this a clearer explanation of “how to think about bottlenecks, and things that are not-especially-bottlenecks-but-might-be-helpful” than I previously had.
Previously, I had thought about major bottlenecks, and I had some vague sense of “well, there definitely seems like there should be more ways to be helpful than just tackling central bottlenecks, but a lot of ways to do that misguidedly.” But I didn’t have any particular models for thinking about it, and I don’t think I could have explained it very well.
I think there are better ways of doing forward-chaining and backward-chaining than listed here (but which roughly correspond to “the one who thought about it a bit,” but with a bit more technique for getting traction).
I do think the question of “to what degree is your field shaped like ‘there’s a central bottleneck that is to a first approximation the only thing that matters here’?” is an important question that hasn’t really been argued for here. (I can’t recall offhand if John has previously written a post exactly doing that in those terms, although the Gears Which Turn the World sequence is at least looking at the same problemspace)
Update: In a slack I’m in, someone said:
A friend of mine who works at US AISI advised:
> “My sense is that relevant people are talking to relevant people (don’t know specifics about who/how/etc.) and it’s better if this is done in a carefully controlled manner.”And another person said:
Per the other thread, a bunch of attention on this from EA/xrisk coded people could easily be counterproductive, by making AISI stick out as a safety thing that should be killed
And while I don’t exactly wanna trust “the people behind the scenes have it handled”, I do think the failure mode here seems pretty real.
I guess I’m just kinda surprised “perspective” feels metaphorical to you – it seems like that’s exactly what it is.
(I think it’s a bit of a long clunky word so not obviously right here, but, still surprised about your take)
What would be less metaphorical than perspective that still captures the ‘one opinionated viewpoint?’ thing?
I called some congresspeople but honestly, I think we should have enough people-in-contact with Elon to say “c’mon man, please don’t do that?”. I’d guess that’s more likely to work than most other things?
The Trump Administration is on the verge of firing all ‘probationary’ employees in NIST, as they have done in many other places and departments, seemingly purely because they want to find people they can fire. But if you fire all the new employees and recently promoted employees (which is that ‘probationary’ means here) you end up firing quite a lot of the people who know about AI or give the government state capacity in AI.
This would gut not only America’s AISI, its primary source of a wide variety of forms of state capacity and the only way we can have insight into what is happening or test for safety on matters involving classified information. It would also gut our ability to do a wide variety of other things, such as reinvigorating American semiconductor manufacturing. It would be a massive own goal for the United States, on every level.
Please, it might already be too late, but do whatever you can to stop this from happening. Especially if you are not a typical AI safety advocate, helping raise the salience of this on Twitter could be useful here.
Do you (or anyone) have any gears as to who is the best person to contact here?
I’m slightly worried about making it salient on twitter because I think the pushback from people who do want them all fired might outweigh whatever good it does.
I’ve now worked with 3 Thinking Assistants, and there are a couple more I haven’t gotten to try out yet. So far I’ve been doing it with remote ones, who I share my screen with. If you would like to try them out I can DM you information and my sense of their various strengths.
The baseline benefit is just them asking “hey, are you working on what you mean to work on?” every 5 minutes. I think I a thing I should do but haven’t yet is have them be a bit more proactive in asking if I’ve switched tasks (because sometimes it’s hard to tell looking at my screen), and nagging me a bit harder about “is this the right thing?” if I’m either switching a lot, or doing one that seems at-odds with my stated goals for the day.
Sometimes I have them do various tasks that are easy to outsource, depending on their skills and what I need that day.
I have a google doc I have them read in advance that lays out my overall approach, and which includes a journal for myself I’m often taking notes in, and a journal for each assistant I work with for them to take notes. I think something-like-this is a good practice.
For reference, here’s my intro:
Intro
There’s a lot of stuff I want done. I’m experimenting with hiring a lot of assistants to help me do it.
My plans are very in-flux, so I prefer not to make major commitments, just hire people piecemeal to either do particular tasks for me, or sit with me and help me think when I’m having trouble focusing.
My working style is “We just dive right into it, usually with a couple hours where I’m testing to see if we work well together.” I explain things as we go. This can be a bit disorienting, but I’ve tried to write important things in this doc which you can read first. Over time I may give you more openended, autonomous tasks, if that ends up making sense.
Default norms
Say “checking in?” and if it’s a good time to check in I’ll say “ok” or “no.” If I don’t respond at all, wait 30-60 seconds and then ask again more forcefully (but still respect a “no”)
Paste in metastrategies from the metastrategy tab into whatever area I’m currently working in when it seems appropriate.
For Metacognitive Assistants
Metacognitive Assistants sit with me and help me focus. Basic suggested workflow:
By default, just watch me work (coding/planning/writing/operations), and occasionally give signs you’re still attentive, without interrupting.
Make a tab in the Assistant Notes section. Write moment to moment observations which feel useful to you, as well as general thoughts. This helps you feel more proactively involved and makes you focused on noticing patterns and ways in which you could be more useful as an assistant.
The Journal tab is for his plans and thoughts about what to generally do. Read it as an overview.
This Context tab is for generally useful information about what you should do and about relevant strategies and knowledge Ray has in mind. Reading this helps you get a more comprehensive view on what his ideal workflow looks like, and what your ideal contributions look like.
Updating quickly
There’s a learning process for figuring out “when it is good to check if Ray’s stuck?” vs “when is it bad to interrupt his thought process?”. It’s okay if you don’t get it perfectly right at first, by try… “updating a lot, in both directions?” like, if it seemed like something was an unhelpful interruption, try speaking up half-as-often, or half-as-loudly, but then later if I seem stuck, try checking in on me twice-as-often or twice-as loudly, until we settle into a good rhythm.
The “10x” here was meant more to refer to how long it took him to figure it out, than how much better it was. I’m less sure how to quantify how much better.
I’m busy atm but will see if I can get a screeshot from an earlier draft
Thanks! I’ll keep this in mind both for potential rewrites here, and for future posts.
Curious how this takes you typically?
@kave @habryka