I would take this movement seriously and endorse it if there was a detailed plan for the future of the movement when the human race is still around in 2051 and I’m homeless and buried in debt.
- For scheming, the model could reason about “should I still stay undercover”, “what should I do in case I should stay undercover” and “what should I do in case it’s time to attack” in parallel, finally using only one serial step to decide on its action.
I am also very interested in e.g. how one could operationalize the number of hops of inference of out-of-context reasoning required for various types of scheming, especially scheming in one-forward-pass; and especially in the context of automated AI safety R&D.
I would expect, generally, solving tasks in parallel to be fundamentally hard in one-forward pass for pretty much all current SOTA architectures (especially Transformers and modern RNNs like MAMBA). See e.g. this comment of mine; and other related works like https://twitter.com/bohang_zhang/status/1664695084875501579, https://twitter.com/bohang_zhang/status/1664695108447399937 (video presentation), Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks, RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval.
There might be more such results I’m currently forgetting about, but they should be relatively easy to find by e.g. following citation trails (to and from the above references) with Google Scholar (or by looking at my recent comments / short forms).
Thanks for pointing that out. I’ve added some clarification.
I listened off and on to much of the interview, while also playing solitaire (why I do that I do not know, but I do), but I paid close attention at two points during the talk about GPT-4, once following about 46:00 where Altman was talking about using it as a brainstorming partner and later at about 55:00 where Fridman mentioned collaboration and said: “I’m not sure where the magic is if it’s in here [gestures to his head] or if it’s in there [points toward the table] or if it’s somewhere in between.” I’ve been in a kind of magical collaborative zone with humble little ChatGPT and find that enormously interesting. Anyone else experience that kind of thing, with any of the engines? (BTW, I’ve got a post around the corner here.)
Somewhat relatedly: I’m interested on how well LLMs can solve tasks in parallel. This seems very important to me.[1]
The “I’ve thought about this for 2 minutes” version is: Hand an LLM two multiple choice questions with four answer options each. Encode these four answer options into a single token, so that there are 16 possible tokens of which one corresponds to the correct answer to both questions. A correct answer means that the model has solved both tasks in one forward-pass.
(One can of course vary the number of answer options and questions. I can see some difficulties in implementing this idea properly, but would nevertheless be excited if someone took a shot at it.)
- ^
Two quick reasons:
- For serial computation the number of layers gives some very rough indication of the strength of one forward-pass, but it’s harder to have intuitions for parallel computation.
- For scheming, the model could reason about “should I still stay undercover”, “what should I do in case I should stay undercover” and “what should I do in case it’s time to attack” in parallel, finally using only one serial step to decide on its action.
- ^
Both gravity and inertia are determined by mass. Both are explained by spacetime curvature in general relativity. Was this an intentional part of the metaphor?
Similarly, I find that GPT-3, GPT-3.5, and Claude 2 don’t benefit from filler tokens. However, GPT-4 (which Tamera didn’t study) shows mixed results with strong improvements on some tasks and no improvement on others.
It’s interesting question whether Gemini has any improvements.
yeah, actually, I think you’ve got some good points. your presentation is just really hard to get through, in a time where talk is even cheaper than ever before, and citations and coherent arguments are pretty important. I’m sorry to give an upsetting response, but it’s what I’ve got, and I think it’s better to tell you something about why I reacted how I did rather than be completely silent while you get downvoted. try asking claude for input on why we reacted this way, and then discuss with claude about it. but I’d just suggest not assuming the downvotes mean people disagree on every point.
First is that I don’t really expect us to come up with a fully general answer to this problem in time. I wouldn’t be surprised if we had to trade off some generality for indexing on the system in front of us—this gets us some degree of non-robustness, but hopefully enough to buy us a lot more time before stuff like the problem behind deep deception breaks a lack of True Names. Hopefully then we can get the AI systems to solve the harder problem for us in the time we’ve bought, with systems more powerful than us. The relevance here is that if this is the case, then trying to generalize our findings to an entirely non-ML setting, while definitely something we want, might not be something we get, and maybe it makes sense to index lightly on a particular paradigm if the general problem seems really hard.
Or maybe not, apparently LLMs are (mostly) not helped by filler tokens.
S-risks are barely discussed in LW, is that because:
People think they are so improbable that it’s not worth mentioning.
People are scared to discuss them.
Avoiding creating hypersititous textual attractors
Other reasons?
Hasn’t that happened?
When I think of how I approach solving this problem in practice, I think of interfacing with structures within ML systems that satisfy an increasing list of desiderata for values, covering the rest with standard mech interp techniques, and then steering them with human preferences. I certainly think it’s probable that there are valuable insights to this process from neuroscience, but I don’t think a good solution to this problem (under the constraints I mention above) requires that it be general to the human brain as well. We steer the system with our preferences (and interfacing with internal objectives seems to avoid the usual problems with preferences) - while something that allows us to actually directly translate our values from our system to theirs would be great, I expect that constraint of generality to make it harder than necessary.
I think the seeds of an interdisciplinary agenda on this are already there, see e.g. https://manifund.org/projects/activation-vector-steering-with-bci, https://www.lesswrong.com/posts/GfZfDHZHCuYwrHGCd/without-fundamental-advances-misalignment-and-catastrophe?commentId=WLCcQS5Jc7NNDqWi5, https://www.lesswrong.com/posts/GfZfDHZHCuYwrHGCd/without-fundamental-advances-misalignment-and-catastrophe?commentId=D6NCcYF7Na5bpF5h5, https://www.lesswrong.com/posts/wr2SxQuRvcXeDBbNZ/bogdan-ionut-cirstea-s-shortform?commentId=A8muL55dYxR3tv5wp and maybe my other comments on this post.
I might have a shortform going into more details on this soon, ot at least by the time of https://foresight.org/2024-foresight-neurotech-bci-and-wbe-for-safe-ai-workshop/.
If I had access to a neuroscience or tech lab (and the relevant skills), I’d be doing that rather than ML.
It sounds to me like you should seriously consider doing work which might look like e.g. https://www.lesswrong.com/posts/eruHcdS9DmQsgLqd4/inducing-human-like-biases-in-moral-reasoning-lms, Getting aligned on representational alignment, Training language models to summarize narratives improves brain alignment; also see a lot of recent works from Ilia Sucholutsky and from this workshop.
Why is value alignment different from these? Because we have working example of a value-aligned system right in front of us: The human brain. This permits an entirely scientific approach, requiring minimal philosophical deconfusion. And in contrast to corrigibility solutions, biological and artificial neural-networks are based upon the same fundamental principles, so there’s a much greater chance that insights from the one easily work in the other.
The similarities go even deeper, I’d say, see e.g. The neuroconnectionist research programme for a review and quite a few of my past linkposts (e.g. on representational alignment and how it could be helpful for value alignment, on evidence of [by default] representational alignment between LLMs and humans, etc.); and https://www.lesswrong.com/posts/eruHcdS9DmQsgLqd4/inducing-human-like-biases-in-moral-reasoning-lms.
Apparently Eliezer decided to not take the time to read e.g. Quintin Pope’s actual critiques, but he does have time to write a long chain of strawmen and smears-by-analogy.
A lot of Quintin Pope’s critiques are just obviously wrong and lots of commenters were offering to help correct them. In such a case, it seems legitimate to me for a busy person to request that Quintin sorts out the problems together with the commenters before spending time on it. Even from the perspective of correcting and informing Eliezer, people can more effectively be corrected and informed if their attention is guided to the right place, with junk/distractions removed.
(Note: I mainly say this because I think the main point of the message you and Quintin are raising does not stand up to scrutiny, and so I mainly think the value the message can provide is in certain technical corrections that you don’t emphasize as much, even if strictly speaking they are part of your message. If I thought the main point of your message stood up to scrutiny, I’d also think it would be Eliezer’s job to realize it despite the inconvenience.)
My first impression was also that axis lines are a matter of aesthetics. But then I browsed The Economist’s visual styleguide and realized they also do something similar, i.e. omit the y-axis line (in fact, they omit the y-axis line on basically all their line / scatter plots, but almost always maintain the gridlines).
Here’s also an article they ran about their errors in data visualization, albeit probably fairly introductory for the median LW reader.
I think this is an interesting point of view. The OP is interested in how this concept of checked democracy might work within a corporation. From a position of ignorance can I ask whether anyone familiar with German corporate governance recognises this mode of democracy within German organisations? I choose Germany because large German companies historically incorporate significant worker representation within their governance structures, and, historically, tend to perform well.
One additional probably important distinction / nuance: there are also theoretical results for why CoT shouldn’t just help with one-forward-pass expressivity, but also with learning. E.g. the result in Auto-Regressive Next-Token Predictors are Universal Learners is about learning; similarly for Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks and for Why Can Large Language Models Generate Correct Chain-of-Thoughts?.
The learning aspect could be strategically crucial with respect to what the first transformatively-useful AIs should look like; also see e.g. discussion here. In the sense that this should add further reasons to think the first such AIs should probably use intermediate outputs like CoT; or at least have a pretraining-like phase involving such intermediate outputs, even if this might be later distilled or modified some other way—e.g. replaced with [less transparent] recurrence.