abramdemski(Abram Demski)

Karma: 16,876

abramdemski 17 Apr 2024 14:53 UTC
2 points
0
in reply to: lukehmiles’s comment on: LLMs for Alignment Research: a safety priority?
I don’t really interact with Twitter these days, but maybe you could translate my complaints there and let me know if you get any solid gold?

abramdemski 17 Apr 2024 14:49 UTC
LW: 4 AF: 3
2
AF
in reply to: plex’s comment on: LLMs for Alignment Research: a safety priority?
I don’t have a good system prompt that I like, although I am trying to work on one. It seems to me like the sort of thing that should be built in to a tool like this (perhaps with options, as different system prompts will be useful for different use-cases, like learning vs trying to push the boundaries of knowledge).
I would be pretty excited to try this out with Claude 3 behind it. Very much the sort of thing I was trying to advocate for in the essay!

abramdemski 10 Apr 2024 14:57 UTC
LW: 4 AF: 4
0
AF
in reply to: lukehmiles’s comment on: LLMs for Alignment Research: a safety priority?
But not intentionally. It was an unintentional consequence of training.

abramdemski 10 Apr 2024 14:56 UTC
LW: 4 AF: 3
2
AF
in reply to: lukehmiles’s comment on: LLMs for Alignment Research: a safety priority?
I am not much of a prompt engineer, I think. My “prompts” generally consist of many pages of conversation where I babble about some topic I am interested in, occasionally hitting enter to get Claude’s responses, and then skim/ignore Claude’s responses because they are bad, and then keep babbling. Sometimes I make an explicit request to Claude such as “Please try and organize these ideas into a coherent outline” or “Please try and turn this into math” but the responses are still mostly boring and bad.
I am trying ;p
But yes, it would be good for me to try and make a more concrete “Claude cannot do X” to get feedback on.

abramdemski 10 Apr 2024 13:35 UTC
4 points
2
in reply to: metachirality’s comment on: LLMs for Alignment Research: a safety priority?
I’ve tried writing the beginning of a paper that I want to read the rest of, but the LLM did not complete it well enough to be interesting.

abramdemski 10 Apr 2024 13:34 UTC
LW: 3 AF: 3
0
AF
in reply to: rpglover64’s comment on: LLMs for Alignment Research: a safety priority?
I agree with this worry. I am overall advocating for capabilitarian systems with a specific emphasis in helping accelerate safety research.

abramdemski 5 Apr 2024 17:39 UTC
LW: 2 AF: 2
0
AF
in reply to: Stephen McAleese’s comment on: LLMs for Alignment Research: a safety priority?
Sounds pretty cool! What LLM powers it?

abramdemski 5 Apr 2024 15:26 UTC
LW: 5 AF: 5
0
AF
in reply to: Charlie Steiner’s comment on: LLMs for Alignment Research: a safety priority?
I don’t think the plan is “turn it on and leave the building” either, but I still think the stated goal should not be automation.
I don’t quite agree with the framing “building very generally useful AI, but the good guys will be using it first”—the approach I am advocating is not to push general capabilities forward and then specifically apply those capabilities to safety research. That is more like the automation-centric approach I am arguing against.
Hmm, how do I put this...
I am mainly proposing more focused training of modern LLMs with feedback from safety researchers themselves, toward the goal of safety researchers getting utility out of these systems; this boosts capabilities for helping-with-safety-research specifically, in a targeted way, because that is what you are getting more+better training feedback on. (Furthermore, checking and maintaining this property would be an explicit goal of the project.)
I am secondarily proposing better tools to aid in that feedback process; these can be applied to advance capabilities in any area, I agree, but I think it only somewhat exacerbates the existing “LLM moderation” problem; the general solution of “train LLMs to do good things and not bad things” does not seem to get significantly more problematic in the presence of better training tools (perhaps the general situation even gets better). If the project was successful for safety research, it could also be extended to other fields. The question of how to avoid LLMs being helpful for dangerous research would be similar to the LLM moderation question currently faced by Claude, ChatGPT, Bing, etc: when do you want the system to provide helpful answers, and when do you want it to instead refuse to help?
I am thirdly also mentioning approaches such as training LLMs to interact with proof assistants and intelligently decide when to translate user arguments into formal languages. This does seem like a more concerning general-capability thing, to which the remark “building very generally useful AI, but the good guys will be using it first” applies.

LLMs for Alignment Research: a safety priority?

abramdemski4 Apr 2024 20:03 UTC

138 points

23 comments11 min readLW link

abramdemski 28 Mar 2024 18:42 UTC
4 points
2
in reply to: cubefox’s comment on: Modern Transformers are AGI, and Human-Level
No, I was talking about the results. lsusr seems to use the term in a different sense than Scott Alexander or Yann LeCun. In their sense it’s not an alternative to backpropagation, but a way of constantly predicting future experience and to constantly update a world model depending on how far off those predictions are. Somewhat analogous to conditionalization in Bayesian probability theory.
I haven’t watched the LeCun interview you reference (it is several hours long, so relevant time-stamps to look at would be appreciated), but this still does not make sense to me—backprop already seems like a way to constantly predict future experience and update, particularly as it is employed in LLMs. Generating predictions first and then updating based on error is how backprop works. Some form of closeness measure is required, just like you emphasize.

abramdemski 28 Mar 2024 18:19 UTC
LW: 6 AF: 4
0
AF
in reply to: Logan Zoellner’s comment on: Modern Transformers are AGI, and Human-Level
Yeah, I didn’t do a very good job in this respect. I am not intending to talk about a transformer by itself. I am intending to talk about transformers with the sorts of bells and whistles that they are currently being wrapped with. So not just transformers, but also not some totally speculative wrapper.

abramdemski 28 Mar 2024 17:56 UTC
LW: 2 AF: 2
2
AF
in reply to: Gerald Monroe’s comment on: Modern Transformers are AGI, and Human-Level
And you end up with “well for most of human history, a human with those disabilities would be a net drain on their tribe. Sometimes they were abandoned to die as a consequence. ”
And it implies something like “can perform robot manipulation and wash dishes, or the “make a cup of coffee in a strangers house” test. And reliably enough to be paid minimum wage or at least some money under the table to do a task like this.
The replace-human-labor test gets quite interesting and complex when we start to time-index it. Specifically, two time-indexes are needed: a ‘baseline’ time (when humans are doing all the relevant work) and a comparison time (where we check how much of the baseline economy has been automated).
Without looking anything up, I guess we could say that machines have already automated 90% of the economy, if we choose our baseline from somewhere before industrial farming equipment, and our comparison time somewhere after. But this is obviously not AGI.
A human who can do exactly what GPT4 can do is not economically viable in 2024, but might have been economically viable in 2020.

abramdemski 28 Mar 2024 17:30 UTC
3 points
0
in reply to: No77e’s comment on: Modern Transformers are AGI, and Human-Level
I don’t think it is sensible to model humans as “just the equivalent of a sort of huge content window” because this is not a particularly good computational model of how human learning and memory work; but I do think that the technology behind the increasing context size of modern AIs contributes to them having a small but nonzero amount of the thing Steven is pointing at, due to the spontaneous emergence of learning algorithms.

abramdemski 27 Mar 2024 2:11 UTC
LW: 16 AF: 7
4
AF
in reply to: Steven Byrnes’s comment on: Modern Transformers are AGI, and Human-Level
Yep, I agree that Transformative AI is about impact on the world rather than capabilities of the system. I think that is the right thing to talk about for things like “AI timelines” if the discussion is mainly about the future of humanity. But, yeah, definitely not always what you want to talk about.
I am having difficulty coming up with a term which points at what you want to point at, so yeah, I see the problem.

abramdemski 26 Mar 2024 22:59 UTC
4 points
2
in reply to: cubefox’s comment on: Modern Transformers are AGI, and Human-Level
I’m not sure how you intend your predictive-coding point to be understood, but from my perspective, it seems like a complaint about the underlying tech rather than the results, which seems out of place. If backprop can do the job, then who cares? I would be interested to know if you can name something which predictive coding has currently accomplished, and which you believe to be fundamentally unobtainable for backprop. lsusr thinks the two have been unified into one theory.
I don’t buy that animals somehow plug into “base reality” by predicting sensory experiences, while transformers somehow miss out on it by predicting text and images and video. Reality has lots of parts. Animals and transformers both plug into some limited subset of it.
I would guess raw transformers could handle some real-time robotics tasks if scaled up sufficiently, but I do agree that raw transformers would be missing something important architecture-wise. However, I also think it is plausible that only a little bit more architecture is needed (and, that the ‘little bit more’ corresponds to things people have already been thinking about) -- things such as the features added in the generative agents paper. (I realize, of course, that this paper is far from realtime robotics.)
Anyway, high uncertainty on all of this.

abramdemski 26 Mar 2024 22:00 UTC
LW: 6 AF: 5
9
AF
in reply to: Hjalmar_Wijk’s comment on: Modern Transformers are AGI, and Human-Level
With respect to METR, yeah, this feels like it falls under my argument against comparing performance against human experts when assessing whether AI is “human-level”. This is not to deny the claim that these tasks may shine a light on fundamentally missing capabilities; as I said, I am not claiming that modern AI is within human range on all human capabilities, only enough that I think “human level” is a sensible label to apply.
However, the point about autonomously making money feels more hard-hitting, and has been repeated by a few other commenters. I can at least concede that this is a very sensible definition of AGI, which pretty clearly has not yet been satisfied. Possibly I should reconsider my position further.
The point about forming societies seems less clear. Productive labor in the current economy is in some ways much more complex and harder to navigate than it would be in a new society built from scratch. The Generative Agents paper gives some evidence in favor of LLM-base agents coordinating social events.

abramdemski 26 Mar 2024 21:31 UTC
5 points
0
in reply to: ryan_greenblatt’s comment on: Modern Transformers are AGI, and Human-Level
Yeah, I think nixing the terms ‘AGI’ and ‘human-level’ is a very reasonable response to my argument. I don’t claim that “we are at human-level AGI now, everyone!” has important policy implications (I am not sure one way or the other, but it is certainly not my point).

abramdemski 26 Mar 2024 21:22 UTC
6 points
2
in reply to: Max H’s comment on: Modern Transformers are AGI, and Human-Level
And maybe I am misremembering history or confused about what you are referring to, but in my mind, the promise of the “AGI community” has always been (implicitly or explicitly) that if you call something “human-level AGI”, it should be able to get you to (a), or at least have a bigger economic and societal impact than currently-deployed AI systems have actually had so far.
Yeah, I don’t disagree with this—there’s a question here about which stories about AGI should be thought of as defining vs extrapolating consequences of that definition based on a broader set of assumptions. The situation we’re in right now, as I see it, is one where some of the broader assumptions turn out to be false, so definitions which seemed relatively clear become more ambiguous.
I’m privileging notions about the capabilities over notions about societal consequences, partly because I see “AGI” as more of a technology-oriented term and less of a social-consequences-oriented term. So while I would agree that talk about AGI from within the AGI community historically often went along with utopian visions, I pretty strongly think of this as speculation about impact, rather than definitional.

abramdemski 26 Mar 2024 21:02 UTC
4 points
0
in reply to: ryan_greenblatt’s comment on: Modern Transformers are AGI, and Human-Level
I think Steven’s response hits the mark, but from my own perspective, I would say that a not-totally-irrelevant way to measure something related would be: many-shot learning, particularly in cases where few-shot learning does not do the trick.

abramdemski 26 Mar 2024 20:24 UTC
LW: 27 AF: 12
10
AF
in reply to: Steven Byrnes’s comment on: Modern Transformers are AGI, and Human-Level
Thanks for your perspective! I think explicitly moving the goal-posts is a reasonable thing to do here, although I would prefer to do this in a way that doesn’t harm the meaning of existing terms.
I mean: I think a lot of people did have some kind of internal “human-level AGI” goalpost which they imagined in a specific way, and modern AI development has resulted in a thing which fits part of that image while not fitting other parts, and it makes a lot of sense to reassess things. Goalpost-moving is usually maligned as an error, but sometimes it actually makes sense.
I prefer ‘transformative AI’ for the scary thing that isn’t here yet. I see where you’re coming from with respect to not wanting to have to explain a new term, but I think ‘AGI’ is probably still more obscure for a general audience than you think it is (see, eg, the snarky complaint here). Of course it depends on your target audience. But ‘transformative AI’ seems relatively self-explanatory as these things go. I see that you have even used that term at times.
I disagree with that—as in “why I want to move the goalposts on ‘AGI’”, I think there’s an especially important category of capability that entails spending a whole lot of time working with a system / idea / domain, and getting to know it and understand it and manipulate it better and better over the course of time. Mathematicians do this with abstruse mathematical objects, but also trainee accountants do this with spreadsheets, and trainee car mechanics do this with car engines and pliers, and kids do this with toys, and gymnasts do this with their own bodies, etc. I propose that LLMs cannot do things in this category at human level, as of today—e.g. AutoGPT basically doesn’t work, last I heard. And this category of capability isn’t just a random cherrypicked task, but rather central to human capabilities, I claim. (See Section 3.1 here.)
I do think this is gesturing at something important. This feels very similar to the sort of pushback I’ve gotten from other people. Something like: “the fact that AIs can perform well on most easily-measured tasks doesn’t tell us that AIs are on the same level as humans; it tells us that easily-measured tasks are less informative about intelligence than we thought”.
Currently I think LLMs have a small amount of this thing, rather than zero. But my picture of it remains fuzzy.

abramdemski(Abram Demski)

LLMs for Align­ment Re­search: a safety pri­or­ity?

LLMs for Alignment Research: a safety priority?