abramdemski(Abram Demski)

Karma: 16,940

abramdemski 28 May 2024 14:09 UTC
4 points
0
in reply to: abramdemski’s comment on: Alexander Gietelink Oldenziel’s Shortform
(continued..)
Explanations?
Alexander analyzes the difference between $p_{1}$ and $p_{2}$ in terms of the famous “explaining away” effect. Alexander supposes that $p_{2}$ has learned some “causes”:
The reason is that the probability mass provided by examples x₁, …, xₙ such that ϕ(xᵢ) holds is now distributed among the universal statement ∀x:ϕ(x) and additional causes Cⱼ known to the more powerful agent that also imply ϕ(xᵢ). Consequently, ∀x:ϕ(x) becomes less “necessary” and has less relative explanatory power for the more informed agent.
An implication of this perspective is that if the weaker agent learns about the additional causes Cⱼ, it should also lower its credence in ∀x:ϕ(x).
Postulating these causes adds something to the scenario. One possible view is that Alexander is correct so far as Alexander’s argument goes, but incorrect if there are no such $C_{j}$ to consider.
However, I do not find myself endorsing Alexander’s argument even that far.
If $C_{1}$ and $C_{2}$ have a common form, or are correlated in some way—so there is an explanation which tells us why the first two sentences, $ϕ (x_{1})$ and $ϕ (x_{2})$ , are true, and which does not apply to $n > 2$ -- then I agree with Alexander’s argument.
If $C_{1}$ and $C_{2}$ are uncorrelated, then it starts to look like a coincidence. If I find a similarly uncorrelated $C_{3}$ for $ϕ (x_{3})$ , $C_{4}$ for $ϕ (x_{4})$ , and a few more, then it will feel positively unexplained. Although each explanation is individually satisfying, nowhere do I have an explanation of why all of them are turning up true.
I think the probability of the universal sentence should go up at this point.
So, what about my “conditional probabilities also change” variant of Alexander’s argument? We might intuitively think that $ϕ (x_{1})$ and $ϕ (x_{2})$ should be evidence for the universal generalization, but $p_{2}$ does not believe this—its conditional probabilities indicate otherwise.
I find this ultimately unconvincing because the point of Paul’s example, in my view, is that more accurate priors do not imply more accurate posteriors. I still want to understand what conditions can lead to this (including whether it is true for all notions of “accuracy” satisfying some reasonable assumptions EG proper scoring rules).
Another reason I find it unconvincing is because even if we accepted this answer for the paradox of ignorance, I think it is not at all convincing for the problem of old evidence.
What is the ‘problem’ in the problem of old evidence?
… to be further expanded later …

abramdemski 28 May 2024 1:04 UTC
6 points
0
in reply to: Alexander Gietelink Oldenziel’s comment on: Alexander Gietelink Oldenziel’s Shortform
The matter seems terribly complex and interesting to me.
Notions of Accuracy?
Suppose $p_{1}$ is a prior which has uncertainty about $ϕ (x_{1}), ϕ (x_{2}), . . .$ and uncertainty about $\forall_{n} ϕ (x_{n})$ . This is the more ignorant prior. Consider $p_{2}$ some prior which has the same beliefs about the universal statement -- $p_{1} (\forall_{n} ϕ (x_{n})) = p_{2} (\forall_{n} ϕ (x_{n}))$ -- but which knows $ϕ (x_{1})$ and $ϕ (x_{2})$ .
We observe that $p_{1}$ can increase its credence in the universal statement by observing the first two instances, $ϕ (x_{1})$ and $ϕ (x_{2})$ , while $p_{2}$ cannot do this -- $p_{2}$ needs to wait for further evidence. This is interpreted as a defect.
The moral is apparently that a less ignorant prior can be worse than a more ignorant one; more specifically, it can learn more slowly.
However, I think we need to be careful about the informal notion of “more ignorant” at play here. We can formalize this by imagining a numerical measure of the accuracy of a prior. We might want it to be the case that more accurate priors are always better to start with. Put more precisely: a more accurate prior should also imply a more accurate posterior after updating. Paul’s example challenges this notion, but he does not prove that no plausible notion of accuracy will have this property; he only relies on an informal notion of ignorance.
So I think the question is open: when can a notion of accuracy fail to follow the rule “more accurate priors yield more accurate posteriors”? EG, can a proper scoring rule fail to meet this criterion? This question might be pretty easy to investigate.
Conditional probabilities also change?
I think the example rests on an intuitive notion that we can construct $p_{2}$ by imagining $p_{1}$ but modifying it to know $ϕ (x_{1})$ and $ϕ (x_{2})$ . However, the most obvious way to modify it so is by updating on those sentences. This fails to meet the conditions of the example, however; $p_{2}$ would already have an increased probability for the universal statement.
So, in order to move the probability of $ϕ (x_{1})$ and $ϕ (x_{2})$ upwards to 1 without also increasing the probability of the universal, we must do some damage to the probabilistic relationship between the instances and the universal. The prior $p_{2}$ doesn’t just know $ϕ (x_{1})$ and $ϕ (x_{2})$ ; it also believes the conditional probability of the universal statement given those two sentences to be lower than $p_{1}$ believes them to be.
It doesn’t think it should learn from them!
This supports Alexander’s argument that there is no paradox, I think. However, I am not ultimately convinced. Perhaps I will find more time to write about the matter later.

abramdemski 23 May 2024 20:03 UTC
4 points
0
on: A Correspondence Theorem
Noting that images currently look broken to me, in this post.

abramdemski 17 Apr 2024 14:53 UTC
2 points
0
in reply to: lukehmiles’s comment on: LLMs for Alignment Research: a safety priority?
I don’t really interact with Twitter these days, but maybe you could translate my complaints there and let me know if you get any solid gold?

abramdemski 17 Apr 2024 14:49 UTC
LW: 4 AF: 3
2
AF
in reply to: plex’s comment on: LLMs for Alignment Research: a safety priority?
I don’t have a good system prompt that I like, although I am trying to work on one. It seems to me like the sort of thing that should be built in to a tool like this (perhaps with options, as different system prompts will be useful for different use-cases, like learning vs trying to push the boundaries of knowledge).
I would be pretty excited to try this out with Claude 3 behind it. Very much the sort of thing I was trying to advocate for in the essay!

abramdemski 10 Apr 2024 14:57 UTC
LW: 6 AF: 5
1
AF
in reply to: lukehmiles’s comment on: LLMs for Alignment Research: a safety priority?
But not intentionally. It was an unintentional consequence of training.

abramdemski 10 Apr 2024 14:56 UTC
LW: 6 AF: 5
2
AF
in reply to: lukehmiles’s comment on: LLMs for Alignment Research: a safety priority?
I am not much of a prompt engineer, I think. My “prompts” generally consist of many pages of conversation where I babble about some topic I am interested in, occasionally hitting enter to get Claude’s responses, and then skim/ignore Claude’s responses because they are bad, and then keep babbling. Sometimes I make an explicit request to Claude such as “Please try and organize these ideas into a coherent outline” or “Please try and turn this into math” but the responses are still mostly boring and bad.
I am trying ;p
But yes, it would be good for me to try and make a more concrete “Claude cannot do X” to get feedback on.

abramdemski 10 Apr 2024 13:35 UTC
5 points
3
in reply to: metachirality’s comment on: LLMs for Alignment Research: a safety priority?
I’ve tried writing the beginning of a paper that I want to read the rest of, but the LLM did not complete it well enough to be interesting.

abramdemski 10 Apr 2024 13:34 UTC
LW: 3 AF: 3
0
AF
in reply to: rpglover64’s comment on: LLMs for Alignment Research: a safety priority?
I agree with this worry. I am overall advocating for capabilitarian systems with a specific emphasis in helping accelerate safety research.

abramdemski 5 Apr 2024 17:39 UTC
LW: 2 AF: 2
0
AF
in reply to: Stephen McAleese’s comment on: LLMs for Alignment Research: a safety priority?
Sounds pretty cool! What LLM powers it?

abramdemski 5 Apr 2024 15:26 UTC
LW: 5 AF: 5
0
AF
in reply to: Charlie Steiner’s comment on: LLMs for Alignment Research: a safety priority?
I don’t think the plan is “turn it on and leave the building” either, but I still think the stated goal should not be automation.
I don’t quite agree with the framing “building very generally useful AI, but the good guys will be using it first”—the approach I am advocating is not to push general capabilities forward and then specifically apply those capabilities to safety research. That is more like the automation-centric approach I am arguing against.
Hmm, how do I put this...
I am mainly proposing more focused training of modern LLMs with feedback from safety researchers themselves, toward the goal of safety researchers getting utility out of these systems; this boosts capabilities for helping-with-safety-research specifically, in a targeted way, because that is what you are getting more+better training feedback on. (Furthermore, checking and maintaining this property would be an explicit goal of the project.)
I am secondarily proposing better tools to aid in that feedback process; these can be applied to advance capabilities in any area, I agree, but I think it only somewhat exacerbates the existing “LLM moderation” problem; the general solution of “train LLMs to do good things and not bad things” does not seem to get significantly more problematic in the presence of better training tools (perhaps the general situation even gets better). If the project was successful for safety research, it could also be extended to other fields. The question of how to avoid LLMs being helpful for dangerous research would be similar to the LLM moderation question currently faced by Claude, ChatGPT, Bing, etc: when do you want the system to provide helpful answers, and when do you want it to instead refuse to help?
I am thirdly also mentioning approaches such as training LLMs to interact with proof assistants and intelligently decide when to translate user arguments into formal languages. This does seem like a more concerning general-capability thing, to which the remark “building very generally useful AI, but the good guys will be using it first” applies.

LLMs for Alignment Research: a safety priority?

abramdemski4 Apr 2024 20:03 UTC

142 points

24 comments11 min readLW link

abramdemski 28 Mar 2024 18:42 UTC
4 points
2
in reply to: cubefox’s comment on: Modern Transformers are AGI, and Human-Level
No, I was talking about the results. lsusr seems to use the term in a different sense than Scott Alexander or Yann LeCun. In their sense it’s not an alternative to backpropagation, but a way of constantly predicting future experience and to constantly update a world model depending on how far off those predictions are. Somewhat analogous to conditionalization in Bayesian probability theory.
I haven’t watched the LeCun interview you reference (it is several hours long, so relevant time-stamps to look at would be appreciated), but this still does not make sense to me—backprop already seems like a way to constantly predict future experience and update, particularly as it is employed in LLMs. Generating predictions first and then updating based on error is how backprop works. Some form of closeness measure is required, just like you emphasize.

abramdemski 28 Mar 2024 18:19 UTC
LW: 6 AF: 4
0
AF
in reply to: Logan Zoellner’s comment on: Modern Transformers are AGI, and Human-Level
Yeah, I didn’t do a very good job in this respect. I am not intending to talk about a transformer by itself. I am intending to talk about transformers with the sorts of bells and whistles that they are currently being wrapped with. So not just transformers, but also not some totally speculative wrapper.

abramdemski 28 Mar 2024 17:56 UTC
LW: 2 AF: 2
2
AF
in reply to: Gerald Monroe’s comment on: Modern Transformers are AGI, and Human-Level
And you end up with “well for most of human history, a human with those disabilities would be a net drain on their tribe. Sometimes they were abandoned to die as a consequence. ”
And it implies something like “can perform robot manipulation and wash dishes, or the “make a cup of coffee in a strangers house” test. And reliably enough to be paid minimum wage or at least some money under the table to do a task like this.
The replace-human-labor test gets quite interesting and complex when we start to time-index it. Specifically, two time-indexes are needed: a ‘baseline’ time (when humans are doing all the relevant work) and a comparison time (where we check how much of the baseline economy has been automated).
Without looking anything up, I guess we could say that machines have already automated 90% of the economy, if we choose our baseline from somewhere before industrial farming equipment, and our comparison time somewhere after. But this is obviously not AGI.
A human who can do exactly what GPT4 can do is not economically viable in 2024, but might have been economically viable in 2020.

abramdemski 28 Mar 2024 17:30 UTC
3 points
0
in reply to: No77e’s comment on: Modern Transformers are AGI, and Human-Level
I don’t think it is sensible to model humans as “just the equivalent of a sort of huge content window” because this is not a particularly good computational model of how human learning and memory work; but I do think that the technology behind the increasing context size of modern AIs contributes to them having a small but nonzero amount of the thing Steven is pointing at, due to the spontaneous emergence of learning algorithms.

abramdemski 27 Mar 2024 2:11 UTC
LW: 16 AF: 7
4
AF
in reply to: Steven Byrnes’s comment on: Modern Transformers are AGI, and Human-Level
Yep, I agree that Transformative AI is about impact on the world rather than capabilities of the system. I think that is the right thing to talk about for things like “AI timelines” if the discussion is mainly about the future of humanity. But, yeah, definitely not always what you want to talk about.
I am having difficulty coming up with a term which points at what you want to point at, so yeah, I see the problem.

abramdemski 26 Mar 2024 22:59 UTC
4 points
2
in reply to: cubefox’s comment on: Modern Transformers are AGI, and Human-Level
I’m not sure how you intend your predictive-coding point to be understood, but from my perspective, it seems like a complaint about the underlying tech rather than the results, which seems out of place. If backprop can do the job, then who cares? I would be interested to know if you can name something which predictive coding has currently accomplished, and which you believe to be fundamentally unobtainable for backprop. lsusr thinks the two have been unified into one theory.
I don’t buy that animals somehow plug into “base reality” by predicting sensory experiences, while transformers somehow miss out on it by predicting text and images and video. Reality has lots of parts. Animals and transformers both plug into some limited subset of it.
I would guess raw transformers could handle some real-time robotics tasks if scaled up sufficiently, but I do agree that raw transformers would be missing something important architecture-wise. However, I also think it is plausible that only a little bit more architecture is needed (and, that the ‘little bit more’ corresponds to things people have already been thinking about) -- things such as the features added in the generative agents paper. (I realize, of course, that this paper is far from realtime robotics.)
Anyway, high uncertainty on all of this.

abramdemski 26 Mar 2024 22:00 UTC
LW: 6 AF: 5
9
AF
in reply to: Hjalmar_Wijk’s comment on: Modern Transformers are AGI, and Human-Level
With respect to METR, yeah, this feels like it falls under my argument against comparing performance against human experts when assessing whether AI is “human-level”. This is not to deny the claim that these tasks may shine a light on fundamentally missing capabilities; as I said, I am not claiming that modern AI is within human range on all human capabilities, only enough that I think “human level” is a sensible label to apply.
However, the point about autonomously making money feels more hard-hitting, and has been repeated by a few other commenters. I can at least concede that this is a very sensible definition of AGI, which pretty clearly has not yet been satisfied. Possibly I should reconsider my position further.
The point about forming societies seems less clear. Productive labor in the current economy is in some ways much more complex and harder to navigate than it would be in a new society built from scratch. The Generative Agents paper gives some evidence in favor of LLM-base agents coordinating social events.

abramdemski 26 Mar 2024 21:31 UTC
5 points
0
in reply to: ryan_greenblatt’s comment on: Modern Transformers are AGI, and Human-Level
Yeah, I think nixing the terms ‘AGI’ and ‘human-level’ is a very reasonable response to my argument. I don’t claim that “we are at human-level AGI now, everyone!” has important policy implications (I am not sure one way or the other, but it is certainly not my point).

abramdemski(Abram Demski)

Explanations?

What is the ‘problem’ in the problem of old evidence?

Notions of Accuracy?

Conditional probabilities also change?

LLMs for Align­ment Re­search: a safety pri­or­ity?

LLMs for Alignment Research: a safety priority?