Aprillion (Peter Hozák)
transformer is only trained explicitly on next token prediction!
I find myself understanding language/multimodal transformer capabilities better when I think about the whole document (up to context length) as a mini-batch for calculating the gradient in transformer (pre-)training, so I imagine it is minimizing the document-global prediction error, it wasn’t trained to optimize for just a single-next token accuracy...
Can you help me understand a minor labeling convention that puzzles me? I can see how we can label from the Z1R process as in MSP because we observe 11 to get there, but why is labeled as after observing either 100 or 00, please?
Pushing writing ideas to external memory for my less burned out future self:
-
agent foundations need path-dependent notion of rationality
economic world of average expected values / amortized big O if f(x) can be negative or you start very high
vs min-maxing / worst case / risk-averse scenarios if there is a bottom (death)
-
alignment is a capability
they might sound different in the limit, but the difference disappears in practice (even close to the limit? 🤔)
-
in a universe with infinite Everett branches, I was born in the subset that wasn’t destroyed by nuclear winter during the cold war—no matter how unlikely it was that humanity didn’t destroy itself (they could have done that in most worlds and I wasn’t born in such a world, I live in the one where Petrov heard the Geiger counter beep in some particular patter that made him more suspicious or something… something something anthropic principle)
similarly, people alive in 100 years will find themselves in a world where AGI didn’t destroy the world, no matter what are the odds—as long as there is at least 1 world with non-zero probability (something something Born rule … only if any decision along the way is a wave function, not if all decisions are classical and the uncertainty comes from subjective ignorance)
if you took quantum risks in the past, you now live only in the branches where you are still alive and didn’t die (but you could be in pain or whatever)
if you personally take a quantum risk now, your future self will find itself only in a subset of the futures, but your loved ones will experience all your possible futures, including the branches where you die … and you will experience everything until you actually die (something something s-risk vs x-risk)
if humanity finds itself in unlikely branches where we didn’t kill our collective selves in the past, does that bring any hope for the future?
-
Aprillion (Peter Hozák)’s Shortform
Now, suppose Carol knows the plan and is watching all this unfold. She wants to make predictions about Bob’s picture, and doesn’t want to remember irrelevant details about Alice’s picture. Then it seems intuitively “natural” for Carol to just remember where all the green lines are (i.e. the message M), since that’s “all and only” the information relevant to Bob’s picture.
(Writing before I read the rest of the article): I believe Carol would “naturally” expect that Alice and Bob share more mutual information than she does with Bob herself (even if they weren’t “old friends”, they both “decided to undertake an art project” while she “wanted to make predictions”), thus she would weight the costs of remembering more than just the green lines against the expected prediction improvement given her time constrains, lost opportunities, … - I imagine she could complete purple lines on her own, and then remember some “diff” about the most surprising differences...
Also, not all of the green lines would be equally important, so a “natural latent” would be some short messages in “tokens of remembering”, not necessarily correspond to the mathematical abstraction encoded by the 2 tokens of English “green lines” ⇒ Carol doesn’t need to be able to draw the green lines from her memory if that memory was optimized to predict purple lines.
If the purpose was to draw the green lines, I would be happy to call that memory “green lines” (and in that, I would assume to share a prior between me and the reader that I would describe as:"to remember green lines" usually means "to remember steps how to draw similar lines on another paper" ... also, similarity could be judged by other humans ... also, not to be confused with a very different concept "to remember an array of pixel coordinates" that can also be compressed into the words "green lines", but I don't expect people will be confused about the context, so I don't have to say it now, just keep in mind if someone squirts their eyes just-so which would provoke me to clarify
).
yeah, I got a similar impression that this line of reasoning doesn’t add up...
we interpret other humans as feeling something when we see their reactions
we interpret other eucaryotes as feeling something when we see their reactions 🤷
(there are a couple of circuit diagrams of the whole brain on the web, but this is the best. From this site.)
could you update the 404 image, please? (link to the site still works for now, just the image is gone)
S5
What is S5, please?
I agree with what you say. My only peeve is that the concept of IGF is presented as a fact from the science of biology, while it’s used as a confused mess of 2 very different concepts.
Both talk about evolution, but inclusive finess is a model of how we used to think about evolution before we knew about genes. If we model biological evolution on the genetic level, we don’t have any need for additional parameters on the individual organism level, natural selection and the other 3 forces in evolution explain the observed phenomena without a need to talk about invididuals on top of genetic explanations.
Thus the concept of IF is only a good metaphor when talking approximately about optimization processes, not when trying to go into details. I am saying that going with the metaphor too far will result in confusing discussions.
humans don’t actually try to maximize their own IGF
Aah, but humans don’t have IGF. Humans have https://en.wikipedia.org/wiki/Inclusive_fitness, while genes have allele frequency https://en.wikipedia.org/wiki/Gene-centered_view_of_evolution ..
Inclusive genetic fitness is a non-standard name for the latter view of biology as communicated by Yudkowsky—as a property of genes, not a property of humans.
The fact that bio-robots created by human genes don’t internally want to maximize the genes’ IGF should be a non-controversial point of view. The human genes successfully make a lot of copies of themselves without any need whatsoever to encode their own goal into the bio-robots.
I don’t understand why anyone would talk about IGF as if genes ought to want for the bio-robots to care about IGF, that cannot possibly be the most optimal thing that genes should “want” to do (if I understand examples from Yudkowsky correctly, he doesn’t believe that either, he uses this as an obvious example that there is nothing about optimization processes that would favor inner alignment) - genes “care” about genetic success, they don’t care about what the bio-robots outght to believe at all 🤷
Some successful 19th century experiments used 0.2°C/minute and 0.002°C/second.
Have you found the actual 19th century paper?
The oldest quote about it that I found is from https://www.abc.net.au/science/articles/2010/12/07/3085614.htm
Or perhaps the story began with E.M. Scripture in 1897, who wrote the book, The New Psychology. He cited earlier German research: "…a live frog can actually be boiled without a movement if the water is heated slowly enough; in one experiment the temperature was raised at the rate of 0.002°C per second, and the frog was found dead at the end of two hours without having moved." Well, the time of two hours works out to a temperature rise of 18°C. And, the numbers don't seem right. First, if the water boiled, that means a final temperature of 100°C. In that case, the frog would have to be put into water at 82°C (18°C lower). Surely, the frog would have died immediately in water at 82°C.
I’m not sure what to call this sort of thing. Is there a preexisting name?
sounds like https://en.wikipedia.org/wiki/Emergence to me 🤔 (not 100% overlap and also not the most useful concept, but very similar shaky pointer in concept space between what is described here and what has been observed as a phenomena called Emergence)
Thanks to Gaurav Sett for reminding me of the boiling frog.
I would like to see some mention that this is a pop culture reference / urban myth, not something actual frogs might do.
To quote https://en.wikipedia.org/wiki/Boiling_frog, “the premise is false”.
PSA: This is the old page pointing to the 2022 meetup month events, chances are you got here in year 2023 (at the time of writing this comment) while there was a bug on the homepage of lesswrong.com with a map and popup link pointing here...
https://www.lesswrong.com/posts/ynpC7oXhXxGPNuCgH/acx-meetups-everywhere-2023-times-and-places seems to be the right one 🤞
sampled uniformly and independently
🤔 I don’t believe this definition fits the “apple” example—uniform samples from a concept space of “apple or not apple” would NEVER™ contain any positive example (almost everything is “not apple”)… or what assumption am I missing that would make the relative target volume more than ~zero (for high n)?
Bob will observe a highly optimized set of Y, carefully selected by Alice, so the corresponding inputs will be Vastly correlated and interdependent at least for the positive examples (centeroid first, dynamically selected for error-correction later 🤷♀️), not at all selected by Nature, right?
A hundred-dollar note is only worth anything if everyone believes in its worth. If people lose that faith, the value of a currency goes down and inflation goes up.
Ah, the condition for the reality of money is much weaker though—you only have to believe that you will be able to find “someone” who believes they can find someone for whom money will be worth something, no need to involve “everyone” in one’s reasoning.
Inflation is much more complicated of course, but in essence, you only have to believe that other people believe that money is losing value and will buy the same thing for higher price from you to be incentivized to increase prices, you don’t have to believe that you yourself will be able to buy less from your suppliers, increasing the price for higher profits is a totally valid reason for doing so.
This is also a kind of “coordination by common knowledge”, but the parties involved don’t have to share the same “knowledge” per se—consumers might believe “prices are higher because of inflation” while retailers might belive “we can make prices higher because people believe in inflation”...
Not sure myself whether search for coordination by common knowledge incentivizes deceptive alignment “by default” (having an exponentially larger basin) or if some reachable policy can incentivize true aligmnent 🤷
yes, it takes millions to advance, but companies are pouring BILLIONS into this and number 3 can earn its own money and create its own companies/DAOs/some new networks of cooperation if it wanted without humans realizing … have you seen any GDP per year charts whatsoever, why would you think we are anywhere close to saturation of money? have you seen any emergent capabilities from LLMs in the last year, why do you think we are anywhere close to saturation of capabilities per million of dollars? Alpaca-like improvemnts are somehow one-off miracle and things are not getting cheaper and better and more efficient in the future somehow?
it could totally happen, but what I don’t see is why are you so sure it will happen by default, are you extrapolating some trend from non-public data or just overly optimistic that 1+1 from previous trends is less than 2 in the future, totally unlike the compount effects in AI advancement in the last year?
Thanks for sharing your point of view. I tried to give myself a few days, but I’m aftraid I still don’t understand where you see the magic barrier for the transition from 3 to 4 to happen outside of the realm of human control.
are you thinking about sub-human-level of AGIs? the standard definition of AGI involves it being it better than most humans in most of the tasks humans can do
the first human hackers were not trained on “take over my data center” either, but humans can behave out of distribution and so will the AGI that is better than humans at behaving out of distribution
the argument about AIs that generalize to many tasks but are not “actually dangerous yet” is about speeding up creation of the actually dangerous AGIs, and it’s the speeding up that is dangerous, not that AI Safety researchers believe that those “weak AGIs” created from large LLMs would actually be capable of killing everyone immediatelly on their own
if you believe “weak AGIs” won’t speed creation of “dangerous AGIs”, can you spell out why, please?
To me as a programmer and not a mathematitian, the distinction doesn’t make practical intuitive sense.
If we can create 3 functions
f, g, h
so that they “do the same thing” likef(a, b, c) == g(a)(b)(c) == average(h(a), h(b), h(c))
, it seems to me that cross-entropy can “do the same thing” as some particular objective function that would explicitly mention multiple future tokens.My intuition is that cross-entropy-powered “local accuracy” can approximate “global accuracy” well enough in practice that I should expect better global reasoning from larger model sizes, faster compute, algorithmic improvements, and better data.
Implications of this intuition might be:
myopia is a quantity not a quality, a model can be incentivized to be more or less myopic, but I don’t expect it will be proven possible to enforce it “in the limit”
instruct training on longer conversations outght to produce “better” overall conversations if the model simulates that it’s “in the middle” of a conversation and follow-up questions are better compared to giving a final answer “when close to the end of this kind of conversation”
What nuance should I consider to understand the distinction better?