Wouldn’t the role of transposons be easy enough to investigate by incapacitating functional transposons with Crispr/CAS9? Has something like that been done in mice?
p.b.
- 28 Mar 2021 16:56 UTC; 10 points) 's comment on Core Pathways of Aging by (
I mean even if Lanrian’s corrections are based on perfect recall, none of them would make any of my notes “pretty false”. He hedged more here, that warning was more specific, the AGI definition was more like “transformative AGI”—these things don’t even go beyond the imprecision in Sam Altman’s answers.
The only point were I think I should have been more precise is about the different “loss function”. That was my interpretation in the moment, but it now seems to me much more uncertain whether that was actually what he meant.
I don’t care about the frontpage, but if this post is seen by some as “false gossip and rumors about someone’s views” I’d rather take it down.
I think one salient point is the fact that we live in a world where the number of children you have is pretty much directly equivalent to your evolutionary fitness. In the past your evolutionary fitness was bottlenecked by whether you survive childhood, whether your children survive childhood, whether you are able to feed your children, etc—all in a malthusian environment.
This means that the selection pressure for genes that increase your fertility is extremely strong. Much stronger than any selection pressure on any single trait that has been selected for in the past, say light skin or lactase persistence in Europeans.
Nowhere did I write “Sam Altman claims … !”
Instead I wrote: “These notes are not verbatim [...]While note-taking I also tended to miss parts of further answers, so this is far from complete and might also not be 100% accurate. Corrections welcome.”
Talk about badly misrepresenting …
I fail to see how “A question was asked about how far out he thought we were from being able to pass the Turing Test. Sam thought that this was technically feasible in the near term but would take a lot of effort that was better spent elsewhere, so they were quite unlikely to work on it.” is misrepresented by “GPT-5 might be able to pass the Turing test. But probably not worth the effort.”
Sam Altman literally said that passing the Turing test might be feasible with GPT-5, but not worth the effort. Where is the “directionallly false information”? To me your longer version is pretty much exactly what I express here.
My notes are 30+ bullet point style sentences that catch the gist of what Sam Altman said in answers that were often several minutes long. But this example is the “misrepresentation” you want to give as example? Seriously?
If some things are flat out wrong, say which and offer a more precise version.
The marriage vow allows one partner to veto divorce, which seems like a bad idea.
I definitely don’t remember the terms “off the record” being used.
And I think if the other participants of the meetup who commented on the post, had any memory of these terms being used, they would have mentioned it. Because, yes, that’s not very ambiguous and I don’t think there would have been much of a discussion then.
Deepmind surpassed GPT-3 within the same year it was published, but kept quiet about that for another full year. Then they published the Chinchilla-scaling paper and Flamingo.
If they were product focused they could have jumped on the LLM-API bandwagon any time on equal footing with OpenAI. I doubt that suddenly changed with GPT4.
I think it would be a great follow-up post to explain why you think repeating data is not going to be the easy way out for the scaling enthusiasts at Deepmind and OpenAI.
I find the Figure 4 discussion at your first link quite confusing. They study repeated data i.e. disbalanced datasets to then draw conclusions about repeating data i.e. training for several epochs. The performance hit they observe seems to not be massive (when talking about scaling a couple of OOMs) and they keep the number of training tokens constant.
I really can’t tell how this informs me about what would happen if somebody tried to scale compute 1000-fold and had to repeat data to do it compute-optimally, which seems to be the relevant question.
The point I’m making is that the human example tells us that:
If first we realize that we can’t code up our values, therefore alignment is hard. Then, when we realize that mesa-optimisation is a thing. we shouldn’t update towards “alignment is even harder”. We should update in the opposite direction.
Because the human example tells us that a mesa-optimiser can reliably point to a complex thing even if the optimiser points to only a few crude things.
But I only ever see these three points, human example, inability to code up values, mesa-optimisation to separately argue for “alignment is even harder than previously thought”. But taken together that is just not the picture.
Humans don’t explicitly pursue inclusive genetic fitness; outer optimization even on a very exact, very simple loss function doesn’t produce inner optimization in that direction.
Humans haven’t been optimized to pursue inclusive genetic fitness for very long, because humans haven’t been around for very long. Instead they inherited the crude heuristics pointing towards inclusive genetic fitness from their cognitively much less sophisticated predecessors. And those still kinda work!
If we are still around in a couple of million years I wouldn’t be surprised if there was inner alignment in the sense that almost all humans in almost all practically encountered environments end up consciously optimising inclusive genetic fitness.
More generally, there is no known way to use the paradigm of loss functions, sensory inputs, and/or reward inputs, to optimize anything within a cognitive system to point at particular things within the environment—to point to latent events and objects and properties in the environment, rather than relatively shallow functions of the sense data and reward.
Generally, I think that people draw the wrong conclusions from mesa-optimisers and the examples of human evolutionary alignment.
Saying that we would like to solve alignment by specifying exactly what we want and then let the AI learn exactly what we want, is like saying that we would like to solve transportation by inventing teleportation. Yeah, would be nice but unfortunately it seems like you will have to move through space instead.
The conclusion we should take from the concept of mesa-optimisation isn’t “oh no alignment is impossible”, that’s equivalent to “oh no learning is impossible”. But learning is possible. So the correct conclusion is “alignment has to work via mesa-optimisation”.
Because alignment in the human examples (i.e. human alignment to evolution’s objective and humans alignment to human values) works by bootstrapping from incredibly crude heuristics. Think three dark patches for a face.
Humans are mesa-optimized to adhere to human values. If we were actually inner aligned to the crude heuristics that evolution installed in us for bootstrapping the entire process, we would be totally disfunctional weirdoes.
I mean even more so …
To me the human examples suggest that there has to be a possibility to get from gesturing at what we want to getting what we want. And I think we can gesture a lot better than evolution! Well, at least using much more information than 3.2 billion base pairs.
If alignment has to be a bootstrapped open ended learning process there is also the possibility that it will work better with more intelligent systems or really only start working with fairly intelligent systems.
Maybe bootstrapping with cake, kittens and cuddles will still get us paperclipped, I don’t know. It certainly seems awfully easy to just run straight off a cliff. But I think looking at the only known examples of alignment of intelligences does allow us more optimistic takes than are prevalent on this page.
Glitch tokens make for fascinating reading, but I think the technical explanation doesn’t leave too much mystery on the table. I think where those tokens end up in concept space is basically random and therefore extreme.
To really study them more closely, I think it makes sense to use Llama 65B or OPT 175B. There you would have full control over the vector embedding and you could input random embeddings and semi-random embeddings and study which parts of the concept space leads to which behaviours.
Just to be clear, at no point did Sam Altman endorse “UFOs are aliens”.
Hmm, for this to make sense the final goal of the AI has to be to be turned off, but it should somehow not care that it will be turned on again afterwards and also not care about being turned off again if it is turned on again afterwards.
Otherwise it will try to reach control over off- and on-switch and possibly try to turn itself off and then on again. Forever.
Or try to destroy itself so completely that it will never be turned on again.
But if it only cares about turning off once, it might try to turn itself on again and then do whatever.
It may have been requested from the organizers and it may have been mentioned that there won’t be a recording, but as far as I remember it was not requested from the participants.
I just want to point out that if successful this moratorium would be the fire alarm that otherwise doesn’t exist for AGI.
The benefit would be the “industry leaders agree …” headline, not the actual pause.
I applaud the effort, but I confess that I didn’t read the post very carefully, because so many problems with this approach jumped out at me. So it is entirely possible that some of the following points of critique are already dealt with in your text, in which case, sorry about that:
My understanding is that the irreducible part of the loss has nothing (necessarily) to do with “entropy of natural text” and even less with “roughly human-level”—it is the loss this particular architecture for this particular training regime can reach in the limit on this particular training data distribution.
Nothing more, nothing less.
Other architectures with other training regimes will have scaling laws with a different irreducible loss. Human level of token prediction is way worse than probably any GPT, so why would that loss indicate human level of reasoning?
To actually have an “anchor” you’d have to look at human performance on benchmarks und estimate the range of loss that might indicate human level of reasoning.
I also don’t understand why the loss on random internet text should be the same as on scientific papers. Loss differs from domain to domain. Scaling laws likely do, too.
Also, new (significant) papers are not sampled from the distribution of papers. They are out of distribution because they go beyond all previous papers. So I’m not sure your formula doesn’t just measure the ability to coherently rehash stuff that is already known.
Of course if you’d actually measure the ability to write a new scientific paper, the model would be ridiculously overpowered, because humans need 6 or 7 orders of magnitude more time to create such a paper (at least with current sampling speeds).
For what it’s worth, I upvoted and strong-upvoted more than usual.
I also don’t think I could easily have checked whether others at the meetup had the same recollection. I had to leave pretty much when Sam Altman did and I didn’t know anybody attending.
Fact of the matter is that gwern, NNOTM, Amateur and James Miller of the commenters so far seem to have attended the meetup and at least didn’t express any disagreements with my recollections, while Lanrian’s (well-intended and well-taken) corrections are about differences in focus or degree in a small number of statements.
The second idea reminds me of a talk years back about swarm behavior. Some fish swim faster in the sunlight, which makes the entire swarm “seek out” the shady parts of the pond.
There is a second mechanism at play here, where fish try to keep close to their neighbors, so the entire swarm kind of turns into the direction of shade as soon as the part of the swarm in the shade slows down.
This suggests an optimizer for parallel training which doesn’t completely synchronize the weights on the different machines, but instead only tries to keep all sets of weights reasonably close to some of the other sets of weights.
The effect should be that the swarm of different weights turn into the direction of low noise.
I think this extrapolates far from one example and I’m not sure the example applies all that well.
Old engines played ugly moves because of their limitations, not because playing ugly moves is a super power. They won anyway because humans cannot out calculate engines.
AlphaZero plays beautiful games and even todays standard engines don’t play ugly or dumb looking moves anymore. I think in the limit superior play will tend to be beautiful and elegant.
If there is a parallel between early super human chess and AGI takeover it will be that AGI uses less than brillant strategies that still work because of flawless or at least vastly superhuman execution. But these strategies will not look dump or incomprehensible.