I mean, obviously the causal chain of weight gain is often going to go through caloric intake, but that doesn’t make caloric intake the root cause. For example, birth control pills, stress, and soda machines in schools all cause weight gain via increased caloric intake, but are distinct root causes.
Hastings
What are our outs to play to?
I am reminded of the classic “Oh say it again Dexter” “Omelette du fromage”
Lesswrong has a [trove of thought experiments](https://www.lesswrong.com/posts/PcfHSSAMNFMgdqFyB/can-you-control-the-past) about scenarios where arguably the best way to maximize your utility is to verifiably (with some probability) modify your own utility function, starting with the prisoner’s dilemma and extending to games with superintelligences predicting what you will do and putting money in boxes etc.
These thought experiments seem to have real world reflections: for example, voting is pretty much irrational under CDT, but paradoxically the outcomes of elections correlate with the utility functions of people who vote, and people who grow up in high trust societies do better than people who grow up in low trust societies, even though defecting is rational.
In addition, humans have an astonishing capability for modifying our own utility functions, such as by joining religions, gaining or losing empathy for animals, etc.
Is it plausible that we could analytically prove that under a training environment rich in these sorts of scenarios, an AGI that wants to maximize an initially bad utility function would develop the capability to verifiably (with some probability) modify it’s own utility function like people do in order to survive and be released into the world?
To steelman, I’d guess this idea applies in the hypothetical where GPT-N gains general intelligence and agency (such as via a mesa-optimizer) just by predicting the next token.
I tutored college students who were taking a computer programming course. A few of them didn’t understand that computers are not sentient. More than one person used comments in their Pascal programs to put detailed explanations such as, “Now I need you to put these letters on the screen.” I asked one of them what the deal was with those comments. The reply: “How else is the computer going to understand what I want it to do?” Apparently they would assume that since they couldn’t make sense of Pascal, neither could the computer.
There’s been a phase change with the release of copilot, where this suddenly appears to work—at least, for tasks like putting letters on the screen or assembling cookie recipes. “Waiter, there’s a ghost in my machine!”
There are two ways a large language model transformer learns: type 1, the gradient descent process, which certainly does not learn information efficiently, taking billions of examples, and then type 2, the mysterious in-episode learning process, where a transformer learns from ~ 5 examples in an engineered prompt to do a ‘new’ task. I think the fundamental question is whether type 2 only works if the task to be learned is represented in the original dataset, or if it generalizes out of distribution. If it truly generalizes, then the obvious next step is to somehow skip straight to type 2 learning.
A quick example of how paper reading works in my research:
2017: Cyclegan comes out, and produces cool pictures of zebras and horses. I skim the paper because it seems cool, file away the concept, but don’t make an effort to replicate the results because in my experience GANs are obnoxious to train
2018: “Which Training Methods for GANs do actually Converge?” comes out, but even though it contains the crucial insight to making GAN’s trainable, I don’t read it because it’s not very popular- I never see it2019: Stylegan comes out, and cites “Which Training Methods for GANs do actually Converge?” I read both papers, mostly forget stylegan because it seems like a “we have big gpu do good science” paper, but am very impressed with “Which Training Methods for GANs do actually Converge?” and take a day or two to replicate it.
2020?: Around this time I also read all of gwern’s anime training exploits, and update my priors towards “maybe large gans are actually trainable.”
2022: I need to convert unlabeled dxa images into matching radiographs as part of a larger project. I’m generally of the opinion that GANs aren’t actually useful, but the problem matches the problem solved by cyclegan exactly, and I’m out of options. I initally try the open source cyclegan codebase, but as expected it’s wildly unstable and miserable. I recall that “Which Training Methods for GANs do actually Converge?” had pretty strong theory backing up gradient penalties on the discriminator, and I was able to replicate their experiments, so I dust off that replicating code, verify that it still works, add a cycle consistency loss, and am able to translate my images. Image translator in hand, I slog back into the larger problem.
--
What does this have to do with the paper reading cargo cult?
- Papers that you can replicate by downloading a code base are useful, but papers that you can replicate from the text without seeing code are solid gold. If there are any paper reading clubs out there that ask the presenter to replicate the results without looking at the author’s code, I would love to join- not just because the replication is valuable, but because it would narrow down the kinds of papers presented in a valuable way.- Reading all the most hyped GAN papers, which is basically what I did, would probably not get me an awesome research result in the field of GANs. However, it served me pretty well as a researcher in an adjacent field. In particular, the obscure but golden insight eventually filtered its way into the citations of the hyped fluffy flagship paper. For alignment research, hanging out in a few paper reading groups that are distantly related to alignment should be useful, even if an alignment research group isn’t useful.
- I had to read so many papers to come across 3 useful ones for this problem. However, I retain the papers that haven’t been useful yet- there’s decent odds that I’ve already read the paper that I’ll need to overcome the next hurdle.
- This type of paper reading, where I gather tools to engineer with, initially seems less relevant for fundamental concepts research like alignment. However, your general relativity example suggests that Einstein also had a tool gathering phase leading up to relativity, so ¯\_(ツ)_/¯
An attempt to name a strategy for an AI almost as smart as you: What fraction of jobs in the world are you intelligent enough to do, if you trained for them? I suspect that a huge fraction of the world’s workers could not compete in a free fair market against an entity as smart as you that eats 15 dollars of electricity a day, works without breaks, and only has to be trained once for each task, after which millions of copies could be churned out.
2%: Solve the same problem as the product Wolfram Alpha, with the same style of inputs and outputs.
Now that I think about it, Wolfram alpha might be sitting on a fairly valuable hunk of diverse math problem data. They get around 10 million visits a month, or about a billion diverse math problems- that’s larger than some chunks of the pile
With a grain of salt: for 2 million years there were various species of homo dotting africa, and eventually the world. Then humans became generally intelligent, and immediately wiped all of them out. Even hiding on an island in the middle of an ocean and specializing biologically for living on small islands was not enough to survive.
One possibility: I suggest that with decent schooling, the kids who could start working professionally at 14 can instead be doubling their productivity every year, so there is a benefit to working on building talent directly before trying to extract outputs- exploration vs exploitation.
My public school was beyond good to me, and so I was learning math as fast as I could from the age of 11 to 21, commuting to the local university for my last two years of high school for multivariable calc, diff eq, linear algebra, and discrete math at the university, then taking a mix of undergraduate and graduate math during college. During highschool I also spent some time working at a lab at the university. The time I spent working in the lab was valuable 99% as a learning experience to 1% actually pushing science- the crux of my actual contribution was a single pull request to matplotlib that took months and months to craft, which would take me around a day today. My work in medical imaging that takes years now would take infinity time without 10 years of math classes behind me.
The question then is, is working on real adult goals a better proxy task for learning than the typical gifted highschooler fare of unproductive projects, contests and tests. I’d guess that as proxy goals, contests > self chosen projects >> real productive work >> school assigned projects > tests.
Who drew this connection in August / October 2021? I haven’t found anything but would love to update on these people’s current analysis of events.
Notably, in August 2021 the US, without a great deal of preparation and at great political cost, pulled out of Afghanistan.
Motivation: in the event of a Ukraine-Russia war, the US would be diplomatically embarassed if Russia could point to an ongoing war in Afghanistan as “equivalent” to their invasion. In addition, a war in Afghanistan would serve as a distraction to US armed forces.
Counterpoints: The US had signalled before that they intended to pull out at this date, in the previous administration’s negotiations with the Taliban. Also, Russia could just claim that their invasion was equivalent to the US’s past invasions of Afghanistan and Iraq, even if they were ended.
I think this is complicated by the reality that money given to the parties isn’t spent directly on solving problems, but on fighting for power. The opinion that “the political parties should have less money on average, and my party should have relatively more money than their party” seems eminently reasonable to me.
Medical Image Registration: The obscure field where Deep Mesaoptimizers are already at the top of the benchmarks. (post + colab notebook)
So i have not actually watched any jordan perterson videos, only been told what to believe about him by left wing sources. Your post gave me a distinctly different impression than I got from them! I decided to suppress my gut reaction and actually see what he had to say.
To get a less biased impression of him, I picked a random video on his channel and scrolled to the middle of the timeline. The very first line was “Children are the sacrificial victims of the trans ideology.”What are the odds of that?
We’ve got a bit of a selection bias: anything that modern medicine is good at treating (smallpox, black plague, scurvy, appendicitis, leprosy, hypothyroidism, deafness, hookworm, syphillis) eventually gets mentally kicked out of your category “things it deals with” since doctors don’t have to spend much time dealing with them.
I think there is useful signal for you that the entire comments section is focused on the definition of a word instead of reconsidering whether specific actions or group memberships might be surprisingly beneficial. This is a property of the post, not the commenters. I suspect the issue is that people already emotionally reacted to the common definition of the word Religion in the title, before you had a chance to redefine it in the body.
The redefinition step is not necessary either- the excellent “Exercise is good” and “Nice clothes are good” posts used the common definitions of Exercise and Nice clothes throughout.
This generates a decent approximation of the distribution of human actions in an open world situation. Is it usable for empirical quantillizer experiments?