Ordered for my family as a direct results of reading this post—thank you!
Aaron Bergman
It’s not that individual journalists don’t trust Wikipedia, but that they know they can’t publish an article in which a key fact comes directly from Wikipedia without any sort of corroboration. I assume, anyway. Perhaps I’m wrong.
Regarding “Magic Pills,” I would note that Wellbutrin is know as the first-line antidepressant that tends to aid in focus, energy, and productivity. SSRIs (which wellbutrin is not) have a reputation for sedation and sometimes an emotional numbing effect, though this very well may be what one needs or desires to deal with depression or anxiety. Additionally, Wellbutrin is “lower risk” than SSRIs in the sense that uncomfortable withdrawal effects are quite rare. The source for this is research for a personal decision regarding whether to try antidepressnts in the past. All this said, there seems to be very large variation in personal satisfaction with different antidepressants, and there are surely some people who would indeed benefit from SSRIs, not only in terms of depression itself but also productivity as a secondary effect.
I’m frequently surprised that my parents will spend effort on something or ask another person for help without Googling; both are well-educated and comfortable using the internet, but it just isn’t their first instinct like it is with me. Perhaps there’s a correlation with age, where older people weren’t trained to use Google as a first-line troubleshooting device.
Yes, I was incorrect about Matuschak’s position. He commented on reddit here:
“I think Matuschak would say that, for the purpose of conveying information, it would be much more efficient to read a very short summary than to read an entire book.”
FWIW, I wouldn’t say that! Actually, my research for the last couple years has been predicated on the value of embedding focused learning interactions (i.e spaced repetition prompts) into extended narrative. The underlying theory isn’t (wasn’t!) salience-based, but basically I believe that strong understanding is produced with a rich network of connections and a meaningful emotional connection, both of which are promoted by narrative (but usually not by a very short summary).
One answer to the question for me:
While writing, something close to “how does this ‘sound’ in my head naturally, when read, in an aesthetic sense?”
I’ve thought for a while that “writing quality” largely boils down to whether the writer has an intuitively salient and accurate intuition about how the words they’re writing come across when read.
Ah late to the party! This was a top-level post aptly titled “Half-baked alignment idea: training to generalize” that didn’t get a ton of attention.
Thanks to Peter Barnett and Justis Mills for feedback on a draft of this post. It was inspired by Eliezer’s Lethalities post and Zvi’s response.
Central idea: can we train AI to generalize out of distribution?
I’m thinking, for example, of an algorithm like the following:
Train a GPT-like ML system to predict the next word given a string of text only using, say, grade school-level writing (this being one instance of the object level)
Assign the system a meta-level award based on how well it performs (without any additional training) at generalizing; in this case, that is, predicting the next word from more advanced, complex writing (perhaps using many independent tests of this task without updating/learning between each test, and allowing parameters to update only after the meta-level aggregate score is provided)
Note: the easy→hard generalization is not a necessary feature. Generalization could be from fiction→nonfiction writing or internet→native print text, for instance.
After all these independent samples are taken, provide the AI its aggregate or average score as feedback
(Maybe?) repeat all of step I on a whole new set of training and testing texts (e.g., using text from a different natural language like Mandarin)
Repeat this step an arbitrary number of times
For example, using French text, then Korean, then Arabic, etc.
Each time a “how well did you generalize” score is provided (which is given once per natural language in this example), the system should improve at the general task of generalizing from simple human writing to more complex human writing, (hopefully) to the point of being able to perform well at generalizing from simple Hindi (or whatever) text to advanced Hindi prediction even if it had never seen advanced Hindi text before.
^Steps 1-3 constitute the second meta-level of training an AI to generalize, but we can easily treat this process as a single training instance (e.g., rating how well the AI generalizes to Hindi advanced text after having been trained on doing this in 30 other languages) and iterate over and over again. I think this would look like:
Running the analogs of steps 1-4 on generalizing from
(a) simple text to advanced text in many languages
(b) easy opponents to hard ones across many games,
(c) photo generation of common or general objects (“car”) to rare/complex/specific ones (“interior of a 2006 Honda Accord VP”), across many classes of object
And (hopefully) the system would eventually be able to generalize from simple Python code training data to advanced coding tasks even though it had never seen any coding at all before this.
And, of course, we can keep on adding piling layers on.
A few notes
I think the following is one way of phrasing what I hope might happen with method: we are using RL to teach an ML system how to do ML in such a way that it sacrifices some in-distribution predictive power for the ability to use its “knowledge” more generally without doing anything that seems dumb to us.
Of course, there are intrinsic limits to any system’s ability to generalize. The system in question can only generalize using knowledge X if X exists as information in the object-level training provided to it.
This limits what we should expect of the system.
For example, I am almost certain that even an arbitrarily smart system will not be able to generate coherent Mandarin text from English training data, because the meaning of Mandarin characters doesn’t exist as “latent knowledge” in even a perfect understanding of English.
Anyone here know Python?
My hands-on experience with ML extends to linear regression in R and not an inch more, so I’m probably not the best person to test this theory out. I’ve heard some LWers know a bit of Python, though.
If that’s you, I’d be fascinated and thankful to see if you can implement this idea using whatever data and structure you think would work best, and would be happy to collaborate in whatever capacity I can.
Appendix: a few brief comments (from someone with much more domain knowledge than me) and responses (from me):
Comment
Is this just the same as training it on this more complex task (but only doing one big update at the end, rather than doing lots of small updates)?
Response (which may help to clarify why I believe the idea might work)
I don’t think so, because the parameters don’t change/update/improve between each of those independent tests. Like GPT-3 in some sense has a “memory” of reading Romeo and Juliet, but that’s only because its parameters updated as a result of seeing the text.
But also I think my conception depends on the system having “layers” of parameters corresponding to each layer of training.
So train on simple English-->only “Simple English word generation” parameters are allowed to change...but then you tell it how well it did at generalizing out of distribution, and now only its “meta level 1 generalization” parameters are allowed to change.
Then you do the whole thing again but with German text, and its “Meta level 1 generalization” parameters are allowed to change again using SGD or whatever. If this works, it will be the reason why it can do well at advanced Hindi text without ever having read advanced Hindi.
Treat this whole process as the object level, and then it updates/improves “meta level 2 generalization” parameters.
Comment:
This looks vaguely like curriculum learning, which apparently doesn’t really work in LLMs https://arxiv.org/abs/2108.02170, I think a similar experiment would be like train on simple+advanced text for English, French, Mandarin etc, but only simple Hindi, and then see if it can do complex Hindi.
Response
I think that’s a pretty different thing because there are no meta level parameters. Seems like fundamentally just a flavor of normal RL
Or do pretraining with English, French, Mandarin, and Hindi, but only do fine tuning with English, French, Mandarin, and see if it can then do the tasks it was fine tuned for in Hindi.
My prediction: it learns to generalize a bit (the scores on the novel Hindi tasks are higher than if there was no fine tuning with the other languages) but worse than the other languages generalize. As the models are scaled up, this ‘generalization gap’ gets smaller.
Seems like this might depend on the relative scaling of different meta level parameters (which I described above)?
Like for example whenever you scale the # of object level params by a factor of 2, you have to scale the number of nth meta level parameters by 2^(n+1).
Yes—if not heretical, at least interesting to other people! I’m going to lean into the “blogging about things that seem obvious to me” thing now.
I was thinking the third bullet, though the question of perverse incentives needs fleshing out, which I briefly alluded to at the end of the post:
“Expected consequences”, for example, leaves under-theorized when you should seek out new, relevant information to improve your forecast about some action’s consequences.
My best guess is that this isn’t actually an issue, because you have a moral duty to seek out that information, as you know a priori that seeking out such info is net-positive in itself.
Thank you, Solenoid! The SSC podcast is the only reason I to consume all of posts like Biological Anchors: A Trick That Might Or Might Not Work
From my perspective, this is why society at large needs to get better at communicating the content—so you wouldn’t have to be good at “anticipating the content.”
The meaningfulness point is interesting, but I’m not sure I fully agree. Some topics can me meaningful but not interesting (high frequency trading to donate money) and visa-versa (video game design? No offense to video game designers).
By your description, it feels like the kind of book where an author picks a word and then rambles about it like an impromptu speaker. If this had an extraordinary thesis requiring extraordinary evidence like Manufacturing Consent then lots of anecdotes would make sense. But the thesis seems too vague to be extraordinary.
I get the impression of the kind of book which where a dense blogpost is stretched out to the length of a book. This is ironic for a book about subtraction.
Yup, very well-put.
Your point about anecdotes got me thinking; an “extraordinary thesis” might be conceptualized as claiming that the distribution of data significantly shifted away from some “obvious” average. If so, showing the existence of a few data points greater than, say, 4 standard deviations from the “obvious” average actually can be strong evidence in its favor. However, the same is not true for a few examples ~2 standard deviations away. Maybe Klotz’s error is using anecdotes that aren’t far enough away from what intuitively makes sense.
Probably didn’t explain that very well, so here is a Tweet from Spencer Greenberg making the point:
1. By Bayes Factor: suppose hypothesis “A” says a data point is nearly impossible, and hypothesis “B” says the data point is quite likely. Then the existence of that one data point (by Bayes’ rule) should move you substantially toward believing hypothesis B (relative to A).
Example: you have had a rash on your arm for 10 years (with no variability). You buy some “rash cream” off of a shady website, and within 2 hours of applying it, the rash is gone. You can be confident the cream works because it’s otherwise highly unlikely for the rash to vanish.
Super interesting and likely worth developing into a longer post if you’re so inclined. Really like this analogy.
But then readers would have to repeat this sentence for as long as it takes to read the blog post to get the same effect. Not quite as fun.
Thanks very much. Just fixed that.
Thanks for your insight. Yes, the “we simplify this for undergrads” thing seems most plausible to me. I guess my concern is that in this particular case, the simplification from “expected consequences matter” to “consequences matter” might be doing more harm than good.
Strongly seconded. Keeping my phone out of reach, out of sight, and on silent is both trivially easy and amazingly effective at reducing distraction. I think that all of those three things (sight, sound, reach) are necessary for me, and I suspect others as well.
Banneker Key! Yeah I was in a very similar position, but basically made the opposite choice (largely because financial costs not internalized)
Interesting, but I think you’re way at the tail end of the distribution on this one. I bet I use Google more than 90%+ of people, but still not as much as I should.
Entirely agree. There are certainly chunks of my life (as a privileged first-worlder) I’d prefer not to have experienced, and these generally these seem less bad than “an average period of the same duration as a Holocaust prisoner.” Given that animals are sentient, I’d put it at at ~98% that their lives are net negative.