I’ve been doing computational cognitive neuroscience research since getting my PhD in 2006, until the end of 2022. I’ve worked on computatonal theories of vision, executive function, episodic memory, and decision-making. I’ve focused on the emergent interactions that are needed to explain complex thought. I was increasingly concerned with AGI applications of the research, and reluctant to publish my best ideas. I’m incredibly excited to now be working directly on alignment, currently with generous funding from the Astera Institute. More info and publication list here.
Seth Herd
This is evidence of nothing but your (rather odd) lack of noticing. If anything, it might be easier to not notice stimulant meds if you benefit from them, but I’m not sure about that either.
Because they’re relatively short duration, some people take Ritalin to get focused work done (when it’s not interesting enough to generate hyper focus), and not at other times.
This wouldn’t fly on wikipedia and it probably shouldn’t fly on the LW wiki either. Of course, moderating a contentious wiki is a ton of work, and if the LW wiki sees more use, you’ll probably need a bigger mod team.
It’s a dilemma, because using the wiki more as a collaborative summary of alignment work could be a real benefit to the field.
You need to have bunches of people use it for it to be any good, no matter how good the algorithm.
Quick summary: it’s super easy and useful to learn a little speedreading. Just move your finger a bit faster than your eyes are comfortable moving and force yourself to keep up as best you can. Just a little of this can go a long way when combined with a skimming-for-important-bits mindset with nonfiction and academic articles.
Explicit answers:
With Regard To brain function. It’s vague, just this matches my understanding of how the brain works.
I don’t remember. I think it was just a matter of forcing yourself to go faster than you could subvocalize. And to try to notice when you were subvocalizing or not. The core technique in learning speed reading was to move your finger along the lines, and keep going slightly faster. I learned this from the very old book How to Read a Book.
I’m pretty sure it both a) literally hasn’t and more importantly b) effectively has increased my learning rate for semantic knowledge. Fundamentally it doesn’t. It doesn’t allow you to think faster (or at least not much), so if you’re reading complex stuff quickly, you’re just not going to understand or remember it. BUT it allows you to quickly skim to find the semantic knowledge you find worth learning and remembering. So your effect rate is higher. Learning to skim is absolutely crucial for academia, and speedreading is very useful for skimming quickly. You sort of get a vague idea of what you’re reading, and then spend time on the stuff that might be important.
That mentioned some of the downsides. It’s what you were guessing: you can’t really take things in faster, so it’s a quantity/quality tradeoff. Here’s another manifestation. I rarely bother to speedread fiction, because I can’t imagine the setting and action if I do. Come to think of it, maybe I could a bit better if I practiced it more. But I usually just skip ahead or better yet, put the book down if I’m tempted to skim. There are lots of great books to read for pleasure, and if it’s not fun word by word and worth imagining, I don’t really see the point. But a friend of mine speedreads tons of fiction, so there is a point; he says he also can’t imagine it in detail, but I guess he’s enjoying taking in the story in broader strokes.
-
I have no idea what my WPM was or is. It’s abundantly clear that I learned to read far faster.
Probably like level 20? Depends if it’s a nonlinear curve.
Here’s the interesting bit: it was very, very easy to learn some useful speedreading, just by using my finger to force my eyes to move faster on the page (and maybe some lesser techniques I’ve now forgotten). I think I probably spent 20 minutes to an hour doing that explictly, then was able to push my reading speed as high as I want. I think with more practice, I could probably take things in and imagine scenes a little better at high speed, but it seemed like diminishing returns, and I’m not the type to just sit and practice skills. To be fair, I spent my childhood reading instead of doing former schooling, so I might’ve had a deeper skill base to work from.
Excellent! I think that’s a clear and compelling description of the AI alignment problem, particularly in combination with your cartoon images. I think this is worth sharing as an easy intro to the concept.
I’m curious—how did you produce the wonderful images? I can draw a little, and I’d like to be able to illustrate like you did here, whether that involves AI or some other process.
FWIW, I agree that understanding humanity’s alignment challenges is conceptually an extension of the AI alignment problem. But I think it’s commonly termed “coordination” in LW discourse, if you want to see what people have written about that problem here. Moloch is the other term of art for thorny coordination/competition problems.
As I understand it from some cog psych/ linguistics class (it’s not my area but this makes sense WRT brain function), the problem with subvocalizing is that it limits your reading speed to approximately the rate you can talk. So most skilled readers have learned to disconnect from subvocalizing. Part of the training for speedreading is to make sure you’re not subvocalizing at all, and this helped me learn to speedread.
I turn on subvocalizing sometimes when reading poetry or lyrical prose, or sometimes when I’m reading slowly to make damned sure I understand something, or remember its precise phrasing.
That’s true, but the timing and incongruity of a “suicide” the day before testifying seems even more absurdly unlikely than corporations starting to murder people. And it’s not like they’re going out and doing it themselves; they’d be hiring a hitman of some sort. I don’t know how any of that works, and I agree that it’s hard to imagine anyone invested enough in their job or their stock options to risk a murder charge; but they may feel that their chances of avoiding charges are near 100%, so it might make sense to them.
I just have absolutely no other way to explain the story I read (sorry I didn’t get the link since this has nothing to do with AI safety) other than that story being mostly fabricated. People don’t say “finally tomorrow is my day” in the evening and then put a gun in their mouth the next morning without being forced to do it. Ever. No matter how suicidal, you’re sticking around one day to tell your story and get your revenge.
The odds are so much lower than somebody thinking they could hire a hit and get away with it, and make a massive profit on their stock options. They could well also have a personal vendetta against the whistleblower as well as the monetary profit. People are motivated by money and revenge, and they’re prone to misestimating the odds of getting caught. They could even be right that in their case it’s near zero.
So I’m personally putting it at maybe 90% chance of murder.
Ummm, wasn’t one of them just about to testify against Boeing in court, on their safety practices? And they “committed suicide” after saying the day before how much they were looking forward to finally getting a hearing on their side of the story? That’s what I read; I stopped at that point, thinking “about zero chance that wasn’t murder”.
Forecasting is hard.
Forecasting in a domain that includes human psychology, society-level propagation of beliefs, development of entirely new technology, and understanding how a variety of minds work in enough detail to predict not only what they’ll do but how they’ll change—that’s really hard.
So, should we give up, and just prepare for any scenario? I don’t think so. I think we should try harder.
That involves spending more individual time on it, and doing more collaborative prediction with people of different perspectives and different areas of expertise.
On the object level: I think it’s pretty easy to predict now that we’ll have more ChatGPT moments, and the Overton window will shift farther. In particular, I think interacting with a somewhat competent agent with self-awareness will be an emotionally resonant experience for most people who haven’t previously imagined in detail that such a thing might exist soon.
It’s helpful to include a summary with linkposts.
So here’s a super quick one. I didn’t listen to it closely, so I could’ve missed something.
It’s about the article No “Zero-Shot” Without Exponential Data
Here’s the key line from the abstract:
We consistently find that, far from exhibiting “zero-shot” generalization, multimodal models require exponentially more data to achieve linear improvements in downstream “zero-shot” performance, following a sample inefficient log-linear scaling trend
So, we might not continue to get better performance if we need exponentially larger datasets to get small linear improvements. This seems quite plausible, if nobody comes up with some sort of clever bootstrapping in which automatic labeling of images and videos, with a little human feedback, creates useful unlimited size datasets.
I think this isn’t going to much of a slowdown on AGI progress, because we don’t need much more progress on foundation models to build scaffolded agentic cognitive architectures that use system 2 type cognition to gauage their accuracy and the importance of the judgment, and use multiple tries on multiple models for important cognitive acts. That’s how humans are as effective as we are; we monitor and double-check our own cognition when appropriate.
I think future more powerful/useful AIs will understand our intentions better IF they are trained to predict language. Text corpuses contain rich semantics about human intentions.
I can imagine other AI systems that are trained differently, and I would be more worried about those.
That’s what I meant by current AI understanding our intentions possibly better than future AI.
This is an excellent point.
While LLMs seem (relatively) safe, we may very well blow right on by them soon.
I do think that many of the safety advantages of LLMs come from their understanding of human intentions (and therefore implied values). Those would be retained in improved architectures that still predict human language use. If such a system’s thought process was entirely opaque, we could no longer perform Externalized reasoning oversight by “reading its thoughts”.
But think it might be possible to build a reliable agent from unreliable parts. I think humans are such an agent, and evolution made us this way because it’s a way to squeeze extra capability out of a set of base cognitive capacities.
Imagine an agentic set of scaffolding that merely calls the super-LLM for individual cognitive acts. Such an agent would use a hand-coded “System 2” thinking approach to solve problems, like humans do. That involves breaking a problem into cognitive steps. We also use System 2 for our biggest ethical decisions; we predict consequences of our major decisions, and compare them to our goals, including ethical goals. Such a synthetic agent would use System 2 for problem-solving capabilities, and also for checking plans for how well they achieve goals. This would be done for efficiency; spending a lot of compute or external resources on a bad plan would be quite costly. Having implemented it for efficiency, you might as well use it for safety.
This is just restating stuff I’ve said elsewhere, but I’m trying to refine the model, and work through how well it might work if you couldn’t apply any external reasoning oversight, and little to no interpretability. It’s definitely bad for the odds of success, but not necessarily crippling. I think.
This needs more thought. I’m working on a post on System 2 alignment, as sketched out briefly (and probably incomprehensibly) above.
Please just wait until you have the podcast link to post these to LW? We probably don’t want to read it if you went to the trouble of making a podcast.
This is now available as a podcast if you search. I don’t have the RSS feed link handy.
I agree, I have heard that claim many times, probably including the vague claim that it’s “more dangerous” than a poorly-defined imagined alternative. A bunch of pessimistic stuff in the vein of List of Lethalities focuses on reinforcement learning, analyzing how and why that is likely to go wrong. That’s what started me thinking about true alternatives.
So yes, that does clarify why you’ve framed it that way. And I think it’s a useful question.
In fact, I would’ve been prone to say “RL is unsafe and shouldn’t be used”. Porby’s answer to your question is insightful; it notes that other types of learning aren’t that different in kind. It depends how the RL or other learning is done.
One reason that non-RL approaches (at least the few I know of) seem safer is that they’re relying on prediction or other unsupervised learning to create good, reliable representations of the world, including goals for agents. That type of learning is typically better because you can do more of it. You don’t need either a limited set of human-labeled data, which is always many orders of magnititude scarcer than data gathered from sensing the world (e.g., language input for LLMs, images for vision, etc). The other alternative is having a reward-labeling algorithm which can attach reward signals to any data, but that seems unreliable in that we don’t have even good guesses on an algorithm that can identify human values or even reliable instruction-following.
Surely asking if anything is safer is only sensible when comparing it to something. Are you comparing it to some implicit expected-if-not RL method of alignment? I don’t think we have a commonly shared concept of what that would be. That’s why I’m pointing to some explicit alternatives in that post.
Compared to what?
If you want an agentic system (and I think many humans do, because agents can get things done), you’ve got to give it goals somehow. RL is one way to do that. The question of whether that’s less safe isn’t meaningful without comparing it to another method of giving it goals.
The method I think is both safer and implementable is giving goals in natural language, to a system that primarily “thinks” in natural language. I think this is markedly safer than any RL proposal anyone has come up with so far. And there are some other options for specifying goals without using RL, each of which does seem safer to me:
Goals selected from learned knowledge: an alternative to RL alignment
I get conservation of expected evidence. But the distribution of belief changes is completely unconstrained.
Going from the class martingale to the subclass Brownian motion is arbitrary, and the choice of 1% update steps is another unjustified arbitrary choice.
I think asking about the likely possible evidence paths would improve our predictions.
You spelled it conversation of expected evidence. I was hoping there was another term by that name :)
But… Why would p(doom) move like Brownian motion until stopping at 0 or 1?
I don’t disagree with your conclusions, there’s a lot of evidence coming in, and if you’re spending full time or even part time thinking about alignment, a lot of important updates on the inference. But assuming a random walk seems wrong.
Is there a reason that a complex, structured unfolding of reality would look like a random walk?
I think this is quite similar to my proposal in Capabilities and alignment of LLM cognitive architectures.
I think people will add cognitive capabilities to LLMs to create fully capable AGIs. One such important capability is executive function. That function is loosely defined in cognitive psychology, but it is crucial for planning among other things.
I do envision such planning looking loosely like a search algorithm, as it does for humans. But it’s a loose search algorithm, working in the space of statements made by the LLM about possible future states and action outcomes. So it’s more like a tree of thought or graph of thought than any existing search algorithm, because the state space isn’t well defined independently of the algorithm.
That all keeps things more dependent on the LLM black box, as in your final possibility.
At least I think that’s the analogy between the proposals? I’m not sure.
I think the pushback to both of these is roughly: this is safer how?
I don’t think there’s any way to strictly formalize not harming humans. My answer is halfway between that and your “sentiment analysis in each step of planning”. I think we’ll define rules of behavior in natural language, including not harming humans but probably much more elaborate, and implement both internal review, like your sentiment analysis but more elaborate, and external review by humans aided by tool AI (doing something like sentiment analysis), in a form of scalable oversight.
I’m curious if I’m interpreting your proposal correctly. It’s stated very succinctly, so I’m not sure.
Thanks for doing this! It’s looking like we may need major economic changes to keep up with job automation (assuming we don’t get an outright AGI takeover). So, getting started on thinking this stuff through may have immense benefit. Like the alignment problem, it’s embarassing as a species that we haven’t thought about this more when the train appears to be barreling down the tracks. So, kudos and keep it up!
Now, the critique: doing this analysis for only the richest country in the world seems obviously inadequate and not even a good starting point; something like the median country would be more useful. OTOH, I see why you’re doing this; I’m a US citizen and numbers are easier to get here.
So in sum, I think the bigger issue is the second one you mention: global tax reform that can actually capture the profits made from various AI companies and the much larger base of AI-enabled companies that don’t pay nearly as much for AI as they would for labor, but reap massive profits. They will often be “based” in whatever country gives them the lowest tax rates. So we have another thorny global coordination problem.
I was also going to mention not accounting for the tech changes this is accounting for. So I recommend you add that this is part 1 in the intro to head off that frustration among readers.