Cole Wyeth

Karma: 5,343

I am a PhD student in computer science at the University of Waterloo, supervised by Professor Ming Li and advised by Professor Marcus Hutter.

My current research is related to applications of algorithmic probability to sequential decision theory (universal artificial intelligence). Recently I have been trying to start a dialogue between the computational cognitive science and UAI communities. Sometimes I build robots, professionally or otherwise. Another hobby (and a personal favorite of my posts here) is the Sherlockian abduction master list, which is a crowdsourced project seeking to make “Sherlock Holmes” style inference feasible by compiling observational cues. Give it a read and see if you can contribute!

See my personal website colewyeth.com for an overview of my interests and work.

I do ~two types of writing, academic publications and (lesswrong) posts. With the former I try to be careful enough that I can stand by ~all (strong/central) claims in 10 years, usually by presenting a combination of theorems with rigorous proofs and only more conservative intuitive speculation. With the later, I try to learn enough by writing that I have changed my mind by the time I’m finished—and though I usually include an “epistemic status” to suggest my (final) degree of confidence before posting, the ensuing discussion often changes my mind again. As of mid-2025, I think that the chances of AGI in the next few years are high enough (though still <50%) that it’s best to focus on disseminating safety relevant research as rapidly as possible, so I’m focusing less on long-term goals like academic success and the associated incentives. That means most of my work will appear online in an unpolished form long before it is published.

Cole Wyeth 24 Mar 2026 18:41 UTC
10 points
7
on: Cole Wyeth’s Shortform
It’s easy to criticize the type of “fake” planning that systematically avoids/defuses criticism rather than aiming for success i.e. appearing blameless for a loss versus actually trying to win. But I think there’s a lot of traction to be gained from refusing to lose in any silly way; it means you have to dovetail a lot of reasonable strategies, that you could be criticized for missing eg “why didn’t you just try…” Trying EVERY post facto obvious thing is harder than it sounds and quite powerful (time permitting). Universal search/learning algorithms including market mechanisms like logical induction tend to have this form! So I don’t think this heuristic comes (only) from social approval or moral mazes. In my own research process, I think of it as “being professional.” It’s not the same as being brilliant, but it’s much easier to pull off consistently (if slowly) and it gets results.

Cole Wyeth 24 Mar 2026 1:14 UTC
2 points
0
in reply to: Richard_Ngo’s comment on: A research agenda for the final year
Yeah, you’re probably right.

Cole Wyeth 24 Mar 2026 1:13 UTC
5 points
0
in reply to: MichaelDickens’s comment on: A research agenda for the final year
If you think that corrigibility is an answer to your question, why rule it out? My favorite approaches have that flavor, such as the work of Michael Cohen on “unambitious” versions of AIXI.
The natural abstractions agenda is another example, which directly grapples with ontological crisis.

Cole Wyeth 23 Mar 2026 22:27 UTC
4 points
0
in reply to: MichaelDickens’s comment on: A research agenda for the final year
I don’t think that’s true at all. The point is not to solve ontology and ethics, but to design an AGI that is safe even though your understanding of ontology and (your own) ethics is and likely always will be wrong / shifting. Many agendas attempt to do this.

Cole Wyeth 21 Mar 2026 3:30 UTC
4 points
0
on: Things I Learned by Spending Five Thousand Hours In Non-EA Charities
Interesting to revisit from the perspective of https://www.mindthefuture.info/p/economic-efficiency-often-undermines

Cole Wyeth 21 Mar 2026 1:28 UTC
LW: 3 AF: 1
0
AF
on: The Credit Assignment Problem
Interesting post in light of our discussion at CMU agent foundations 2026, in which I questioned whether (Schurz) meta-inductive justification of induction actually justifies model-based planning as in AIXI, or actually suggests a model-free approach.

Cole Wyeth 18 Mar 2026 22:48 UTC
19 points
2
in reply to: sam’s comment on: sam’s Shortform
I’m also pretty anxious. I kind of imagine that I’m a character in a sci fi novel and I prefer to be the type that does useful stuff despite (or even because) the situation is grim instead of melting down. I guess that’s also loosely Eliezer‘s suggestion in methods of sanity. It doesnt eliminate the anxiety, but it is motivating.

Cole Wyeth 18 Mar 2026 15:56 UTC
4 points
0
in reply to: Steven Byrnes’s comment on: Alignment Proposal: Adversarially Robust Augmentation and Distillation
It may be hard to faithfully imitate a human for 1000 years (particularly since that sounds like quite a distributional shift / it’s not even clear what the right answer is since no human has lived for 1000 years). I believe we’re in agreement on this.
Simulating a human for shorter times on multiple problems in parallel is powerful, but presumably comes at a steep capabilities cost relative to other easier options. So it is worth exploring gains from safe augmentation beyond pure speedup.
Also, at some point we want a plan for scaling qualitatively past human level.

Cole Wyeth 17 Mar 2026 20:40 UTC
4 points
0
in reply to: Steven Byrnes’s comment on: Alignment Proposal: Adversarially Robust Augmentation and Distillation
Presumably this is a learning algorithm with weights, and PyTorch code that updates the weights. My question is: how are the weights being updated? Are they being updated by a continual learning objective (e.g. RL, self-distillation, whatever), or are the weights are being updated by an imitation-learning objective (self-supervised learning on the outputs of the “teacher”)? Or are you interspersing both? Or are there two different sets of weights, one for each type of update? Or what?
An imitation learning objective on the outputs of the teacher (starting from a very strong inductive bias) is the outer loop. During (online) generation, it should of course also simulate gradient updates, using either actual gradient updates (which it has meta-learned how to perform) or perhaps sufficiently rich residual activations. Probably context tokens aren’t enough, though I don’t know, possibly a vast number of internal reasoning tokens would be enough in principle.
My interpretation of this part is: you’re imagining that we have written down a parametrized family of continual learning algorithms, and you have black-box access to a “teacher” continual learning algorithm which we know is somewhere in this space of continual learning algorithms, but we don’t know where. Then I agree (in principle) that you can do imitation learning to home in on which element of your parametrized family of continual learning algorithms matches the teacher.
Does that match what you’re trying to say here?
Yes, that is one way of putting it, though “whatever a human is doing to plan and learn” counts as a continual learning algorithm for my purposes.

Cole Wyeth 17 Mar 2026 15:53 UTC
4 points
0
in reply to: Steven Byrnes’s comment on: Alignment Proposal: Adversarially Robust Augmentation and Distillation
Why not? Well, actually, for an ideal imitation learning algorithm, i.e. Solomonoff induction on an imaginary hypercomputer, my answers would all be “yes”! But in the real world, we don’t have hypercomputers!
Your true objection seems to be that current imitation learning algorithms are not good enough. If you’re saying that pure “in-context learning” without weight updates will not cut it, I think I agree, and I have been one of the most prolific advocates of that view. However, the naive implications of that mental model have overall harmed my predictive performance. I now weakly prefer to bet that minor modifications on the level of (slightly more clever) intermittent finetuning or distillation of own reasoning outputs are sufficient for continual learning.
Elsewhere, you suggest that imitation learning how to learn is actually impossible because you would need to simulate a learning algorithm, and you would really be running that learning algorithm, not imitation learning:
The only practical way to know what happens after millions of steps of some scaled-up continual learning algorithm is to actually do millions of steps of that same scaled-up continual learning algorithm, with actual weights getting actually changed in specifically-designed ways via PyTorch code. And then that’s the scaled-up learning algorithm you’re running. Which means you’re not doing imitation learning.
I disagree; you would be imitation learning to run that learning algorithm, and I see no principled reason this cannot be practical.
I think there’s a bit of miscommunication here. You would in fact need a great continual learning algorithm in order to imitation learn how to continually learn. This may be what you mean by saying you can’t imitation-learn how to continually learn (that is, from scratch, without some base continual learning algorithm to start with). However, I see no principled reason that imitation learning cannot improve / distill a better continual learning algorithm than the one which is performing the imitation learning. But this isn’t very cruxy for me. The point is that given a great continual learning algorithm, you could imitation learn a human policy which includes both planning and (possibly weaker!) “inner loop” continual learning. That would be sufficient for my alignment plan, even if it were “impractical” in the sense of “weakening the continual learning engine.”

Cole Wyeth 16 Mar 2026 15:41 UTC
12 points
1
on: A research agenda for the final year
If your plan for making AI go well requires figuring out once-and-for-all the correct ontology and meta-ethics in the next year, that plan is approximately hopeless, and you should switch to lobbying, which is merely very unlikely to work, rather than predictably guaranteed not to.

Cole Wyeth 11 Mar 2026 3:37 UTC
4 points
−6
in reply to: lc’s comment on: lc’s Shortform
It’s unlikely a priori that anything like our experience of happiness emerges in LLMs, and I haven’t seen anything to suggest it does.

Cole Wyeth 10 Mar 2026 21:50 UTC
13 points
1
on: Economic efficiency often undermines sociopolitical autonomy
I found this pretty persuasive. Though, I’m not sure exactly what it’s persuasive about.

Cole Wyeth 8 Mar 2026 22:20 UTC
30 points
0
on: Cole Wyeth’s Shortform
Tom Sterkenburg will critique Solomonoff induction at the UAI research meeting tomorrow (March 9th, 1:15 pm EST): https://uaiasi.com/2026/03/08/tom-sterkenburg-on-solomonoff-induction/
I think his argument (questioning the universality of SI and the justification of Occam’s razor) is sophisticated and worth taking very seriously (though I ultimately disagree with some of his conclusions). Hope to see some of you there.
Zoom link: https://uwaterloo.zoom.us/j/7921763961?pwd=TDatET6CBu47o4TxyNn9ccL2Ia8HN4.1

Cole Wyeth 8 Mar 2026 22:12 UTC
3 points
0
in reply to: Daniel Kokotajlo’s comment on: Cole Wyeth’s Shortform
Done.

Cole Wyeth 6 Mar 2026 6:14 UTC
7 points
2
in reply to: StartAtTheEnd’s comment on: Maybe there’s a pattern here?
“The same goes for LLMs and AGIs, and I hope this forum realizes so in time.”

But it totally does, that’s like one of our main things?
Also, “realizing it in time” sort of carries the implication that by realizing it, we might solve the problem “in time,” while evidently that is not sufficient.

Cole Wyeth 5 Mar 2026 4:49 UTC
2 points
0
in reply to: rationalelf’s comment on: My model of what is going on with LLMs
Well, it was from a year ago, I’d be less positive today.

Cole Wyeth 2 Mar 2026 3:39 UTC
LW: 2 AF: 1
−2
AF
on: Schelling Goodness, and Shared Morality as a Goal
I wildly speculate that a plurality of people who are not from Paris and end up in something like the coordination game you describe have probably resolved it by meeting at one of the airports. Or at least, more frequently than the Eiffel Tower trick. The former is what I came up with as something I would realistically try, while the later just seems like the intended answer.
There seem to be three airports in Paris today, but probably the one you’re currently at is a good bet—it’s disproportionality likely to be the biggest one, and you don’t have to go anywhere.

Cole Wyeth 2 Mar 2026 3:15 UTC
1 point
0
in reply to: robo’s comment on: Jimrandomh’s Shortform Posts
You say there was too much agreement with OP, but it looks to me like there wasn’t much. 10 net agree votes and ten 10% likely or 25% likely reacts at time of writing.

Cole Wyeth 1 Mar 2026 20:17 UTC
5 points
3
on: Petapixel cameras won’t exist soon
You can’t expect anyone to “quickly prove they’re acting in good faith,” that’s a highly unreasonable bar.