Francis Rhys Ward

Karma: 883

Perspectives on Continual Learning: Survey Results and Forecasts

Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward and Seth Herd

24 Jun 2026 16:30 UTC

33 points

0 comments12 min readLW link

Francis Rhys Ward 17 Jun 2026 15:20 UTC
2 points
0
in reply to: dgros’s comment on: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models
but will note they intentionally focus the main results on tasks with short results.
We do this because it’s a sharper distinct meaning of no-CoT, i.e., in the main plot we restrict to tasks which only require very few forward passes. See the paper and above comments showing that including longer tasks, including generation and agent long-horizon tasks, doesn’t change the trends that much.

Francis Rhys Ward 17 Jun 2026 15:18 UTC
2 points
0
in reply to: dgros’s comment on: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

Francis Rhys Ward 17 Jun 2026 15:14 UTC
3 points
0
in reply to: dgros’s comment on: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models
Quick response:

> (1) which tasks most influence the results
The mainline trends and uncertainty estimates are computed from bootstraps which resample many times over different benchmarks, meaning that the analysis is robust to leaving out any particular (subset of) benchmarks. This is true for both the main doubling trends and the individual model TH estimates. From the paper:
>The fact there is only single task in the >0.5hr regime looks pretty problematic
We have lots of tasks that are >0.5 hr. In the main trend we only include a subset of (mostly shorter) tasks, but including all tasks doesn’t change the trend that much (the overall doubling times reduces a bit because frontier models can do some long horizon SWE tasks without CoT, e.g., just outputting tool calls at each step).
From the paper:

> Figure 22: Comparison of TH trend lines using all tasks including short-answer, generation, and multi-turn agentic. The most significant impact of including the multi-turn agentic tasks is an increase in the point-estimate THs for the latest frontier models which perform very well on these tasks.
It would be helpful if you could ask more precise questions about the rest :)

Angles of attack for continual learning safety

Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward and Seth Herd

16 Jun 2026 16:15 UTC

47 points

0 comments13 min readLW link

Francis Rhys Ward 15 Jun 2026 10:51 UTC
3 points
0
in reply to: Petropolitan’s comment on: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models
We don’t have privileged access from OpenAI. Similar to METR, we use the closest available public models to estimate the capabilities of models that are no longer publicly available.

How might continual learning affect safety and alignment?

Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward and Seth Herd

13 Jun 2026 17:34 UTC

60 points

2 comments16 min readLW link

What’s Continual Learning, and Why Might We Expect To See It In Advanced LLM Agents?

RohanS, Rauno Arike, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward and Seth Herd

12 Jun 2026 18:43 UTC

32 points

2 comments17 min readLW link

Implications of Continual Learning for LLM Agents: Introduction

RohanS, Rauno Arike, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward and Seth Herd

12 Jun 2026 18:36 UTC

49 points

0 comments6 min readLW link

Francis Rhys Ward 11 Jun 2026 12:55 UTC
1 point
0
in reply to: J Bostock’s comment on: Three types of model organism
>I think ablations/knockouts (e.g. helpful-only models, RLVR-only models, models without X piece of post-training) should also be counted here.

I would count these as “natural”—where the definitive feature is to understand training pipelines and their safety properties or failure modes.

Francis Rhys Ward 11 Jun 2026 12:53 UTC
1 point
0
in reply to: Daniel Tan’s comment on: Three types of model organism
One difference is: Worst-case MOs are supposed to upper-bound the difficulty of some problem, like eliciting hidden goals, they need not exhibit super realistic behaviours or mechanisms. Constructed MOs are supposed to behave similarly to the real-life case so you can learn about the real situation, but they need not be a difficult case for safety measures.

Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

Anders Cairns Woodruff, Francis Rhys Ward, Dewi Gould, Rauno Arike, Jason R Brown, Jo Jiao, wlanderson, ariana_azarbal, harrymayne, Patrick Leask, Twm Stone, Josh Hills, Ida Caspary, Shubhorup Biswas and Julian Stastny

10 Jun 2026 17:58 UTC

275 points

23 comments4 min readLW link

Francis Rhys Ward 10 Jun 2026 9:39 UTC
4 points
0
in reply to: Stephen Fowler’s comment on: Three types of model organism
Please interpret me as saying that the hope with the methodology for worst-case MOs is that we can have good reason to believe that the problem is strictly harder than the real case, rather than the methodology itself being to cross your fingers and hope that the MO is strictly harder without good reason.

Three types of model organism

Francis Rhys Ward10 Jun 2026 8:50 UTC

58 points

8 comments2 min readLW link

Francis Rhys Ward 1 Apr 2026 0:03 UTC
1 point
0
on: Is Bayesianism Susceptible to the Mail-Order Prophet Scam?
Bayesian epistemology typically works in the framework of an existing hypothesis space, with a prior over that space, which is then updated. In addition to updating your credences about the possibilities in the space, you can also reformulate your hypothesis space itself, e.g., because you become aware of new possibilities (like the existence of scammers), or because you want to carve the world into different concepts due to some ontological shift. I think the Bayesian should just be allowed to reformulate their hypothesis space and reform their prior to get out of this.

[Paper] How does information access affect LLM monitors’ ability to detect sabotage?

Rauno Arike, Raja Moreno, RohanS, Shubhorup Biswas and Francis Rhys Ward

11 Feb 2026 21:25 UTC

26 points

0 comments6 min readLW link

The Elicitation Game: Evaluating capability elicitation techniques

Teun van der Weij, Felix Hofstätter, JaydenTeoh, HenningB and Francis Rhys Ward

27 Feb 2025 20:33 UTC

15 points

1 comment2 min readLW link

Why care about AI personhood?

Francis Rhys Ward26 Jan 2025 11:24 UTC

43 points

6 comments3 min readLW link

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Teun van der Weij, Felix Hofstätter, Ollie J, Sam F. Brown and Francis Rhys Ward

13 Jun 2024 10:04 UTC

84 points

10 comments2 min readLW link

(arxiv.org)

Francis Rhys Ward 14 May 2024 2:49 UTC
2 points
0
in reply to: Teun van der Weij’s comment on: An Introduction to AI Sandbagging
Nathan’s suggestion is that adding noise to a sandbagging model might increase performance, rather than decrease it as usual for a non-sandbagging model. It’s an interesting idea!