Rubi J. Hudson

Karma: 822

Defining Monitorable and Useful Goals

Rubi J. Hudson15 Jul 2025 23:06 UTC

15 points

0 comments16 min readLW link

Defining Corrigible and Useful Goals

Rubi J. Hudson25 Jun 2025 3:51 UTC

38 points

2 comments24 min readLW link

Safe Predictive Agents with Joint Scoring Rules

Rubi J. Hudson9 Oct 2024 16:38 UTC

55 points

10 comments17 min readLW link

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural

Rubi J. Hudson16 Jul 2024 22:44 UTC

46 points

27 comments5 min readLW link

A Basic Economics-Style Model of AI Existential Risk

Rubi J. Hudson24 Jun 2024 20:26 UTC

24 points

3 comments7 min readLW link

The Case for Predictive Models

Rubi J. Hudson3 Apr 2024 18:22 UTC

43 points

7 comments8 min readLW link

Searching for Searching for Search

Rubi J. Hudson14 Feb 2024 23:51 UTC

19 points

4 comments7 min readLW link

Conditional Prediction with Zero-Sum Training Solves Self-Fulfilling Prophecies

Rubi J. Hudson and Johannes Treutlein

26 May 2023 17:44 UTC

88 points

13 comments24 min readLW link

Conditioning Predictive Models: Open problems, Conclusion, and Appendix

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

10 Feb 2023 19:21 UTC

36 points

3 comments11 min readLW link

Mechanism Design for AI Safety—Agenda Creation Retreat

Rubi J. Hudson10 Feb 2023 3:05 UTC

24 points

2 comments1 min readLW link

Conditioning Predictive Models: Deployment strategy

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

9 Feb 2023 20:59 UTC

28 points

0 comments10 min readLW link

Conditioning Predictive Models: Interactions with other approaches

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

8 Feb 2023 18:19 UTC

32 points

2 comments11 min readLW link

Conditioning Predictive Models: Making inner alignment as easy as possible

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

7 Feb 2023 20:04 UTC

33 points

2 comments19 min readLW link

Conditioning Predictive Models: The case for competitiveness

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

6 Feb 2023 20:08 UTC

20 points

3 comments11 min readLW link

Conditioning Predictive Models: Outer alignment via careful conditioning

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

2 Feb 2023 20:28 UTC

72 points

15 comments57 min readLW link

Conditioning Predictive Models: Large language models as predictors

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

2 Feb 2023 20:28 UTC

89 points

4 comments13 min readLW link

Stop-gradients lead to fixed point predictions

Johannes Treutlein, Caspar Oesterheld, Rubi J. Hudson and Emery Cooper

28 Jan 2023 22:47 UTC

37 points

2 comments24 min readLW link

Underspecification of Oracle AI

Rubi J. Hudson, Adam Jermyn and Johannes Treutlein

15 Jan 2023 20:10 UTC

30 points

12 comments19 min readLW link

Proper scoring rules don’t guarantee predicting fixed points

Johannes Treutlein, Rubi J. Hudson and Caspar Oesterheld

16 Dec 2022 18:22 UTC

80 points

8 comments21 min readLW link

Mechanism Design for AI Safety—Reading Group Curriculum

Rubi J. Hudson25 Oct 2022 3:54 UTC

15 points

3 comments4 min readLW link