Alignment Newsletter

1 Aug 2018 18:31 UTC

I publish the Alignment Newsletter, a weekly publication with recent content relevant to AI alignment. See here for more details. Quick links: email signup form, RSS feed, spreadsheet of all summaries.

The Alignment Newsletter #1: 04/09/18

Rohin Shah9 Apr 2018 16:00 UTC

12 points

3 comments4 min readLW link

The Alignment Newsletter #2: 04/16/18

Rohin Shah16 Apr 2018 16:00 UTC

8 points

0 comments5 min readLW link

The Alignment Newsletter #3: 04/23/18

Rohin Shah23 Apr 2018 16:00 UTC

9 points

0 comments6 min readLW link

The Alignment Newsletter #4: 04/30/18

Rohin Shah30 Apr 2018 16:00 UTC

8 points

0 comments3 min readLW link

The Alignment Newsletter #5: 05/07/18

Rohin Shah7 May 2018 16:00 UTC

8 points

0 comments7 min readLW link

The Alignment Newsletter #6: 05/14/18

Rohin Shah14 May 2018 16:00 UTC

8 points

0 comments2 min readLW link

The Alignment Newsletter #7: 05/21/18

Rohin Shah21 May 2018 16:00 UTC

8 points

0 comments5 min readLW link

The Alignment Newsletter #8: 05/28/18

Rohin Shah28 May 2018 16:00 UTC

8 points

0 comments6 min readLW link

The Alignment Newsletter #9: 06/04/18

Rohin Shah4 Jun 2018 16:00 UTC

8 points

0 comments2 min readLW link

The Alignment Newsletter #10: 06/11/18

Rohin Shah11 Jun 2018 16:00 UTC

16 points

0 comments9 min readLW link

The Alignment Newsletter #11: 06/18/18

Rohin Shah18 Jun 2018 16:00 UTC

8 points

0 comments10 min readLW link

The Alignment Newsletter #12: 06/25/18

Rohin Shah25 Jun 2018 16:00 UTC

15 points

0 comments3 min readLW link

Alignment Newsletter #13: 07/02/18

Rohin Shah2 Jul 2018 16:10 UTC

70 points

12 comments8 min readLW link

(mailchi.mp)

Alignment Newsletter #14

Rohin Shah9 Jul 2018 16:20 UTC

14 points

0 comments9 min readLW link

(mailchi.mp)

Alignment Newsletter #15: 07/16/18

Rohin Shah16 Jul 2018 16:10 UTC

42 points

0 comments15 min readLW link

(mailchi.mp)

Alignment Newsletter #16: 07/23/18

Rohin Shah23 Jul 2018 16:20 UTC

42 points

0 comments12 min readLW link

(mailchi.mp)

Alignment Newsletter #17

Rohin Shah30 Jul 2018 16:10 UTC

32 points

0 comments13 min readLW link

(mailchi.mp)

Alignment Newsletter #18

Rohin Shah6 Aug 2018 16:00 UTC

17 points

0 comments10 min readLW link

(mailchi.mp)

Alignment Newsletter #19

Rohin Shah14 Aug 2018 2:10 UTC

18 points

0 comments13 min readLW link

(mailchi.mp)

Alignment Newsletter #20

Rohin Shah20 Aug 2018 16:00 UTC

12 points

2 comments6 min readLW link

(mailchi.mp)

Alignment Newsletter #21

Rohin Shah27 Aug 2018 16:20 UTC

25 points

0 comments7 min readLW link

(mailchi.mp)

Alignment Newsletter #22

Rohin Shah3 Sep 2018 16:10 UTC

18 points

0 comments6 min readLW link

(mailchi.mp)

Alignment Newsletter #23

Rohin Shah10 Sep 2018 17:10 UTC

16 points

0 comments7 min readLW link

(mailchi.mp)

Alignment Newsletter #24

Rohin Shah17 Sep 2018 16:20 UTC

10 points

6 comments12 min readLW link

(mailchi.mp)

Alignment Newsletter #25

Rohin Shah24 Sep 2018 16:10 UTC

18 points

3 comments9 min readLW link

(mailchi.mp)

Alignment Newsletter #26

Rohin Shah2 Oct 2018 16:10 UTC

13 points

0 comments7 min readLW link

(mailchi.mp)

Alignment Newsletter #27

Rohin Shah9 Oct 2018 1:10 UTC

16 points

0 comments9 min readLW link

(mailchi.mp)

Alignment Newsletter #28

Rohin Shah15 Oct 2018 21:20 UTC

11 points

0 comments8 min readLW link

(mailchi.mp)

Alignment Newsletter #29

Rohin Shah22 Oct 2018 16:20 UTC

15 points

0 comments9 min readLW link

(mailchi.mp)

Alignment Newsletter #30

Rohin Shah29 Oct 2018 16:10 UTC

29 points

2 comments6 min readLW link

(mailchi.mp)

Alignment Newsletter #31

Rohin Shah5 Nov 2018 23:50 UTC

17 points

0 comments12 min readLW link

(mailchi.mp)

Alignment Newsletter #32

Rohin Shah12 Nov 2018 17:20 UTC

18 points

0 comments12 min readLW link

(mailchi.mp)

Alignment Newsletter #33

Rohin Shah19 Nov 2018 17:20 UTC

23 points

0 comments9 min readLW link

(mailchi.mp)

Alignment Newsletter #34

Rohin Shah26 Nov 2018 23:10 UTC

24 points

0 comments10 min readLW link

(mailchi.mp)

Alignment Newsletter #35

Rohin Shah4 Dec 2018 1:10 UTC

15 points

0 comments6 min readLW link

(mailchi.mp)

Alignment Newsletter #36

Rohin Shah12 Dec 2018 1:10 UTC

21 points

0 comments11 min readLW link

(mailchi.mp)

Alignment Newsletter #37

Rohin Shah17 Dec 2018 19:10 UTC

25 points

4 comments10 min readLW link

(mailchi.mp)

Alignment Newsletter #38

Rohin Shah25 Dec 2018 16:10 UTC

9 points

0 comments8 min readLW link

(mailchi.mp)

Alignment Newsletter #39

Rohin Shah1 Jan 2019 8:10 UTC

32 points

2 comments5 min readLW link

(mailchi.mp)

Alignment Newsletter #40

Rohin Shah8 Jan 2019 20:10 UTC

21 points

2 comments5 min readLW link

(mailchi.mp)

Alignment Newsletter #41

Rohin Shah17 Jan 2019 8:10 UTC

22 points

6 comments10 min readLW link

(mailchi.mp)

Alignment Newsletter #42

Rohin Shah22 Jan 2019 2:00 UTC

20 points

1 comment10 min readLW link

(mailchi.mp)

Alignment Newsletter #43

Rohin Shah29 Jan 2019 21:10 UTC

14 points

2 comments13 min readLW link

(mailchi.mp)

Alignment Newsletter #44

Rohin Shah6 Feb 2019 8:30 UTC

18 points

0 comments9 min readLW link

(mailchi.mp)

Alignment Newsletter #45

Rohin Shah14 Feb 2019 2:10 UTC

25 points

2 comments8 min readLW link

(mailchi.mp)

Alignment Newsletter #46

Rohin Shah22 Feb 2019 0:10 UTC

12 points

0 comments9 min readLW link

(mailchi.mp)

Alignment Newsletter #47

Rohin Shah4 Mar 2019 4:30 UTC

18 points

0 comments8 min readLW link

(mailchi.mp)

Alignment Newsletter #48

Rohin Shah11 Mar 2019 21:10 UTC

29 points

14 comments9 min readLW link

(mailchi.mp)

Alignment Newsletter #49

Rohin Shah20 Mar 2019 4:20 UTC

23 points

1 comment11 min readLW link

(mailchi.mp)

Alignment Newsletter #50

Rohin Shah28 Mar 2019 18:10 UTC

15 points

2 comments10 min readLW link

(mailchi.mp)

Alignment Newsletter #51

Rohin Shah3 Apr 2019 4:10 UTC

25 points

2 comments15 min readLW link

(mailchi.mp)

Alignment Newsletter #52

Rohin Shah6 Apr 2019 1:20 UTC

19 points

1 comment8 min readLW link

(mailchi.mp)

Alignment Newsletter One Year Retrospective

Rohin Shah10 Apr 2019 6:58 UTC

94 points

31 comments21 min readLW link

Alignment Newsletter #53

Rohin Shah18 Apr 2019 17:20 UTC

20 points

0 comments8 min readLW link

(mailchi.mp)

[AN #54] Boxing a finite-horizon AI system to keep it unambitious

Rohin Shah28 Apr 2019 5:20 UTC

20 points

0 comments8 min readLW link

(mailchi.mp)

[AN #55] Regulatory markets and international standards as a means of ensuring beneficial AI

Rohin Shah5 May 2019 2:20 UTC

17 points

2 comments8 min readLW link

(mailchi.mp)

[AN #56] Should ML researchers stop running experiments before making hypotheses?

Rohin Shah21 May 2019 2:20 UTC

21 points

8 comments9 min readLW link

(mailchi.mp)

[AN #57] Why we should focus on robustness in AI safety, and the analogous problems in programming

Rohin Shah5 Jun 2019 23:20 UTC

26 points

15 comments7 min readLW link

(mailchi.mp)

[AN #58] Mesa optimization: what it is, and why we should care

Rohin Shah24 Jun 2019 16:10 UTC

55 points

10 comments8 min readLW link

(mailchi.mp)

[AN #59] How arguments for AI risk have changed over time

Rohin Shah8 Jul 2019 17:20 UTC

43 points

4 comments7 min readLW link

(mailchi.mp)

[AN #60] A new AI challenge: Minecraft agents that assist human players in creative mode

Rohin Shah22 Jul 2019 17:00 UTC

23 points

6 comments9 min readLW link

(mailchi.mp)

[AN #61] AI policy and governance, from two people in the field

Rohin Shah5 Aug 2019 17:00 UTC

12 points

2 comments9 min readLW link

(mailchi.mp)

[AN #62] Are adversarial examples caused by real but imperceptible features?

Rohin Shah22 Aug 2019 17:10 UTC

28 points

10 comments9 min readLW link

(mailchi.mp)

[AN #63] How architecture search, meta learning, and environment design could lead to general intelligence

Rohin Shah10 Sep 2019 19:10 UTC

21 points

12 comments8 min readLW link

(mailchi.mp)

[AN #64]: Using Deep RL and Reward Uncertainty to Incentivize Preference Learning

Rohin Shah16 Sep 2019 17:10 UTC

11 points

8 comments7 min readLW link

(mailchi.mp)

[AN #65]: Learning useful skills by watching humans “play”

Rohin Shah23 Sep 2019 17:30 UTC

11 points

0 comments9 min readLW link

(mailchi.mp)

[AN #66]: Decomposing robustness into capability robustness and alignment robustness

Rohin Shah30 Sep 2019 18:00 UTC

12 points

1 comment7 min readLW link

(mailchi.mp)

[AN #67]: Creating environments in which to study inner alignment failures

Rohin Shah7 Oct 2019 17:10 UTC

17 points

0 comments8 min readLW link

(mailchi.mp)

[AN #68]: The attainable utility theory of impact

Rohin Shah14 Oct 2019 17:00 UTC

17 points

0 comments8 min readLW link

(mailchi.mp)

[AN #69] Stuart Russell’s new book on why we need to replace the standard model of AI

Rohin Shah19 Oct 2019 0:30 UTC

60 points

12 comments15 min readLW link

(mailchi.mp)

[AN #70]: Agents that help humans who are still learning about their own preferences

Rohin Shah23 Oct 2019 17:10 UTC

16 points

0 comments9 min readLW link

(mailchi.mp)

[AN #71]: Avoiding reward tampering through current-RF optimization

Rohin Shah30 Oct 2019 17:10 UTC

12 points

0 comments7 min readLW link

(mailchi.mp)

[AN #72]: Alignment, robustness, methodology, and system building as research priorities for AI safety

Rohin Shah6 Nov 2019 18:10 UTC

26 points

4 comments10 min readLW link

(mailchi.mp)

[AN #73]: Detecting catastrophic failures by learning how agents tend to break

Rohin Shah13 Nov 2019 18:10 UTC

11 points

0 comments7 min readLW link

(mailchi.mp)

[AN #74]: Separating beneficial AI into competence, alignment, and coping with impacts

Rohin Shah20 Nov 2019 18:20 UTC

19 points

0 comments7 min readLW link

(mailchi.mp)

[AN #75]: Solving Atari and Go with learned game models, and thoughts from a MIRI employee

Rohin Shah27 Nov 2019 18:10 UTC

38 points

1 comment10 min readLW link

(mailchi.mp)

[AN #76]: How dataset size affects robustness, and benchmarking safe exploration by measuring constraint violations

Rohin Shah4 Dec 2019 18:10 UTC

14 points

6 comments9 min readLW link

(mailchi.mp)

[AN #77]: Double descent: a unification of statistical theory and modern ML practice

Rohin Shah18 Dec 2019 18:30 UTC

21 points

4 comments14 min readLW link

(mailchi.mp)

[AN #78] Formalizing power and instrumental convergence, and the end-of-year AI safety charity comparison

Rohin Shah26 Dec 2019 1:10 UTC

26 points

10 comments9 min readLW link

(mailchi.mp)

[AN #79]: Recursive reward modeling as an alignment technique integrated with deep RL

Rohin Shah1 Jan 2020 18:00 UTC

13 points

0 comments12 min readLW link

(mailchi.mp)

[AN #80]: Why AI risk might be solved without additional intervention from longtermists

Rohin Shah2 Jan 2020 18:20 UTC

36 points

95 comments10 min readLW link

(mailchi.mp)

[AN #81]: Universality as a potential solution to conceptual difficulties in intent alignment

Rohin Shah8 Jan 2020 18:00 UTC

32 points

4 comments11 min readLW link

(mailchi.mp)

[AN #82]: How OpenAI Five distributed their training computation

Rohin Shah15 Jan 2020 18:20 UTC

19 points

0 comments8 min readLW link

(mailchi.mp)

[AN #83]: Sample-efficient deep learning with ReMixMatch

Rohin Shah22 Jan 2020 18:10 UTC

15 points

4 comments11 min readLW link

(mailchi.mp)

[AN #84] Reviewing AI alignment work in 2018-19

Rohin Shah29 Jan 2020 18:30 UTC

23 points

0 comments6 min readLW link

(mailchi.mp)

[AN #85]: The normative questions we should be asking for AI alignment, and a surprisingly good chatbot

Rohin Shah5 Feb 2020 18:20 UTC

14 points

2 comments7 min readLW link

(mailchi.mp)

[AN #86]: Improving debate and factored cognition through human experiments

Rohin Shah12 Feb 2020 18:10 UTC

15 points

0 comments9 min readLW link

(mailchi.mp)

[AN #87]: What might happen as deep learning scales even further?

Rohin Shah19 Feb 2020 18:20 UTC

28 points

0 comments4 min readLW link

(mailchi.mp)

[AN #88]: How the principal-agent literature relates to AI risk

Rohin Shah27 Feb 2020 9:10 UTC

18 points

0 comments9 min readLW link

(mailchi.mp)

[AN #89]: A unifying formalism for preference learning algorithms

Rohin Shah4 Mar 2020 18:20 UTC

16 points

0 comments9 min readLW link

(mailchi.mp)

[AN #90]: How search landscapes can contain self-reinforcing feedback loops

Rohin Shah11 Mar 2020 17:30 UTC

11 points

6 comments8 min readLW link

(mailchi.mp)

[AN #91]: Concepts, implementations, problems, and a benchmark for impact measurement

Rohin Shah18 Mar 2020 17:10 UTC

15 points

10 comments13 min readLW link

(mailchi.mp)

[AN #92]: Learning good representations with contrastive predictive coding

Rohin Shah25 Mar 2020 17:20 UTC

18 points

1 comment10 min readLW link

(mailchi.mp)

[AN #93]: The Precipice we’re standing at, and how we can back away from it

Rohin Shah1 Apr 2020 17:10 UTC

24 points

0 comments7 min readLW link

(mailchi.mp)

[AN #94]: AI alignment as translation between humans and machines

Rohin Shah8 Apr 2020 17:10 UTC

11 points

0 comments7 min readLW link

(mailchi.mp)

[AN #95]: A framework for thinking about how to make AI go well

Rohin Shah15 Apr 2020 17:10 UTC

20 points

2 comments10 min readLW link

(mailchi.mp)

[AN #96]: Buck and I discuss/argue about AI Alignment

Rohin Shah22 Apr 2020 17:20 UTC

17 points

4 comments10 min readLW link

(mailchi.mp)

[AN #97]: Are there historical examples of large, robust discontinuities?

Rohin Shah29 Apr 2020 17:30 UTC

15 points

0 comments10 min readLW link

(mailchi.mp)

[AN #98]: Understanding neural net training by seeing which gradients were helpful

Rohin Shah6 May 2020 17:10 UTC

22 points

3 comments9 min readLW link

(mailchi.mp)

[AN #99]: Doubling times for the efficiency of AI algorithms

Rohin Shah13 May 2020 17:20 UTC

29 points

0 comments10 min readLW link

(mailchi.mp)

[AN #100]: What might go wrong if you learn a reward function while acting

Rohin Shah20 May 2020 17:30 UTC

33 points

2 comments12 min readLW link

(mailchi.mp)

[AN #101]: Why we should rigorously measure and forecast AI progress

Rohin Shah27 May 2020 17:20 UTC

15 points

0 comments10 min readLW link

(mailchi.mp)

[AN #102]: Meta learning by GPT-3, and a list of full proposals for AI alignment

Rohin Shah3 Jun 2020 17:20 UTC

38 points

6 comments10 min readLW link

(mailchi.mp)

[AN #103]: ARCHES: an agenda for existential safety, and combining natural language with deep RL

Rohin Shah10 Jun 2020 17:20 UTC

29 points

0 comments10 min readLW link

(mailchi.mp)

[AN #104]: The perils of inaccessible information, and what we can learn about AI alignment from COVID

Rohin Shah18 Jun 2020 17:10 UTC

19 points

5 comments8 min readLW link

(mailchi.mp)

[AN #105]: The economic trajectory of humanity, and what we might mean by optimization

Rohin Shah24 Jun 2020 17:30 UTC

24 points

3 comments11 min readLW link

(mailchi.mp)

[AN #106]: Evaluating generalization ability of learned reward models

Rohin Shah1 Jul 2020 17:20 UTC

14 points

2 comments11 min readLW link

(mailchi.mp)

[AN #107]: The convergent instrumental subgoals of goal-directed agents

Rohin Shah16 Jul 2020 6:47 UTC

13 points

1 comment8 min readLW link

(mailchi.mp)

[AN #108]: Why we should scrutinize arguments for AI risk

Rohin Shah16 Jul 2020 6:47 UTC

19 points

6 comments12 min readLW link

(mailchi.mp)

[AN #109]: Teaching neural nets to generalize the way humans would

Rohin Shah22 Jul 2020 17:10 UTC

17 points

3 comments9 min readLW link

(mailchi.mp)

[AN #110]: Learning features from human feedback to enable reward learning

Rohin Shah29 Jul 2020 17:20 UTC

13 points

2 comments10 min readLW link

(mailchi.mp)

[AN #111]: The Circuits hypotheses for deep learning

Rohin Shah5 Aug 2020 17:40 UTC

23 points

0 comments9 min readLW link

(mailchi.mp)

[AN #112]: Engineering a Safer World

Rohin Shah13 Aug 2020 17:20 UTC

26 points

2 comments12 min readLW link

(mailchi.mp)

[AN #113]: Checking the ethical intuitions of large language models

Rohin Shah19 Aug 2020 17:10 UTC

23 points

0 comments9 min readLW link

(mailchi.mp)

[AN #114]: Theory-inspired safety solutions for powerful Bayesian RL agents

Rohin Shah26 Aug 2020 17:20 UTC

21 points

3 comments8 min readLW link

(mailchi.mp)

[AN #115]: AI safety research problems in the AI-GA framework

Rohin Shah2 Sep 2020 17:10 UTC

19 points

16 comments6 min readLW link

(mailchi.mp)

[AN #116]: How to make explanations of neurons compositional

Rohin Shah9 Sep 2020 17:20 UTC

21 points

2 comments9 min readLW link

(mailchi.mp)

[AN #117]: How neural nets would fare under the TEVV framework

Rohin Shah16 Sep 2020 17:20 UTC

27 points

0 comments7 min readLW link

(mailchi.mp)

[AN #118]: Risks, solutions, and prioritization in a world with many AI systems

Rohin Shah23 Sep 2020 18:20 UTC

15 points

6 comments10 min readLW link

(mailchi.mp)

[AN #119]: AI safety when agents are shaped by environments, not rewards

Rohin Shah30 Sep 2020 17:10 UTC

11 points

0 comments11 min readLW link

(mailchi.mp)

[AN #120]: Tracing the intellectual roots of AI and AI alignment

Rohin Shah7 Oct 2020 17:10 UTC

13 points

4 comments10 min readLW link

(mailchi.mp)

[AN #121]: Forecasting transformative AI timelines using biological anchors

Rohin Shah14 Oct 2020 17:20 UTC

29 points

5 comments14 min readLW link

(mailchi.mp)

[AN #122]: Arguing for AGI-driven existential risk from first principles

Rohin Shah21 Oct 2020 17:10 UTC

28 points

0 comments9 min readLW link

(mailchi.mp)

[AN #123]: Inferring what is valuable in order to align recommender systems

Rohin Shah28 Oct 2020 17:00 UTC

20 points

1 comment8 min readLW link

(mailchi.mp)

[AN #124]: Provably safe exploration through shielding

Rohin Shah4 Nov 2020 18:20 UTC

17 points

0 comments9 min readLW link

(mailchi.mp)

[AN #125]: Neural network scaling laws across multiple modalities

Rohin Shah11 Nov 2020 18:20 UTC

25 points

7 comments9 min readLW link

(mailchi.mp)

[AN #126]: Avoiding wireheading by decoupling action feedback from action effects

Rohin Shah26 Nov 2020 23:20 UTC

24 points

1 comment10 min readLW link

(mailchi.mp)

[AN #127]: Rethinking agency: Cartesian frames as a formalization of ways to carve up the world into an agent and its environment

Rohin Shah2 Dec 2020 18:20 UTC

53 points

0 comments13 min readLW link

(mailchi.mp)

[AN #128]: Prioritizing research on AI existential safety based on its application to governance demands

Rohin Shah9 Dec 2020 18:20 UTC

16 points

2 comments10 min readLW link

(mailchi.mp)

[AN #129]: Explaining double descent by measuring bias and variance

Rohin Shah16 Dec 2020 18:10 UTC

14 points

1 comment7 min readLW link

(mailchi.mp)

[AN #130]: A new AI x-risk podcast, and reviews of the field

Rohin Shah24 Dec 2020 18:20 UTC

8 points

0 comments7 min readLW link

(mailchi.mp)

[AN #131]: Formalizing the argument of ignored attributes in a utility function

Rohin Shah31 Dec 2020 18:20 UTC

13 points

4 comments9 min readLW link

(mailchi.mp)

[AN #132]: Complex and subtly incorrect arguments as an obstacle to debate

Rohin Shah6 Jan 2021 18:20 UTC

19 points

1 comment19 min readLW link

(mailchi.mp)

[AN #133]: Building machines that can cooperate (with humans, institutions, or other machines)

Rohin Shah13 Jan 2021 18:10 UTC

14 points

0 comments9 min readLW link

(mailchi.mp)

[AN #134]: Underspecification as a cause of fragility to distribution shift

Rohin Shah21 Jan 2021 18:10 UTC

13 points

0 comments7 min readLW link

(mailchi.mp)

[AN #135]: Five properties of goal-directed systems

Rohin Shah27 Jan 2021 18:10 UTC

33 points

0 comments8 min readLW link

(mailchi.mp)

[AN #136]: How well will GPT-N perform on downstream tasks?

Rohin Shah3 Feb 2021 18:10 UTC

21 points

2 comments9 min readLW link

(mailchi.mp)

[AN #137]: Quantifying the benefits of pretraining on downstream task performance

Rohin Shah10 Feb 2021 18:10 UTC

18 points

0 comments8 min readLW link

(mailchi.mp)

[AN #138]: Why AI governance should find problems rather than just solving them

Rohin Shah17 Feb 2021 18:50 UTC

12 points

0 comments9 min readLW link

(mailchi.mp)

[AN #139]: How the simplicity of reality explains the success of neural nets

Rohin Shah24 Feb 2021 18:30 UTC

26 points

6 comments12 min readLW link

(mailchi.mp)

[AN #140]: Theoretical models that predict scaling laws

Rohin Shah4 Mar 2021 18:10 UTC

46 points

8 comments10 min readLW link

(mailchi.mp)

[AN #141]: The case for practicing alignment work on GPT-3 and other large models

Rohin Shah10 Mar 2021 18:30 UTC

27 points

4 comments8 min readLW link

(mailchi.mp)

[AN #142]: The quest to understand a network well enough to reimplement it by hand

Rohin Shah17 Mar 2021 17:10 UTC

36 points

4 comments8 min readLW link

(mailchi.mp)

[AN #143]: How to make embedded agents that reason probabilistically about their environments

Rohin Shah24 Mar 2021 17:20 UTC

13 points

3 comments8 min readLW link

(mailchi.mp)

[AN #144]: How language models can also be finetuned for non-language tasks

Rohin Shah2 Apr 2021 17:20 UTC

19 points

0 comments6 min readLW link

(mailchi.mp)

Alignment Newsletter Three Year Retrospective

Rohin Shah7 Apr 2021 14:39 UTC

55 points

0 comments5 min readLW link

[AN #145]: Our three year anniversary!

Rohin Shah9 Apr 2021 17:48 UTC

19 points

0 comments8 min readLW link

(mailchi.mp)

[AN #146]: Plausible stories of how we might fail to avert an existential catastrophe

Rohin Shah14 Apr 2021 17:30 UTC

23 points

1 comment8 min readLW link

(mailchi.mp)

[AN #147]: An overview of the interpretability landscape

Rohin Shah21 Apr 2021 17:10 UTC

14 points

2 comments7 min readLW link

(mailchi.mp)

[AN #148]: Analyzing generalization across more axes than just accuracy or loss

Rohin Shah28 Apr 2021 18:30 UTC

24 points

5 comments11 min readLW link

(mailchi.mp)

[AN #149]: The newsletter’s editorial policy

Rohin Shah5 May 2021 17:10 UTC

19 points

3 comments8 min readLW link

(mailchi.mp)

[AN #150]: The subtypes of Cooperative AI research

Rohin Shah12 May 2021 17:20 UTC

17 points

0 comments6 min readLW link

(mailchi.mp)

[AN #151]: How sparsity in the final layer makes a neural net debuggable

Rohin Shah19 May 2021 17:20 UTC

19 points

2 comments6 min readLW link

(mailchi.mp)

[AN #152]: How we’ve overestimated few-shot learning capabilities

Rohin Shah16 Jun 2021 17:20 UTC

22 points

6 comments8 min readLW link

(mailchi.mp)

[AN #153]: Experiments that demonstrate failures of objective robustness

Rohin Shah26 Jun 2021 17:10 UTC

25 points

1 comment8 min readLW link

(mailchi.mp)

[AN #154]: What economic growth theory has to say about transformative AI

Rohin Shah30 Jun 2021 17:20 UTC

12 points

0 comments9 min readLW link

(mailchi.mp)

[AN #155]: A Minecraft benchmark for algorithms that learn without reward functions

Rohin Shah8 Jul 2021 17:20 UTC

21 points

5 comments7 min readLW link

(mailchi.mp)

[AN #156]: The scaling hypothesis: a plan for building AGI

Rohin Shah16 Jul 2021 17:10 UTC

46 points

20 comments8 min readLW link

(mailchi.mp)

[AN #157]: Measuring misalignment in the technology underlying Copilot

Rohin Shah23 Jul 2021 17:20 UTC

28 points

18 comments7 min readLW link

(mailchi.mp)

[AN #158]: Should we be optimistic about generalization?

Rohin Shah29 Jul 2021 17:20 UTC

20 points

0 comments8 min readLW link

(mailchi.mp)

[AN #159]: Building agents that know how to experiment, by training on procedurally generated games

Rohin Shah4 Aug 2021 17:10 UTC

18 points

4 comments14 min readLW link

(mailchi.mp)

[AN #160]: Building AIs that learn and think like people

Rohin Shah13 Aug 2021 17:10 UTC

28 points

6 comments10 min readLW link

(mailchi.mp)

[AN #161]: Creating generalizable reward functions for multiple tasks by learning a model of functional similarity

Rohin Shah20 Aug 2021 17:20 UTC

15 points

0 comments9 min readLW link

(mailchi.mp)

[AN #162]: Foundation models: a paradigm shift within AI

Rohin Shah27 Aug 2021 17:20 UTC

25 points

0 comments8 min readLW link

(mailchi.mp)

[AN #163]: Using finite factored sets for causal and temporal inference

Rohin Shah8 Sep 2021 17:20 UTC

41 points

0 comments10 min readLW link

(mailchi.mp)

[AN #164]: How well can language models write code?

Rohin Shah15 Sep 2021 17:20 UTC

13 points

7 comments9 min readLW link

(mailchi.mp)

[AN #165]: When large models are more likely to lie

Rohin Shah22 Sep 2021 17:30 UTC

23 points

0 comments8 min readLW link

(mailchi.mp)

[AN #166]: Is it crazy to claim we’re in the most important century?

Rohin Shah8 Oct 2021 17:30 UTC

52 points

5 comments8 min readLW link

(mailchi.mp)

[AN #167]: Concrete ML safety problems and their relevance to x-risk

Rohin Shah20 Oct 2021 17:10 UTC

21 points

4 comments9 min readLW link

(mailchi.mp)

[AN #168]: Four technical topics for which Open Phil is soliciting grant proposals

Rohin Shah28 Oct 2021 17:20 UTC

15 points

0 comments9 min readLW link

(mailchi.mp)

[AN #169]: Collaborating with humans without human data

Rohin Shah24 Nov 2021 18:30 UTC

33 points

0 comments8 min readLW link

(mailchi.mp)

[AN #170]: Analyzing the argument for risk from power-seeking AI

Rohin Shah8 Dec 2021 18:10 UTC

21 points

1 comment7 min readLW link

(mailchi.mp)

[AN #171]: Disagreements between alignment “optimists” and “pessimists”

Rohin Shah21 Jan 2022 18:30 UTC

32 points

1 comment7 min readLW link

(mailchi.mp)

[AN #172] Sorry for the long hiatus!

Rohin Shah5 Jul 2022 6:20 UTC

54 points

0 comments3 min readLW link

(mailchi.mp)

[AN #173] Recent language model results from DeepMind

Rohin Shah21 Jul 2022 2:30 UTC

37 points

9 comments8 min readLW link

(mailchi.mp)