aogara

Karma: 1,421

Research Engineering Intern at the Center for AI Safety. Helping to write the AI Safety Newsletter. Studying CS and Economics at the University of Southern California, and running an AI safety club there. Previously worked at AI Impacts and with Lionel Levine and Collin Burns on calibration for Detecting Latent Knowledge Without Supervision.

Analysis: US restricts GPU sales to China

aogara7 Oct 2022 18:38 UTC

102 points

58 comments5 min readLW link

Argument against 20% GDP growth from AI within 10 years [Linkpost]

aogara12 Sep 2022 4:08 UTC

59 points

21 comments5 min readLW link

(twitter.com)

Key Papers in Language Model Safety

aogara20 Jun 2022 15:00 UTC

39 points

1 comment22 min readLW link

AISN #25: White House Executive Order on AI, UK AI Safety Summit, and Progress on Voluntary Evaluations of AI Risks

aogara and Dan H

31 Oct 2023 19:34 UTC

35 points

1 comment6 min readLW link

(newsletter.safe.ai)

AISN #28: Center for AI Safety 2023 Year in Review

aogara and Dan H

23 Dec 2023 21:31 UTC

30 points

1 comment5 min readLW link

(newsletter.safe.ai)

Adversarial Robustness Could Help Prevent Catastrophic Misuse

aogara11 Dec 2023 19:12 UTC

30 points

18 comments9 min readLW link

Full Automation is Unlikely and Unnecessary for Explosive Growth

aogara31 May 2023 21:55 UTC

28 points

3 comments5 min readLW link

AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes

aogara, Dan H and Corin Katzke

24 Jan 2024 19:38 UTC

27 points

1 comment6 min readLW link

(newsletter.safe.ai)

Emergent Abilities of Large Language Models [Linkpost]

aogara10 Aug 2022 18:02 UTC

25 points

2 comments1 min readLW link

(arxiv.org)

AISN #19: US-China Competition on AI Chips, Measuring Language Agent Developments, Economic Analysis of Language Model Propaganda, and White House AI Cyber Challenge

aogara and Dan H

15 Aug 2023 16:10 UTC

21 points

0 comments5 min readLW link

(newsletter.safe.ai)

Model-driven feedback could amplify alignment failures

aogara30 Jan 2023 0:00 UTC

21 points

1 comment2 min readLW link

Git Re-Basin: Merging Models modulo Permutation Symmetries [Linkpost]

aogara14 Sep 2022 8:55 UTC

21 points

0 comments2 min readLW link

(arxiv.org)

AISN #22: The Landscape of US AI Legislation - Hearings, Frameworks, Bills, and Laws

aogara and Dan H

19 Sep 2023 14:44 UTC

20 points

0 comments5 min readLW link

(newsletter.safe.ai)

Yudkowsky Contra Christiano on AI Takeoff Speeds [Linkpost]

aogara5 Apr 2022 2:09 UTC

18 points

0 comments11 min readLW link

MLSN: #10 Adversarial Attacks Against Language and Vision Models, Improving LLM Honesty, and Tracing the Influence of LLM Training Data

aogara and Dan H

13 Sep 2023 18:03 UTC

15 points

1 comment5 min readLW link

(newsletter.mlsafety.org)

AISN #21: Google DeepMind’s GPT-4 Competitor, Military Investments in Autonomous Drones, The UK AI Safety Summit, and Case Studies in AI Policy

aogara and Dan H

5 Sep 2023 15:03 UTC

15 points

0 comments5 min readLW link

(newsletter.safe.ai)

AISN #23: New OpenAI Models, News from Anthropic, and Representation Engineering

aogara and Dan H

4 Oct 2023 17:37 UTC

15 points

2 comments5 min readLW link

(newsletter.safe.ai)

Benchmarking LLM Agents on Kaggle Competitions

aogara22 Mar 2024 13:09 UTC

15 points

1 comment5 min readLW link

AISN #24: Kissinger Urges US-China Cooperation on AI, China’s New AI Law, US Export Controls, International Institutions, and Open Source AI

aogara, Dan H and Corin Katzke

18 Oct 2023 17:06 UTC

14 points

0 comments6 min readLW link

(newsletter.safe.ai)

Hoodwinked: Evaluating Deception Capabilities in Large Language Models

aogara25 Aug 2023 19:39 UTC

14 points

3 comments3 min readLW link