RSS

simeon_c

Karma: 929

@SaferAI

simeon_c’s Shortform

simeon_c4 Apr 2024 9:01 UTC
5 points
25 comments1 min readLW link

Fore­cast­ing fu­ture gains due to post-train­ing enhancements

8 Mar 2024 2:11 UTC
26 points
2 comments1 min readLW link
(docs.google.com)

Davi­dad’s Prov­ably Safe AI Ar­chi­tec­ture—ARIA’s Pro­gramme Thesis

simeon_c1 Feb 2024 21:30 UTC
69 points
17 comments1 min readLW link
(www.aria.org.uk)

A Brief Assess­ment of OpenAI’s Pre­pared­ness Frame­work & Some Sugges­tions for Improvement

simeon_c22 Jan 2024 20:08 UTC
14 points
0 comments6 min readLW link
(uploads-ssl.webflow.com)

Re­spon­si­ble Scal­ing Poli­cies Are Risk Man­age­ment Done Wrong

simeon_c25 Oct 2023 23:46 UTC
114 points
33 comments22 min readLW link
(www.navigatingrisks.ai)

[Question] Do LLMs Im­ple­ment NLP Al­gorithms for Bet­ter Next To­ken Pre­dic­tions?

simeon_c19 Sep 2023 12:28 UTC
5 points
1 comment1 min readLW link

[Question] In the Short-Term, Why Couldn’t You Just RLHF-out In­stru­men­tal Con­ver­gence?

simeon_c16 Sep 2023 10:44 UTC
21 points
6 comments1 min readLW link

AGI x An­i­mal Welfare: A High-EV Outreach Op­por­tu­nity?

simeon_c28 Jun 2023 20:44 UTC
29 points
0 comments1 min readLW link

The Cruel Trade-Off Between AI Mi­suse and AI X-risk Concerns

simeon_c22 Apr 2023 13:49 UTC
24 points
1 comment2 min readLW link

AI Takeover Sce­nario with Scaled LLMs

simeon_c16 Apr 2023 23:28 UTC
42 points
15 comments8 min readLW link

Nav­i­gat­ing AI Risks (NAIR) #1: Slow­ing Down AI

simeon_c14 Apr 2023 14:35 UTC
11 points
3 comments1 min readLW link
(navigatingairisks.substack.com)

Re­quest to AGI or­ga­ni­za­tions: Share your views on paus­ing AI progress

11 Apr 2023 17:30 UTC
141 points
11 comments1 min readLW link

[Question] Could Si­mu­lat­ing an AGI Tak­ing Over the World Ac­tu­ally Lead to a LLM Tak­ing Over the World?

simeon_c13 Jan 2023 6:33 UTC
15 points
1 comment1 min readLW link

[Linkpost] Dream­erV3: A Gen­eral RL Architecture

simeon_c12 Jan 2023 3:55 UTC
23 points
3 comments1 min readLW link
(arxiv.org)

[Question] Are Mix­ture-of-Ex­perts Trans­form­ers More In­ter­pretable Than Dense Trans­form­ers?

simeon_c31 Dec 2022 11:34 UTC
7 points
5 comments1 min readLW link

AGI Timelines in Gover­nance: Differ­ent Strate­gies for Differ­ent Timeframes

19 Dec 2022 21:31 UTC
63 points
28 comments10 min readLW link

Ex­tract­ing and Eval­u­at­ing Causal Direc­tion in LLMs’ Activations

14 Dec 2022 14:33 UTC
29 points
5 comments11 min readLW link

Is GPT3 a Good Ra­tion­al­ist? - In­struc­tGPT3 [2/​2]

simeon_c7 Apr 2022 13:46 UTC
11 points
0 comments7 min readLW link

New GPT3 Im­pres­sive Ca­pa­bil­ities—In­struc­tGPT3 [1/​2]

simeon_c13 Mar 2022 10:58 UTC
72 points
10 comments7 min readLW link