RSS

Nathan Helm-Burger

Karma: 2,292

AI alignment researcher, ML engineer. Masters in Neuroscience.

I believe that cheap and broadly competent AGI is attainable and will be built soon. This leads me to have timelines of around 2024-2027. Here’s an interview I gave recently about my current research agenda. I think the best path forward to alignment is through safe, contained testing on models designed from the ground up for alignability trained on censored data (simulations with no mention of humans or computer technology). I think that current ML mainstream technology is close to a threshold of competence beyond which it will be capable of recursive self-improvement, and I think that this automated process will mine neuroscience for insights, and quickly become far more effective and efficient. I think it would be quite bad for humanity if this happened in an uncontrolled, uncensored, un-sandboxed situation. So I am trying to warn the world about this possibility.

See my prediction markets here:

https://​​manifold.markets/​​NathanHelmBurger/​​will-gpt5-be-capable-of-recursive-s?r=TmF0aGFuSGVsbUJ1cmdlcg

I also think that current AI models pose misuse risks, which may continue to get worse as models get more capable, and that this could potentially result in catastrophic suffering if we fail to regulate this.

I now work for SecureBio on AI-Evals.

relevant quote:

“There is a powerful effect to making a goal into someone’s full-time job: it becomes their identity. Safety engineering became its own subdiscipline, and these engineers saw it as their professional duty to reduce injury rates. They bristled at the suggestion that accidents were largely unavoidable, coming to suspect the opposite: that almost all accidents were avoidable, given the right tools, environment, and training.” https://​​www.lesswrong.com/​​posts/​​DQKgYhEYP86PLW7tZ/​​how-factories-were-made-safe

What more com­pute does for brain-like mod­els: re­sponse to Rohin

Nathan Helm-Burger13 Apr 2022 3:40 UTC
22 points
14 comments12 min readLW link

An at­tempt to steel­man OpenAI’s al­ign­ment plan

Nathan Helm-Burger13 Jul 2023 18:25 UTC
22 points
0 comments4 min readLW link

Digi­tal hu­mans vs merge with AI? Same or differ­ent?

6 Dec 2023 4:56 UTC
21 points
11 comments7 min readLW link

Neu­ral net /​ de­ci­sion tree hy­brids: a po­ten­tial path to­ward bridg­ing the in­ter­pretabil­ity gap

Nathan Helm-Burger23 Sep 2021 0:38 UTC
21 points
2 comments12 min readLW link

Progress Re­port 7: mak­ing GPT go hur­rdurr in­stead of brrrrrrr

Nathan Helm-Burger7 Sep 2022 3:28 UTC
21 points
0 comments4 min readLW link

Please (re)ex­plain your per­sonal jargon

Nathan Helm-Burger22 Aug 2022 14:30 UTC
19 points
4 comments4 min readLW link

Will GPT-5 be able to self-im­prove?

Nathan Helm-Burger29 Apr 2023 17:34 UTC
18 points
22 comments3 min readLW link

linkpost: neuro-sym­bolic hy­brid ai

Nathan Helm-Burger6 Oct 2022 21:52 UTC
17 points
0 comments1 min readLW link
(youtu.be)