Nathan Helm-Burger

Karma: 2,341

AI alignment researcher, ML engineer. Masters in Neuroscience.

I believe that cheap and broadly competent AGI is attainable and will be built soon. This leads me to have timelines of around 2024-2027. Here’s an interview I gave recently about my current research agenda. I think the best path forward to alignment is through safe, contained testing on models designed from the ground up for alignability trained on censored data (simulations with no mention of humans or computer technology). I think that current ML mainstream technology is close to a threshold of competence beyond which it will be capable of recursive self-improvement, and I think that this automated process will mine neuroscience for insights, and quickly become far more effective and efficient. I think it would be quite bad for humanity if this happened in an uncontrolled, uncensored, un-sandboxed situation. So I am trying to warn the world about this possibility.

See my prediction markets here:

https://manifold.markets/NathanHelmBurger/will-gpt5-be-capable-of-recursive-s?r=TmF0aGFuSGVsbUJ1cmdlcg

I also think that current AI models pose misuse risks, which may continue to get worse as models get more capable, and that this could potentially result in catastrophic suffering if we fail to regulate this.

I now work for SecureBio on AI-Evals.

relevant quote:

“There is a powerful effect to making a goal into someone’s full-time job: it becomes their identity. Safety engineering became its own subdiscipline, and these engineers saw it as their professional duty to reduce injury rates. They bristled at the suggestion that accidents were largely unavoidable, coming to suspect the opposite: that almost all accidents were avoidable, given the right tools, environment, and training.” https://www.lesswrong.com/posts/DQKgYhEYP86PLW7tZ/how-factories-were-made-safe

Neural net / decision tree hybrids: a potential path toward bridging the interpretability gap

Nathan Helm-Burger23 Sep 2021 0:38 UTC

21 points

2 comments12 min readLW link

Progress Report 1: interpretability experiments & learning, testing compression hypotheses

Nathan Helm-Burger22 Mar 2022 20:12 UTC

11 points

0 comments2 min readLW link

Progress Report 2

Nathan Helm-Burger30 Mar 2022 2:29 UTC

4 points

1 comment1 min readLW link

Progress report 3: clustering transformer neurons

Nathan Helm-Burger5 Apr 2022 23:13 UTC

5 points

0 comments2 min readLW link

Progress Report 4: logit lens redux

Nathan Helm-Burger8 Apr 2022 18:35 UTC

3 points

0 comments2 min readLW link

What more compute does for brain-like models: response to Rohin

Nathan Helm-Burger13 Apr 2022 3:40 UTC

22 points

14 comments12 min readLW link

Progress Report 5: tying it together

Nathan Helm-Burger23 Apr 2022 21:07 UTC

10 points

0 comments2 min readLW link

[Question] How to balance between process and outcome?

Nathan Helm-Burger4 May 2022 19:34 UTC

6 points

8 comments1 min readLW link

Progress Report 6: get the tool working

Nathan Helm-Burger10 Jun 2022 11:18 UTC

4 points

0 comments2 min readLW link

A little playing around with Blenderbot3

Nathan Helm-Burger12 Aug 2022 16:06 UTC

9 points

0 comments1 min readLW link

Timelines explanation post part 1 of ?

Nathan Helm-Burger12 Aug 2022 16:13 UTC

10 points

1 comment2 min readLW link

Please (re)explain your personal jargon

Nathan Helm-Burger22 Aug 2022 14:30 UTC

19 points

4 comments4 min readLW link

Timelines ARE relevant to alignment research (timelines 2 of ?)

Nathan Helm-Burger24 Aug 2022 0:19 UTC

11 points

5 comments6 min readLW link

Progress Report 7: making GPT go hurrdurr instead of brrrrrrr

Nathan Helm-Burger7 Sep 2022 3:28 UTC

21 points

0 comments4 min readLW link

linkpost: loss basin visualization

Nathan Helm-Burger30 Sep 2022 3:42 UTC

14 points

1 comment1 min readLW link

linkpost: neuro-symbolic hybrid ai

Nathan Helm-Burger6 Oct 2022 21:52 UTC

17 points

0 comments1 min readLW link

(youtu.be)

Feature idea: extra info about post author’s response to comments.

Nathan Helm-Burger23 Mar 2023 20:14 UTC

6 points

0 comments1 min readLW link

[Question] Can GPT-4 play 20 questions against another instance of itself?

Nathan Helm-Burger28 Mar 2023 1:11 UTC

15 points

1 comment1 min readLW link

(evanthebouncy.medium.com)

Will GPT-5 be able to self-improve?

Nathan Helm-Burger29 Apr 2023 17:34 UTC

18 points

22 comments3 min readLW link

Nice intro video to RSI

Nathan Helm-Burger16 May 2023 18:48 UTC

12 points

0 comments1 min readLW link

(youtu.be)