Jonas Hallgren

Karma: 328

Max Tegmark’s new Time article on how we’re in a Don’t Look Up scenario [Linkpost]

Jonas Hallgren25 Apr 2023 15:41 UTC

39 points

9 comments1 min readLW link

(time.com)

Jonas Hallgren 7 Apr 2023 14:49 UTC
29 points
12
on: Catching the Eye of Sauron
I will say that I thought Connor Leahy’s talk on ML Street Talk was amazing and that we should if possible make Connor go on Lex Fridman?

The dude looks like a tech wizard and is smart, funny, charming and a short timeline doomer. What else do you want?

Anyway we should create a council of charming doomers or something and send them at the media, it would be very epic. (I am in full agreement with this post btw)

Reprograming the Mind: Meditation as a Tool for Cognitive Optimization

Jonas Hallgren11 Jan 2024 12:03 UTC

28 points

3 comments11 min readLW link

Advice for new alignment people: Info Max

Jonas Hallgren30 May 2023 15:42 UTC

27 points

4 comments5 min readLW link

Announcing the Distillation for Alignment Practicum (DAP)

Jonas Hallgren and CallumMcDougall

18 Aug 2022 19:50 UTC

23 points

3 comments3 min readLW link

Power-Seeking = Minimising free energy

Jonas Hallgren22 Feb 2023 4:28 UTC

21 points

10 comments7 min readLW link

How well does your research adress the theory-practice gap?

Jonas Hallgren8 Nov 2023 11:27 UTC

18 points

0 comments10 min readLW link

Jonas Hallgren 19 Oct 2023 7:38 UTC
18 points
1
on: Are humans misaligned with evolution?
I’ve been following this discussion from Jan’s first post, and I’ve been enjoying it. I’ve put together some pictures to explain what I see in this discussion.

Something like the original misalignment might be something like this:

This is fair as a first take, and if we want to look at it through a utility function optimisation lens, we might say something like this:

Where cultural values is the local environment that we’re optimising for.
As Jacob mentions, humans are still very effective when it comes to general optimisation if we look directly at how well it matches evolution’s utility function. This calls for a new model.
Here’s what I think actually happens :

Which can be perceived as something like this in the environmental sense:
Based on this model, what is cultural (human) evolution telling us about misalignment?

We have adopted proxy values (Y1,Y2,..YN) or culture in order to optimise for X or IGF. In other words, the shard of cultural values developed as a more efficient optimisation target in the new environment where different tribes applied optimisation pressure on each other.
Also, I really enjoy the book The Secret Of Our Success when thinking about these models as it provides some very nice evidence about human evolution.

The Benefits of Distillation in Research

Jonas Hallgren4 Mar 2023 17:45 UTC

15 points

2 comments5 min readLW link

Jonas Hallgren 14 Apr 2023 9:50 UTC
15 points
15
in reply to: TekhneMakre’s comment on: A freshman year during the AI midgame: my approach to the next year
I feel like this is trying to say something important but my brain isn’t parsing it.

First and foremost, what categorisation are we talking about? Secondly, in what way are the categories framed in terms of social perception? Thirdly, what do you mean by direction and how does Buck confuse the direction?

(Sorry if this is obvious)

Jonas Hallgren 16 Oct 2022 22:27 UTC
11 points
8
on: Age changes what you care about
(PSA:)
Hey you, whoever is reading this comment, this post is not an excuse to skip working on alignment. I can fully relate to the fear of death here, and my own tradeoff is focusing hard on instrumental goals such as my own physical health and nutrition (including supplements) to delay death and get some nice productivity benefits. This doesn’t mean that an AI won’t kill you within 15 years, so it’s most likely not even a defect in a tragedy of commons to not work on it; it’s rather paramount to your future success at being alive.

(Also if we solve alignment, then we can get a pretty op AGI that can help us out with the other stuff so really it’s very much a win-win in my mind.)

Respect for Boundaries as non-arbirtrary coordination norms

Jonas Hallgren9 May 2023 19:42 UTC

9 points

3 comments7 min readLW link

Jonas Hallgren 16 Mar 2024 7:35 UTC
7 points
4
on: Strong-Misalignment: Does Yudkowsky (or Christiano, or TurnTrout, or Wolfram, or…etc.) Have an Elevator Speech I’m Missing?
Hey! I saw that you had a bunch of downvotes and I wanted to get in here before you came too disilusioned with the LW crowd. I think a big point for me is that you don’t really have any sub-headings or examples that are more straight to the point. It is all a long text that seems similar to how you directly thought, this makes it really hard to engage with what you say. Of course you’re saying controversial things but if there was more clarity I think you would have more engagement.

(GPT is really op for this nowadays) Anyway, I wish you the best of luck! I’m also sorry for not engaging with any of your arguments but I couldnt quite follow.

Jonas Hallgren 15 Mar 2024 7:30 UTC
7 points
4
on: Toward a Broader Conception of Adverse Selection
Alright, quite a ba(y)sed point there, very nice. My lazy ass is looking for a heuristic here. It seems like the more the EMH is true in a situation/amount of optimisation pressure applied the more you should expect to be disappointed with a trade.

But what is a good heuristic for how much worse it will be? Maybe one just has to think about the counterfactual option each time?

Jonas Hallgren 25 Nov 2023 7:59 UTC
6 points
−4
in reply to: paulfchristiano’s comment on: Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
Alright, I will try to visualise what I see as the disagreement here.

It seems to me that Paul is saying that behaviourist abstractions will happen in smaller time periods than long time horizons.

(Think of these shards as in the shard theory sense)

Nate is saying that the right picture creates stable wants more than the left and Paul is saying that it is time-agnostic and that the relevant metric is how competent the model is.

The crux here is essentially whether longer time horizons are indicative of behaviourist shard formation.
My thought here is that the process in the picture to the right induces more stable wants because a longer time horizon system is more complex, and therefore heuristics is the best decision rule. The complexity is increased in such a way that it is a large enough difference between short-term tasks and long-term tasks.

Also, the Redundant Information Hypothesis might give credence to the idea that systems will over time create more stable abstractions?

Jonas Hallgren 11 Jun 2023 10:56 UTC
6 points
5
on: Ethodynamics of Omelas
Man, this was a really fun and exciting post! Thank you for writing it.
Maybe there’s a connection to the FEP here? I remember Karl Friston saying something about how we can see morality as downstream from the FEP, which is (kinda?) used here, maybe?

Jonas Hallgren 29 May 2023 6:23 UTC
6 points
9
on: “Membranes” is better terminology than “boundaries” alone
I just wanted to say that you have my vote of confidence on this. It makes the intuitions behind the idea more salient as well.

Jonas Hallgren 5 Mar 2024 7:26 UTC
5 points
3
on: Many arguments for AI x-risk are wrong
I notice being confused about the relationship between power-seeking arguments and counting arguments. Since I’m confused I’m assuming others are so I would appreciate some clarity on this.

In footnote 7, Turner mentions that the paper, optimal policies tend to seek power is an irrelevant counting error post.

In my head, I think of the counting argument as that it is hard to hit an alignment target because of there being a lot more non-alignment targets. This argument is (clearly?) wrong due to reasons specified in the post. Yet this doesn’t address the power seeking as that seems more like a optimisation pressure applied to the system not something dependent on counting arguments?

In my head, power-seeking is more like saying that an agent’s attraction basin is larger in one point of the optimisation landscape compared to another point. The same can also be said about deception here.

I might be dumb but I never thought of the counting argument as true nor crucial to both deception and power-seeking. I’m very happy to be enlightened about this issue.

[Question] Is there any serious attempt to create a system to figure out the CEV of humanity and if not, why haven’t we started yet?

Jonas Hallgren25 Feb 2021 22:06 UTC

4 points

2 comments1 min readLW link

Jonas Hallgren 14 Mar 2024 20:35 UTC
4 points
0
on: Highlights from Lex Fridman’s interview of Yann LeCun
I thought the orangutan argument was pretty good when I first saw it, but then I looked it up, and I realised that it is not that they aren’t power seeking. It is more that they only are when it comes to interactions that matter for the future survival of offspring. It actually is a very flimsy argument. Some of the things he says are smart like some of the stuff on the architecture front, but you know, he always talks about his aeroplane analogy in AI Safety. It is like really dumb as I wouldn’t get into an aeroplane without knowing that it has been safety checked and I have a hard time taking him seriously when it comes to safety as a consequence.