Mateusz Bagiński

Karma: 615

~[agent foundations]

Mateusz Bagiński 29 Feb 2024 17:42 UTC
35 points
37
in reply to: Daniel Kokotajlo’s comment on: Daniel Kokotajlo’s Shortform
I’d love to see/hear you on his podcast.

Mateusz Bagiński 1 Apr 2024 9:20 UTC
32 points
14
on: The Story of “I Have Been A Good Bing”
The Litany of Tarrrrski is beyond wholesome!

Thank you for doing this!

Mateusz Bagiński 24 Dec 2022 9:57 UTC
25 points
15
on: The case against AI alignment
I ask you, do you really think that an AI aligned to human values would refrain from doing something like this to anyone? One of the most fundamental aspects of human values is the hated outgroup. Almost everyone has somebody they’d love to see suffer. How many times has one human told another “burn in hell” and been entirely serious, believing that this was a real thing, and 100% deserved?
This shows how vague a concept “human values” are, and how different people can interpret it very differently.
I always interpreted “aligning an AI to human values” as something like “making it obedient to us, ensuring it won’t do anything that we (whatever that ‘we’ is—another point of vagueness) wouldn’t endorse, lowering suffering in the world, increasing eudaimonia in the world, reducing X-risks, bringing the world closer to something we (or smarter/wiser versions of us) would consider a protopia/utopia”
Certainly I never thought it to be a good idea to imbue the AI with my implicit biases, outgroup hatred, or whatever. I’m ~sure that people who work on alignment for a living have also seen these skulls.
I know little about CEV, but if I were to coherently extrapolate my volition, then one aspect of that would be increasing the coherence and systematicity of my moral worldview and behavior, including how (much) different shards conform to it. I would certainly trash whatever outgroup bias I have (not counting general greater fondness for the people/other things close to me).
So, yeah, solving “human values” is also a part of the problem but I don’t think that it makes the case against aligning AI.

Mateusz Bagiński 15 Mar 2024 13:22 UTC
22 points
8
on: Open Thread Spring 2024
Does anybody know what happened to Julia Galef?

Mateusz Bagiński 22 Nov 2023 19:48 UTC
17 points
13
on: So you want to save the world? An account in paladinhood
(Meta) Why do you not use capital letters, unless in acronyms? I find it harder to parse.

Mateusz Bagiński 1 Oct 2022 8:19 UTC
12 points
9
on: A Few Terrifying Facts About The Russo-Ukrainian War
Metacomment on speculations on who might have sabotaged NordStream.

It seems like people here mostly implicitly treat possible state actors as coherent, unified agents. But maybe it wasn’t any particular state acting as a whole but rather some small group within that state that decided to do it on their own. Even if they considered it likely to be identified after the fact, the subgroup may have judged the sabotage to be in the interest of the whole nation or maybe that particular subgroup.

(I don’t know how much fragmentation of that sort there is in any given country but I think it’s at least plausible)

Mateusz Bagiński 9 Apr 2023 17:49 UTC
10 points
7
in reply to: lc’s comment on: Ng and LeCun on the 6-Month Pause (Transcript)
I think wanting to seem like sober experts makes them kinda believe the things they expect other people to expect to hear from sober experts.

Mateusz Bagiński 4 Jan 2023 16:52 UTC
10 points
4
on: 2022 was the year AGI arrived (Just don’t call it that)
Also, there was Gato, trained on shitload of different tasks, achieving good performance on a vast majority of them, which led some to call it “subhuman AGI”.

I agree in general with the post, although I’m not quite sure how you would stich several narrow models/systems together to get an AGI. A more viable path is probably something like training it end-to-end, like Gato (needless to say, please don’t).

Mateusz Bagiński 19 Apr 2024 4:14 UTC
9 points
1
in reply to: gwern’s comment on: FHI (Future of Humanity Institute) has shut down (2005–2024)

Why did FHI get closed down? In the end, because it did not fit in with the surrounding administrative culture. I often described Oxford like a coral reef of calcified institutions built on top of each other, a hard structure that had emerged organically and haphazardly and hence had many little nooks and crannies where colorful fish could hide and thrive. FHI was one such fish but grew too big for its hole. At that point it became either vulnerable to predators, or had to enlarge the hole, upsetting the neighbors. When an organization grows in size or influence, it needs to scale in the right way to function well internally – but it also needs to scale its relationships to the environment to match what it is.

Mateusz Bagiński 28 Feb 2024 18:51 UTC
9 points
2
on: Timaeus’s First Four Months
I would love to see something like Vanessa’s LTA reading list but for devinterp.

Mateusz Bagiński 7 Apr 2023 18:26 UTC
9 points
5
on: Anthropic is further accelerating the Arms Race?
Reading this, it reminds me of the red flags that some people (e.g. Soares) saw when interacting with SBF and, once shit hit the fan, ruminated over not having taken some appropriate action.

Mateusz Bagiński 25 Mar 2024 8:28 UTC
7 points
0
on: Some Rules for an Algebra of Bayes Nets
$D_{K} L (P [X] | | \prod_{i \leq k} P [X_{i - 1} | X_{i}]) P [X_{k}] (\prod_{k \leq i} P [X_{i + 1} | X_{i}])$
$= D_{K} L (P [X] | | \prod_{i \leq k + 1} P [X_{i - 1} | X_{i}]) P [X_{k + 1}] (\prod_{k + 1 \leq i} P [X_{i + 1} | X_{i}])$
I think the parentheses are off here. IIUC you want to express the equality of divergences, not divergences multiplied by probabilities (which wouldn’t make sense I think).
$ϵ_{blanket} g e q D_{K L} (P [X, Y, Z] | | P [X | Y] P [Y] P [Z | Y])$
$= D_{K L} (P [X, Y, Z] | | P [X, Y] P [Z, Y] / P [Y])$
Typo: $g e q$ → $\geq$

Mateusz Bagiński 25 Nov 2023 10:03 UTC
7 points
0
on: Goodhart’s Law Example: Training Verifiers to Solve Math Word Problems
Typo: it’s “Goodhart”, not “Goodheart”

Mateusz Bagiński 29 Dec 2022 8:09 UTC
7 points
1
in reply to: Gerald Monroe’s comment on: The case against AI alignment
First, this presupposes that for any amount of suffering there is some amount of pleasure/bliss/happiness/eudaimonia that could outweigh it. Not all LWers accept this, so it’s worth pointing that out.

But I don’t think the eternal paradise/mediocrity/hell scenario accurately represents what is likely to happen in that scenario. I’d be more worried about somebody using AGI to conquer the world and establish a stable totalitarian system built on some illiberal system, like shariah (according to Caplan, it’s totally plausible for global totalitarianism to persist indefinitely). If you get to post-scarcity, you may grant all your subjects UBI, all basic needs met, etc. (or you may not, if you decide that this policy contradicts Quran or hadith), but if your convictions are strong enough, women will still be forced to wear burkas, be basically slaves of their male kin etc. One could make an argument that abundance robustly promotes more liberal worldview, loosening of social norms, etc., but AFAIK there is no robust evidence for that.

This is meant just to illustrate that you don’t need an outgroup to impose a lot of suffering. Having a screwed up normative framework is just enough.

This family of scenarios is probably still better than AGI doom though.

Mateusz Bagiński 13 Apr 2024 15:28 UTC
6 points
1
on: Generalized Stat Mech: The Boltzmann Approach
I don’t quite get what actions are available in the heat engine example.

Is it just choosing a random bit from H or C (in which case we can’t see whether it’s 0 or 1) OR a specific bit from W (in which case we know whether it’s 0 or 1) and moving it to another pool?

Mateusz Bagiński 2 Sep 2022 5:18 UTC
LW: 6 AF: 3
AF
in reply to: Linda Linsefors’s comment on: Linda Linsefors’s Shortform
Can’t you restate the second one as the relationship between two utility functions $U_{A}$ and $U_{B}$ such that increasing one (holding background conditions constant) is guaranteed not to decrease the other? I.e. their respective derivatives are always non-negative for every background condition.

$\frac{\partial U_{A}}{\partial U_{B}} \geq 0 \land \frac{\partial U_{B}}{\partial U_{A}} \geq 0$

Mateusz Bagiński 6 Apr 2024 17:12 UTC
5 points
−7
on: My intellectual journey to (dis)solve the hard problem of consciousness
(I skipped straight to ch7, according to your advice, so I may be missing relevant parts from the previous chapters if there are any.)

I probably agree with you on the object level regarding phenomenal consciousness.

That being said, I think it’s “more” than a meme. I witnessed at least two people not exposed to the scientific/philosophical literature on phenomenal consciousness reinvent/rediscover the concept on their own.

It seems to me that the first-person perspective we necessarily adopt makes inclines to ascribe to sensations/experiences some ineffable, seemingly irreducible quality. My guess is that we (re)perceive our perception as a meta-modality different from ordinary modalities like vision, hearing, etc, and that causes the illusion. It’s plausible that being raised in a WEIRD culture contributes to that inclination.

A butterfly conjecture: While phenomenal consciousness is an illusion, there is something to be said about the first-person perspective being an interesting feature of some minds (sufficiently sophisticated? capable of self-reflection?). It can be viewed as a computational heuristic that makes you “vulnerable” to certain illusions or biases, such as phenomenal consciousness, but also:
- the difficulty to accept one-boxing in the Newcomb’s problem
- mind-body dualism
- the naive version of free will illusion, difficulty in accepting physicalism/determinism
- (maybe) the illusion of being in control over your mind (various sources say that meditation-naive people are often surprised to discover how little control they have over their own mind when they first try meditation)
A catchy term for this line of investigation could be “computational phenomenology”.

Mateusz Bagiński 21 Feb 2024 14:55 UTC
5 points
1
on: Less Wrong automated systems are inadvertently Censoring me
How about a dialogue on this, with no (asymmetric) posting rate limits?

Mateusz Bagiński 22 Jan 2024 16:43 UTC
5 points
0
in reply to: abramdemski’s comment on: Fixing The Good Regulator Theorem

I think sigma-algebras are probably not the right algebra to base beliefs on. Something resembling linear logic might be better for reasons we’ve discussed privately; that’s very speculative of course. Ideally the right algebra should be derived from considerations arising in construction of the representation theorem, rather than attempting to force any outcome top-down.

Have you elaborated on this somewhere or can you link some resource about why linear logic is a better algebra for beliefs than sigma algebra?

Mateusz Bagiński 10 Dec 2023 9:21 UTC
5 points
0
in reply to: philh’s comment on: Why Are Bacteria So Simple?
We know of at least one eukaryote species that lost its mitochondria. I don’t know about mitochondria but no nucleus. Erythrocytes don’t have either, so maybe (speculating here) there are some specialized animal cells that have mitochondria but no nuclei.