Alex Flint

Karma: 3,915

Independent AI alignment researcher

Alex Flint Jan 21, 2025, 6:10 PM
2 points
0
in reply to: kave’s comment on: Announcement: Learning Theory Online Course
Got it! Good to know.

Alex Flint Jan 21, 2025, 5:52 PM
2 points
0
in reply to: kave’s comment on: Announcement: Learning Theory Online Course
Thanks! We were wondering about that. Is there any way we could be changed to the frontpage category?

Alex Flint Jan 21, 2025, 12:40 PM
11 points
0
in reply to: Towards_Keeperhood’s comment on: Announcement: Learning Theory Online Course
It’s the latter.

Announcement: Learning Theory Online Course

Yegreg and Alex Flint

Jan 20, 2025, 7:55 PM

63 points

33 comments4 min readLW link

Alex Flint Mar 21, 2024, 1:21 PM
2 points
0
in reply to: Epimetheus’s comment on: The ground of optimization
A bomb would not be an optimizing system, because the target space is not small compared to the basin of attraction. An AI that systematically dismantles things would be an optimizing system if for no other reason than that the AI systematically preserves its own integrity.

Alex Flint Feb 19, 2024, 3:43 PM
LW: 6 AF: 3
AF
in reply to: dkirmani’s comment on: The ground of optimization
It’s worse, even, in a certain way, than that: the existence of optimizing systems organized around a certain idea of “natural class” feeds back into more observers observing data that is distributed according to this idea of “natural class”, leading to more optimizing systems being built around that idea of “natural class”, and so on.

Once a certain idea of “natural class” gains a foothold somewhere, observers will make real changes in the world that further suggest this particular idea of “natural class” to others, and this forms a feedback loop.

Alex Flint Mar 30, 2023, 8:35 AM
LW: 4 AF: 2
0
AF
on: Teleosemantics!
If you pin down what a thing refers to according to what that thing was optimized to refer to, then don’t you have to look at the structure of the one who did the optimizing in order to work out what a given thing refers to? That is, to work out what the concept “thermodynamics” refers to, it may not be enough to look at the time evolution of the concept “thermodynamics” on its own, I may instead need to know something about the humans who were driving those changes, and the goals held within their minds. But, if this is correct, then doesn’t it raise another kind of homunculus-like regression where we were trying to directly explain semantics, but we ended up needing to inquire into yet another mind, the complete understanding of which would require further unpacking of the frames and concepts held in that mind, and the complete understanding of those frames and concepts requiring even further inquiry into a yet earlier mind that was responsible for doing the optimization of those frames and concepts?

Alex Flint Feb 27, 2023, 11:15 PM
6 points
2
on: Here’s the exit.
There seems to be some real wisdom in this post but given the length and title of the post, you haven’t offered much of an exit—you’ve just offered a single link to a youtube channel for a trauma healer. If what you say here is true, then this is a bit like offering an alcoholic friend the sum total of one text message containing a single link to the homepage of alcoholics anonymous—better than nothing, but not worthy of the bombastic title of this post.

Alex Flint Feb 27, 2023, 10:44 PM
5 points
0
on: Beginning to feel like a conspiracy theorist

friends and family significantly express their concern for my well being

What exact concerns do they have?

Alex Flint Feb 7, 2023, 11:07 PM
2 points
1
in reply to: Sparkette’s comment on: SolidGoldMagikarp (plus, prompt generation)
Wow, thank you for this context!

Alex Flint Feb 4, 2023, 2:34 PM
12 points
0
on: Fucking Goddamn Basics of Rationalist Discourse
1. You don’t get to fucking assume any shit on the basis of “but… ah… come on”. If you claim X and someone asks why, then congratulations now you’re in a conversation. That means maybe possible shit is about to get real, like some treasured assumptions might soon be questioned. There are no sarcastic facial expressions or clever grunts that get you an out from this. You gotta look now at the thing itself.

Alex Flint Feb 3, 2023, 3:17 PM
16 points
17
on: MIRI didn’t “give up” (fairly obviously IMO?)
I just want to acknowledge the very high emotional weight of this topic.

For about two decades, many of us in this community have been kind of following in the wake of a certain group of very competent people tackling an amazingly frightening problem. In the last couple of years, coincident with a quite rapid upsurge in AI capabilities, that dynamic has really changed. This is truly not a small thing to live through. The situation has real breadth—it seems good to take it in for a moment, not in order to cultivate anxiety, but in order to really engage with the scope of it.

It’s not a small thing at all. We’re in this situation where we have AI capabilities kind of out of control. We’re not exactly sure where any of the leader’s we’ve previously relied on stand. We all have this opportunity now to take action. The opportunity is simply there. Nobody, actually, can take it away. But there is also the opportunity, truly available to everyone regardless of past actions, to falter, exactly when the world most needs us.

What matters, actually, is what, concretely, we do going forward.

Alex Flint Feb 1, 2023, 3:51 PM
LW: 4 AF: 2
0
AF
in reply to: Chris_Leong’s comment on: Logical induction for software engineers
That is correct. I know it seems little weird to generate a new policy on every timestep. The reason it’s done that way is that the logical inductor needs to understand the function that maps prices to the quantities that will be purchased, in order to solve for a set of prices that “defeat” the current set of trading algorithms. That function (from prices to quantities) is what I call a “trading policy”, and it has to be represented in a particular way—as a set of syntax tree over trading primitives—in order for the logical inductor to solve for prices. A trading algorithm is a sequence of such sets of syntax trees, where each element in the sequence is the trading policy for a different time step.

Normally, it would be strange to set up one function (trading algorithms) that generates another function (trading policies) that is different for every timestep. Why not just have the trading algorithm directly output the amount that it wants to buy/sell? The reason is that we need not just the quantity to buy/sell, but that quantity as a function of price, since prices themselves are determined by solving an optimization problem with respect to these functions. Furthermore, these functions (trading policies) have to be represented in a particular way. Therefore it makes most sense to have trading algorithms output a sequence of trading policies, one per timestep.

Alex Flint Jan 30, 2023, 5:56 PM
0 points
0
on: How it feels to have your mind hacked by an AI
Thank you for this extraordinarily valuable report!

I believe that what you are engaging in, when you enter into a romantic relationship with either a person or a language model, is a kind of artistic creation. What matters is not whether the person on the “other end” of the relationship is a “real person” but whether the thing you create is of true benefit to the world. If you enter into a romantic relationship with a language model and produce something of true benefit to the world, then the relationship was real, whether or not there was a “real person” on the other end of it (whatever that would mean, even in the case of a human).

Alex Flint Jan 29, 2023, 4:32 PM
LW: 6 AF: 3
AF
on: Worst-case thinking in AI alignment
This is a relatively banal meta-commentary on reasons people sometimes give for doing worst-case analysis, and the differences between those reasons. The post reads like a list of things with no clear through-line. There is a gesture at an important idea from a Yudkowsky post (the logistic success curve idea) but the post does not helpfully expound that idea. There is a kind of trailing-off towards the end of the post as things like “planning fallacy” seem to have been added to the list with little time taken to place them in the context of the other things on the list. In the “differences between these arguments” section, the post doesn’t clearly elucidate deep differences between the arguments, it just lists verbal responses that you might make if you are challenged on plausibility grounds in each case.

Overall, I felt that this post under-delivered on an important topic.

Alex Flint Jan 29, 2023, 3:41 PM
LW: 4 AF: 3
AF
on: Grokking the Intentional Stance
Many people believe that they already understand Dennett’s intentional stance idea, and due to that will not read this post in detail. That is, in many cases, a mistake. This post makes an excellent and important point, which is wonderfully summarized in the second-to-last paragraph:

In general, I think that much of the confusion about whether some system that appears agent-y “really is an agent” derives from an intuitive sense that the beliefs and desires we experience internally are somehow fundamentally different from those that we “merely” infer and ascribe to systems we observe externally. I also think that much of this confusion dissolves with the realization that internally experienced thoughts, beliefs, desires, goals, etc. are actually “external” with respect to the parts of the mind that are observing them—including the part(s) of the mind that is modeling the mind-system as a whole as “being an agent” (or a “multiagent mind,” etc.). You couldn’t observe thoughts (or the mind in general) at all if they weren’t external to “you” (the observer), in the relevant sense.

The real point of the intentional stance idea is that there is no fact of the matter about whether something really is an agent, and that point is most potent when applied to ourselves. It is neither the case that we really truly are an agent, nor that we really truly are not an agent.

This post does an excellent job of highlighting this facet. However, I think this post could have been more punchy. There is too much meta-text of little value, like this paragraph:

In an attempt to be as faithful as possible in my depiction of Dennett’s original position, as well as provide a good resource to point back to on the subject for further discussion[1], I will err on the side of directly quoting Dennett perhaps too frequently, at least in this summary section.

In a post like this, do we need to be fore-warned that the author will err perhaps to frequently on the side of directly quoting Dennett, at least in the summary section? No, we don’t need to know that. In fact the post does not contain all that many direct quotes.

At the top of the “takeaways” section, the author gives the following caveat:

Editorial note: To be clear, these “takeaways” are both “things Dan Dennett is claiming about the nature of agency with the intentional stance” and “ideas I’m endorsing in the context of deconfusing agency for AI safety.”

The word “takeaways” in the heading already tells us that this section will contain points extracted by the reader that may or may not be explicitly endorsed by the original author. There is no need for extra caveats, it just leads to a bad reading experience.

In the comments section, Rohin makes the following very good point:

I mostly agree with everything here, but I think it is understating the extent to which the intentional stance is insufficient for the purposes of AI alignment. I think if you accept “agency = intentional stance”, then you need to think “well, I guess AI risk wasn’t actually about agency”.

Although we can “see through” agency as not-an-ontologically-fundamental-thing, nevertheless we face the practical problem of what to do about the (seemingly) imminent destruction of the world by powerful AI. What actually should we do about that? The intentional stance not only fails to tell us what to do, it also fails to tell us how any approach to averting AI risk can co-exist with the powerful deconstruction of agency offered by the intentional stance idea itself. If agency is in the eye of the beholder, then… what? What do we actually do about AI risk?
What links here?
- Raemon's comment on Highlights and Prizes from the 2021 Review Phase by Raemon (7 Feb 2023 23:01 UTC; 10 points)

Alex Flint 29 Jan 2023 14:54 UTC
LW: -1 AF: -1
AF
in reply to: Raemon’s comment on: Soares, Tallinn, and Yudkowsky discuss AGI cognition
Have you personally ever ridden in a robot car that has no safety driver?

Alex Flint 28 Jan 2023 23:56 UTC
LW: 12 AF: 7
0
AF
on: Soares, Tallinn, and Yudkowsky discuss AGI cognition
This post consists of comments on summaries of a debate about the nature and difficulty of the alignment problem. The original debate was between Eliezer Yudkowsky and Richard Ngo but this post does not contain the content from that debate. This posts is mostly of commentary by Jaan Tallinn on that debate, with comments by Eliezer.

The post provides a kind of fascinating level of insight into true insider conversations about AI alignment. How do Eliezer and Jaan converse about alignment? Sure, this is a public setting, so perhaps they communicate differently in private. But still. Read the post and you kind of see the social dynamics between them. It’s fascinating, actually.

Eliezer is just incredibly doom-y. He describes in fantastic detail the specific ways that a treacherous turn might play out, over dozens of paragraphs, 3 levels deep in a one-on-one conversation, in a document that merely summarizes a prior debate on the topic. He uses Capitalized Terms to indicate that things like “Doomed Phase” and “Terminal Phase” and “Law of Surprisingly Undignified Failure” are not merely for one time use but in fact refer to specific nodes in a larger conceptual framework.

One thing that happens often is that Jaan asks a question, Eliezer gives an extensive reply, and then Jaan response that, no, he was actually asking a different question.

There is one point where Jaan describes his frustration over the years with mainstream AI researchers objecting to AI safety arguments as being invalid due to anthropomorphization, when in fact the arguments were not invalidly anthropomorphizing. There is a kind of gentle vulnerability in this section that is worth reading seriously.

There is a lot of swapping of models of others in and outside the debate. Everyone is trying to model everyone all the time.

Eliezer does unfortunately like to explicitly underscore his own brilliance. He says things like:

I consider all of this obvious as a convergent instrumental strategy for AIs. I could probably have generated it in 2005 or 2010 [...]

But it’s clear enough that probably nobody was ever going to pass the validation set for generating lines of reasoning obvious enough to be generated by Eliezer in 2010 or possibly 2005

I do think that the content itself really comes down to the same basic question tackled in the original Hanson/Yudkowsky FOOM debate. I understand that this debate was ostensibly a broader question than FOOM. In practice I don’t think this discourse has actually moved on much since 2008.

The main thing the FOOM debate is missing, in my opinion, is this: we have almost no examples of AI systems that can do meaningful sophisticated things in the physical world. Self-driving cars still aren’t a reality. Walk around a city or visit an airport or drive down a highway, and you see shockingly few robots, and certainly no robots pursuing even the remotest kind of general-purpose tasks. Demo videos of robots doing amazing, scary, general-purpose things abound, but where are these robots in the real world? They are always just around the corner. Why?
What links here?
- Raemon's comment on Highlights and Prizes from the 2021 Review Phase by Raemon (7 Feb 2023 23:01 UTC; 10 points)

Alex Flint 26 Jan 2023 19:31 UTC
2 points
in reply to: michael_dello’s comment on: Agency in Conway’s Game of Life
Thanks for the note.

In Life, I don’t think it’s easy to generate an X-1 time state that leads to an X time state, unfortunately. The reason is that each cell in an X time state puts a logical constraint on 9 cells in an X-1 time state. It is therefore possible to set up certain constraint satisfaction problems in terms of finding an X-1 time state that leads to an X time state, and in general these can be NP-hard.

However, in practice, it is very very often quite easy to find an X-1 time state that leads to a given X time state, so maybe this experiment could be tried in an experimental form anyhow.

In our own universe, the corresponding operation would be to consider some goal configuration of the whole universe, and propagate that configuration backwards to our current time. However, this would generally just tell us that we should completely reconfigure the whole universe right now, and that is generally not within our power, since we can only act locally, have access only to certain technologies, and such.

I think it is interesting to push on this “brute forcing” approach to steering the future, though. I’d be interested to chat more about it.

Alex Flint 25 Jan 2023 14:03 UTC
3 points
in reply to: the gears to ascension’s comment on: Agency in Conway’s Game of Life
Interesting. Thank you for the pointer.

The real question, though, is whether it is possible within our physics.

Alex Flint

An­nounce­ment: Learn­ing The­ory On­line Course

Announcement: Learning Theory Online Course