Alex_Altair

Karma: 4,772

Alex_Altair Jun 8, 2024, 4:00 AM
2 points
0
on: 0. CAST: Corrigibility as Singular Target
3.
3b.*?

Alex_Altair Jun 5, 2024, 8:43 PM
3 points
0
in reply to: TsviBT’s comment on: Announcing ILIAD — Theoretical AI Alignment Conference
How about deconferences?

Alex_Altair Jun 5, 2024, 6:39 PM
4 points
0
in reply to: Ruby’s comment on: Open Problems Create Paradigms
I’m noticing what might be a miscommunication/misunderstanding between your comment and the post and Kuhn. It’s not that the statement of such open problems creates the paradigm; it’s that solutions to those problems creates the paradigm.
The problems exist because the old paradigms (concepts, methods etc) can’t solve them. If you can state some open problems such that everyone agrees that those problems matter, and whose solution could be verified by the community, then you’ve gotten a setup for solutions to create a new paradigm. A solution will necessarily use new concepts and methods. If accepted by the community, these concepts and methods constitute the new paradigm.
(Even this doesn’t always work if the techniques can’t be carried over to further problems and progress. For example, my impression is that Logical Induction nailed the solution to a legitimately important open problem, but it does not seem that the solution has been of a kind which could be used for further progress.)

Alex_Altair Jun 5, 2024, 6:14 PM
8 points
0
in reply to: Bird Concept’s comment on: Announcing ILIAD — Theoretical AI Alignment Conference
Interactively Learning the Ideal Agent Design

Alex_Altair Jun 5, 2024, 5:34 PM
3 points
7
in reply to: Thomas Kwa’s comment on: Thomas Kwa’s Shortform
[Continuing to sound elitist,] I have a related gripe/hot take that comments give people too much karma. I feel like I often see people who are “noisy” in that they comment a lot and have a lot of karma from that,^[1] but have few or no valuable posts, and who I also don’t have a memory of reading valuable comments from. It makes me feel incentivized to acquire more of a habit of using LW as a social media feed, rather than just commenting when a thought I have passes my personal bar of feeling useful.
1. ^
  Note that self-karma contributes to a comments position within the sorting, but doesn’t contribute to the karma count on your account, so you can’t get a bunch of karma just by leaving a bunch of comments that no one upvotes. So these people are getting at least a consolation prize upvote from others.

Alex_Altair Jun 4, 2024, 6:55 PM
4 points
0
in reply to: aysja’s comment on: “Does your paradigm beget new, good, paradigms?”
I think the guiding principle behind whether or not scientific work is good should probably look something more like “is this getting me closer to understanding what’s happening”
One model that I’m currently holding is that Kuhnian paradigms are about how groups of people collectively decide that scientific work is good, which is distinct from how individual scientists do or should decide that scientific work is good. And collective agreement is way more easily reached via external criteria.
Which is to say, problems are what establishes a paradigm. It’s way easier to get a group of people to agree that “thing no go”, than it is to get them to agree on the inherent nature of thing-ness and go-ness. And when someone finally makes thing go, everyone looks around and kinda has to concede that, whatever their opinion was of that person’s ontology, they sure did make thing go. (And then I think the Wentworth/Greenblatt discussion above is about whether the method used to make thing go will be useful for making other things go, which is indeed required for actually establishing a new paradigm.)
That said, I think that the way that an individual scientist decides what ideas to pursue should usually route though things more like “is this getting me closer to understanding what’s happening”, but that external people are going to track “are problems getting solved”, and so it’s probably a good idea for most of the individual scientists to occasionally reflect on how likely their ideas are to make progress on (paradigm-setting) problems.
(It is possible for the agreed-upon problem to be “everyone is confused”, and possible for a new idea to simultaneously de-confused everyone, thus inducing a new paradigm. (You could say that this is what happened with the Church-Turing thesis.) But it’s just pretty uncommon, because people’s ontologies can be wildly different.)
When you say, “I think that more precisely articulating what our goals are with agent foundations/paradigmaticity/etc could be very helpful...”, how compatible is that with more precisely articulating problems in agent foundations (whose solutions would be externally verifiable by most agent foundations researchers)?

Alex_Altair May 29, 2024, 8:46 PM
6 points
2
on: MIRI 2024 Communications Strategy
stable, durable, proactive content – called “rock” content
FWIW this is conventionally called evergreen content.

Alex_Altair May 29, 2024, 5:42 PM
4 points
0
on: One way violinists fail
“you’re only funky as [the moving average of] your last [few] cut[s]”
Somehow this is in a <a> link tag with no href attribute.

Alex_Altair May 28, 2024, 7:42 PM
LW: 5 AF: 4
2
AF
on: When is Goodhart catastrophic?
I finally got around to reading this sequence, and I really like the ideas behind these methods. This feels like someone actually trying to figure out exactly how fragile human values are. It’s especially exciting because it seems like it hooks right into an existing, normal field of academia (thus making it easier to leverage their resources toward alignment).
I do have one major issue with how the takeaway is communicated, starting with the term “catastrophic”. I would only use that word when the outcome of the optimization is really bad, much worse that “average” in some sense. That’s in line with the idea that the AI will “use the atoms for something else”, and not just leave us alone to optimize its own thing. But the theorems in this sequence don’t seem to be about that;
We call this catastrophic Goodhart because the end result, in terms of $V$ , is as bad as if we hadn’t conditioned at all.
Being as bad as if you hadn’t optimized at all isn’t very bad; it’s where we started from!
I think this has almost the opposite takeaway from the intended one. I can imagine someone (say, OpenAI) reading these results and thinking something like, great! They just proved that in the worst case scenario, we do no harm. Full speed ahead!
(Of course, putting a bunch of optimization power into something and then getting no result would still be a waste of the resources put into it, which is presumably not built into $V$ . But that’s still not very bad.)
That said, my intuition says that these same techniques could also suss out the cases where optimizing for $U$ pessimizes for $V$ , in the previously mentioned use-our-atoms sense.

Alex_Altair May 28, 2024, 3:51 PM
2 points
0
on: Catastrophic Goodhart in RL with KL penalty
Does the notation get flipped at some point? In the abstract you say
prior policy $π_{0}$
and
there are arbitrarily well-performing policies $π$
But then later you say
This strongly penalizes $π_{0}$ taking actions the base policy never takes
Which makes it sound like they’re switched.
I also notice that you call it “prior policy”, “base policy” and “reference policy” at different times; these all make sense but it’d be a bit nicer if there was one phrase used consistently.

Alex_Altair May 25, 2024, 1:58 AM
2 points
0
on: Computational Mechanics Hackathon (June 1 & 2)
I’m curious if you knowingly scheduled this during LessOnline?

Alex_Altair May 19, 2024, 8:21 PM
2 points
0
in reply to: Garrett Baker’s comment on: Towards a formalization of the agent structure problem
Yep, that paper has been on my list for a while, but I have thus far been unable to penetrate the formalisms that the Causal Incentive Group uses. This paper in particular also seems have some fairly limiting assumptions in the theorem.

Alex_Altair May 19, 2024, 1:05 AM
104 points
101
on: Fund me please—I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University
Hey Johannes, I don’t quite know how to say this, but I think this post is a red flag about your mental health. “I work so hard that I ignore broken glass and then walk on it” is not healthy.
I’ve been around the community a long time and have seen several people have psychotic episodes. This is exactly the kind of thing I start seeing before they do.
I’m not saying it’s 90% likely, or anything. Just that it’s definitely high enough for me to need to say something. Please try to seek out some resources to get you more grounded.

Alex_Altair May 15, 2024, 7:44 PM
8 points
4
in reply to: robo’s comment on: Ilya Sutskever and Jan Leike resign from OpenAI
I really appreciate this comment!
And yeah, that’s why I said only “Note that...”, and not something like “don’t trust this guy”. I think the content of the article is probably true, and maybe it’s Metz who wrote it just because AI is his beat. But I do also hold tiny models that say “maybe he dislikes us” and also something about the “questionable understanding” etc that habryka mentions below. AFAICT I’m not internally seething or anything, I just have a yellow-flag attached to this name.

Alex_Altair May 15, 2024, 1:38 AM
38 points
24
on: Ilya Sutskever and Jan Leike resign from OpenAI
Note that the NYT article is by Cade Metz.

Alex_Altair May 14, 2024, 3:40 PM
3 points
−1
in reply to: Alexander Gietelink Oldenziel’s comment on: New intro textbook on AIXI
I think the biggest thing I like about it is that it exists! Someone tried to make a fully formalized agent model, and it worked. As mentioned above it’s got some big problems, but it helps enormously to have some ground to stand on to try to build on further.

Alex_Altair May 13, 2024, 5:55 AM
3 points
0
on: Partitioned Book Club
I love this idea!

Some other books this could work for:
- The Ancestor’s Tale
- The Art of Game Design
- The Anthropocene Reviewed
- The LessWrong review books 😉
Many textbooks have a few initial “core” chapters, and then otherwise a bunch of independent chapters on applications or assorted advanced topics.

New intro textbook on AIXI

Alex_AltairMay 11, 2024, 6:18 PM

46 points

8 comments1 min readLW link

Alex_Altair May 11, 2024, 5:54 AM
5 points
0
in reply to: Nathan Helm-Burger’s comment on: Open Thread Spring 2024
You can “bookmark” a post, is that equivalent to your desired “read later”?

Alex_Altair May 11, 2024, 5:18 AM
4 points
2
on: New to this community
Welcome kjsisco! One good place to start interacting with others here is on the current open thread.

Alex_Altair

New in­tro text­book on AIXI

New intro textbook on AIXI