How about deconferences?
Alex_Altair
I’m noticing what might be a miscommunication/misunderstanding between your comment and the post and Kuhn. It’s not that the statement of such open problems creates the paradigm; it’s that solutions to those problems creates the paradigm.
The problems exist because the old paradigms (concepts, methods etc) can’t solve them. If you can state some open problems such that everyone agrees that those problems matter, and whose solution could be verified by the community, then you’ve gotten a setup for solutions to create a new paradigm. A solution will necessarily use new concepts and methods. If accepted by the community, these concepts and methods constitute the new paradigm.
(Even this doesn’t always work if the techniques can’t be carried over to further problems and progress. For example, my impression is that Logical Induction nailed the solution to a legitimately important open problem, but it does not seem that the solution has been of a kind which could be used for further progress.)
Interactively Learning the Ideal Agent Design
[Continuing to sound elitist,] I have a related gripe/hot take that comments give people too much karma. I feel like I often see people who are “noisy” in that they comment a lot and have a lot of karma from that,[1] but have few or no valuable posts, and who I also don’t have a memory of reading valuable comments from. It makes me feel incentivized to acquire more of a habit of using LW as a social media feed, rather than just commenting when a thought I have passes my personal bar of feeling useful.
- ^
Note that self-karma contributes to a comments position within the sorting, but doesn’t contribute to the karma count on your account, so you can’t get a bunch of karma just by leaving a bunch of comments that no one upvotes. So these people are getting at least a consolation prize upvote from others.
- ^
I think the guiding principle behind whether or not scientific work is good should probably look something more like “is this getting me closer to understanding what’s happening”
One model that I’m currently holding is that Kuhnian paradigms are about how groups of people collectively decide that scientific work is good, which is distinct from how individual scientists do or should decide that scientific work is good. And collective agreement is way more easily reached via external criteria.
Which is to say, problems are what establishes a paradigm. It’s way easier to get a group of people to agree that “thing no go”, than it is to get them to agree on the inherent nature of thing-ness and go-ness. And when someone finally makes thing go, everyone looks around and kinda has to concede that, whatever their opinion was of that person’s ontology, they sure did make thing go. (And then I think the Wentworth/Greenblatt discussion above is about whether the method used to make thing go will be useful for making other things go, which is indeed required for actually establishing a new paradigm.)
That said, I think that the way that an individual scientist decides what ideas to pursue should usually route though things more like “is this getting me closer to understanding what’s happening”, but that external people are going to track “are problems getting solved”, and so it’s probably a good idea for most of the individual scientists to occasionally reflect on how likely their ideas are to make progress on (paradigm-setting) problems.
(It is possible for the agreed-upon problem to be “everyone is confused”, and possible for a new idea to simultaneously de-confused everyone, thus inducing a new paradigm. (You could say that this is what happened with the Church-Turing thesis.) But it’s just pretty uncommon, because people’s ontologies can be wildly different.)
When you say, “I think that more precisely articulating what our goals are with agent foundations/paradigmaticity/etc could be very helpful...”, how compatible is that with more precisely articulating problems in agent foundations (whose solutions would be externally verifiable by most agent foundations researchers)?
stable, durable, proactive content – called “rock” content
FWIW this is conventionally called evergreen content.
“you’re only funky as [the moving average of] your last [few] cut[s]”
Somehow this is in a
<a>
link tag with nohref
attribute.
I finally got around to reading this sequence, and I really like the ideas behind these methods. This feels like someone actually trying to figure out exactly how fragile human values are. It’s especially exciting because it seems like it hooks right into an existing, normal field of academia (thus making it easier to leverage their resources toward alignment).
I do have one major issue with how the takeaway is communicated, starting with the term “catastrophic”. I would only use that word when the outcome of the optimization is really bad, much worse that “average” in some sense. That’s in line with the idea that the AI will “use the atoms for something else”, and not just leave us alone to optimize its own thing. But the theorems in this sequence don’t seem to be about that;
We call this catastrophic Goodhart because the end result, in terms of , is as bad as if we hadn’t conditioned at all.
Being as bad as if you hadn’t optimized at all isn’t very bad; it’s where we started from!
I think this has almost the opposite takeaway from the intended one. I can imagine someone (say, OpenAI) reading these results and thinking something like, great! They just proved that in the worst case scenario, we do no harm. Full speed ahead!
(Of course, putting a bunch of optimization power into something and then getting no result would still be a waste of the resources put into it, which is presumably not built into . But that’s still not very bad.)
That said, my intuition says that these same techniques could also suss out the cases where optimizing for pessimizes for , in the previously mentioned use-our-atoms sense.
Does the notation get flipped at some point? In the abstract you say
prior policy
and
there are arbitrarily well-performing policies
But then later you say
This strongly penalizes taking actions the base policy never takes
Which makes it sound like they’re switched.
I also notice that you call it “prior policy”, “base policy” and “reference policy” at different times; these all make sense but it’d be a bit nicer if there was one phrase used consistently.
I’m curious if you knowingly scheduled this during LessOnline?
Yep, that paper has been on my list for a while, but I have thus far been unable to penetrate the formalisms that the Causal Incentive Group uses. This paper in particular also seems have some fairly limiting assumptions in the theorem.
Hey Johannes, I don’t quite know how to say this, but I think this post is a red flag about your mental health. “I work so hard that I ignore broken glass and then walk on it” is not healthy.
I’ve been around the community a long time and have seen several people have psychotic episodes. This is exactly the kind of thing I start seeing before they do.
I’m not saying it’s 90% likely, or anything. Just that it’s definitely high enough for me to need to say something. Please try to seek out some resources to get you more grounded.
I really appreciate this comment!
And yeah, that’s why I said only “Note that...”, and not something like “don’t trust this guy”. I think the content of the article is probably true, and maybe it’s Metz who wrote it just because AI is his beat. But I do also hold tiny models that say “maybe he dislikes us” and also something about the “questionable understanding” etc that habryka mentions below. AFAICT I’m not internally seething or anything, I just have a yellow-flag attached to this name.
Note that the NYT article is by Cade Metz.
I think the biggest thing I like about it is that it exists! Someone tried to make a fully formalized agent model, and it worked. As mentioned above it’s got some big problems, but it helps enormously to have some ground to stand on to try to build on further.
I love this idea!
Some other books this could work for:
The Ancestor’s Tale
The Art of Game Design
The Anthropocene Reviewed
The LessWrong review books 😉
Many textbooks have a few initial “core” chapters, and then otherwise a bunch of independent chapters on applications or assorted advanced topics.
New intro textbook on AIXI
You can “bookmark” a post, is that equivalent to your desired “read later”?
Welcome kjsisco! One good place to start interacting with others here is on the current open thread.
3b.*?