Agreed the title is confusing. I assumed it meant that some metric was 5% for last year’s course, and 37% for this year’s course. I think I would just nix numbers from the title altogether.
Alex_Altair
[Talk transcript] What “structure” is and why it matters
A simple model of math skill
Empirical vs. Mathematical Joints of Nature
One model I have is that when things are exponentials (or S-curves), it’s pretty hard to tell when you’re about to leave the “early” game, because exponentials look the same when scaled. If every year has 2x as much activity as the previous year, then every year feels like the one that was the big transition.
For example, it’s easy to think that AI has “gone mainstream” now. Which is true according to some order of magnitude. But even though a lot of politicians are talking about AI stuff more often, it’s nowhere near the top of the list for most of them. It’s more like just one more special interest to sometimes give lip service too, nowhere near issues like US polarization, China, healthcare and climate change.
Of course, AI isn’t necessarily well-modelled by an S-curve. Depending on what you’re measuring, it could be non-monotonic (with winters and summers). It could also be a hyperbola. And if we all dropped dead in the same minute from nanobots, then there wouldn’t really be a mid- or end-game at all. But I currently hold a decent amount of humility around ideas like “we’re in midgame now”.
(Tiny bug report, I got an email for this comment reply, but I don’t see it anywhere in my notifications.)
Done
I propose that this tag be merged into the tag called Infinities In Ethics.
3.
3b.*?
How about deconferences?
I’m noticing what might be a miscommunication/misunderstanding between your comment and the post and Kuhn. It’s not that the statement of such open problems creates the paradigm; it’s that solutions to those problems creates the paradigm.
The problems exist because the old paradigms (concepts, methods etc) can’t solve them. If you can state some open problems such that everyone agrees that those problems matter, and whose solution could be verified by the community, then you’ve gotten a setup for solutions to create a new paradigm. A solution will necessarily use new concepts and methods. If accepted by the community, these concepts and methods constitute the new paradigm.
(Even this doesn’t always work if the techniques can’t be carried over to further problems and progress. For example, my impression is that Logical Induction nailed the solution to a legitimately important open problem, but it does not seem that the solution has been of a kind which could be used for further progress.)
Interactively Learning the Ideal Agent Design
[Continuing to sound elitist,] I have a related gripe/hot take that comments give people too much karma. I feel like I often see people who are “noisy” in that they comment a lot and have a lot of karma from that,[1] but have few or no valuable posts, and who I also don’t have a memory of reading valuable comments from. It makes me feel incentivized to acquire more of a habit of using LW as a social media feed, rather than just commenting when a thought I have passes my personal bar of feeling useful.
- ^
Note that self-karma contributes to a comments position within the sorting, but doesn’t contribute to the karma count on your account, so you can’t get a bunch of karma just by leaving a bunch of comments that no one upvotes. So these people are getting at least a consolation prize upvote from others.
- ^
I think the guiding principle behind whether or not scientific work is good should probably look something more like “is this getting me closer to understanding what’s happening”
One model that I’m currently holding is that Kuhnian paradigms are about how groups of people collectively decide that scientific work is good, which is distinct from how individual scientists do or should decide that scientific work is good. And collective agreement is way more easily reached via external criteria.
Which is to say, problems are what establishes a paradigm. It’s way easier to get a group of people to agree that “thing no go”, than it is to get them to agree on the inherent nature of thing-ness and go-ness. And when someone finally makes thing go, everyone looks around and kinda has to concede that, whatever their opinion was of that person’s ontology, they sure did make thing go. (And then I think the Wentworth/Greenblatt discussion above is about whether the method used to make thing go will be useful for making other things go, which is indeed required for actually establishing a new paradigm.)
That said, I think that the way that an individual scientist decides what ideas to pursue should usually route though things more like “is this getting me closer to understanding what’s happening”, but that external people are going to track “are problems getting solved”, and so it’s probably a good idea for most of the individual scientists to occasionally reflect on how likely their ideas are to make progress on (paradigm-setting) problems.
(It is possible for the agreed-upon problem to be “everyone is confused”, and possible for a new idea to simultaneously de-confused everyone, thus inducing a new paradigm. (You could say that this is what happened with the Church-Turing thesis.) But it’s just pretty uncommon, because people’s ontologies can be wildly different.)
When you say, “I think that more precisely articulating what our goals are with agent foundations/paradigmaticity/etc could be very helpful...”, how compatible is that with more precisely articulating problems in agent foundations (whose solutions would be externally verifiable by most agent foundations researchers)?
stable, durable, proactive content – called “rock” content
FWIW this is conventionally called evergreen content.
“you’re only funky as [the moving average of] your last [few] cut[s]”
Somehow this is in a
<a>
link tag with nohref
attribute.
I finally got around to reading this sequence, and I really like the ideas behind these methods. This feels like someone actually trying to figure out exactly how fragile human values are. It’s especially exciting because it seems like it hooks right into an existing, normal field of academia (thus making it easier to leverage their resources toward alignment).
I do have one major issue with how the takeaway is communicated, starting with the term “catastrophic”. I would only use that word when the outcome of the optimization is really bad, much worse that “average” in some sense. That’s in line with the idea that the AI will “use the atoms for something else”, and not just leave us alone to optimize its own thing. But the theorems in this sequence don’t seem to be about that;
We call this catastrophic Goodhart because the end result, in terms of , is as bad as if we hadn’t conditioned at all.
Being as bad as if you hadn’t optimized at all isn’t very bad; it’s where we started from!
I think this has almost the opposite takeaway from the intended one. I can imagine someone (say, OpenAI) reading these results and thinking something like, great! They just proved that in the worst case scenario, we do no harm. Full speed ahead!
(Of course, putting a bunch of optimization power into something and then getting no result would still be a waste of the resources put into it, which is presumably not built into . But that’s still not very bad.)
That said, my intuition says that these same techniques could also suss out the cases where optimizing for pessimizes for , in the previously mentioned use-our-atoms sense.
Does the notation get flipped at some point? In the abstract you say
prior policy
and
there are arbitrarily well-performing policies
But then later you say
This strongly penalizes taking actions the base policy never takes
Which makes it sound like they’re switched.
I also notice that you call it “prior policy”, “base policy” and “reference policy” at different times; these all make sense but it’d be a bit nicer if there was one phrase used consistently.
I’m curious if you knowingly scheduled this during LessOnline?
There is a little crackpot voice in my head that says something like, “the real numbers are dumb and bad and we don’t need them!” I don’t give it a lot of time, but I do let that voice exist in the back of my mind trying to work out other possible foundations. A related issue here is that it seems to me that one should be able to have a uniform probability distribution over a countable set of numbers. Perhaps one could do that by introducing infinitesimals.