i think they presented a pretty good argument that it is actually rather minor
AprilSR
While the concept that looking at the truth even when it hurts is important isn’t revolutionary in the community, I think this post gave me a much more concrete model of the benefits. Sure, I knew about the abstract arguments that facing the truth is valuable, but I don’t know if I’d have identified it as an essential skill for starting a company, or as being a critical component of staying in a bad relationship. (I think my model of bad relationships was that people knew leaving was a good idea, but were unable to act on that information—but in retrospect inability to even consider it totally might be what’s going on some of the time.)
So if a UFO lands in your backyard and aliens ask if you if you want to go on a magical (but not particularly instrumental) space adventure with them, I think it’s reasonable to very politely decline, and get back to work solving alignment.
I think I’d probably go for that, actually, if there isn’t some specific reason to very strongly doubt it could possibly help? It seems somewhat more likely that I’ll end up decisive via space adventure than by mundane means, even if there’s no obvious way the space adventure will contribute.
This is different if you’re already in a position where you’re making substantial progress though.
nonetheless, i think the analogy is still suggestive that an AI selectively shaped for whatever might end up deliberately maximizing something else
i think, in retrospect, this feature was a really great addition to the website.
This post introduced me to a bunch of neat things, thanks!
There are several comments “suggesting that maybe the cause is mental illness”.
But personally, I think having such a standard is both unreasonable and inconsistent with the implicit standard set by essays from Yudkowsky and other MIRI people.
I think this is largely coming from an attempt to use approachable examples? I could believe that there were times when MIRI thought that even getting something as good as ChatGPT might be hard, in which case they should update, but I don’t think they ever believed that something as good as ChatGPT is clearly sufficient. I certainly never believed that, at least.
Yes, we told everyone they were in the minority. It’s a “game”.
I think this is bad. I mean, it’s not that big a deal, but I generally speaking expect messages I receive from The LessWrong Team to not tell falsehoods.
Hmm.
I don’t think Avoiding actions that noticeably increase the chance civilization is destroyed is necessarily the most practically-relevant virtue, for most people, but it does seem to me like it’s the point of Petrov day in particular. If we’re recognizing Petrov as a person, I’d say that was Petrov’s key virtue.
Or maybe I’d say something like “not doing very harmful acts despite incentives to do so”—I think “resisting social pressure” isn’t quite on the mark, but I think it is important to Petrov day that there were strong incentives against what Petrov did.
I think other virtues are worth celebrating, but I think I’d want to recognize them on different holidays.
I mean, that’s a thing you might hope to be true. I’m not sure if it actually is true.
I think, if you had several UDT agents with the same source code, and then one UDT agent with slightly different source code, you might see the unique agent defect.
I think the CDT agent has an advantage here because it is capable of making distinct decisions from the rest of the population—not because it is CDT.
I’m not sure “original instantiation” is always well-defined
I think personally I’d say it’s a clear advancement—it opens up a lot of puzzles, but the naïve intuition corresponding to it it still seems more satisfying than CDT or EDT, even if a full formalization is difficult.
(Not to comment on whether there might be a better communications strategy for getting the academic community interested.)
Provided that you make sure you don’t publish some massive capabilities progress—which I think is pretty unlikely for most undergrads—I think the benefits from having an additional alignment-conscious person with relevant skills probably outweighs the very marginal costs of tiny incremental capabilities ideas.
I think a lot of travel expenses?
I was confused by the disagree votes on this comment, so I looked—the comment in question is highest on the default “new and upvoted” sorting, but it isn’t highest on the “top” sorting.
I’m more confident that we should generally have norms prohibiting using threats of legal action to prevent exchange of information than I am of the exact form those norms should take. But to give my immediate thoughts:
I think the best thing for Alice to do if Bob is lying about her is to just refute the lies. In an ideal world, this is sufficient. In practice, I guess maybe it’s insufficient, or maybe refuting the lies would require sharing private information, so if necessary I would next escalate to informing forum moderators, presenting evidence privately, and requesting a ban.
Only once those avenues are exhausted might I consider threatening a libel suit acceptable.
I do notice now that the Nonlinear situation in particular is impacted by Ben Pace being a LessWrong admin—so if step 1 doesn’t work, step 2 might have issues, so maybe escalating to step 3 might be acceptable sooner than usual.
Concerns have been raised that there might be some sort of large first-mover advantage. I’m not sure I buy this—my instinct is that the Nonlinear cofounders are just bad-faith actors making any arguments that seem advantageous to them (though out of principle I’m trying to withhold final judgement). That said, I could definitely imagine deciding in the future that this is a large enough concern to justify weaker norms against rapid escalation.
I feel like Project Lawful, as well as many of Lintamande’s other glowfic since then, have given me a whole lot deeper an understanding of… a collection of virtues including honor, honesty, trustworthiness, etc, which I now mostly think of collectively as “Law”.
I think this has been pretty valuable for me on an intellectual level—I think, if you show me some sort of deontological rule, I’m going to give a better account of why/whether it’s a good idea to follow it than I would have before I read any glowfic.
It’s difficult for me to separate how much of that is due to Project Lawful in particular, because ultimately I’ve just read a large body of work which all had some amount of training data showing a particular sort of thought pattern which I’ve since learned. But I think this particular fragment of the rationalist community has given me some valuable new ideas, and it’d be great to figure out a good way of acknowledging that.