LessWrong Team
I have signed no contracts or agreements whose existence I cannot mention.
LessWrong Team
I have signed no contracts or agreements whose existence I cannot mention.
I’m curious how much you’re using this and if it’s turning out to be useful on LessWrong. Interested because it’s something we’ve been thinking about integrating LLM stuff like this into LW itself.
I’d ask in the Open Thread rather than here. I don’t know of a canonical answer but would be good if someone wrote one.
Update: I’ve lifted your rate limit since it wasn’t an automatically applied one and its duration (12 months) seemed excessive.
1) Folders OR sort by tag for bookmarks.
I would like that a lot personally. Unfortunately bookmarks don’t get enough general use for us to prioritize that work.
2) When I am closing the hamburger menu on the frontpage I don’t see a need for the blogs to not be centred. It’s unusual, it might make more sense if there was a way to double stack it side by side like mastodon.
I believe this change was made for the occasions on which there are neat art to be displayed on the right side. It might also allow more room for a chat LLM integration we’re currently experimenting with.
3) Not currently I’m afraid. I think this would make sense but is competing with all the other things to do.
(Unrelated: can I get deratelimited lol or will I have to make quality Blogs for that to happen?)
Quality contributions or enough time passing for many of the automatic rate limits.
I think there are two possibilities:
The community norms are orthogonal or opposed to figuring out what’s right. In which case it’s unclear why you’d want to engage with this community. Perhaps you altruistically want to improve people’s beliefs, but if so, disregarding the norms and culture is a good way to be ignored (or banned), since the people bought into the culture think they’re important for getting things right, and ignoring them makes your submission less likely to be worth engaging with.
The culture and norms in fact successfully get at things which are important for getting things right, and in disregarding them, you’re actually much less likely to figure out what’s true. People are justified in ignoring and downvoting you if you don’t stick to them.
It’s also possible that there’s more than one set of truth-seeking norms, but that doesn’t mean it’s easy to communicate across them. So better to say “over here, we operate in X, if you want to participate, please follow X norms. And I think that’s legit.
Of course, this is very abstract and it’s possible you have examples I’d agree with.
Curated. My first (and planned to be only) child was born three months ago, so I’ve answered the question of “whether to have kids”. How to raise a child in the current world remains a rather open question, and I appreciate this post for broaching the topic.
I think there’s a range of open questions here, would be neat to see further work on them. Big topics are how to ensure a child’s psychological wellbeing if you’re honest with them that you think imminent extinction is likely, and also how to maintain productivity and ambitious goals with kids.
An element that’s taken me surprise is the effect of my own psychology of feeling a very strong desire to give my child a good world to live in, but feeling I’m not capable of ensuring that. Yes, I can make efforts, but it’s hard feeling I can’t give her the world I wish I could. That’s a psychological weight I didn’t anticipate before.
Pretty solid evidence.
My feeling of the plan pre-pivotal-act era was “figure out the theory of how to build a safe AI at all, and try to get whoever is building to adopt that approach”, and that MIRI wasn’t taking any steps to be the ones building it. I also had the model that due to psychological unity of mankind, anyone building an aligned[ with them] AGI was a good outcome compared to someone building unaligned. Like even if it was Xi Jinping, a sovereign aligned with him would be okay (and not obviously that dramatically different from anyone else?). I’m not sure how much this was MIRI positions vs fragments that I combined in my own head that came from assorted places and were never policy.
I don’t remember what exactly I thought in 2012 when I was reading the Sequences. I do recall sometime later, after DL was in full swing, it seeming like MIRI wasn’t in any position to be building AGI before others (like no compute, not the engineering prowess), and someone (not necessarily at MIRI) confirmed that wasn’t the plan. Now and at the time, I don’t know how much that was principle vs ability.
I would be surprised if a Friendly AI resulted in those things being left untouched.
I think that is germane but maybe needed some bridging/connecting work since this thread so far was about MIRI-as-having-pivotal-act-goal. Whereas I was less sure about whether MIRI itself would enact a pivotal act if they could than Habryka, my understanding was they had no plan to create a sovereign for most of their history (like after 2004) and so doesn’t seem like that’s a candidate for them having a plan to take over the world.
I’m confused about your question. I am not imagining that the pivotal act AI (the AI under discussion) does any of those things.
Hi Sohang,
The title of the post in the email is a link to the specific post. I’m afraid it’s not in green or anything to indicate it’s a link though. That’s something maybe to fix.
Interesting to consider it a failure mode. Maybe it is. Or is at least somewhat.
I’ve got another post on eigening in the works, I think that might provide clearer terminology for talking about this, if you’ll have time to read it.
Hmm, okay, I think I’ve made an update (not necessarily to agree with you entirely, but still an update on my picture, so thanks).
I was thinking that if a group of people all agree on particular axioms or rules of inferences, etc., then that will be where eigening is occurring even if given sufficiently straightforward axioms, the group members will achieve consensus without further eigening. But possibly you can get consensus on the axioms just via selection and via individuals using their inside-view to adopt them or not. That’s still a degree of “we agreed”, but not eigening.
Huh. Yeah, that’s an interesting case which yeah, plausibly doesn’t require any eigening. I think the plausibility comes from it being a case where someone can so fully do it from their personal inside view (the immediate calculation and also their belief in how the underlying mathematical operations ought to work).
I don’t think it scales to anything interesting (def not alignment research), but it is conceptually interesting for how I’ve been thinking about this.
Curated. I like the variety of the examples, really highlights how there are multiple angles from which you might fail to be kind. Though failure is a harsh term. I’m often tempted to levels of labels on things, where I do think the attempts made (“we’ll do whatever you want”, “kick me out whenever”) are still kindnesses and better than not doing those things. It’s just we can aspire to even greater levels of kindness.
I’m curious how much the response one wants to make in response to this is in that in individual interactions, move beyond cached notions of what’s kind behavior and actually boot up more detailed models of the other person and their experience as opposed to recomputing what’s actually kind across various common scenarios that can be more heuristically and cheaply applied. As usual, perhaps some of both!
“Pretty world-takeover-adjacent” feels like a fair description to me.
I got your point and think it’s valid and I don’t object to calling MIRI structurally power-seeking to the extent they wanted to execute a pivotal act themselves (Habryka claims they weren’t, I’m not knowledgeable on that front).
I still think it’s important to push back against a false claim that someone had the goal of taking over the world.
Meta; I think it’s good to proactively think of examples if you can, and good to provide them too.
My position is approx “whenever there’s group aggregate belief, it arises from an eigen- process”. (True even when you’ve got direct-evaluation, though so quantitatively different as to be qualitatively different.)
Predicting that whatever you say will also be eigen-evaluation according to me makes it hard to figure what you think isn’t.
ETA: This perhaps inspires me to write a post arguing for this larger point. Like it’s the same mechanism with “status”, fashion, and humor too.
Same question as above.