yams
You wrote this comment in an adversarial tone but I Just Agree With You.
Indeed, this is an alternate formulation of the thesis of my post, and even uses language I used when characterizing the post itself to someone in the office ~2 hours ago.
most of the things you said seemed like on average it would increase the amount I expected some kind of adversarial posture to make sense
I don’t understand this. Can you say more?
Meanings of political identities shift dramatically based on context, and you can’t manually confirm the beliefs of everyone present at your ‘gathering of people with x political identity’. To the extent that your political identity is based on Real Beliefs with Real Consequences, you should expect not to have much in common with many other people who declare the same identity when you move to a new place (or corner of the internet).
Example: In rural Southeast Texas, Confederate flags are a common sight, and my geometry teacher once told us about a cross burning he witnessed (which a few students murmured we really ought to bring back).
The majority of people genuinely hold at least one belief that, to many of my coastal-elite-descended friends, would seem comical. E.g., women should never have jobs and should rarely speak (especially in public), men with long hair are wanton or gay or trans or both, beating children (not like ‘spanking’ but like ‘anything short of broken bones’) is not only fine but your duty as a father, weed overdose not only can but will definitely kill you, megadoses of zinc can cure cancer, the covid vaccine is the mark of the beast from the book of revelation, high school football ought to be the most important thing in your life and, if it isn’t, you are not just odd but untrustworthy, and abortion doctors force-feed fetuses to geese to make fois gras for gay New Yorkers (of which the force feeding is the only ethical component).
Okay, I made up the last one, but the rest are actual positions I’ve heard espoused hundreds or thousands of times by people I met between the ages of 14 and 18.
Also many people talk like this, and everyone’s a ‘libertarian’.
My mom’s from a conservative California family with environmentalist sympathies, and we had something like 60 percent overlap in our views prior to the Texas move. However, I soon found that everyone around who wasn’t liable to drop one of these devastating truth bombs on me thought of themselves as somewhere to the left of Bernie Sanders and read 20th century Marxist writings in their free time. Often these people would think leftist voices at the national scale were somewhat silly or focused on the wrong things (e.g. identity politics), but they nonetheless considered themselves closer to those views than to the other views present in their environs. (There was a democratic party around, but it was very different from the national democratic party for reasons I won’t address here.)
I assume most readers are in an environment more similar to my current environment (Berkeley, CA) than to Lumberton, TX, and so won’t explicate the delta.
I think there’s a lot of mind-killing that happens as a result of relying on a presumed shared vocabulary for political identities that does not exist. When I say something left-coded, my Rationalist Libertarian Interlocutor often reproaches me, and then as we talk more about it, they often conclude that I’m a ‘boring centrist like everybody else’ who uses the language of the left owing to some biographical quirk.
I submit instead that everyone’s sense of the political map is hopelessly warped due to biographical quirks, and that assuming an adversarial posture on the basis of someone’s declared political identity is often, and maybe even ~always, a mistake.
New reacts available only to paid users of LessWrong Premium (not you freeloaders) facilitate frictionless, borderline-telepathic communication.
‘I will NEVER change my mind’: Use this react to assert that you’re content with exactly how wrong you are (which is not at all), and that the case is permanently closed on this matter, so far as you’re concerned.1
‘EY Stamp of Approval’: Use this react to assert that, on your personal authority, Eliezer Yudkowsky agrees with the contents of the comment, rendering it beyond reproach.
‘NOT EY Approved’: Use this react to assert that, on your personal authority, Eliezer Yudkowsky disagrees with the contents of the comment, rendering it immensely reproachful. Users who accrue too many ‘NOT EY Approved’ reacts will have their accounts suspended (although actual thresholds here have yet to be set).2
‘May as well be AI’: Use this react when you’re indifferent to whether or not a statement was generated by AI because shit, it may as well be. You’re ignoring it either way.
‘Have you even read plane crash?’: Use this react when your interlocutor’s unfamiliarity with prior literature is clearly on display.
‘China Hawk’: Use this react to assert American supremacy and insist that its recipient is derelict in their duty to ensure the preeminence of the greatest country in the history of conscious life from here to the other side of the singularity.
’Toilet’3
‘Sure, buddy’: We all know that the optimal amount of mental masturbation is non-zero. But some reach far beyond the zone of optimality and into the depths of their pants to produce truly monstrous works of self-gratification. Previously, one was compelled to express such an opinion obliquely on LessWrong, or else shatter the decorum of the space and open oneself up to similar critiques. In Beta, we now have the power to quietly acknowledge the reality of the situation, without derailing the gratification itself.
———————————————————————————————1This has replaced the ‘I beseech thee’ emoji, which never worked anyway.
2 Note that both EY-invoking reacts are invisible to anyone logged into the @-EliezerYudkowsky LessWrong account, or from any IP address that has ever been logged into that account. No point in having the man himself weigh in when so many LessWrong users are so well practiced at speaking on his behalf!
3Beta users haven’t settled on how this react ought to be deployed. I’ve seen utilizations ranging from ‘This post belongs in the toilet’ to ‘I enjoyed reading this on the toilet.’
I don’t have a real explanation, but I’ve been interested in this, since it feels like the LLM is doing something like the opposite of what writers intend to do (at least in the effect). As if there’s some portion of language space that invites engagement, or trips an alarm in the reader that says ‘there’s something in this!’ Human writers swim toward that portion of the space; LLMs swim away from it.
[I would be unsurprised to find I have not expressed this well.]
I found this post was pretty disappointing in its argumentation, for reasons you describe, and I fairly strongly support its conclusion.
high potential upside for alignment
I like AE Studios!
Can you give an at-all-more-concrete operationalization of this?
Who’s evaluating the proposals and where’s the best public-facing analog of their views on alignment / the criteria applied to evaluate research?
Same question as above, but this time for ‘whoever decides whether this program gets scaled’.
Can you give an example of prior work (e.g., a paper from Anthropic’s safety teams) that would have been competitive for this program?
For instance, if the higher-ups are thinking ‘the problem is that we need more reliable LLMs on roughly the current instantiation of LLMs’, that’s very different from ‘the problem is we need to align superhuman coders that will build the aligned AI, plausibly on some other substrate’, or ‘the problem is we need to identify an alternative architecture that’s fundamentally easier to align than LLMs’.
The people writing these proposals are likely fitting whatever piece you work on for them into a larger picture, and I’m wondering which of the competing larger pictures are advantaged in the application phase.
I was comparing to other video posts by Sanders.
I was comparing to a broader activation of Eliezer’s audience vs any given tweet.
Outperforming the ‘average’ is the wrong standard for ‘blowing up’. ‘Blowing up’ would be ‘outperforming all the recent similar artifacts’, at least as I intended it in my original post.
Meta: it feels pretty strange to have used an underspecified colloquial term, to have walked back the applicability of that term as I intended to use it, and then to be told I’m wrong for walking it back. The point I cared about capturing in that edit is ‘this tweet didn’t do as well as I expected when it first dropped.’ That’s a claim about my own expectations.
Depends on the reference class. As of the past year or so, 1m eyes on a piece of AI safety content isn’t crazy, especially for a video on Twitter, where my impression is the criteria for what counts as a ‘view’ are pretty liberal. Like, plausibly the video has been viewed in full (much) less than half that many times.
Separately, videos posted by that account seem to routinely get ~1m views—not outperforming other content from an external collaborator is a little disappointing from a raw metrics perspective! Naively you’d hope to get the combined weight of your respective audiences, which seems to have only somewhat happened here.
When I posted this I think I expected we’d get to 2m views in the first couple days (weakly outperforming other Sanders Twitter content). I think with a different video, that could have happened.
Still an exciting crossover episode.
But do you see how shifting from ‘just reading’ to ‘interpreting implementation consequences’ means that the tweet may be claiming an effective ban and not a ban by the letter of the law itself? I agree that this is sloppy but it’s not out of the norm for discussion of laws in the public eye (where the consequences are foregrounded rather than the details; e.g. the stuff from like 15 years ago about hallway width in abortion clinics, which was widely reported on as ‘banning abortion clinics’, because changing hallway width was prohibitively expensive).
[Genuinely not trying to convince; it looks like you may not have seen my point yet, rather than that you’ve seen it and disagree with it, but maybe I’m wrong about that!]
It seems likely to me that the actual consequence of the bill, if enacted, map more closely to the broad version than the narrow version for ease of implementation / cost of violation reasons, and that the various tweeters share this assumption.
Maybe that’s just a differently instantiated version of the bias you’re describing, but it feels to me like a relevant distinction: They’re not literally misreading the law. They’re trying to model the consequences of its implementation.
I asked a genuine question in good faith because I was confused about what you meant. Now I understand what you meant. Thank you for clarifying.
Is it true that someone received an email from an instance of Claude asking this question? Probably. The degree of autonomy involved in the sending of that email may be a pretty big crux for whether or not this is “mind-blowing and deserves a lot of alarms in many labs.” Users still have influence over the activities and preoccupations of their agents; current consensus afaict is that most of the concerning/consciousness-flavored content on MoltBook is downstream of user influence.
That’s why I asked you to clarify what you mean by ‘legit’. Is the recipient of the email attempting to defraud the public? Probably not. Is this email much evidence of consciousness in the Claude families of models? Also probably not. So it’s ‘legit’ in the first sense, but not in the second.
Bernie Sanders has released a video on x-risk featuring a discussion with Eliezer, Nate, Daniel Kokotajlo, and Jeffrey Ladish. An excerpt from it appears to be blowing up on twitter.
EDIT: re ‘blowing up’: initial view velocity was really high (~200k/hour), but steeply dropped off and it doesn’t seem to have really broken containment
What do you mean by ‘legit’?
it’s plausible to me that almost any public discussion of AI or AI safety that is not centrally about LLM consciousness should clarify this early and often.
Low-context audiences are really hung up on the consciousness topic, and are often reading entirely unrelated material as though it were trying to make a claim about consciousness, then generalizing to a judgement about the speaker that inoculates them against partitioning consciousness and capabilities.
Clarifying that you don’t mean to step in the consciousness discussion upfront may be a way to reduce instances of this (but could also backfire? I’m actually not confident in the solution; maybe the real thing is ‘if you’re discussing AI on the internet expect that someone with no context will show up and read it in this way, and do whatever makes sense to do to head that off’, but that seems much more costly and delicate a procedure than a simple disclaimer).
[inspired by recent twitter activity related to the Anthropic PSM paper]
afaict getting up to date on the cyborgism-adjacent discourse is something you (mostly) do by talking to people in person, rather than by reading things on the internet.
(I also wish there were a more convenient way to get up to speed.)
I expect this is true in the simple case of ‘you can sell a service to them’; the reason I used the Amazon example, is that Amazon owns shares in Anthropic ($60B worth according to quick Claude check), which it purchased (at least in part) with compute credits; this is a much more intimate entanglement then Anthropic being yet-another-AWS-customer.
Again, I genuinely don’t know how this pans out, but the crux for me is not ‘can you sell a product to a company that’s a supply chain risk and keep your contracts’; it’s ‘can you [do all the things Amazon is doing with Anthropic, which are largely mutually conditioned on one another as part of complex agreements] with a supply chain risk and keep your contracts.’
The post is meant to be somewhat agnostic on the question—conditional on one has a map, here’s a common failure mode. It’s also meant to point in the direction of ‘reconsider the value of your map’.
Separately, I think I ~endorse your first comment, but I also think there are cases in which you should definitely have a map (eg you are attempting to achieve political ends). So I think your second comment is somewhat overstated.