LessWrong dev & admin as of July 5th, 2022.
RobertM
Mod note (for other readers): I think this is a good example of acceptable use of LLMs for translation purposes. The comment reads to me[1] like it was written by a human and then translated fairly literally, without performing edits that would make it sound unfortunately LLM-like (perhaps with the exception of the em-dashes).
“Written entirely by you, a human” and “translated literally, without any additional editing performed by the LLM” are the two desiderata, which, if fulfilled, I will usually consider sufficient to screen off the fact that the words technically came out of an LLM[2]. (If you do this, I strongly recommend using a reasoning model, which is much less likely to end up rewriting your comment in its own style. Also, I appreciate the disclaimer. I don’t know if I’d want it present in every single comment; the first time seems good and maybe having one in one’s profile after that is sufficient? Needs some more thought.) This might sometimes prove insufficient, but I don’t expect people honestly trying and failing at achieving good outcomes here to substantially increase our moderation burden.
He did not say that they made such claims on LessWrong, where he would be able to publicly cite them. (I have seen/heard those claims in other contexts.)
Curated! I found the evopsych theory interesting but (as you say) speculative; I think the primary value of this post comes from presenting a distinct frame by which to analyze the world, one which I and probably many readers either didn’t have distinctly carved out or part of their active toolkit. I’m not sure if this particular frame will prove useful enough to make it into my active rotation, but it has the shape of something that could, in theory.
I’ve had many similar experiences. Not confident, but I suspect a big part of this skill, at least for me, is something like “bucketing”—it’s easy to pick out the important line from a screen-full of console logs if I’m familiar with the 20[1] different types of console logs I expect to see in a given context and know that I can safely ignore almost all of them as either being console spam or irrelevant to the current issue. If you don’t have that basically-instant recognition, which must necessarily be faster than “reading speed”, the log output might as well be a black hole.
Becoming familiar with those 20 different types of console logs is some combination of general domain experience, project-specific experience, and native learning speed (for this kind of pattern matching).
Similar effect when reading code, and I suspect why some people care what seems like disproportionately much about coding standards/style/convention—if your codebase doesn’t follow a consistent style/set of conventions, you can end up paying a pretty large penalty by absence of that speedup.
- ^
Made up number
- ^
Not having talked to any such people myself, I think I tentatively disbelieve that those are their true objections (despite their claims). My best guess as to what actual objection would be most likely to generate that external claim would be something like… “this is an extremely weird thing to be worried about, and very far outside of (my) Overton window, so I’m worried that your motivations for doing [x] are not true concern about model welfare but something bad that you don’t want to say out loud”.
These days, if somebody’s house has candles burning in it, I’m turning around and leaving. People dumb enough to do that just aren’t worth putting up with their air pollution.
I found this post broadly entertaining (and occasionally enlightening) but unless you mean “has candles burning it regularly, outside of annual rituals like Petrov Day”, this is a pretty weird take. Do you also refuse to enter houses with inhabitants that use their kitchens, unless you confidently know that they keep the windows open during & after cooking? Burning a candle indoor once or twice a year is just not that many micromorts, and deciding that people who do it are obviously so dumb as to be trivially screened off is a straightforwardly wrong heuristic.
This is, broadly speaking, the problem of corrigibility, and how to formalize it is currently an open research problem. (There’s the separate question whether it’s possible to make systems robustly corrigible in practice without having a good formalized notion of what that even means; this seems tricky.)
Thanks for the heads-up, I’ve fixed it in the post.
Curated! I think that this post is one of the best attempts I’ve seen at concisely summarizing… the problem, as it were, in a way that highlights the important parts, while remaining accessible to an educated lay-audience. The (modern) examples scattered throughout were effective, in particular the use of Golden Gate Claude as an example of the difficulty of making AIs believe false things was quite good.
I agree with Ryan that the claim re: speed of AI reaching superhuman capabilities is somewhat overstated. Unfortunately, this doesn’t seem load-bearing for the argument; I don’t feel that much more hopeful if we have 2-5 years to use/study/work with AI systems that are only slightly-superhuman at R&D (or some similar target). You could write an entire book about why this wouldn’t be enough. (The sequences do cover a lot of the reasons.)
Why did the early computer vision scientists not write succeed in writing a formal ruleset for recognizing birds, and ultimately it took a messy cludge of inscrutable learned heuristics to solve that task?
I disapprove of Justice Potter in many respects, but “I know it when I see it” is indeed sometimes the only practical[1] way to carve reality.
(This is not meant to be a robust argument, just a couple of pointers at countervailing considerations.)
- ^
For humans.
- ^
And it seems like you forgot about them too by the time you wrote your comment.
It was not clear from your comment which particular catastrophic failures you meant (and in fact it’s still not clear to me which things from your post you consider to be in that particular class of “catastrophic failures”, which of them you attribute at least partial responsibility for to MIRI/CFAR, by what mechanisms/causal pathways, etc).
ETA: “OpenAI existing at all” is an obvious one, granted. I do not think EY considers SBF to be his responsibility (reasonable, given SBF’s intellectual inheritance from the parts of EA that were least downstream of EY’s thoughts). You don’t mention other grifters in your post.
FYI I am generally good at tracking inside baseball but I understand neither what specific failures[1] you would have wanted to see discussed in an open postmortem nor what things you’d consider to be “improvements” (and why the changes since 2022/04/01 don’t qualify).
- ^
I’m sure there were many, but I have no idea what you consider to have been failures, and it seems like you must have an opinion because otherwise you wouldn’t be confident that the changes over the last three years don’t qualify as improvements.
- ^
People sometimes ask me what’s good about glowfic, as a reader.
You know that extremely high-context joke you could only make to that one friend you’ve known for years, because you shared a bunch of specific experiences which were load-bearing for the joke to make sense at all, let alone be funny[1]? And you know how that joke is much funnier than the average low-context joke?
Well, reading glowfic is like that, but for fiction. You get to know a character as imagined by an author in much more depth than you’d get with traditional fiction, because the author writes many stories using the same character “template”, where the character might be younger, older, a different species, a different gender… but still retains some recognizable, distinct “character”. You get to know how the character deals with hardship, how they react to surprises, what principles they have (if any). You get to know Relationships between characters, similarly. You get to know Societies.
Ultimately, you get to know these things better than you know many people, maybe better than you know yourself.
Then, when the author starts a new story, and tosses a character you’ve seen ten variations of into a new situation, you already have _quite a lot of context_ for modeling how the character will deal with things. This is Fun. It’s even more Fun when you know many characters by multiple authors like that, and get to watch them deal with each other. There’s also an element of parasocial attachment and empathy, here. Knowing someone[2] like that makes everything they’re going through more emotionally salient—victory or defeat, fear or jubilation, confidence or doubt.
Part of this is simply a function of word count. Most characters don’t have millions of words[3] written featuring them. I think the effect of having the variation in character instances and their circumstances is substantial, though.
Probably I should’ve said this out loud, but I had a couple of pretty explicit updates in this direction over the past couple years: the first was when I heard about character.ai (and similar), the second was when I saw all TPOTers talking about using Sonnet 3.5 as a therapist. The first is the same kind of bad idea as trying a new addictive substance and the second might be good for many people but probably carries much larger risks than most people appreciate. (And if you decide to use an LLM as a therapist/rubber duck/etc, for the love of god don’t use GPT-4o. Use Opus 3 if you have access to it. Maybe Gemini is fine? Almost certainly better than 4o. But you should consider using an empty Google Doc instead, if you don’t want to or can’t use a real person.)
I think using them as coding and research assistants is fine. I haven’t customized them to be less annoying to me personally, so their outputs often are annoying. Then I have to skim over the output to find the relevant details, and don’t absorb much of the puffery.
If we assume conservatively that a bee’s life is 10% as unpleasant as chicken life
This doesn’t seem at all conservative based on your description of how honey bees are treated, which reads like it was selecting for the worst possible things you could find plausible citations for. In fact, very little of your description makes an argument about how much we should expect such bees to be suffering in an ongoing way day-to-day. What I know of how broiler chickens are treated makes suffering ratios like 0.1% (rather than 10%) seem reasonable to me. This also neglects the quantities that people are likely to consume, which could trivially vary by 3 OoM.
If you’re a vegan I think there are a bunch of good reasons not to make exceptions for honey. If you’re trying to convince non-vegans who want to cheaply reducing their own contributions to animal suffering, I don’t think they should find this post very convincing.
I agree it’s more related than a randomly selected Nate post would be, but the comment itself did not seem particularly aimed at arguing that Nate’s advice was bad or that following it would have undesirable consequences[1]. (I think the comments it was responding to were pretty borderline here.)
I think I am comfortable arguing that it would be bad if every post that Nate made on subjects like “how to communicate with people about AI x-risk” included people leaving comments with argument-free pointers to past Nate-drama.
The most recent post by Nate seemed good to me; I think its advice was more-than-sufficiently hedged and do not think that people moving in that direction on the margin would be bad for the world. If people think otherwise they should say so, and if they want to use Nate’s interpersonal foibles as evidence that the advice is bad that’s fine, though (obviously) I don’t expect I’d find such arguments very convincing.
- ^
When keeping in mind its target audience.
- ^
I think it would be bad for every single post that Nate publishes on maybe-sorta-related subjects to turn into a platform for relitigating his past behavior[1]. This would predictably eat dozens of hours of time across a bunch of people. If you think Nate’s advice is bad, maybe because you think that people following it risk behaving more like Nate (in the negative ways that you experienced), then I think you should make an argument to that effect directly, which seems more likely to accomplish (what I think is) your goal.
- ^
Which, not having previously expressed an opinion on, I’ll say once—sounds bad to me.
- ^
(Separately, even accepting for the sake of argument that you notice most work done and have a negative reaction to it, that is not very strong counterevidence to the original claim.)
Nope, sorry, no functionality to bookmark sequences.
We have a concept of “canonical” sequences, and this should only happen in cases where a post doesn’t have a canonical sequence. I think the only way that should happen is if a post is added to a sequence made by someone other than the post author. Otherwise, posts should have a link to their canonical sequences above the post title, when on post pages with urls like
lesswrong.com/posts/{postId}/{slug}
. Do you have an example of this not happening?