LessWrong dev & admin as of July 5th, 2022.
RobertM
Concerning! Intercom shows up for me on Firefox (macos), will see if there’s anything in the logs. How does the LLM integration being broken present itself?
You have a typo where the second instance of
let belief = null;
should presumably belet belief = undefined;
.(Also, I think “It’d print an error saying that
foobar
is not defined” is false? Confirmed by going to the browser console and running that two-liner; it just printsundefined
to the console.)Interesting mapping, otherwise!
They very briefly discuss automated AI alignment research as a proposal for mitigating AI risk, but their arguments against that plan do not respond to the most thoughtful versions of these plans. (In their defense, the most thoughtful versions of these plans basically haven’t been published, though Ryan Greenblatt is going to publish a detailed version of this plan soon. And I think that there are several people who have pretty thoughtful versions of these plans, haven’t written them up (at least publicly), but do discuss them in person.)
Am a bit confused by this section—did you think that part 3 was awful because it didn’t respond to (as yet unpublished) plans, or for some other reason?
there is very much demand for this book in the sense that there’s a lot of people who are worried about AI for agent foundations shaped reasons and want an introduction they can give to their friends and family who don’t care that much.
This is true, but many of the surprising prepublication reviews are from people who I don’t think were already up-to-date on these AI x-risk arguments (or at least hadn’t given any prior public indication of their awareness, unlike Matt Y).
This is a valid line of critique but seems moderately undercut by its prepublication endorsements, which suggest that the arguments landed pretty ok. Maybe they will land less well on the rest of the book’s target audience?
(re: Said & MIRI housecleaning: Lightcone and MIRI are separate organizations and MIRI does not moderate LessWrong. You might try to theorize that Habryka, the person who made the call to ban Said back in July, was attempting to do some 4d-chess PR optimization on MIRI’s behalf months ahead of time, but no, he was really nearly banned multiple times over the years and he was finally banned this time because Habryka changed his mind after the most recent dust-up. Said practically never commented on AI-related subjects, so it’s not even clear what the “upside” would’ve been. From my perspective this type of thinking resembles the constant noise on e.g. HackerNews about how [tech company x] is obviously doing [horrible thing y] behind-the-scenes, which often aren’t even in the company’s interests, and generally rely on assumptions that turn out to be false.)
I don’t believe that you believe this accusation. Maybe there is something deeper you are trying to say, but given that I also don’t believe you’ve finished reading the book in the 3(?) hours it’s been released, I’m not sure what it could be. (To say it explicitly, Said’s banning had nothing to do with the book.)
Yeah, sadly this is an existing bug.
LessWrong is migrating hosting providers (report bugs!)
Thanks, fixed!
Nope, sorry, no functionality to bookmark sequences.
If I bookmark the sequence’s first post, clicking on that post from my bookmarks doesn’t bring me to the view of the post within the sequence; the post is standalone without any mention of the sequence it’s in, and oftentimes the post was written without reference to such a sequence which leads me to forget about the sequence in the first place.
We have a concept of “canonical” sequences, and this should only happen in cases where a post doesn’t have a canonical sequence. I think the only way that should happen is if a post is added to a sequence made by someone other than the post author. Otherwise, posts should have a link to their canonical sequences above the post title, when on post pages with urls like
lesswrong.com/posts/{postId}/{slug}
. Do you have an example of this not happening?
Mod note (for other readers): I think this is a good example of acceptable use of LLMs for translation purposes. The comment reads to me[1] like it was written by a human and then translated fairly literally, without performing edits that would make it sound unfortunately LLM-like (perhaps with the exception of the em-dashes).
“Written entirely by you, a human” and “translated literally, without any additional editing performed by the LLM” are the two desiderata, which, if fulfilled, I will usually consider sufficient to screen off the fact that the words technically came out of an LLM[2]. (If you do this, I strongly recommend using a reasoning model, which is much less likely to end up rewriting your comment in its own style. Also, I appreciate the disclaimer. I don’t know if I’d want it present in every single comment; the first time seems good and maybe having one in one’s profile after that is sufficient? Needs some more thought.) This might sometimes prove insufficient, but I don’t expect people honestly trying and failing at achieving good outcomes here to substantially increase our moderation burden.
He did not say that they made such claims on LessWrong, where he would be able to publicly cite them. (I have seen/heard those claims in other contexts.)
Curated! I found the evopsych theory interesting but (as you say) speculative; I think the primary value of this post comes from presenting a distinct frame by which to analyze the world, one which I and probably many readers either didn’t have distinctly carved out or part of their active toolkit. I’m not sure if this particular frame will prove useful enough to make it into my active rotation, but it has the shape of something that could, in theory.
I’ve had many similar experiences. Not confident, but I suspect a big part of this skill, at least for me, is something like “bucketing”—it’s easy to pick out the important line from a screen-full of console logs if I’m familiar with the 20[1] different types of console logs I expect to see in a given context and know that I can safely ignore almost all of them as either being console spam or irrelevant to the current issue. If you don’t have that basically-instant recognition, which must necessarily be faster than “reading speed”, the log output might as well be a black hole.
Becoming familiar with those 20 different types of console logs is some combination of general domain experience, project-specific experience, and native learning speed (for this kind of pattern matching).
Similar effect when reading code, and I suspect why some people care what seems like disproportionately much about coding standards/style/convention—if your codebase doesn’t follow a consistent style/set of conventions, you can end up paying a pretty large penalty by absence of that speedup.
- ^
Made up number
- ^
Not having talked to any such people myself, I think I tentatively disbelieve that those are their true objections (despite their claims). My best guess as to what actual objection would be most likely to generate that external claim would be something like… “this is an extremely weird thing to be worried about, and very far outside of (my) Overton window, so I’m worried that your motivations for doing [x] are not true concern about model welfare but something bad that you don’t want to say out loud”.
These days, if somebody’s house has candles burning in it, I’m turning around and leaving. People dumb enough to do that just aren’t worth putting up with their air pollution.
I found this post broadly entertaining (and occasionally enlightening) but unless you mean “has candles burning it regularly, outside of annual rituals like Petrov Day”, this is a pretty weird take. Do you also refuse to enter houses with inhabitants that use their kitchens, unless you confidently know that they keep the windows open during & after cooking? Burning a candle indoor once or twice a year is just not that many micromorts, and deciding that people who do it are obviously so dumb as to be trivially screened off is a straightforwardly wrong heuristic.
This is, broadly speaking, the problem of corrigibility, and how to formalize it is currently an open research problem. (There’s the separate question whether it’s possible to make systems robustly corrigible in practice without having a good formalized notion of what that even means; this seems tricky.)
Thanks for the heads-up, I’ve fixed it in the post.
Curated! I think that this post is one of the best attempts I’ve seen at concisely summarizing… the problem, as it were, in a way that highlights the important parts, while remaining accessible to an educated lay-audience. The (modern) examples scattered throughout were effective, in particular the use of Golden Gate Claude as an example of the difficulty of making AIs believe false things was quite good.
I agree with Ryan that the claim re: speed of AI reaching superhuman capabilities is somewhat overstated. Unfortunately, this doesn’t seem load-bearing for the argument; I don’t feel that much more hopeful if we have 2-5 years to use/study/work with AI systems that are only slightly-superhuman at R&D (or some similar target). You could write an entire book about why this wouldn’t be enough. (The sequences do cover a lot of the reasons.)
We shipped “draft comments” earlier today. Next to the “Submit” button, you should see a drop-down menu (with only one item), which lets you save a comment as a draft. Draft comments will be visible underneath the comment input on the posts they’re responding to, and all of them will be visible on your profile page, underneath your post draft list. Big thanks to the EA forum for building the feature!
Please let us know if you encounter any bugs/mishaps with them.