LessWrong team member / moderator. I’ve been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I’ve been interested in improving my own epistemic standards and helping others to do so as well.
Raemon
who’s “we”?
Okay yeah fair enough.
Cool. Yeah I had just re-read AI doom from an LLM-plateau-ist perspective and still was a bit confused.
Is the part that the model you’re starting with is an LLM, as opposed to some different RL base architecture, particularly loadbearing?
And yeah, seems fine to state the opinion without defending it. I just wanted more clarity on which opinion you weren’t defending :P
FYI, every time I hear you say LLMs will plateau (or, “will never be able to take over the world” or whatever), I have to do a bunch of work to figure out if you mean “LLMs + shit tons of diverse RL environments.”
I think you also think LLMs + shit tons of diverse RL environments are unlikely to take over the world, but I’m not sure. And, your framing makes it feel like you’re centering the argument in a very weird place to center the argument. Who cares whether LLMs in their original form are going to scale to AGI? That’s clearly not the mechanism by which they will scale to AGI. “LLM-base-models + tons of diverse RL environments” are obviously the default path, and I think even people who are relatively bullish on LLMs who don’t think very hard about it are still implicitly assuming that.
(for me specifically, I’m also mentally tacking on “diverse RL environments that require long horizon conceptual reasoning to solve”, which most LLM-bulls are not thinking through, but, I think they’ll eventually bump into by accident)
But, I don’t get why the particular limitations of LLMs should be the particularly loadbearing part of the description of “the thing that won’t work.”
(I’m not very educated here but my impression is, right now RL is like 20% of the compute spent training current LLM-agents, and I’m imagining a world where it it’s more like 80%, 95%, or something. The main bottleneck seems to be figuring out how to construct the relevant RL environments at scale, but, that doesn’t have much to do with LLMs. Someone tell me if I’m being dumb here.)
So, this is a) asking for specific clarity on that, b) arguing that your framing about this is weird and you should change it somehow.
Goddamn it we need to fix our linkpost UI. Thanks
[Geir Isene] A desktop made for one
Wowzers, didn’t know about Patagonia. That’s pretty interesting.
I think the current workflow for those of us who who AI to iterate on ideas and collect information is to draft with AI and the literally rewrite in our own words in the final form. It is quite bizarre but this actually seems to work.
On one hand, I think this can work. But, I caution that many versions of this are a trap that don’t produce good output. I recommend spending at least some chunks of time writing and thinking without any AI assistance, because otherwise I think you critical discernment skills probably won’t sharpen enough to contribute usefully.
I have not currently read most of this (just the tl;dr and some skimming), but wanted to quickly note: I think “rationality” in the LW sense is mostly useful for two reasons
1) having relatively ambitious, openended, confusing projects
2) navigating environmental disruption (i.e. covid)
3) being a good citizen, who is able to vote and participate in The Discourse in a way that shapes your country/world for the better.
I don’t think it’s especially great for “being a happy, well adjusted guy” (compared to other schools of thought with different vibes). I think it helps, but, not so overwhelmingly that I recommend it to a person who doesn’t naturally vibe with it.
Slightly varied example: is laying ambushes for enemy humans dishonest, during war?
It’s certainly deceptive. But I feel hesitant to lump it and “normal dishonesty” together, because I think there is some qualitative difference between degrading the commons and Winning At Conflict.
It’s dishonest (and quite bad) to wave a white flag of surrender, and then lure people into a trap (compared to leaking bad information to a spy to lure enemies into a trap). Because Surrender is a mode of communication that enemies both agree is good to have open.
Yeah but I kinda do put moderate odds on “The White House continues to actively try to destroy Anthropic and eventually either succeeds or at least it’s pretty visibly in-question.”
Fwiw the first one wasn’t rejected for “being raw/sloppy”, we just have particularly high standards for AI content because we get so much of it and we want to keep signal/noise quality high. And both the writing and idea quality need to be actively good.
I think it’s an achievable goal to learn to come up with interesting/meaningful contributions and articulate them well. You can ask AIs for meta-level advice on how to write without having them do your writing for you.
Curated. This seemed like a (relatively) straightforward thing to check that seems straightforwardly useful. I’m interested in seeing METR run a version of this against their existing task suite.
There’s always a bit of a double-edged sword of making a good capabilities eval because, even if people don’t have direct access to the eval to iterate against, it implicitly becomes a target. (i.e. it’s hard to tell, but I get some sense of companies striving to hit the METR trendline and beat the other companies on it).
My understanding is the METR task suite is basically saturated. You could probably construct a good version of this that is less saturated less quickly.
I’m wondering if there’s any way to keep this artificially low while making CoT time horizons high, and if there’s some sort of index you could publish that’s, like, ratio of CoT-time-horizon to non-CoT or something. I think for it to be that real/helpful you’d also want some kind of ”...and the CoT is faithful” metric that I don’t currently know of a robust solution for. (This is not a very well thought out idea, just musing)
Good luck!
Huh, curious for you model of why you predicted the other-which-way? This seemed like a classic “does better on LW” kinda post (for good or for ill). Although I wouldn’t have predicted so much disparity.
Past me definitely would have been frustrated about this right along with you, and somehow I have become a stodgy grownup villain from Peter Pan and I’m not sure how/why.
(I think it’s, like, to my detriment, I have less fun now. But, doesn’t seem like I can fix it by just trying to do more fun silly things, they just don’t resonate like they used to. I recall going to one of your laser-tag things and thinking ‘man, I really should be enjoying this but it feels meh for some reason’)
It’s plausible this is more like a muscle I need to rebuild.
I’m a little worried Anthropic has missed the window for this option, since now it might look like the Whitehouse was out to destroy them and they were just throwing in the towel.
(since they clearly don’t care about overrefusals).
(this particular claim here seems false/overstated. Like, clearly, overall, they are willing to accept overrefusals. That doesn’t meant they “don’t care about them”. Maybe they don’t, but, much more likely it just seems like a reasonable tradeoff to them.)
This is presumably not relevant anymore, but.… can you not just turn off memory?
Okay, the last time I was having this argument with a (different) someone, it was primarily about timelines, and they specifically disbelieved LLM-descendants would be capable of inventing the next paradigm. Sounds like you don’t have a strong take on that?
I totally agree there will be a new paradigm by the time we get to overwhelming superintelligence. I don’t think there is necessarily a new paradigm by the time we get to “human-reasoning-complete” AGI. Is that something you have a strong belief on?