LessWrong Team
I have signed no contracts or agreements whose existence I cannot mention.
LessWrong Team
I have signed no contracts or agreements whose existence I cannot mention.
So you don’t think they’re enough capability to replace 50-100M jobs[1] in the US over the next 5-10 years? (I think this could happen from just the current generation with better scaffolding/products/diffusion, and even more so if the models continue to improve).
This is measuriing in term of people’s occupations, but could instead weight it by fraction of the economy. I’m not sure how that’ll net out.
Replacing all humans works who do menial repetitive tasks like taking food order or scheduling appointments, manning tollbooths, provide assistance within stores, and more (especially with robotics – then you can do shelf stocking and food service at restaurants). Millions of jobs.
Replacing teachers and tutors, massive uphaul in education. Even if what was being taught was a dumb “if else” logic on material, having a system parse out your selections from natural language is the different in adoption vs not.
Replacing medical advice and guidance much more so Google Search already did.
I feared becoming the very thing I swore to defend against. I was saved by null results.
Everyday we have people show up on LessWrong wanting to post their major insights. Very often their insights are in domains (especially AI) where they were not previously experts but think they’ve made major breakthroughs with the help of AI.
Well, I wanted to do research into bipolar disorder pathophysiology and genetics. (Similar to Einstein’s Arrogance, a sufficiently powerful could figure out what’s going on in that condition just from evidence already collected: it’s just a question of when we hit that point.) I threw 4.6 and 5.4 at the problem to see what they’d find, and at my peak of optimism, was moderately hopeful for meaningful insights. I ended up wondering, will I become someone who’s contacting actual domain researchers with what they believe to be novel and important breakthroughs made with the help of AI?
Alas, after 5-10 hours of computational crunchin’, all my results were negative.[1] Spared the undignified fate![2]
It’s likely that if I’d let 4.6/5.4 run follow their instincts, they’d have spun the results something they’d claim is worthy of a paper. My process involved a lot of pushback about the validity and interestingness of results.
I could still email researchers my negative results though!
Physical protein-interaction network topology: Bipolar disorder risk genes do not sit in unusually bridge-like or bottleneck-like positions in the physical protein-interaction network once basic network biases are controlled for.
Physical protein-interaction network convergence: Signal-spreading analyses on the physical protein-interaction network do not show that bipolar disorder risk converges on a small shared set of downstream proteins.
Directed signaling-network topology: Even in a directed, signed signaling network, bipolar disorder risk genes still do not show the expected bridge or bottleneck structure, and the main topological signal remains weak or negative.
Bipolar-only causal core: The directed signaling relationships among bipolar disorder genes are too sparse to reveal a strong disease-specific core regulatory circuit from the current data.
I’ve had much the same toughts previously. Work is gonna be weird.
“Beginner friendly” isn’t the thing. Think the difference between the UI that’s inserting punch cards and deciphering punch cards as they come out vs a GUI. There might have been computations worth the hassle previously, but for many the friction wasn’t worth it. The latter gets a lot more use cases and adoption. I think this is the same kind of jump or more.
Exact timelines are hard to say, but if takover/loss of control/similar doesn’t happen first, we will see a lot of automation.
(I’ve been trying a new drug and my brain isn’t at 100% capacity, hence slow or limited replies right now.)
I think that’s a good question. I think the Internet doesn’t feel to me like it reorganized enough of how civilization works to quite be a revolution. In contrast to things like agriculture or steam engines where the vocation and living situation of so much of the population changed. I think LLMs, via automation, can cause an economic reorganization on the scale of agriculture/industrialization, that the Internet itself didn’t do. I’m fuzzier on where “electricity” fits.
I don’t think my use cases are especially niche. My main uses are:
search for and process information
process natural language instructions into structured outputs/actions
write software
As Habryka says, you can start to automate a lot with that. Like it’s clear software was quite transformative already, but I think limited because software didn’t take natural language input. Change that...and heck, you can automate so much.
I think you’re reaching for overly narrow use cases. LLMs just do a lot of basic stuff well. My quick take just that it’s weird they still screw up in some ways that a human wouldn’t, and the spikiness is interesting.
Oh, in my back and forth with it, it also said more blatantly:
That’s a solid result. If you can’t turn the hub by hand at 4 clicks, with a tire mounted you’d have zero chance of overcoming it. The hub gives you way less leverage than a full wheel and tire would.
Sentence 2 and 3 are directly in contradiction.
Well, steam engines have even less coherent world models.
I believe in their power from seeing just how much value they give me and how transformative they are for me. I’m a super early adopter, but if I extrapolate the rest of the world making as much use of the tech as I am, and doing all the things I could see doing, it’s still so much.
Your Claude transcript covers the relevant response:
Meanwhile, a person grabbing a wheel at the studs (which are maybe 2–3 inches from center on a typical bolt pattern) is actually at a disadvantage compared to grabbing the rim. At the studs, your lever arm is very short. If you’re gripping at roughly 2.5 inches from center and pulling hard with maybe 50–80 lbs of force, that’s only about 10–17 ft-lbs of torque. That’s dramatically less than the hill torque.
So the writer may actually be correct for the specific scenario they described — trying to turn the wheel at the studs rather than at the rim. That’s a crucial detail.
I do update that the amount of torque the car is experiencing under gravity is more like 150-200ft-lb and therefore closer to what a human can produce with a good lever arm. Though my Claude’s assertion was “a lot less than someone deliberately trying to wrench a wheel around”, which is not true even with more leverage – they are perhaps comparable then.
Regarding case 2, Claude knew we were just running on my Macbook where the marginal cost of running is negligible, and from my questions, it was cleared I cared about time.
The “use uncommon tools” example is familiar. Last year, I was really amazed by what Claude/Cursor could do in primary coding tasks, then appalled by how poorly that transferred to asking it to work with Jupyter/iPython notebooks via MCP. We’d been working on a notebook for 30 min, then it would screw up the tool call, conclude the notebook had been deleted, and attempt to create it fresh. This happened repeatedly. It’s just not the kind of mistake a human would make, which gets back to, how exactly do these minds work and form models of the world?
LLMs are current level are already phenomenal. Enough to usher in a new industrial revolution even without further progress. Also still remarkable how untethered or nonsensical their reasoning can be, even with Opus 4.6 or similar.
Ex1. I was working on parking brake issue with my car, comparing the clamping force I was getting the wheel with the observation that it had wanted to roll down the hill. I told it I was getting enough clamping to be unable to turn the wheel by hand.
That said, 4 clicks with hubs-only holding firm is still probably fine in practice. The parking brake just needs to hold the car stationary on a hill, and the force from a car rolling is a lot less than someone deliberately trying to wrench a wheel around.
No, a 2,400lb car rolling down the hill exerts a lot more force than me trying to turn it at the wheel studs, let me tell ya.
Ex2. I was setting of a long-running gene analysis job. A while after it had started, I asked if actually we could parallelize it. Claudes says yes, absolutely, there’s a parameter already for that. I ask it to estimate whether it’d make sense to stop and restart the job. Yes, it says, would take half the time – but we’ve already started it so might as well let it finish.
I feel like I get some many of these bonkers inferences, that there’s something interesting here to reconcile with the brilliance they have in other moments.
“140 of us”—I wouldn’t lock in this number. People could fail to show or way more could show up (I registered but might come with +2). I’d say 140 are registered ;) Maybe there will be 300!
Ah, thanks to your report we found the issue. The issue arose when the site tried to match local formatting for prices (and dates), should be fixed now. Sorry about that!
Oh no, that’s no good at all. I wouldn’t think we are. Are you or friend able to look at the console logs for errors? Or otherwise free to get on a call with me to figure it out?
I believe “save the date” notices are common for when you know the date of things but don’t have further details, let’s people block it out on their calendars. In fact tickets are now available though ;)
Curated! I don’t feel fully competent to evaluate this post, but gain confidence in its curation-worthiness from Habryka having endorsed it. Yet, I’ll describe the various many things I like about it, in no strong order. It is earnestly scholarship, engaging both on an interesting and important topic, and situating its reasoning amongst the work of others. Ihor didn’t just read a little or muse on the topic, but has studied the field. The topic is fundamental, and it’s challenging the fundamentals. I value the boldness of that. Most posts that are making intellectual contributions are pushing at the edges, the frontiers, and it’s cool (assuming quality is high, and I think that’s clearly the case here even if it would turn out to be wrong) to have challenges made at core doctrine – especially as the case does feel compelling here. The writing was pleasant to read notwithstanding a non-zero LLM score (we’re wrestling with LLM-assisted writing on LW, but felt quite good to read). The post doesn’t fully explain all it concepts for an unfamiliar audience, but does do some of this pedagogy in a nice way, e.g. explaining the different types of utility in a technical sense. I model that if we had more discourse of this kind, back and forth, we’d make some pretty neat intellectual progress. I could imagine someone coming along and making some really strong counters, but I’d just love to see that back and forth. I wasn’t familiar with Ihor before, but I hope he keeps writing. Kudos.
I’m afraid we don’t make exceptions. People often ask about ADHD, second language use, etc. (1) the reason for use doesn’t change the output not being adequate quality, (2) we’d then have to spend much time investigating whether uses were legitimate or not, and already spend too much time going through things.
Hi, I’m afraid that per our Policy for LLM Writing on LessWrong, those would not be accepted on LessWrong as posts by a new user. The policy isn’t just about using AI, but also topics of AI consciousness, entropy, etc. and that since we get too much on that topic, and sorting through to find the good ones is too hard. But I appreciate you asking about how to post!
Curated! I think the immense capability and usefulness of current LLMs, and specifically their increasing ability to take over tasks from humans, distracts from the ways in which they are strange minds different from human minds. I like this post for digging into that. It’s known that of course LLMs lack memory and now we give them scratchpads and other files they can reference as a substitute, and yet it’s not the same (as I keep experiencing in my own use). I appreciate this post for digging in and making claims like no amount of context window or scratchpads, etc., substitutes for actual continual learning. Without asserting this is correct, it’s a discussion I like. One reason for that is I think significant and scary things might happen if/when we move beyond current architectures – which are already very capable – to those without these limitations. Good predictions there will come from understanding what is going on with the current models. Kudos. I like this line of work.