Yes mostly agree. Unless the providers themselves log all responses and expose some API to check for LLM generation, we’re probably out of luck here, and incentives are strong to defect.
One thing I was thinking about (similar to i.e—speedrunners) is just making a self-recording or screenrecording of actually writing out the content / post? This probably can be verified by an AI or neutral third party. Something like a “proof of work” for writing your own content.
rahulxyz
Yes, I agree with that. I’m not claiming that knowing about it stops you from wanting ice cream.
I’m claiming that if the concept was hardwired into our brains, evolution would have had an easy time optimizing us directly to want “inclusive genetic fitness” rather than wanting ice cream.i.e—we wouldn’t want ice cream at all but reason from first principles what we should eat based on fitness.
Just finished reading “If Anyone Builds It, Everyone Dies”. I had a question that seems like an obvious one, but one I didn’t see addressed in the book, maybe someone can help:
The main argument in the book is the analogy to humans. Evolution “wanted” us to maximize genetic fitness, but it didn’t get what it trained for. Instead, it created humans who love ice cream and condoms even though they reduce our genetic fitness.
With AGI, we’re on track to do something similar—we won’t get an AI aligned to human interests even though we do RLHF or any other such simple training or shaping to an AI, it’ll end up wanting something weird and inhuman rather than maximizing human values.
But in my mind, this seems to miss a fairly important point: The fact that human brains don’t come pre-wired with much knowledge. We have to learn it from scratch. We don’t come out of the womb with concept of “inclusive genetic fitness”. It took us culture and ~200,000 years to figure that out, and we still only learn it after about 15-20 years of existing. So there’s no way that evolution could have made us point our utility function to “inclusive genetic fitness” because that concept doesn’t exist in our brains.
Modern AIs don’t seem like that. They come with the sum of human knowledge baked in during pre-training. As they get smarter, the concept of “human values” or “friendly AI” is definitely something in it’s existing mind. So it should be much easier for us to do alignement and test whether we can point it to that specific concept vs. what what evolution had.
Seems mostly true. There’s also a group of people flailing around trying to fit it in their workflows because all the top tech companies are saying it’s the next big thing.
I notice lots of LARPing too with adding the word “AI” to everything hoping that will unlock some new avenues.
> The big struggle is to even start using AI coding assistant tools. Lots of teams just don’t use them at all, or use them in very limited ways. People leading these teams know they are going to lose if they don’t change but are struggling to get their orgs to let them.
It seems to me 25-50% of developers are using some form of AI-assisted coding. Did you notice that the beaureacracy of their companies was not allowing their developers to use coding assistants?
This post here might change your perspective on the purpose of advertising
https://meltingasphalt.com/ads-dont-work-that-way/
I think there are 3 ways to think about AI and lot of confusion seems to happen because the different paradigms are talking past each other. The 3 paradigms I see on the internet & when talking to people:
Paradigm A) AI is a new technology like the internet / smartphone / electricity—this seems to be mostly held by VC’s / enterpreneurs / devs that think this will unlock a whole new set of apps like AI:new apps like smartphone:Uber or internet:Amazon
Paradigm B) AI is a step change in how humanity will work. Similarly to the agricultural revolution that led to the change in how large society could get and GDP growth, and the industrial revolution was a step-change in GDP growth from ~0% to 2-4% a year, and made things possible such as electricity and the internet and smartphones.
Paradigm C) AI is like the rise of humanity on this earth (the first general intelligences). The world changed completely with the rise of GI, and ASI/AGI will be a similar paradigm. We’ve been locked at humanity’s level of intelligence for the past ~200k years, and getting ASI will be like unlocking multiple new revolutions all at the same time.
Most of the LW crowd is probably (C) or between (B) and (C)
When talking to the general population, I’ve found it to be very helpful to probe about where they are before talking about things like AI safety / how the world will change.
I made the same comment on the original post. I really think this is a blindspot for US-based AI analysis.
China has smart engineers, as much as DM, OpenAI etc. Even the talent in a lot of these labs is from China originally. With a) immigration going the way it is, b) the ability to coordinate massive resources as a state, subsidies, c) potentially invading Taiwan, d) how close DeepSeek / Qwen models seem to be and the rate of catchup, e) how uncertain we are about hardware overhand (again, see deepseek training costs) etc, I think we should put at least a 50% chance of China being ahead in the next year.
My initial reaction—A lot of AI related predictions are based on “follow the curve” predictions, and this is mostly doing that. With a lack of more deeper underlying theory on the nature of intelligence, I guess that’s all we get.
If you look at the trend of how far behind China is to the US, that has gone from 5 years behind 2 years ago, to maybe 3 months behind now. If you follow that curve, it seems to me that China will be ahead of the US by 2026 (even with the chip controls, and export regulations etc—my take is you’re not giving them enough agency). If you want to follow the curve, IMO you can /s/USA/China after 2026 (i.e—China is ahead of the US), and I can imagine it being a better trend-following prediction. It’s much less convincing to tell a story about Chinese AI labs being ahead given who we are, but I’d put at least a 50⁄50 chance on China being ahead vs. USA being ahead.Other than that, thanks for putting something concrete out there. Even though it’s less likely the more specific it is, I feel this will get a lot more talked about, and hopefully some people with power (i.e—governments) start paying some attention to disempowerment scenarios.
Coming from a somewhat similar space myself, I’ve also had the same thoughts. My current thinking is there is no straightforward answer on how to convert dollars to impact.
I think the EA community did a really good job at that back in the day with a spreadsheet-based relatively easier way to measure impact per dollars or per life saved in the near-term future.
With AI safety / existential-risk—the space seems a lot more confused, and everyone has different models of the world, what will work, and what good ideas are. There are some people working directly on this space directly—like QURI, but IMO it’s not anything close to a consensus for “where can I put my marginal dollar for AI safety”. The really obvious / good ideas and people working on them don’t seem funding-constrained.
There’s in general (from my observation):
- Direct interpretability work on LLM
- Governance work (trying to convince regulators / goverments to put a stop to this)
- Explaining AI risk to the general public
- Direct alignment work on current-gen LLM (super-alignment type things in major labs)
- More theoretical work (like MIRI), but I don’t know if anyone is doing this now.
- More weirder things like whole brain emulation, or gene-editing / making superbabies.
My guess is your best bet spending your money / time on the last one would be on the margin helpful, or just talk to people who are struggling for funding and otherwise seem like they have decent ideas that you can fund.There’s probably something other than those in the above list will actually work for reducing existential risk from AI, but no one knows what it it is.
I’m very dubious that we’ll solve alignment in time, and it seems like my marginal dollar would do better in non-obvious causes for AI safety. So I’m very open to funding something like this in the hope we get a AI winter / regulatory pause etc.
I don’t know if you or anyone else has thought about this, but what is your take on whether this or WBE is the more likely chance to getting done successfully? WBE seems a lot more funding intensive, but also possible to measure progress easier and potentially less regulatory burdens?
If RL becomes the next thing in improving LLM capabilities, one thing that I would bet on becoming big is computer-use in 2025. Seems hard to get more intelligence with just RL (who verifies the outputs?), but with something like computer use, it’s easy to verify if a task has been done (has the email been sent, ticket been booked etc..) that it’s starting to look to more to me like it can do self-learning.
One thing that’s left AI still fully not integrated into the rest of the economy is simply that the current interfaces were built for humans and moving all those takes engineering time / effort etc.
I’m fairly sure the economic disruption would be pretty quick once this happens. For example, I can just run 10 LLM agents to act as customer service agents using my *existing tools* - just open emails, whatsapp, and message customers, check internal dashboards etc., then it’s game over. What’s stopping people right now is that there’s not enough people to build that pipeline fast enough to utilize even the current capabilities.
Not sure if it’s correct, I didn’t actually short NVDA so all I can do is collect my bayes points. I did expect most investors to think at a first-level thinking as that was my immediate reaction on reading about DeepSeek’s training cost. If models can be duplicated a few weeks / months after they’re out for cheaper, then you don’t have a moat (this is for most regular technologies. I’m not saying AI isn’t different, just that most investors think of this like any other tech innovation)
Yeah, in one sense that makes sense. But also, NVDA is down ~16% today.
Deepseek R1 could mean reduced VC investments into large LLM training runs. They claim to have done it with ~6M. If there’s a big risk of someone else coming out with a comparable model at 1/10th the cost, then there’s no moat in OpenAI in the long run. I don’t know how much the VC / investors buy the ASI as an end goal and even what the pitch would be. They’re probably looking at more prosaic things like moats and growth rates, and this may mean reduced appetite for further investment instead of more.
There doesn’t seem to be many surveys of the general population on doom type scenarios. Most of them seem to be based on bias/weapons type scenario. You could look at something like metaculus but I don’t think that’s representative of the general population.
Here’s a breakdown of AI researchers: https://aiimpacts.org/2022-expert-survey-on-progress-in-ai/ (median /mean of extinction is 5%/14%)US Public: https://governanceai.github.io/US-Public-Opinion-Report-Jan-2019/general-attitudes-toward-ai.html (12% of americans think it will be “extremely bad i.e extinction)
Based on the very weak data above, it doesn’t seem like a huge divergence of opinion specifially for x-risk
Which is funny because there is at least one situation where robin reasons from first principles instead of taking the outside view (cryonics comes to mind). I’m not sure why he really doesn’t want to go through the arguments from first principles for AGI.
GPT-2 does not—probably, very probably, but of course nobody on Earth knows what’s actually going on in there—does not in itself do something that amounts to checking possible pathways through time/events/causality/environment to end up in a preferred destination class despite variation in where it starts out.
A blender may be very good at blending apples, that doesn’t mean it has a goal of blending apples.
A blender that spit out oranges as unsatisfactory, pushed itself off the kitchen counter, stuck wires into electrical sockets in order to burn open your produce door, grabbed some apples, and blended those apples, on more than one occasion in different houses or with different starting conditions, would much more get me to say, “Well, that thing probably had some consequentialism-nature in it, about something that cashed out to blending apples” because it ended up at highly similar destinations from different starting points in a way that is improbable if nothing is navigating Time.
It doesn’t seem crazy to me that a GPT type architecture with the “Stack More Layers” could eventually model the world well enough to simulate consequentialist plans—i.e given a prompt like:
“If you are a blender with legs in environment X, what would you do to blend apples?” and provide a continuation with a detailed plan like the above (and GPT4/5 etc with more compute giving slightly better plans—maybe eventually at a superhuman level)
It also seems like it could do this kind of consequentialist thinking without itself having any “goals” to pursue. I’m expecting the response to be one of the following, but I’m not sure which:
“Well, if it’s already make consequentialist plans, surely it has some goals like maximizing the amount of text it generates etc., and will try to do whatever it can to ensure that (similar to the “consequentialist alphago” example in the conversation) instead of just letting itself be turned off.
A LLM / GPT will never be able to reliably output such plans with the current architecture or type of training data.
Small world, I guess :) I knew I heard this type of argument before, but I couldn’t remember the name of it.
So it seems like the grabby aliens model contradicts the doomsday argument unless one of these is true:
We live in a “grabby” universe, but one with few or no sentient beings long-term?
The reference classes for the 2 arguments are somehow different (like discussed above)
Thanks for the great writeup (and the video). I think I finally understand the gist of the argument now.
The argument seems to raise another interesting question about the grabby aliens part.
He’s using the hypothesis of grabby aliens to explain away the model’s low probability of us appearing early (and I presume we’re one of these grabby aliens). But this leads to a similar problem: Robin Hanson (or anyone reading this) has a very low probability of appearing this early amongst all the humans to ever exist.
This low probability would also require a similar hypothesis to explain away. The only way to explain that is some hypothesis where he’s not actually that early amongst the total humans to ever exist which means we turn out not to be “grabby”?
This seems like one the problems with anthropic reasoning arguments and I’m unsure how seriously to take them.
This whole SaaSpocalyse scenario outlined here https://www.lesswrong.com/posts/bKrpLhqcoN6WycrFp/citrini-s-scenario-is-a-great-but-deeply-flawed-thought has made me think that one obvious loser in all this is Amazon / AWS
It’s been said that the real money maker for Amazon is AWS and not their retail business.
In fact, the lock-in is so strong that there’s a cottage industry of people with AWS certifications and firms whose sole job is “AWS Cost Optimization”.
But what seems to be not yet priced in is the ease of which anyone with a datacenter can now build an AWS-compatible API in the future.
In the end of the day, amazon is bunch of servers in a datacenter. All the so called “services” are just some syntactic sugar for people that don’t want to manage their own servers—and that’s where their moat lies.
It’s hard for a startup who’s built on top of these services to migrate out to another bare-bones rack in another datacenter , but if the datacenter can give them a compatible API, then moving becomes a click of a button (for the most part).
But if you look at how openai competitors worked, almost everyone has a “openai-compatible” API—all I do is change the URL to new model provider and I’m good to go.
This seems like it would truly kill the AWS lock-in, and it doesn’t seem to be priced in to their stock price at all. Maybe people don’t think of AWS as a SaaS company? I would never myself short a stock, but it does seem like the second-order effect to all this is obviously not priced in at all.