This account exists only for archival purposes.
[deactivated]
What?
Now he’s free to run for governor of California in 2026:
I was thinking about it because I think the state is in a very bad place, particularly when it comes to the cost of living and specifically the cost of housing. And if that doesn’t get fixed, I think the state is going to devolve into a very unpleasant place. Like one thing that I have really come to believe is that you cannot have social justice without economic justice, and economic justice in California feels unattainable. And I think it would take someone with no loyalties to sort of very powerful interest groups. I would not be indebted to other groups, and so maybe I could try a couple of variable things, just on this issue.
...
I don’t think I’d have enough experience to do it, because maybe I could do like a few things that would be really good, but I wouldn’t know how to deal with the thousands of things that also just needed to happen.
And more importantly than that to me personally, I wanted to spend my time trying to make sure we get artificial intelligence built in a really good way, which I think is like, to me personally, the most important problem in the world and not something I was willing to set aside to run for office.
Prediction market: https://manifold.markets/firstuserhere/will-sam-altman-run-for-the-governo
I have a question about “AGI Ruin: A List of Lethalities”.
These two sentences from Section B.2 stuck out to me as the most important in the post:
...outer optimization even on a very exact, very simple loss function doesn’t produce inner optimization in that direction.
...on the current optimization paradigm there is no general idea of how to get particular inner properties into a system, or verify that they’re there, rather than just observable outer ones you can run a loss function over.
My question is: supposing this is all true, what is the probability of failure of inner alignment? Is it 0.01%, 99.99%, 50%...? And how do we know how likely failure is?
It seems like there is a gulf between “it’s not guaranteed to work” and “it’s almost certain to fail”.
Yup. See: “love bombing”.
Thanks for posting this. I am still a bit fuzzy on what exactly the Superalignment plan is, or if there even is a firm plan at this stage. Hope we can learn more soon.
William Nordhaus estimates that firms recover maybe 2% of the value they create by developing new technologies.
Isn’t this the wrong metric? 2% of the value of a new technology might be a lot of money, far in excess of the R&D cost required to create it.
I think you are way overestimating your ability to tell who is trans and way underestimating the ability of trans people to pass as cis. Sometimes, you just can’t tell.
I’m confused by the question. It seems incredibly broad and general. Are you asking about neural network architectures like convolutional neural networks or transformers?
Very lucidly written. Thanks.
Broadly, it seems that in a world where LeCun’s architecture becomes dominant, useful AI safety work looks more analogous to the kind of work that goes on now to make self-driving cars safe. It’s not difficult to understand the individual components of a self-driving car or to debug them in isolation, but emergent interactions between the components and a diverse range of environments require massive and ongoing investments in testing and redundancy.
I think this is the crux of the matter. This is why LeCun tweeted:
One cannot just “solve the AI alignment problem.” Let alone do it in 4 years. One doesn’t just “solve” the safety problem for turbojets, cars, rockets, or human societies, either. Engineering-for-reliability is always a process of continuous & iterative refinement.
LeCun, like Sam Altman, believes in an empirical, iterative approach to AI safety. This is in sharp contrast to the highly theoretical, figure-it-all-out-far-in-advance approach of MIRI.
I don’t get why some folks are so dismissive of the empirical, iterative approach. Is it because they believe in a fast takeoff?
Thanks. Your post makes point #3 from my post, and it makes two additional points I’ll call #5 and #6:
-
Onboard compute for Teslas, which is a constraint on model size, is tightly limited, whereas LLMs that live in the cloud don’t have to worry nearly as much about the physical space they take up, the cost of the hardware, or their power consumption.
-
Self-driving cars don’t get to learn through trial-and-error and become gradually more reliable, whereas LLMs do.
Re: (5), I wonder why the economics of, say, making a ChatGPT Plus subscription profitable wouldn’t constrain inference compute for GPT-4 just as much as for a Tesla.
Re: (6), Tesla customers acting as safety drivers for the “Full Self-Driving Capability” software seems like it contradicts this point.
Curious to hear your thoughts.
-
What do you make of the prospect of neurotech, e.g. Neuralink, Kernel, Openwater, Meta/CTRL-Labs, facilitating some kind of merge of biological human intelligence and artificial intelligence? If AI alignment is solved and AGI is safely deployed, then “friendly” or well-aligned AGI could radically accelerate neurotech. This sounds like it might obviate the sort of obsolescence of human intelligence you seem to be worried about, allowing humans alive in a post-AGI world to become transhuman or post-human cyborg entities that can possibly “compete” with AGI in domains like writing, explanation, friendship, etc.
My contention is that this model of the process is basically just wrong for the examples of minority group labels that have actually caught on.
Most important sentence:
A reward function reshapes an agent’s cognition to be more like the sort of cognition that got rewarded in the training process.
Wow. That is a tremendous insight. Thank you.
On another topic: you quote Yudkowsky in 2008 expressing skepticism of deep learning. I remember him in 2016 or 2017 still expressing skepticism, though much more mildly. Does anyone else recall this? Better yet, can you link to an example? [Edit: it might have been more like 2014 or 2015. Don’t remember exactly.]
- 3 Dec 2023 0:19 UTC; 13 points) 's comment on Quick takes on “AI is easy to control” by (EA Forum;
Thanks so much for publishing this. It’s so refreshing to read about alignment plans that people think might work, rather than just reading about why alignment is putatively hard or why naive approaches to the problem would fail.
Demis Hassabis has publicly stated that Google DeepMind’s upcoming Gemini model will be some sort of combination of an RL agent and an LLM, but AFAIK he hasn’t given more details than that. I’m very curious to see what they’ve come up with.
Kyle Vogt responded to the New York Times article. He claims 2.5 to 5 miles is the rate that Cruise vehicles request help from remote operators, not the rate that they actually get help. Vogt doesn’t say what the rate they actually get help is.
I’m a bit sus. If that number were so much better than the 2.5-5 miles number cited by the times, why wouldn’t he come out and say it?
This post was enjoyable as heck to read. Thanks for taking the time to write it.
I guess I’m of two minds about the effective altruism of it all.
One one hand: It kinda just seems like a bunch of self-identified effective altruists, who were well-meaning but perhaps naive, got blinded by money and suckered into servitude by a smart and charismatic leader who was successful at scamming a lot of people. Maybe there isn’t a big lesson about EA philosophy or the EA subculture. Maybe this is just like any other cult leader or con artist or corrupt CEO manipulating a lot of smart, sane, good-hearted people.
On the other hand: Maybe there’s something a bit cult-y about EA subculture and something about EA philosophy’s rejection of common sense and folk morality that made people associated with effective altruism extra susceptible to the Sam Bankman-Fried mind virus. Maybe people in the EA movement need more common sense and more folk morality. Maybe EA people also need more intellectual humility and more healthy skepticism of EA, such that they are more willing to balance EA philosophy with common sense and balance utilitarian ethics with folk morality.I’m empathetic to the people who got taken in by SBF and I don’t judge them harshly. I’ve been scammed before. I’ve been overzealous about non-common sense ideas before. A guy who seems really good at making money trading crypto and wants to donate it all to buy anti-malarial bed nets? On the face of it, what’s wrong with that?
Maybe the more interesting question is: why didn’t the exodus of the initial management team at Alameda result in SBF’s reputation getting destroyed in the EA community? Did the people who left not speak up enough to make that happen? Were they silenced by fear of reprisal? Were they too burnt out and defeated to do much after leaving? Were they embarrassed that Sam manipulated them?
Or did others, especially leaders in the EA community, not listen to them? Did they get blinded by dollar signs in their eyes? Did they find it easier to shoo away inconvenient allegations?
I really enjoyed reading this post. Thank you for writing it.
I guess we could say governance remains a problem with biological superintelligence? As it does with normal humans, just more so.
Beautifully written! Great job! I really enjoyed reading this story.
in comparison to a morally purified version of SimplexAI,we might be the baddies.”Did you link to the wrong thing here or is there some reference to generative grammar I’m not getting?
If I can attempt to synthesize these two points into a single point: don’t assume weird people are evil.
If someone walks around barefoot in an urban environment, that’s a good clue they might also be weird in other ways. But weird ≠ evil.
Principled non-conformity is a thing. Human diversity is a thing. Eccentricity is a thing.
If weirdness indicated evil, then LessWrong would be a hive of scum and villainy.
Uncritically enforcing rules and conformity to an idea of normalcy is not good. It has done great harm.