That’s funny, I’ve already changed the title from “Using Prediction Platforms To Select Quantified Self Experiments”. I guess the problem is really the block quote, which I’ll move somewhere later in the post.
That example with traders was to show that in the limit these non EU-maximizers actually become EU-maximizers, now with linear utility instead of logaritmic. And in other sections I tried to demonstrate that they are not EU-maximizers for a finite number of agents.
First, in the expression for their utility based on the outcome distribution, you integrate something of the form, a quadratic form, instead of as you do to compute expected utility. By itself it doesn’t prove that there is no utility function, because there might be some easy cases like , and I didn’t rigorously proof that this utility function can’t be split, though it feels very unlikely to me that something can be done with such non-linearity.
Second, in the example about Independence axiom we have , which should have been equal if was equivalent to expectation of some utility function.
Both make sense. I spent ~all my mana on creating the markets, and as more Mana rolls in from other bets I am subsidizing them.
I agree completely about AGI being like Turing completeness, that there’s a threshold. However, there are programming languages that are technically Turing complete, but only a masochist would actually try to use. So there could be a fire alarm, while the AGI is still writing all the (mental analogs of) domain-specific languages and libraries it needs. My evidence for this is humans: we’re over the threshold, but barely so, and it takes years and years of education to turn us into quantum field theorist or aeronautical engineer.
But my main crux is that I think we already know how to align an AGI: value learning. See my post Requirements for a STEM-capable AGI Value Learner. That’s an alignment techniwque that only works on things over the threshold.
What if all I can assign is a probability distribution of probabilities? Like in extraterrestrial life question. All that can be said is that extraterrestrial life is sufficiently rare to not find evidence of it yet. Our observation of our existence is conditioned on our existence, so it doesn’t provide much evidence one way or another.
Should I sample the distribution to give an answer, or maybe take mode, or mean, or median? I’ve chosen a value that is far from both extremes, but I might have done something else with no clear justification for any of the choices.
We have to get the AIs values exactly aligned with human values
This is a big crux for me, and one of the major reasons my P(DOOM) isn’t >90%. If you use value learning, you only need to get your value learner aligned well enough to a) be inside the region of convergence, and b) not kill everyone while it’s learning, and it will do its research and Bayesianly converge on human values (and if it’s not capable of being competently Bayesian enough to do that, it’s not superhuman, at least at STEM). So, if you use value learning, the only piece you need to get exactly right is the phrasing of the terminal goal saying “Use value learning”. For something containing an LLM, I think that might be about one short paragraph of text, possibly with one equation in it. The prospect of getting one paragraph of text and one equation right, with enough polishing and peer review, doesn’t actually seem that daunting to me.
Approaches to alignment stability
I view this as pretty-much a solved problem, solved by value learning. Though there are then issues due to the mutability of human values.
if and how humans are stably aligned
Humans are NOT aligned. Humans are not selfless, caring only about the good of others. Joseph Stalin was not aligned with the citizenry of Russia. If humans were aligned, we wouldn’t need law enforcement, or locks. Humans cannot safely be trusted with absolute power or the sorts of advantages inherent to being a digital intelligence. They’re just less badly aligned than a paperclip maximizer.
As a historical note / broader context, the worry about model class over-expressivity has been there in the early days of Machine Learning. There was a mistrust of large blackbox models like random forest and SVM and their unusually low test or even cross-validation loss, citing ability of the models to fit noise. Breiman frank commentary back in 2001, “Statistical Modelling: The Two Cultures”, touch on this among other worries about ML models. The success of ML has turn this worry into the generalisation puzzle. Zhang et. al. 2017 being a call to arms when DL greatly exacerbated the scale and urgency of this problem.
Yeah it surprises me that Zhang et al. (2018) has had the impact it did when, like you point out, the ideas have been around for so long. Deep learning theorists like Telgarsky point to it as a clear turning point.
Naive optimism: hopefully progress towards a strong resolution to the generalisation puzzle give us understanding enough to gain control on what kind of solutions are learned. And one day we can ask for more than generalisation, like “generalise and be safe”.
This I can stand behind.
pauses training to do alignment work
There’s yet another approach: conditional training, where the LLM is aligned during the pretraining phase. See How to Control an LLM’s Behavior for more details.
Several of the superscalers have public plans of the form: Step 1) build an AI scientist, or at least research assistant 2) point it at the Aligment Problem 3) check it output until the Alignment Problem is solved 4) Profit!
This is basically the same proposal as Value Leaning, just done as a team effort.
I just gave this a re-read, I forgot what a trip it is to read the thoughts of Eliezer Yudkowsky. It continues to be some of my favorite stuff in recent years written on LessWrong.
It’s hard to relate to the world with a level of mastery over basic ideas as Eliezer has. I don’t mean with this to vouch that his perspective is certainly correct, but I believe it is at least possible, and so I think he aspires to a knowledge of reality that I rarely if ever aspire to. Reading it inspires me to really think about how the world works, and really figure out what I know and what I don’t. +9
(And the smart people dialoguing with him here are good sports for keeping up their side of the argument.)
I started asking other folks in AI Governance. The vast majority had not talked to congressional staffers (at all).
??? WTF do people “in AI governance” do?
I’ve read your explanation of what happened, and it still seems like the board acted extremely incompetently. Call me an armchair general if you want. Specifically I take issue with:
The decision to fire Sam, instead of just ejecting him from the board
Both kicking Sam off the board, and firing him, and kicking Greg off at the same time all at once with no real explanation is what ultimately gives Sam the cassus belli for organizing the revolt in the first place, and there’s literally no need for it.
Consider instead what happens if Sam just loses his board seat. First, his cost-benefit analysis looks different: Sam still has most of what he had before to lose, namely his actual position at OpenAI. Second, quitting in protest and moving to Microsoft suddenly now looks incredibly vindictive. And if Sam tries to use his position as CEO to sabotage the company or subvert the board further, he’s giving you more ammunition to fire him later if you really need to.
If I had been on the board my first action after getting the five together would have been to call Greg and Mira into an office room and explain what was going on. Then after a long conversation (whether or not they’d agreed with the decision), I call Sam in/over the internet and delivered the news that he was no longer a board member. I then overtly explain the reasoning behind why he’s losing the board seat (“we felt you were trying to compromise the integrity of the board with your attacks on Helen”), starting with the explanation that the decision has already been made, so there’s no takebacks. If it’s appropriate, we offer him the option to save face and say that he voluntarily resigned to keep the board independent. Even if he says no, he’s pretty much incapable of pulling any of the shenanigans he did over that weekend. And if the objection is that Sam’s mystical powers of persuasion will be used corrupt the organization further down the road, well, now you’ve at least created common knowledge of his intent to capture OpenAI and you’ve removed a vote from him in the first place, so it should be much easier to shut that down.
The decision never to explain why they ejected Sam.
Not much explanation necessary here. People desperately wanted to know why the board fired him and whether or not it was something beyond Sam not being EA-affiliated. Which it was! So just fucking say that, and now it’s on Sam to prove or disprove your accusations. People (I’d wager even people inside OpenAI who feel some semblance of loyalty to him) do not actually need that much evidence to believe that Sam Altman, Silicon Valley’s career politician, is a snake and was trying to corrupt the organization he was a part of. Say you have private information, explain precisely the things you explain in the above comment. That’s way better than saying nothing because if you say nothing then Sam can just insert whatever
The decision not to be aggressive in denouncing Sam after he started actively threatening to destroy the company.
Beyond communicating the motivation behind the initial decision, the board (Ilya ideally, if you can get him to do this) should have been on Twitter the entire time screaming at the top of their lungs that Sam’s actions were quinessentially petty and that he was willing to burn the entire company down in order to hold on power, and that while kicking Sam off the board was a tough call and much tears were shed, everything that happened over the last three days—destroying all of your hard-earned OpenAI equity and handing it to MIcrosoft etc. - was a resounding endorsement of their decision to fire him, and that they will never surrender, etc. etc. etc. The only reason Sam’s strategies of feeding info to the press about his inevitable return worked in the first place was because the board stayed completely fucking silent the entire time and refused to give any hint as to what they were thinking to either the staff at OpenAI or the general public.
Outside of full-blown deceit-leading-to-coup and sharp-left-turn scenarios where everything looks just fine until we’re all dead, alignment and capabilities often tend to be significantly intertwined, few things are just Alignment, and it’s often hard determine the ratio of the two (at least without the benefit of tech-tree hindsight). Capabilities are useless if your LLM capably spews stuff that gets you sued, and it’s also rapidly becoming the case that a majority of capabilities researchers/engineers even at superscalers acknowledge that alignment (or at least safety) is a real problem that actually needs to be worked on, and their company has team doing so. (I could name a couple of orgs that seem like exceptions to this, but they’re now in a minority.) There’s an executive order that mentions the importance of Alignment, the King of England made a speech about it, and even China signed on to a statement about it (though one suspects they meant alignment to the Party).
Capabilities researchers/engineers outnumber alignment researchers/engineers by more then an order of magnitude, and some of them are extremely smart. The probability that any given alignment researcher/engineer has come up with a key capabilities-enhancing idea that has eluded every capabilities researcher/engineer out there, and that will continue to do so for very long, seems pretty darned unlikely (and also rather intellectually arrogant). [Yes, I know Conjecture sat on chain-of-thought prompting — for a month or two while multiple other people came up with it independently and then wrote and published papers, or didn’t. Any schoolteacher could have told you that was a good idea, it wasn’t going to stay secret.]
So, (unless you’re pretty sure you’re a genius) I don’t think people should worry quite as much about this as many seem to. Alignment is a difficult, very urgent problem. We’re not going to solve it in time while wearing a gag, nor with one hand tied behind our back. Caution makes sense to me, but not the sort of caution that makes it much slower for us to get things done — we’re not in a position to slow Capabilities by more than a tiny fraction, no matter how closed-mouthed we are; but we’re in a lot better position to slow Alignment down. And if your gears-level predictions are about the prospects of things that multiple teams of capabilities engineers are already working on, go ahead and post them — I could be wrong, but I don’t think Yann LeCun is reading the Alignment Forum. Yes, ideas can and do diffuse, but that takes a few months, and that’s about the timespan apart of most parallel inventions. If you’ve been sitting on a capabilities idea for >6 months, you’ve done literature searches to confirm no one else published it, and you’re not in fact a genius, then there’s probably a reason why none of the capabilities people have published it yet.
reducing suffering
…by painlessly reducing the human population to zero. Or in the gamma → 1 limit, the painlessness becomes a nice-to-have.
7 months later, we now know that this is true. Also, we now know that you can take output from a prompted/scaffolded LLM and use it to fine-tune another LLM to do the same things without needing prompt/scaffold.
Yep, I am working on it right now!
The LessWrong moderation team will take the voting results as a strong indicator of which posts to include in the Best of 2022 sequence.
Will there also be a Best of 2021 sequence at some point?
Modern congressional staffers are the product of Goodhart’s law; 50-100 years ago, they were the ones that ran congress de-facto, so all the businessmen and voters wanted to talk to them, so the policymaking ended up moving elsewhere. Just like what happened with congressmen themselves 100-150 years ago. Congressional staffers today primarily take constituent calls from voters, and make interest groups think they’re being listened to. Akash’s accomplishments came from wading through that bullshit, meeting people through people until he managed to find some gems.
Most policymaking today is called in from outside, with lobbyists having the domain-expertise needed to write the bills, and senior congressional staffers (like the legislative directors and legislative assistants here) overseeing the process, usually without getting very picky about the details.
It’s not like congressmembers have no power, but they’re just one part of what’s called an “Iron triangle”, the congressional lawmakers, the executive branch bureaucracies (e.g. FDA, CDC, DoD, NSA), and the private sector companies (e.g. Walmart, Lockheed, Microsoft, Comcast), with the lobbyists circulating around the three, negotiating and cutting deals between them. It’s incredibly corrupt and always has been, but not all-crushingly corrupt like African governments. It’s like the Military Industrial Complex, except that’s actually a bad example because congress is increasingly out of the loop de-facto on foreign policy (most structures are idiosyncratic, because the fundamental building block is people who are thinking of ways to negotiate backdoor deals).
People in the executive branch/bureaucracies like the DoD have more power on interesting things like foreign policy, Congress is more powerful for things that have been entrenched for decades like farming policy. Think tank people have no power but they’re much less stupid and have domain expertise and are often called up to help write bills instead of lobbyists.
I don’t know how AI policy is made in Congress, I jumped ship from domestic AI policy to foreign AI policy 3.5 years ago in order to focus more on the incentives from the US-China angle, Akash is the one to ask about where AI policymaking happens in congress, as he was the one actually there deep in the maze (maybe via DM because he didn’t describe it in this post).
I strongly recommend people talking to John Wentworth about AI policy, even if he doesn’t know much at first; after looking at Wentworth’s OpenAI dialog, he’s currently my top predicted candidate for “person who starts spending 2 hours a week thinking about AI policy instead of technical alignment and thinks up galaxy brained solutions that break stalemates that have thwarted the AI policy people for years”.