Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
I don’t feel great about my donations to a nonprofit funding their “hotel/event venue business” (as I would call it)
The nice thing about Lighthaven is that it mostly funds itself! Our current expected net-spending on Lighthaven is about 10% of our budget, largely as a result of subsidizing events and projects here that couldn’t otherwise exist. I think looking at that marginal expenditure Lighthaven is wildly cost-effective if you consider any of the organizations that run events here that we subsidize to be cost-effective.
because if, e.g., someone was considering whether it’s important to harm the third party now rather than later and telling them the information that I shared would’ve moved them towards harming the third party earlier, Oliver would want to share information with that someone so that they could harm the third party.
No, I didn’t say anything remotely like this! I have no such policy! I don’t think I ever said anything that might imply such a policy. I only again clarified that I am not making promises about not doing these things to you. I would definitely not randomly hand out information to anyone who wants to harm the third party.
At this point I am just going to stop commenting every time you summarize me inaccurately, since I don’t want to spend all day doing this, but please, future readers, do not assume these summaries are accurate.
Then, after hearing Oliver wouldn’t agree to confidentiality given that I haven’t asked him for it in advance
I have clarified like 5 times that this isn’t because you didn’t ask in advance. If you had asked in advance I would have rejected your request as well, it’s just that you would have never told me in the first place.
don’t try to tell people specifically for the purpose of harming the third party
This is also not what you asked for! You said “I just ask you to not use this information in a way designed to hurt [third party]”, which is much broader. “Not telling people” and “not using information” are drastically different. I have approximately no idea how to commit to “not use information for purpose X”. Information propagates throughout my world model. If I end up in conflict with a third party I might want to compete with them and consider the information as part of my plans. I couldn’t blind myself to that information when making strategic decisions.
I think you could have totally written a post that focused on communicating that, and it could have been a great post! Like, I do think the cost of keeping secrets is high. Both me and other people at Lightcone have written quite a bit about that. See for example “Can you keep this confidential? How do you know?”
Many words followed only after I expressed surprise and started the discussion (“about 3000 words of explanation, and a 1-2 hour chat conversation” is false, there were fewer than 2k words from your side in the entire conversation, many of which were about the third party, unrelated to your explanations of your decision procedures etc., a discussion of making a bet that you ended up not taking, some facts that you got wrong, etc.)
When I just extracted my messages from the thread I was referencing and threw them into a wordcounter I got 2,370 words in my part of the conversation (across three parallel threads), which is close enough to 3,000 that I feel good about my estimate. I do now realize that about 500 of those were a few weeks later (but still like a month ago), so I would have now said more like 2000 words to refer to that specific 1-2 hour conversation (do appreciate the correction, though I think in this context the conversation a few weeks later makes sense to include).
Among the almost 2000 words, you did not describe this procedure even once.
I brought it up as a consideration a few times. (Example: “Like, to be clear, I definitely rather you not have told me instead of demanding that [I] ‘only use the information to coordinate’ afterwards”). I agree I didn’t outline my whole decision-making procedure, but I did explain .
that you think it is insane to expect people to use information in ways that align with important preferences.
Sorry, I am not parsing this. My guess is you meant to say something else than “important preferences” here?
I think you’re misrepresenting what I asked; I asked you to not use it adversarially towards the third party, as it seemed to me as a less strong demand than confidentiality
It’s plausible I am still not understanding what you are asking. To be clear, what you asked for seemed to me substantially costlier than confidentiality (as I communicated pretty early on after you made your request). I have hopefully clarified my policies sufficiently now.
This kind of stuff is hard and we are evidently not on the same page about many of the basics, and that’s part of why I don’t feel comfortable promising things here, since my feeling is that you feel pretty upset already about me having violated something you consider an important norm, and I would like to calibrate expectations.
This is great!
Don’t make my helping you have been a bad idea for me
Yeah, I think this is a good baseline to aspire to, but of course the “my helping you” is the contentious point here. If you hurt me, and then also demand that I make you whole, then that’s not a particularly reasonable request. Why should I make you whole, I am already not whole myself!
Sometimes interactions are just negative-sum. That’s the whole reason why it does usually make sense to check-in beforehand before doing things that could easily turn out to be negative sum, which this situation clearly turned out to be!
I mostly just want people to become calibrated about the cost of sharing information with strings attached. It is quite substantial! It’s OK for that coordination to happen based on people’s predictions of each other, without needing to be explicitly negotiated each time.
I would like it to be normalized and OK for someone to signal pretty heavily that they consider the cost of accepting secrets, or even more intensely, the cost of accepting information that can only be used to the benefit of another party, to be very high. People should therefore model that kind of request as likely to be rejected, and so if you just spew information onto the other party, and also expect them to keep it secret or to only be used for your benefit, that the other party is likely to stop engaging with you, or to tell you that they aren’t planning to meet your expectations.
I think marginally the most important thing to do is to just tell people who demand constraints on information, without wanting to pay any kind of social cost for it, to pound sand.
Like, with all this new information I now am a tiny bit more wary of talking in front of Habryka.
Feel free to update on “Oliver had one interaction ever with Mikhail in which Oliver refused to make a promise that Mikhail thought reasonable”, but I really don’t think you should update beyond that. Again, the summaries in this post of my position are very far away from how I would describe them.
There is a real thing here, which if you don’t know you should know, which is that I do really think confidentiality and information-flow constraints are very bad for society. They are the cause of as far as I can tell a majority of major failures in my ecosystem in the last few years, and mismanagement of e.g. confidentiality norms was catastrophic in many ways, so I do have strong opinions about this topic! But the summary of my positions on this topic is really very far from my actual opinions.
and that you replied “lol, no” after a week.
No, what I did is reply with “lol, no” followed by about 3000 words of explanation across a 1-2 hour chat conversation, detailing my decision procedures, and what I am and am not happy to do. Like, I really went into a huge amount of detail, gave concrete specific examples, and elaborated what I would do. Much of this involved Mikhail insisting on a very specific interpretation of what reasonable conduct is and clarifying multiple times that yes, he wouldn’t want me to use information like this under any circumstance in any kind of way adversarial to the third party the information is about, and that it would be unreasonable for me to reject such a request.
As an interested third party who generally would like to to work with LightConeInfra and you, unrelated to Mikhail’s specific asks, I’m curious for if you broadly agree to put some non trivial decision weight on not using info people give you in ways they strongly disagree with, even if they didn’t ask you to precomit to that, even if they were mistaken in some assumptions. (If you later get that info from other places you’re ~released from the first obligations, tho this shouldn’t be gamed)
Of course! See my general process described above. If you tell me something in secret, or ask me to put some kind of constraint on information, I will check whether I would have accepted that information with that constraint in advance. If I would have, I am happy to agree to it afterwards. Similarly, if I think you have some important preference, but you just forgot to ask me explicitly, or we didn’t have time to discuss it, or it’s just kind of obvious that you have this preference, I will do the same.
I have a bunch more thoughts, but I don’t super want to prop up this comment section by writing stuff that I actually think is worth reading in general. I’ll post my more cleaned-up thoughts somewhere else and link them.
I think this whole thread is a waste of time and I don’t want to engage with it. I definitely think both the post and Mikhail’s comments should be downvoted, and think others should downvote them too!
Like, please model the costs of upvoting here. If a comment is bad, please just downvote it. Please don’t do the weird thing where you think the comment is bad, oh, but it would be so spicy and interesting if the comment was upvoted instead and so I could get more replies out of the people the comments are demanding attention from. These kinds of threads are super costly to engage in.
Like, if you want to know more information, just write comments yourself and ask them. Nobody is going to be happy if for some reason you force me to engage with Mikhail more. I am happy to answer questions but engaging with Mikhail on this is just beyond frustrating at this point.
I mean, sure, you can believe that for whatever reason. It’s definitely not something I said, and something I explicitly disclaimed like 15 times now!
and would like to use the information to hurt the third party given an opportunity.
Please stop misquoting me, come on, I have clarified this like 15 times now. Please. How many more times must I say this? All I am saying is that I am not committing to never do anything with information of this kind that hurts the third party, that is a drastically different kind of thing!
This is not the request that I made. I asked to not use information adversarially: to not try to cause harm to the third party using it.
You said: “I don’t think I have any reason to ask you to not consider it in your plans insofar as these considerations are not hurting their interests or whatever” when I asked for clarification. This clearly implies you are asking me to not consider this information in my plans if doing so would hurt their interests!
You also clarified multiple other times that you were asking me to promise to not use this information in any future conflicts or anything like that, or to make plans on their basis that would somehow interfere with the other party’s plans, even if I thought they would cause grave harm if I didn’t interfere.
You didn’t signal in any way that any of that stuff was an option.
I am really not very optimistic about making agreements with you in-particular, based on how the one conversation I’ve ever had with you went. So no, that is not an option, though I will still try to do good by what I think you care about. But I do not want to risk you forming more expectations about how I will behave which you then get angry at me for and try to strongarm me into various things I don’t want to do. It’s not been fun dealing with you on this!
(2) is dependent on you not having ways to use the information to hurt the third party.
This is just false. I am not going around trying to randomly hurt people. All I am saying, and will continue to say, is that I am not promising you that I will use this information only in ways you approve of, or the third party would approve of. The bar is much higher than simply “an opportunity presents itself to hurt the third party!”, as I have told you multiple times!
People who donated to keep Lighthaven going are not particularly happy about this.
Feel free to do a survey on this! I am sure almost all of our donors would of course have an exchange rate where instead of them donating, we just provide epsilon value to an AI company, and then they can use their money to do other good things in the world. I would be extremely surprised if your statement was true in any kind of generality.
talked to friends who previously have or considered donating large amounts to Lightcone, and then regretted that/decided not to after learning about all this.
Almost none of the information in this post is correct! If they updated because of takes like this post, then I think they just made a mistake.
To anyone else: please reach out to me if you somehow made updates in this direction, I would be highly surprised if you end up endorsing it. The only thing that seems plausible to me as a real update in the space is that for a high enough tax we will host basically arbitrary events at Lighthaven (not literally arbitrary, but like, I think we should have some price for basically anything, and I expect the tax to sometimes be very high). If you really don’t want that you should at least let me know! You can also leave comments here and I’ll be glad to respond.
Separately, I think it’s good to invite people like Sam Altman to events like the Progress Conference, and would of course want Sam to be at important diplomatic meetings. If you think that’s always bad, then I do think Lighthaven might be bad! I am definitely hoping for it to facilitate conversations between many people I think are causing harm for the world.
Supporting the idea that the criticisms are false with a note on “Mikhail must’ve not had time” is weird, especially given that I explicitly told you all that I find the arguments in your comments invalid and didn’t want to reply in detail from my phone.
Look, “three hours on a Saturday night” is not the right amount of time to give someone if you are asking them for input on a post like this. I mean, you could have just not asked for input at all, but it’s clearly not an amount of time that should give you any confidence you got the benefits of input.
A lot of the claims about me, and about Lightcone, in this post are false, which is sad. I left a large set of comments on a draft of thist post, pointing out many of them, though not all of them got integrated before the post was published (presumably because this post was published in a rush as Mikhail is part of Inkhaven, and decided to make this his first post of Inkhaven, and only had like 2 hours to get and integrate comments).
A few quick ones, though this post has enough errors that I mostly just want people to really not update on this at all:
Oliver said that Lightcone would be fine with providing Lighthaven as a conference venue to AI labs for AI capabilities recruiting, perhaps for a higher price as a tax.
This is technically true, but of course the whole question lies in the tax! I think the tax might be quite large, possible enough to cover a large fraction of our total operational costs for many months (like a 3-4x markup on our usual cost of hosting such an event, or maybe even more). If you are deontologically opposed to Lighthaven ever hosting anything that has anything even vaguely to do with capability companies, no matter the price, then yeah, I think that’s a real criticism, but I also think it’s a very weird one. Even given that, at a high enough price, the cost to the labs would be virtually guaranteed to be more than they would benefit from it, making it a good idea even if you are deontologically opposed to supporting AI companies.
he said he already told some people and since he didn’t agree to the conditions before hearing the information, he can share it, even though wouldn’t go public with it.
The promise that Mikhail asked me to make was, as far as I understood it, to “not use any of the information in the conversation in any kind of adversarial way towards the people who the information is about”. This is a very strong request, much stronger than confidentiality (since it precludes making any plans on the basis of that information that might involve competing or otherwise acting against the interests of the other party, even if they don’t reveal any information to third parties). This is not a normal kind of request! It’s definitely not a normal confidentiality request! Mikhail literally clarified that he thought that it would only be OK for me to consider this information in my plans, if that consideration would not hurt the interests of the party we were talking about.
And he sent the message in a way that somehow implied that I was already supposed to have signed up for that policy, as if it’s the most normal thing in the world, and with no sense that this is a costly request to make (or that it was even worth making a request at all, and that it would be fine to prosecute someone for violating this even if it had never been clarified at all as an expectation from the other side).
He just learned that keeping secrets is bad in general, and so he doesn’t by default, unless explicitly agrees to.
This is not true! My policy is simply that you should not assume that I will promise to keep your secrets after you tell me, if you didn’t check with me first. If you tell me something without asking me for confidentiality first, and then you clarify that the information is sensitive, I will almost always honor that! But if you show up and suddenly demand of me that I will promise that I keep something a secret, without any kind of apology or understanding that this is the kind of thing you do in advance, of course I am not going to just do whatever you want. I will use my best judgement!
My general policy here is that I will promise to keep things secret retroactively, if I would have agreed to accept the information with a confidentiality request in advance. If I would have rejected your confidentiality request in advance, you can offer me something for the cost incurred by keeping the secret. If you don’t offer me anything, I will use my best judgement and not make any intense promises but broadly try to take your preferences into account in as much as it’s not very costly, or offer you some weaker promise (like “I will talk about this with my team or my partner, but won’t post it on the internet”, which is often much cheaper than keeping a secret perfectly).
Roughly the aim here is to act in a timeless fashion and to not be easily exploitable. If I wouldn’t have agreed to something before, I won’t agree to it just because you ask me later, without offering me anything to make up the cost to me!
And to repeat the above again, the request here was much more intense! The request, as I understood it, was basically “don’t use this information in any kind of way that would hurt the party the information is about, if the harm is predictable”, which I don’t even know how to realistically implement at a policy level. Of course if I end up in conflict with someone I will use my model of the world which is informed by all the information I have about someone!
And even beyond that, I don’t think I did anything with the relevant information that Mikhail would be unhappy about! I have indeed been treating the informations as sensitive. This policy might change if at some point the information looks more valuable to communicate. Mikhail seems only angry about me not fully promising to do what he wants, without him offering me anything in return, and despite me thinking that I would not have agreed to any kind of promise like this in the first place if I was asked to do that before receiving the information (and would have just preferred to never receive the information in the first place).
I ask Oliver to promise that he’s not going to read established users’ messages without it being known to others at Lightcone Infrastructure and without a justification such as suspected spam, and isn’t going to share the contents of the messages.
We’ve had internal policies here for a long time! We never look at DMs unless one of the users in the conversation reports a conversation as spam. Sometimes DM contents end up in error logs, but I can’t remember a time where I actually saw any message contents instead of just metadata in the 8 years that I’ve been working on LW (but we don’t have any special safeguards against it).
We look at drafts that were previously published. We also sometimes look at early revisions of posts that have been published for debugging purposes (not on-purpose, but it’s not something we currently have explicit safeguards or rules about). We never look at unpublished drafts, unless the user looks pretty clearly spammy, and never for established users.
It shouldn’t cost hundreds of thousands of dollars to keep a website running and moderated and even to ship new features with the help from the community.
Look, we’ve had this conversation during our fundraiser. There is zero chance of running an operation like LW 2.0 long-term without that not somehow costing at least $200k/yr. Even if someone steps up and does it for free, that is still them sacrificing at least $200k in counterfactual income, if they are skilled enough to run LessWrong in the first place. I think even at a minimum skeleton crew, you would be looking at at least $300k of costs.
The cost of running/supporting LessWrong is much lower than Lightcone Infrastructure’s spending.
This is false! Most of our spending is LessWrong spending these days (as covered in our annual fundraiser post). All of our other projects are much closer to paying for itself. Most of the cost of running Lightcone is the cost of running LessWrong (since it’s just a fully unmonetized product).
IDK, I am pretty sad about this post. I am happy to clarify my confidentiality policies and other takes on honoring retroactive deals (which I am generally very into, and have done a lot of over the years), if anyone ends up concerned as a result of it.
I will be honest in that it does also feel to me like this whole post was written in an attempt at retaliation when I didn’t agree with Mikhail’s opinions on secrets and norms. Like, I don’t think this post was written in an honest attempt at figuring out whether Lightcone is a good donation target.
I mean, I think running ML experiments is like the other most cursed thing that I feel like I have to interface with on a regular basis.
Hmm, I guess you mean “an alignment plan conditional on no governance plan?”. Which is IMO a kind of weird concept. Your “safety plan” for a bridge or a power plant should be a plan that makes it likely the bridge doesn’t fall down, and the power plant doesn’t explode, not a plan that somehow desperately tries to attach last-minute support to a bridge that is very likely to collapse.
Like, I think our “alignment plan” should be a plan that has much of any chance of solving the problem, which I think the above doesn’t (which is why I’ve long advocated for calling the AI 2027 slowdown ending the “lucky” ending, because really most of what happens in it is that you get extremely lucky and reality turns out to be very unrealistically convenient).
I don’t think the above plan is a reasonable plan in terms of risk tradeoffs, and in the context of this discussion I think we should mostly be like “yeah, I think you don’t really have a shot of solving the problem if you just keep pushing on the gas, I mean, maybe you get terribly lucky, but I don’t think it makes sense to describe that as a ‘plan’”.
Like, approximately anyone “working on the above plan” would my guess have 20x-30x more impact if they instead focused their efforts on a mixture of a plan that works without getting extremely lucky, plus pushing timelines back to make it more likely that you have the time to implement such a plan.
(Of course I expect some people to disagree with me here, maybe including you, which is fine, just trying to explain my perspective here)
(What are the best alternatives? Curious for answers to that question.)
Don’t build AGI for a long time. Probably make smarter humans. Build better coordination technology so that you can be very careful with scaling up. Do a huge enormous amount of mechinterp in the decades you have.
Copy-pasting what I wrote in a Slack thread about this:
My current take, having thought a lot about a few things in this domain, but not necessarily this specific question, is that the only dimensions where the empirical evidence feels like it was useful, besides a broad “yes, of course the problems are real, and AGI is possible, and it won’t take hundreds of years” confirmation, are the dynamics around how much you can steer and control near-human AI systems to perform human-like labor.
I think almost all the evidence for that comes from just the scaling up, and basically none of it comes from safety work (unless you count RLHF as safety work, though of course the evidence there is largely downstream of the commercialization and scaling of that technology).
I can’t think of any empirical evidence that updated me much on what superintelligent systems would do, even if they are the results of just directly scaling current systems, which is the key thing that matters.
A small domain that updated me a tiny bit, though mostly in the direction of what I already believed, is the material advantage research with stuff like LeelaOdds, which demonstrated more cleanly you can overcome large material disadvantages with greater intelligence in at least one toy scenario. The update here was really small though. I did make a bet with one person and won that one, so presumably it was a bigger update for others.
I think a bunch of other updates for me are downstream of “AIs will have situational awareness substantially before they are even human-level competent”, which changes a lot of risk and control stories. I do think the situational awareness studies were mildly helpful for that, though most of it was IMO already pretty clear by the release of GPT-4, and the studies are just helpful for communicating that to people with less context or who use AI systems less.
Buck: What do you think we’ve learned about how much you can steer and control the AIs to perform human-like labor?
Me: It depends on what timescale. One thing that I think I updated reasonably strongly on is that we are probably not going to get systems with narrow capability profiles. The training regime we have seems to really benefit from throwing a wide range of data on it, and the capital investments to explicitly train a narrow system are too high. I remember Richard a few years ago talking about building AI systems that are exceptionally good at science and alignment, but bad at almost everything else. This seems a bunch less likely now. And then there is just a huge amount of detail on what things do I expect AI to be good at and bad at, at different capability levels, based on extrapolating current progress. Some quick updates here:
Models will be better at coding than almost any other task
Models will have extremely wide-ranging knowledge in basically all fields that have plenty of writing about them
It’s pretty likely natural language will just be the central interface for working with human-level AI systems (I would have had at least some probability mass on more well-defined objectives, though I think in retrospect that was kind of dumb)
We will have multi-modal human-level AIs, but it’s reasonably likely we will have superhuman AIs in computer use and writing substantially before we have AIs that orient to the world at human reaction speeds (like, combining real-time video, control and language is happening, but happening kind of slowly)
We have different model providers, but basically all the AI systems behave the same, with their failure modes and goals and misalignment all being roughly the same. This has reasonably-big implications for hoping that you can get decorrelated supervision by using AIs from different providers.
Chains of thought will stop being monitorable soon, but it stayed monitorable for an IMO mildly surprisingly long length of time. This suggests there is maybe more traction on keeping chains of thought monitorable than I would have said a few months ago.
The models will just lie to you all the time, everyone is used to this, you cannot use “the model is lying to me or clearly trying to deceive me” as any kind of fire alarm
Factored cognition seems pretty reliably non-competitive with just increasing context-lengths and doing RL (this is something I believed for a long time, but every year of the state of the art still not involving factored cognition is more evidence in this direction IMO, though I expect others to find this point kind of contentious)
Elicitation in-general is very hard, at least from a consumer perspective. There are tons of capabilities that the models demonstrate in one context, that are very hard to elicit without doing your own big training run in other contexts. At least in my experience LoRA’s don’t really work. Maybe this will get better. (one example that informs my experience here: restoring base-model imitation behavior. Fine-tuning seems to not work great for this, you still end up with huge mode collapse and falling back to the standard RLHF-corpo-speak. Maybe this is just a finetuning skill issue)
There are probably more things.
@Ustice Definitely interested in feedback! We are pretty much continuously iterating on these.
Yeah, I honestly think the above is pretty clear?
I do not think it at all describes a policy of “if someone was trying to harm the third party, and having this information would cause them to do it sooner, then I would give them the information”. Indeed, it seems really very far away from that! In the above story nobody is trying to actively harm anyone else as far as I can tell? I certainly would not describe “CEA Comm Health team is working on a project to do a bunch of investigations, and I tell them information that is relevant to how highly they should prioritize those investigations” as being anything close to “trying to harm someone directly”!
No, I literally said “Like, to be clear, I definitely rather you not have told me”. And then later “Even if I would have preferred knowing the information packaged with the request”. And my first response to your request said “You can ask in-advance if I want to accept confidentiality on something, and I’ll usually say no”.
Sure, but I also wouldn’t have done that! The closest deal we might have had would have been a “man, please actually ask in advance next time, this is costly and makes me regret having that whole conversation in the first place. If you recognize that as a cost and owe me a really small favor or something, I can keep it private, but please don’t take this as a given”, but I did not (and continue to not) have the sense that this would actually work.
Maybe I am being dense here, and on first read this sounded like maybe a thing I could do, but after thinking more about it I do not know what I am promising if I promise I “won’t actively try to use [this information] outside of coordinating with the third party”. Like, am I allowed to write it in my private notes? Am I allowed to write it in our weekly memos as a consideration for Lightcone’s future plans? Am I not allowed to think the explicit thought “oh, this piece of information is really important for this plan that puts me in competition with this third party, better make sure to not forget it, and add it to my Anki deck?
Like, I am not saying there isn’t any distinction between “information passively propagating” and “actively using information”, but man, it feels like a very tricky distinction, and I do not generally want to be in the business of adding constraints to my private planning and thought-processes that would limit how I can operate here, and relies on this distinction being clear to other people. Maybe other people have factored their mind and processes in ways they find this easy, but I do not.