Yes—I didn’t say it was hard without AI, I said it was hard. Using the best tech in the world, humanity doesn’t *even ideally* have ways to get AI to design safe useful vaccines in less than months, since we need to do actual trials.
Davidmanheim
I know someone who has done lots of reporting on lab leaks, if that helps?
Also, there are some “standard” EA-adjacent journalists who you could contact / someone could introduce you to, if it’s relevant to that as well.
Vaccine design is hard, and requires lots of work. Seems strange to assert that someone could just do it on the basis of a theoretical design. Viral design, though, is even harder, and to be clear, we’ve never seen anyone build one from first principles; the most we’ve seen is modification of extant viruses in minor ways where extant vaccines for the original virus are likely to work at least reasonably well.
I have a lot more to say about this, and think it’s worth responding to in much greater detail, but I think that overall, the post criticizes Omhundro and Tegmark’s more extreme claims somewhat reasonably, though very uncharitably, and then assumes that other proposals which seem to be related, especially Dalyrymple et al. approach, are essentially the same, and doesn’t engage with the specific proposal at all.
To be very specific about how I think the post in unreasonable, there are a number of places where a seeming steel-man version of the proposals are presented, and then this steel-manned version, rather than the initial proposal for formal verification, is attacked. But this amounts to a straw-man criticism of the actual proposals being discussed!
For example, this post suggests that arbitrary DNA could be proved safe by essentially impossible modeling (“on-demand physical simulations of entire human bodies (with their estimated 36 trillion cells [9]), along with the interactions between the cells themselves and the external world and then run those simulations for years”). This is true, that would work—but the proposal ostensibly being criticized was to check narrower questions about whether DNA synthesis is being used to produce something harmful. And Dalyrmple et al explained explicitly what they might have included elsewhere in the paper (“Examples include machines provably impossible to login to without correct credentials, DNA synthesizers that provably cannot synthesize certain pathogens, and AI hardware that is provably geofenced, time-limited (“mortal”) or equipped with a remote-operated throttle or kill-switch. Provably compliant sensors can be specified to ensure “zeroization”, in which tampering with PCH is guaranteed to cause detection and erasure of private keys.”)
But that premise falls apart as soon as a large fraction of those (currently) highly motivated (relatively) smart tech workers can only get jobs in retail or middle management.
Yeah, I think the simplest thing for image generation is for model hosting providers to use a separate tool—and lots of work on that already exists. (see, e.g., this, or this, or this, for different flavors.) And this is explicitly allowed by the bill.
For text, it’s harder to do well, and you only get weak probabilistic identification, but it’s also easy to implement an Aaronson-like scheme, even if doing it really well is harder. (I say easy because I’m pretty sure I could do it myself, given, say, a month working with one of the LLM providers, and I’m wildly underqualified to do software dev like this.)
Internet hosting platforms are responsible for ensuring indelible watermarks.
The bill requires that “Generative AI hosting platforms shall not make available a generative AI system that does not allow a GenAI provider, to the greatest extent possible and either directly providing functionality or making available the technology of a third-party vendor, to apply provenance data to content created or substantially modified by the system”
This means that sites running GenAI models need to allow the GenAI systems to implement their required watermarking, not that hosting providers (imgur, reddit, etc.) need to do so. Less obviously good, but still importantly, it also doesn’t require the GenAI hosting provider the ensure the watermark is indelible, just that they include watermarking via either the model, or a third-party tool, when possible.
LLMs are already moderately-superhuman at the task of predicting next tokens. This isn’t sufficient to help solve alignment problems. We would need them to meet the much much higher bar of being moderately-superhuman at the general task of science/engineering.
We also need the assumption—which is definitely not obvious—that significant intelligence increases are relatively close to achievable. Superhumanly strong math skills presumably don’t let AI solve NP problems in P time, and it’s similarly plausible—though far from certain—that really good engineering skill tops out somewhere only moderately above human ability due to instrinsic difficulty, and really good deception skills top out somewhere not enough to subvert the best systems that we could build to do oversight and detect misalignment. (On the other hand, even with these objections being correct, it would only show that control is possible, not that it is likely to occur.)
Live like life might be short. More travel even when it means missing school, more hugs, more things that are fun for them.
Optimizing for both impact and personal fun, I think this is probably directionally good advice for the types of analytic people who think about the long term a lot, regardless of kids. (It’s bad advice for people who aren’t thinking very long term already, but that’s not who is reading this.)
First, I think this class of work is critical for deconfusion, which is critical if we need a theory for far more powerful AI systems, rather than for very smart but still fundamentally human level systems.
Secondly, concretely, it seems that very few other approaches to safety have the potential to provide enough fundamental understanding to allow us to make strong statements about models before they are fully trained. This seems like a critical issue if we are concerned about very strong models that could pose risks during testing, or possibly even during training. And as far as I’m aware, nothing in the interpretability and auditing spaces has a real claim to be able to make clear statements about those risks, other than perhaps to suggest interim testing during model training—which could work, if a huge amount of such work is done, but seems very unlikely to happen.
Edit to add: Given the votes on this, what specifically do people disagree with?
Ideally, statements should be at least two of true, necessary/useful, and kind. I agree that this didn’t breach confidentiality of some sort, and yet I think that people should generally follow a policy where we don’t publicize random non-notable people’s names without explicit permission when discussing them in a negative light—and the story might attempt to be sympathetic, but it certainly isn’t complementary.
- 21 Jun 2024 3:01 UTC; -1 points) 's comment on I would have shit in that alley, too by (
- 21 Jun 2024 4:16 UTC; -1 points) 's comment on I would have shit in that alley, too by (
Unlikely, since he could have walked away with a million dollars instead of doing this. (Per Zvi’s other post, “Leopold was fired right before his cliff, with equity of close to a million dollars. He was offered the equity if he signed the exit documents, but he refused.”)
The most obvious reason for skepticism of the impact that would cause follows.
David Manheim: I do think that Leopold is underrating how slow much of the economy will be to adopt this. (And so I expect there to be huge waves of bankruptcies of firms that are displaced / adapted slowly, and resulting concentration of power- but also some delay as assets change hands.)
I do not think Leopold is making that mistake. I think Leopold is saying a combination of the remote worker being a seamless integration, and also not much caring about how fast most businesses adapt to it. As long as the AI labs (and those in their supply chains?) are using the drop-in workers, who else does so mostly does not matter. The local grocery store refusing to cut its operational costs won’t much postpone the singularity.
I want to clarify the point I was making—I don’t think that this directly changes the trajectory of AI capabilities, I think it changes the speed at which the world wakes up to those possibilities. That is, I think that in worlds with the pace of advances he posits, the impacts on the economy are slower than in AI, and we get faster capabilities takeoff than we do in economic impacts that make the transformation fully obvious to the rest of the world.
The more important point, in my mind, is what this means for geopolitics, which I think aligns with your skepticism. As I said responding to Leopold’s original tweet: “I think that as the world wakes up to the reality, the dynamics change. The part of the extensive essay I think is least well supported, and least likely to play out as envisioned, is the geopolitical analysis. (Minimally, there’s at least as much uncertainty as AI timelines!)”
I think the essay showed lots of caveats and hedging about the question of capabilities and timelines, but then told a single story about geopolitics—one that I think it both unlikely, and that fails to notice the critical fact—that this is describing a world where government is smart enough to act quickly, but not smart enough to notice that we all die very soon. To quote myself again, “I think [this describes] a weird world where military / government “gets it” that AGI will be a strategic decisive advantage quickly enough to nationalize labs, but never gets the message that this means it’s inevitable that there will be loss of control at best.”
I don’t really have time, but I’m happy to point you to a resource to explain this: https://oyc.yale.edu/economics/econ-159
And I think O disagreed with the concepts inasmuch as you are saying something substantive, but the terms were confused, and I suspect, but may be wrong, that if laid out clearly, there wouldn’t be any substantive conclusion you could draw from the types of examples you’re thinking of.
That seems a lot like Davidad’s alignment research agenda.
Agree that it’s possible to have small amounts of code describing very complex things, and I said originally, it’s certainly partly spaghetti towers. However, to expand on my example, for something like a down-and-in European call option, I can give you a two line equation for the payout, or a couple lines of easily understood python code with three arguments (strike price, min price, final price) to define the payout, but it takes dozens of pages of legalese instead.
My point was that the legal system contains lots of that type of what I’d call fake complexity, in addition to the real complexity from references and complex requirements.
Very happy to see a concrete outcome from these suggestions!
I’ll note that I think this is a mistake that lots of people working in AI safety have made, ignoring the benefits of academic credentials and prestige because of the obvious costs and annoyance. It’s not always better to work in academia, but it’s also worth really appreciating the costs of not doing so in foregone opportunities and experience, as Vanessa highlighted. (Founder effects matter; Eliezer had good reasons not to pursue this path, but I think others followed that path instead of evaluating the question clearly for their own work.)
And in my experience, much of the good work coming out of AI Safety has been sidelined because it fails the academic prestige test, and so it fails to engage with academics who could contribute or who have done closely related work. Other work avoids or fails the publication process because the authors don’t have the right kind of guidance and experience to get their papers in to the right conferences and journals, and not only is it therefore often worse for not getting feedback from peer review, but it doesn’t engage others in the research area.
There aren’t good ways to do this automatically for text, and state of the art is rapidly evolving.
https://arxiv.org/abs/2403.05750v1
For photographic images which contain detailed images humans or contain non-standard objects with details, there are still some reasonably good heuristics for when AIs will mess up those details, but I’m not sure how long they will be valid for.
As you sort of refer to, it’s also the case that the 7.5 hour run time can be paid once, and then remain true of the system. It’s a one-time cost!
So even if we have 100 different things we need to prove for a higher level system, then even if it takes a year of engineering and mathematics research time plus a day or a month of compute time to get a proof, we can do them in parallel, and this isn’t much of a bottleneck, if this approach is pursued seriously. (Parallelization is straightforward if we can, for example, take the guarantee provided by one proof as an assumption in others, instead of trying to build a single massive proof.) And each such system built allows for provability guarantees for systems build with that component, if we can build composable proof systems, or can separate the necessary proofs cleanly.