Done! There aren’t enough mysterious old wizards.
P.
Public mechanistic interpretability research is net positive in expectation.
It seems they edited the paper, where now there is a close up of a handpalm with leaves growing from it, there once was a super saiyan sentient bag of potato chips:
He says he will be doing alignment work, the worst thing I can think of that can realistically happen is that he gives OpenAI unwarranted confidence in how aligned their AIs are. Working at OpenAI isn’t intrinsically bad, publishing capabilities research is.
Positive:
People will pay way less for new pretty images than they did before.
Thanks to img2img people that couldn’t draw well before now finally can: https://www.reddit.com/r/StableDiffusion/comments/wvcyih/definitely_my_favourite_generation_so_far/
Because of this, a lot more art will be produced, and I can’t wait to see it.
Since good drawings are now practically free, we will see them in places where we couldn’t before, like in fanfiction.
Stable Diffusion isn’t quite as good as a talented artist, but since we can request hundreds of variations and pick the best, the quality of art might increase.
Ambiguous or neutral:
It can produce realistic images and it is easier to use and more powerful than Photoshop, so we will see a lot of misinformation online. But once most people realize how easy it is to fabricate false photographs hopefully it will lead them to trust what they see online way less than they did before, and closer to the appropriate level.
Anyone will be able to make porn of anyone else. As long as people don’t do anything stupid after seeing the images, this seems inconsequential. As discussed on HN, it might cause people to stop worrying about others seeing them naked, even if the photos are real.
Anyway, both of these will cause a lot of drama, which I at least, perhaps selfishly, consider to be slightly positive.
Negative:
I expect a lot of people will lose their jobs. Most companies will prefer to reduce their costs and hire a few non-artists to make art rather than making more art.
New kinds of scams will become possible and some people will keep believing everything they see online.
Unlike DALL-E 2, anyone can access this, so it will be much more popular and will make many people realize how advanced current AI is and how consequential it will be, which will probably lead to more funding.
Research into getting a mechanistic understanding of the brain for purposes of at least one of: understanding how values/empathy works in people, brain uploading or improving cryonics/plastination is net positive and currently greatly underfunded.
You know of a technology that has at least a 10% chance of having a very big novel impact on the world (think the internet or ending malaria) that isn’t included in this list, very similar, or downstream from some element of it: AI, mind uploads, cryonics, human space travel, geo-engineering, gene drives, human intelligence augmentation, anti-aging, cancer cures, regenerative medicine, human genetic engineering, artificial pandemics, nuclear weapons, proper nanotech, very good lie detectors, prediction markets, other mind-altering drugs, cryptocurrency, better batteries, BCIs, nuclear fusion, better nuclear fission, better robots, AR, VR, room-temperature superconductors, quantum computers, polynomial time SAT solvers, cultured meat, solutions to antibiotic resistance, vaccines to some disease, optical computers, artificial wombs, de-extinction and graphene.
Bad options included just in case someone thinks they are good.
Doesn’t minimizing the L1 norm correspond to performing MLE with laplacian errors?
I wish I had a better source, but in this video, a journalist says that a well-equipped high schooler could do it. The information needed seems to be freely available online, but I don’t know enough biology to be able to tell for sure. I think it is unknown whether it would spread to the whole population given a single release, though.
If you want it to happen and can’t do it yourself nor pay someone else to do it, the best strategy might be to pay someone to translate the relevant papers into instructions that a regular smart person can follow and then publish them online. After making sure to the best of your capabilities (i.e. asking experts the right questions) that it actually is a good idea, that is.
ELK itself seems like a potentially important problem to solve, the part that didn’t make much sense to me was what they plan to do with the solution, their idea based on recursive delegation.
If we had access to a brain upload (and maybe a world simulator too) we could in principle extract something like a utility function, and the theory behind it relates more to agents in general than it does to humans in particular.
Given the flood of comments that will inevitably result from this, it might be hard to get noticed and to surface the best ones to the top. So I am offering the following service: If you reply to this I guarantee that I will read your comment, and then will give you one or two upvotes (or none) depending on how insightful I consider it to be. Sadly, this only works if people get to see this comment, so it is in your best interest to upvote it. Let’s turn this into a new, better comment section!
It depends on what you know about the model and the reason you have to be concerned in the first place (if it’s just “somehow”, that’s not very convincing).
You might be worried that training it leads to the emergence of inner-optimizers, be them ones that are somehow “trying” to be good at prediction in a way that might generalize to taking real-life actions, approximating the searchy part of the humans they are trying to predict, or just being RL agents. If you are just using basically standard architectures with a lot more compute, these all seem unlikely. But if I were you, I might try to test its ability to perform well in a domain it has never seen, where humans start by performing poorly but very quickly learn what to do (think about video games with new mechanics). If it does well, you have a qualitatively new thing on your hands, don’t deploy, study it instead. If a priori for some reason you think it could happen, and only a small subset of all the data is necessary to achieve that, do a smaller training run first with that data.
Or you might be worried about mostly external consequentialist cognition (think explicit textual it-then-elses). In that case, existing systems can already do it to some extent, and you should worry about how good its reasoning actually is, so perform capability evaluations. If it looks that there is some way of getting it to do novel research by any known method or that it’s getting close, don’t deploy, otherwise someone might figure out how to use it to do AI research, and then you get a singularity.
And in any case, you should worry about the effects your system will have on the AI race. Your AI might not be dangerous, but if it is a good enough lawyer or programmer that it starts getting many people out of their jobs, investment in AI research will increase a lot and someone will figure out how to create an actual AGI sooner than they would otherwise.
Edit: And obviously you should also test how useful it could be for people trying to do mundane harm (e.g. with existing pathogens) and, separately, there might not be a hard threshold on how good a model is at doing research that it starts being dangerous, so they might get there little by little and you would be contributing to that.
Edit in response to the second clarification: Downscale the relevant factors, like amount of training data, number of parameters and training time, or use a known-to-be-inferior architecture until the worrying capabilities go away. Otherwise, you need to solve the alignment problem.
Edit in response to Beth Barnes’s comment: You should probably have people reviewing outputs to check the model behaves well, but if you actually think you need measures like “1000 workers with technical undergrad degrees, paid $50/hr” because you are worried it somehow kills you, then you simply shouldn’t deploy it. It’s absurd to have the need to check whether a commercial product is an existential threat, or anything close to that.
Thanks, I’ve added him to my list of people to contact. If someone else wants to do it instead, reply to this comment so that we don’t interfere with each other.
Ok, I sent them an email.
I have a few questions:
Do we need to study at a US university in order to participate? I’m in Europe.
Who should be the target audience for the posts? CS students? The average LW reader? People somewhat interested in AI alignment? The average Joe? How much do we need to dumb it down?
Can we publish the posts before the contest ends?
Will you necessarily post the winners’ names? Can we go by a pseudonym instead?
How close to the source material should we stay? I might write a post about what value learning is, why it seems like the most promising approach and why it might be solvable, which would involve explaining a few of John Wentworth’s posts. But I don’t think my reasoning is exactly the same as his.
Also, is there any post that whoever is reading this comment tried and failed to understand? Or better yet, tried hard to understand but found completely impenetrable? If so, what part did you find confusing? If I choose to participate and try to explain that post, would you volunteer to read a draft to check that I’m explaining it clearly?
We shouldn’t be surprised by the quality of the images, but since this will become a commercial product and art is something that is stereotypically hard for computers, I wonder if for the general public (including world governments) this will be what finally makes them realize that AGI is coming. OpenAI could have at least refrained from publishing the paper, it wouldn’t have made any difference but would have been a nice symbolic gesture.
I think going to congress would be counterproductive and would just convince them to create an AGI before their enemies do.
It’s not quite what you want, but there’s this: https://forum.effectivealtruism.org/community#individuals and this: https://eahub.org/
For whatever it is worth, this post along with reading the unworkable alignment strategy on the ELK report has made me realize that we actually have no idea what to do and has finally convinced me to try to solve alignment, I encourage everyone else to do the same. For some people knowing that the world is doomed by default and that we can’t just expect the experts to save it is motivating. If that was his goal, he achieved it.