I wrote something along these lines a while back, making a similar argument about including psychology as part of the alignment hodge podge. Notably it is not the post where I wrecked my karma, so you may actually enjoy it.
I’m sorry, it’s not up anymore. The code is still available if you wanted to host it, but part of why I took it down is that it depends on a Twilio video chat API that they’ve discontinued, so you would need to find some replacement for that.
I expect a roughly 5.5 month doubling time in the next year or two, but somewhat lower seems pretty likely. The proposed timeline I gave consistent with Anthropic’s predictions requires <1 month doubling times (and this is prior to >2x AI R&D acceleration, at least given my view of what you get at that level of capability).
The fourth color opsin for these animals lies in the ultraviolet.
Hmm, but my understanding is that humans who are natural tetrachromats see more colors in the yellow-red part of the spectrum. And humans already can see UV light a little, but the thing that stops this from being visible is actually the lens, which blocks UV light normally to protect our eyes. So what I know suggests that even if we could change our cones we’d still fail to see UV light.
I’m maintaining a reading list of philosophy of LLMs papers; reach out if you have recommendations.
It’s a point of shame. It was very short, not very good, and downvoted below zero. I’m thinking of writing a better version. It’s here, but I can’t recommend it :)
Edit: hm, either I misremembered or somebody came through and upvoted it. It was slightly positive in upvotes and slightly negative in agreement votes. Maybe I’m too sensitive.
2nd edit: it’s not actually as bad as I remember, either. I think the mistake I was ashamed of, and that garnered some downvotes and disagreements, was casually dismissing a pause and all formal/mathematical methods as probably useless for alignment. Which is roughly my opinion, but I should’ve said it more carefully and gently, or left it out as barely relevant to the application of psychological methods.
The UK Health Secretary in 2021, Matt Hancock, ordered 100m vaccines, rather than 30m, because of the film Contagion
The details are complicated, vary a lot person-to-person, and I’m not sure which are OK to share publicly; the TLDR is that relatively early employees have a 3:1 match on up to 50% of their equity, and later employees a 1:1 match on up to 25%.
I believe that many people eligible for earlier liquidation opportunities used the proceeds from said liquidation to exercise additional stock options, because various tax considerations mean that doing so ends up being extremely leveraged for one’s donation potential in the future (at least if one expects the value of said options to increase over time); I expect that most people into doing interesting impact-maximizing things with their money took this route, which doesn’t produce much in the way of observable consequences right now.
I think the “all demons are fallen humans” interpretation is supported by one of Jinu’s lines: “That’s all demons do. Feel. Feel our shame, our misery. It’s how Gwi-Ma controls us”
Which post are you referring to re: psychology and alignment?
I’ve written about this here:
https://www.lesswrong.com/posts/kFRn77GkKdFvccAFf/100-years-of-existential-risk
Give that the type of hot that you’re going for is “rapey but in the sexy way” is it much of a surprise that women don’t send you highly legible signals that they’re interested?
In my experience, when I’ve been going for a kind of “interesting indie intellectual” vibe (especially when trying to date nonbinary/masculine women) I have had moves made on me directly, whereas when I’ve leaned more towards what you’re describing, this basically never happens.
You’re going to change it as you go along, as you get feedback from users and discover what people really need.
This is one part I feel iffy on because I’m concerned that following the customer gradient will lead to a local minima that will eventually detach from where I’d like to go.
That said, it definitely feels correct to reflect on one’s alignment and incentives. The pull is real:
All of this makes it tricky to start a pro-alignment company but I think it is worth trying because when people do create a successful company it creates a nexus of smart people and money to spend that can attack a lot of problems that aren’t possible in the “nonprofit research” world.
Yeah, that’s the vision! I’d have given up and taken another route if I didn’t think there was value in pursuing a pro-safety company.
i notice the OP didn’t actually mention examples of legible or illegible alignment problems. saying “leaders would be unlikely to deploy an unaligned AGI if they saw it had legible problem X” sounds a lot like saying “we would never let AGI onto the open internet, we can just keep it in a box”, in the era before we deployed sydney soon as it caught the twinkle of a CEO’s eye.
‘Much’ is open to interpretation, I suppose. I think the post would be better served by a different example.
Here’s a sequel hook:
The “demon world” isn’t the demon world; not originally anyway. It’s the world of memory; of individual and cultural remembrance. Its landscape is full of old and significant things. It was largely taken over by Gwi-ma (at root, a recursive shame-trauma loop) due to how many shameful traumatic memories humans have generated.
But putting up the Golden Honmoon turns out to be a bad idea, because it cuts people off from memory and history. On the “demon world” side, the demons are indeed starving: but on the Earthly side, the humans are, too: starving for meaning, as they fall into anomie, alienation, numbness, rootlessness, and increasingly vapid lyrics.
Our kkachi horangi (tiger and magpie) friends show up mangy and tattered …
I think Eliezer has oft-made the meta observation you are making now, that simple logical inferences take shockingly long to find in the space of possible inferences. I am reminded of him talking about how long backprop took.
In 1969, Marvin Minsky and Seymour Papert pointed out that Perceptrons couldn’t learn the XOR function because it wasn’t linearly separable. This killed off research in neural networks for the next ten years.
[...]
Then along came this brilliant idea, called “backpropagation”:
You handed the network a training input. The network classified it incorrectly. So you took the partial derivative of the output error (in layer N) with respect to each of the individual nodes in the preceding layer (N − 1). Then you could calculate the partial derivative of the output error with respect to any single weight or bias in the layer N − 1. And you could also go ahead and calculate the partial derivative of the output error with respect to each node in the layer N − 2. So you did layer N − 2, and then N − 3, and so on back to the input layer. (Though backprop nets usually had a grand total of 3 layers.) Then you just nudged the whole network a delta—that is, nudged each weight or bias by delta times its partial derivative with respect to the output error.
It says a lot about the nonobvious difficulty of doing math that it took years to come up with this algorithm.
I find it difficult to put into words just how obvious this is in retrospect. You’re just taking a system whose behavior is a differentiable function of continuous paramaters, and sliding the whole thing down the slope of the error function. There are much more clever ways to train neural nets, taking into account more than the first derivative, e.g. conjugate gradient optimization, and these take some effort to understand even if you know calculus. But backpropagation is ridiculously simple. Take the network, take the partial derivative of the error function with respect to each weight in the network, slide it down the slope.
If I didn’t know the history of connectionism, and I didn’t know scientific history in general—if I had needed to guess without benefit of hindsight how long it ought to take to go from Perceptrons to backpropagation—then I would probably say something like: “Maybe a couple of hours? Lower bound, five minutes—upper bound, three days.”
“Seventeen years” would have floored me.
it is time for sinc to INK
sorry for being crazy. i am well rested. i am ready to post good posts. you will be amazed
I don’t think you need a vision for how to solve the entire alignment problem yourself. It’s setting the bar too high. When you start a startup, you can’t possibly have the whole plan laid out up front. You’re going to change it as you go along, as you get feedback from users and discover what people really need.
What you can do is make sure that your startup’s incentives are aligned correctly at the start. Solve your own alignment. The most important questions here are, who is your customer? and how do you make money?
For example, if you make money by selling e-commerce ads against a consumer product, the incentives on your company will inevitably push you toward making a more addictive, more mass-market product.
For another example, if you make money by selling services to companies training AI models, your company’s incentives will be to broaden the market as much as possible, help all sorts of companies train all sorts of AI models, and offer all the sorts of services they want.
In the long run, it seems like companies often follow their own natural incentives, more than they follow the personal preferences of the founder.
All of this makes it tricky to start a pro-alignment company but I think it is worth trying because when people do create a successful company it creates a nexus of smart people and money to spend that can attack a lot of problems that aren’t possible in the “nonprofit research” world.
I think selling alignment-relevant RL environments to labs is underrated as an x-risk-relevant startup idea. To be clear, x-risk-relevant startups is a pretty restricted search space; I’m not saying that one necessarily should be founding a startup as the best way to address AI x-risk, but just operating under the assumption that we’re optimizing within that space, selling alignment RL environments is definitely the thing I would go for. There’s a market for it, the incentives are reasonable (as long as you are careful and opinionated about only selling environments you think are good for alignment, not just good for capabilities), and it gives you a pipeline for shipping whatever alignment interventions you think are good directly into labs’ training processes. Of course, that’s dependent on you actually having a good idea for how to train models to be more aligned, and that intervention being in the form of something you can sell, but if you can do that, and you can demonstrate that it works, you can just sell it to all the labs, have them all use it, and then hopefully all of their models will now be more aligned. E.g. if you’re excited about character training, you can just replicate it, sell it to all the labs, and then in so doing change how all the labs are training their models.