If you are going to read just one thing I wrote, read The Problem of the Criterion.
More AI related stuff collected over at PAISRI
If you are going to read just one thing I wrote, read The Problem of the Criterion.
More AI related stuff collected over at PAISRI
My vague impression is that for a while the US did have something like starting under FDR, but it broke in the post-Nixon era when politicians stopped being able to collude as well.
I’m suspicious of the strength of the claim this company is making. I think it’s more likely this is a publicity stunt.
First, there’s the legal issues. As far as I know, no jurisdiction allows software to serve as the officer of a company, let alone as the CEO. So to any extent an AI is calling the shots, there’s got to be humans in the loop for legal reasons.
Second, sort of unclear what this AI is doing. Sounds more like they just have some fancy analytics software and they’re saying it’s the CEO because they mostly do whatever their analytics say to do?
This would be a big deal if true, but seems like there’s not enough in this article to think an AI is now the CEO of a company in a meaningful way beyond the way companies already rely heavily on data and analytics and ML-based analytics to make decisions.
Oh, oops, thank you! I can’t believe I made that mistake. I’ll update my comment. I thought the number seemed really low!
There’s already a good answer to the question, but I’ll add a note.
Different people value different things, and so are willing to expend different amounts of effort to achieve different ends. As a result, even rational agents may not all achieve the same ends because they care about different things.
Thus we can have two rational agents, A and B. A cares a lot about finding a mate and not much else. B cares a lot about making money and not much else. A will be willing to invest more effort into things like staying in shape to the extent that helps A find a mate. B will invest a lot less in staying in shape and more in other things to the extent that’s the better tradeoff to make a lot of money.
Rationality doesn’t prescribe the outcome, just some of the means. Yes, some outcomes are convergent for many concerns, so many agents end up having the same instrumental concerns even if they have different ultimate concerns (e.g. power seeking is a common instrumental goal), but without understanding what an agent cares about you can’t judge how well they are succeeding since success must be measures against their goals.
So just to check, if we run the numbers, not counting non-human life or future lives, and rounding up a bit to an even 8 billion people alive today, if we assume for the sake of argument that each person has 30 QALYs left, that’s 8b * 30 QALY at stake with doom, and a 0.01% chance of doom represents the loss of 24 million QALYs. Or if we just think in terms of people, that’s the expected loss of 800 thousand people.
If we count future lives the number gets a lot bigger. If we conservatively guess at something like 100 trillion future lives throughout the history of the future universe with let’s say 100mm QALYs each, that’s 10^16 QALYs at stake.
But either way, since this is the threshold, you seem to think that, in expectation, less than 800,000 people will die from misaligned AI? Is that right? At what odds would you be willing to bet that less than 800,000 people die as a result of the development of advanced AI systems?
How large does it have to be before it’s worth focusing on, in your opinion? Even for very small probabilities of doom the expected value is extremely negative, even if you fully discount future life and only consider present lives.
Okay, but why? You’ve provided an assertion with no argument or evidence.
This post brought to mind a thought: I actually don’t care very much about arguments about how likely doom is and how pessimistic or optimistic to be since they are irrelevant, to my style of thinking, for making decisions related to building TAI. Instead, I mostly focus on downside risks and avoiding them because they are so extreme, which makes me look “pessimistic” but actually I’m just trying to minimize the risk of false positives in building aligned AI. Given this framing, it’s actually less important, in most cases, to figure out how likely something is, and more important to figure out how likely doom is if we are wrong, and carefully navigate the path that minimizes the risk of doom, regardless of what the assessment of doom is.
In general I think this is not possible. That’s because the thing we call awakening or enlightenment is just noticing the world as it as and responding our experience of the world to include it rather than close it out. It’s not something special, just the mundane realization of life as it is. That we build it up into something is because we have trouble making sense of it when we’ve not fully experienced it.
To me your question is a bit like asking how can I have my eyes open and avoid seeing any colors when you believe the world is black and white. Trouble is, you’re already seeing the colors, even if you’re really good at ignoring them. Anything you do might cause you start noticing them, and an activity designed to help you notice things will only be especially good at this.
If you’re really set on this goal, though, there are ways you could try to actively avoid noticing the world. Don’t do meditation like described in TMI or any Buddhist text. In fact, don’t meditate at all. Also stay away from psychedelics. Instead, you can learn to hypnotize yourself to put yourself into trance states. This is a bit like the stupor benzos and alcohol induce. I don’t recommend it, and it can be habit forming, but it exists as an alternative to facing reality as it already is.
A good specific example of trying to pull this kind of shell game is perhaps HCH. I don’t recall if someone made this specific critique of it before, but it seems like there’s some real concern that it’s just hiding the misalignment rather than actually generating an aligned system.
I guess this is a concern, but I’m also concerned if we don’t invest enough in deep ideas that we then later regret working on. This seems less a matter of choosing between than doing both and growing the number of folks working on alignment so we can tackle many potential solutions to find what works.
I think this is misunderstanding the orthogonality thesis, but we can talk about it over on that post perhaps. The problem of converging to power seeking is well known, but this is not seen as a an argument against the orthogonality thesis, but rather a separate but related concern. I’m not aware of anyone who thinks they can ignore concerns about instrumental convergence towards power seeking. In fact, I think the problem is that people are all too aware of this, and thing that a lack of orthogonality thesis mitigates it, while the point of the orthogonality thesis is to say that it does not resolve on its own the way it does in humans.
Actually the opposite seems true to me. Assuming the orthogonality thesis is the conservative view that’s less likely to result in a false positive (think you built aligned AI that isn’t). Believing it is false seems more likely to lead to building AI that you think will be aligned but then is not.
I’ve explored this kind of analysis here, which suggests we should in some cases be a bit less concerned with what’s true and a bit more concerned with, given uncertainty, what’s most dangerous if we think it’s true and we’re wrong.
There is no AI police, for better or worse, though coordination among AI labs is an active and pressing area of work. You can find more about it here and on the EA Forum.
Not yet. The review process looks at posts from the previous year and happens in December, so for example in December 2022 we reviewed posts from 2021. Since your post was made in 2023, it will be eligible for the December 2024 review cycle.
I think this is a type error, by which I mean the thing we do on Less Wrong is not science, so it doesn’t make sense to try to find a scientific consensus.
The thing we try to do here is rationality, of the sort described in the sequences. Science is sometimes a useful thing to do, but it’s not the only thing, and in this case it doesn’t quite make sense.
Further, voting is very much not about figuring out what’s right or true; it’s just people saying I want or don’t want to read things like this, which is something different. Less Wrong has, for the past few years, conducted an annual review, which is a bit more like peer review, and you can read things that “passed review” here, though note that as far as I know the reviews have never pulled up a post that got a negative score, though they have found gems that were under-appreciated at time of publication.
Looking at the linked post, I’m not sure what the issue is. There’s several people arguing that you made a mistake, and they downvoted likely because they think you made the mistake in an obvious way and don’t see your post as worth others engaging with.
The intended purposes of votes on LessWrong is I want to see more/less of things like this. It’s a feedback signal from the community about what kind of content they want to read. Yes, this can have some weird effects, where I think sometimes the community upvotes things that are foolish and downvotes things because they either don’t understand them or find them personally threatening, but it’s hard to get people to do better and on the whole, thanks to various mechanisms that incentivize the people who remain part of the readership, it tends to work well enough that probably 90%+ of posts get a score that’s reasonable. That still means there’s some ~10% in error, but that’s pretty good for asking a mob to rate things!
Looking at the linked post, I’m not sure what the issue is. There’s several people arguing that you made a mistake, and they downvoted likely because they think you made the mistake in an obvious way and don’t see your post as worth others engaging with.
The intended purposes of votes on LessWrong is to signal “I want to see more/less of things like this”. It’s a feedback signal from the community about what kind of content they want to read. Yes, this can have some weird effects, where I think sometimes the community upvotes things that are foolish and downvotes things because they either don’t understand them or find them personally threatening, but it’s hard to get people to do better and on the whole, thanks to various mechanisms that incentivize the people who remain part of the readership, it tends to work well enough that probably 90%+ of posts get a score that’s reasonable. That still means there’s some ~10% in error, but that’s pretty good for asking a mob to rate things!
What would it mean for the consensus to be scientific? I’m having a hard time parsing what this could mean, since posts and comments are usually not making claims that we could verify via experimental processes.
@abramdemski Wanted to say thanks again for engaging with my posts and pointing me towards looking again at Lob. It’s weird: now that I’ve taken so time to understand it, it’s just what in my mind was already the thing going on with Godel, just I wasn’t doing a great job of separating out what Godel proves and what the implications are. As presented on its own, Lob didn’t seem that interesting to me so I kept bouncing off it as something worth looking at, but now I realize it’s just the same thing I learned from GEB’s presentation of Peano arithmetic and Godel when I read it 20+ years ago.
When I go back to make revisions to the book, I’ll have to reconsider including Godel and Lob somehow in the text. I didn’t because I felt like it was a bit complicated and I didn’t really need to dig into it since I think there’s already a bit too many cases where people use Godel to overreach and draw conclusions that aren’t true, but it’s another way to explain these ideas. I just have to think about if Godel and Lob are necessary: that is, do I need to appeal to them to make my key points, or are these things that are better left as additional topics I can point folks at but not key to understanding the intuitions I want them to develop.