TheoR
So is it an example of a statement that is maybe a truth because it is maybe a lie? And if it is definitely one or the other, it is definitely a lie? Fun!
I think notion of “one-shot-ness” introduces a counterproductive dichotomy. A lot of people, even those aligned with the general AI-risk position of the writer, react to this framing as downplaying or dismissing the role of empirical research, trials with smaller AIs, etc. (Oliver’s quotes as evidence). I think it doesn’t necessitate strawmaning to reply in this way! And I found confusion around “Yudkowsky wants us to have perfect theoretical understanding before we try anything” reasonable, even if it misrepresents the writer’s position.
Indeed, “one-shot-ness” is, like Kokotajlo notes, a graded property. I think the writer is more likely disagreeing qualitatively on how hard the alignment problem is with Buck, Paul and others than quantitatively somehow (and perhaps, most bizzarely, not disagreeing at all!). To any reasonable person it is obvious any problem has some (often trivial) degree of “we are doing this for a first time”.
I personally hold that it is more productive for the broad discourse to argue instead directly on if AI alignment is at least as hard as, say, launching a space probe to Mars, and what are then the expected costs given different strategies. I think it is a much clearer position to say “If we launch ASI inference and it is misaligned, humanity probably dies”, “With near future understanding of AI internals, if we launch ASI inference, chance that it is misaligned is higher than the chance that a probe to Mars explodes, because A, B and C” → “If we launch ASI inference with near future technology, humanity probably dies”.
Maybe an even better position, if a bit more complicated, is “There exists AI with such internal capabilities that when we launch its inference, humanity probably dies”, coupled with “Current incentives of people developing AI do not have any built-in breaks, and thus, capabilities will increase as much as technologies allow”. This sidesteps the problems of defining ASI and predicting the exact amount of capabilities required to spell humanity’s doom.
Whoever who wants to dismiss this prediction then needs to argue why AI alignment is easier than launching a probe to Mars, or that necessary capabilities won’t be reached.
Practically, mode collapse seems like a bad thing by itself if (a) underlying reality shifts, or (b) if your beliefs in the first place were incorrect. Example of (a) would be when, after your mode collapsed, animal welfare becomes 80% of the proposals. Example of (b) would be an image model that “didn’t know” that diversity of outputs is in itself a value.
(b) doesn’t seem as bad for humans, because if we are investigating our beliefs, and find out some of our previously held convictions were wrong, we can try to trace back what decisions those informed, and break out of harmful mode collapse.
(a) is worse, because mode collapse deprives us of the signal on the distribution shift itself, making it hard to detect if it happened.
Good news are, solving (a) doesn’t require taking random walks periodically to balance out this exploration / exploitation dilemma. I would wager in most situations taking explicit action to check for distribution shift is cheaper and more efficient. Coming back to the grantmaker example, periodically checking true market distribution of grant proposals between global welfare and animal welfare is presumably cheaper than randomly trying out hiring people who are really good at evaluating animal welfare.
PS: This is ignoring effects that your mode collapse has on the market of grant proposals itself, which is unrealistic. That is why I start with “Practically, mode collapse seems like a bad thing by itself”.
I don’t think we (currently, I do not make the case for me-living-in-1490s) live in a world of scarce opportunities to apply yourself to something. Yes, things that turn out to be good can sometimes not be done with good intentions of the majority of people doing them. But also, most (?) bad things are done by people with bad intentions (or bad beliefs). I would be surprised if in a world where you can choose to apply yourself to movements that exercise effort to have good intentions & good beliefs you wouldn’t be better off assigning some weight to this heuristic.
In other words, people involved trying to do something good is not a prerequisite, but I would guess it is a non-negligible predictor. (You don’t explicitly state that you disagree that it is an significant predictor, so maybe we agree on this actually, but your last paragraph seems to imply that taking mission alignment into account for harm forecasting does not improve its quality, which is a bit counterintuitive for me).
I do agree if your alternative is “having goals of a corpse” you probably should try to make American colonisation less-bad or its results more-good instead of just doing nothing.
I’m under impression that conquest of the American continent was not a project of goodness, it was a project of conquest. Then, revolution happened (the way it happened), Washington resigning happened, and a whole lot of other things, and in the end, it could be argued (controversially), that, given on oracle into 2026 and a counterfactual world, a person guided by “goodness” would not oppose the colonial project. But original conquerors didn’t have this oracle, and weren’t acting in service of “goodness”. There are, I believe, countless examples throughout history where similar incentives and decision making resulted in net negative atrocities.
So, I believe, if faced with horrors of the American colonisation, it is probable a “good” person would actually try to stop it. In other words, I don’t know if American colonialism is good in expectation.
I do enjoy the motto itself that you state in this post. It is inspiring, extreme like all good mottos. I’m tired of good people trying to remain innocent. I’m just not sure if American colonialism is directly relevant to this idea in a way that your post seems to frame it. Please, correct me if I misinterpreted it somewhere, or if my knowledge about the ideas behind the American conquest is incorrect.
PS: It is a bit tragic so much of the discussion in the comments is centered around colonialism argument, my comment included. I would enjoy seeing more discussion on how can goodness conquer while remaining good. I’m of Russian background, and modern opposition repeatedly tried to oppose dictatorship with democratic tools, and rightfully lost. However, attempting to threaten government with direct violence runs against the principles of democracy itself that this goodness is based on. This seems like a dilemma to which I struggle to find a good solution. However, it is also a very obvious dilemma to arrive at, so I would be surprised if it wasn’t already addressed in rational discourse extensively.
I’m somewhat new to the rationalist space. I’m under impression rationalist thought, existence / magnitude of which is partially supported through EA infrastructure, is one of the main driving forces behind whatever progress we have at AI Safety. Most of the people I meet in this space are at least EA / rationality adjacent.
I read your comment as implying that EA is, at least within the scope of just the accelerating ASI, a net negative and world would be better off counterfactually if EA was to retire. Can you explain why do you believe what you believe?
I’m sceptical of this being a universal rule because I don’t know why you believe what you believe. However, I want to scream “preach!” because empirically I agree 100%.
I don’t know if it is my theatre background, but another thing I find severely underappreciated are textured shadows. Worse than a blue LED is only a bright blue LED in the middle of the ceiling as the only light source.
I think (based on photos), it is another thing Lighthaven does really well!
I want to add that helping out someone with low morale seems like a very high impact intervention to me, if you are in a good position to help! Low morale predicts even lower morale, both because there is likely something in the environment causing it, and because you have less desire to act or contribute effort, which starves you of positive rewards. In that regard, it is similar to depression, but far more people are in position to help someone with morale rather than with treating depression, by rewarding someone neglected fairly for their contributions.
If you accept both properties and you violate independence, you can be money-pumped. Here is how it works, concretely. Suppose your preference between gambles A and B depends on what the common component C is (as the independence axiom says it shouldn’t). Before the uncertainty resolves, you evaluate the compound lottery holistically and prefer the plan involving B (because, in combination with the C branch, B produces a better overall distribution). But then the coin comes up heads, the C branch is now off the table, and you find yourself choosing between A and B in isolation. Consequentialism says you should evaluate based on what’s still possible. And in isolation, you prefer A. So you switch from your plan (B) to your current preference (A). You are dynamically inconsistent.
Can someone clarify this passage to me? I find myself increasingly confused. Earlier, we assume agent can form a plan: “if the coin comes up heads (no C), I will choose A, if coin comes up tails, I will choose B (with C)”. How can I be money pumped? I don’t violate dynamic consistency nor do I violate consequentialism. Yet I violate independence, and can’t be money pumped. I can’t be convinced to pre-commit to either B or A, since there are no predictors involved, and I can just postpone my actual choice.
Edit: Actually, I don’t violate independence either, these are simply different outcomes. So I don’t understand this argument at all.
Here is the specific confusion that matters for our purposes. When someone says “a rational agent maximizes expected utility,” this sounds, to a casual listener, like it means “a rational agent computes the probability-weighted average of their subjective values across all possible outcomes.” In other words, it sounds like the agent takes f1, the function representing how good each outcome feels or how much they value it, and averages it across possible worlds, weighted by probability. This would mean that the agent literally values a gamble at the weighted sum of how much they value each possible result.
This seems untrue. “a rational agent computes the probability-weighted average of their subjective values across all possible outcomes.” isn’t the same as agent taking expected value of f1. Expected value of f1 doesn’t carry any meaning at all — it is ordinal, not cardinal. I could prefer two apples to one apple only slightly, but f1(two apples) would be extremely larger than f1(one apple), without violating any Debreu’s theorems.
What this actually says is that agents takes f2, and takes expected value of it across all possible outcomes. This is exactly what VNM agent does per the original theorem, and it is true, per my understanding, that agents “value gambles at the weighted sum of how much they value each possible result”.
I think the most natural fix within the VNM theory is to just say S’ and D’ are the events “car is awarded so son/daughter based on a coin toss”, which are slightly better than S and D themselves, and that F is really 0.5S’ + 0.5D’. Unfortunately, such modifications undermine the applicability of the VNM theorem, which implicitly assumes that the source of probabilities itself is insignificant to the outcomes for the agent. Luckily, Bolker4 has divised an axiomatic theory whose theorems will apply without such assumptions, at the expense of some uniqueness results. I’ll have another occasion to post on this later.
I don’t know if author has made further comment on this. I don’t think this undermines the applicability of VNM. If the agent cares whether the car was assigned via a coin toss, then the relevant consequences aren’t just S and D, but richer outcomes like S′ = “son gets car via coin toss” and D′ = “daughter gets car via coin toss.” In that case, the original model just used too coarse a consequence space; VNM can still be applied to lotteries over the refined outcomes. What would challenge VNM is insisting that two lotteries over the same fully specified outcomes can still differ in value purely because of how the probabilities are generated. However, if we assume a deterministic universe, we are allowed to expand the outcome space indefinitely until there is no probability involved, so I’m having a hard time imagining such a scenario.
Yes, this was exactly what I was pointing at, unless I misunderstood your further comment and it does, in fact, point to a mistake of my reasoning.