AI alignment researcher supported by MIRI and LTFF. Working on the learning-theoretic agenda. Based in Israel. See also LinkedIn.
E-mail: vanessa DOT kosoy AT {the thing reverse stupidity is not} DOT org
AI alignment researcher supported by MIRI and LTFF. Working on the learning-theoretic agenda. Based in Israel. See also LinkedIn.
E-mail: vanessa DOT kosoy AT {the thing reverse stupidity is not} DOT org
Linkpost to Twitter thread is a bad format for LessWrong. Not everyone has Twitter.
I agree that in the long-term it probably matters little. However, I find the issue interesting, because the failure of reasoning that leads people to ignore the possibility of AI personhood seems similar to the failure of reasoning that leads people to ignore existential risks from AI. In both cases it “sounds like scifi” or “it’s just software”. It is possible that raising awareness for the personhood issue is politically beneficial for addressing X-risk as well. (And, it would sure be nice to avoid making the world worse in the interim.)
.
What is ? Also, we should allow adding some valid reward function of .
is a polytope with , corresponding to allowed action distributions at that state.
I think it’s mathematically cleaner to get rid of A and have those be abstract polytopes.
Did anyone around here try Relationship Hero and has opinions?
First, I said I’m not a utilitarian, I didn’t say that I don’t value other people. There’s a big difference!
Second, I’m not willing to step behind that veil of ignorance. Why should I? Decision-theoretically, it can make sense to argue “you should help agent X because in some counterfactual, agent X would be deciding whether to help you using similar reasoning”. But, there might be important systematic differences between early people and late people (for example, because late people are modified in some ways compared to the human baseline) which break the symmetry. It might be a priori improbable for me to be born as a late person (and still be me in the relevant sense) or for a late person to be born in our generation[1].
Moreover, if there is a valid decision-theoretic argument to assign more weight to future people, then surely a superintelligent AI acting on my behalf would understand this argument and act on it. So, this doesn’t compel me to precommit to a symmetric agreement with future people in advance.
There is a stronger case for intentionally creating and giving resources to people who are early in counterfactual worlds. At least, assuming people have meaningful preferences about the state of never-being-born.
Your “psychohistory” is quite similar to my “metacosmology”.
Disagree. I’m in favor of (2) because I think that what you call a “tyranny of the present” makes perfect sense. Why would the people of the present not maximize their utility functions, given that it’s the rational thing for them to do by definition of “utility function”? “Because utilitarianism” is a nonsensical answer IMO. I’m not a utilitarian. If you’re a utilitarian, you should pay for your utilitarianism out of your own resource share. For you to demand that I pay for your utilitarianism is essentially a defection in the decision-theoretic sense, and would incentivize people like me to defect back.
As to problem (2.b), I don’t think it’s a serious issue in practice because time until singularity is too short for it to matter much. If it was, we could still agree on a cooperative strategy that avoids a wasteful race between present people.
John Wentworth, founder of the stores that bear his name, once confessed: “I learned thirty years ago that it is foolish to scold. I have enough trouble overcoming my own limitations without fretting over the fact that God has not seen fit to distribute evenly the gift of intelligence.”
@johnswentworth is an ancient vampire, confirmed.
I’m going to be in Berkeley February 8 − 25. If anyone wants to meet, hit me up!
Where do the Base Rate Times report on AI? I don’t see it on their front page.
I honestly don’t know. The discussions of this problem I encountered are all in the American (or at least Western) context[1], and I’m not sure whether it’s because Americans are better at noticing this problem and fixing it, or because American men generate more unwanted advances, or because American women are more sensitive to such advances, or because this is an overreaction to a problem that’s much more mild than it’s portrayed.
Also, high-status men, really? Men avoiding meetups because they get too many propositions from women is a thing?
To be clear, we certainly have rules against sexual harassment here in Israel, but that’s very different from “don’t ask a woman out the first time you meet her”.
“It’s true that we don’t want women to be driven off by a bunch of awkward men asking them out, but if we make everyone read a document that says ‘Don’t ask a woman out the first time you meet her’, then we’ll immediately give the impression that we have a problem with men awkwardly asking women out too much — which will put women off anyway.”
American social norms around romance continue to be weird to me. For the record, y’all can feel free to ask me out the first time you meet me, even if you do it awkwardly ;)
“Virtue is its own reward” is a nice thing to believe in when you feel respected, protected and loved. When you feel tired, lonely and afraid, and nobody cares at all, it’s very hard to understand why you should be making big sacrifices for the sake of virtue. But, hey, people are different. Maybe, for you virtue is truly, unconditionally, its own reward, and a sufficient one at that. And maybe EA is a community professional circle only for people who are that stoic and selfless. But, if so, please put the warning in big letters on the lid.
There is tension between the stance that “EA is just a professional circle” and the (common) thesis that EA is a moral ideal. The latter carries the connotation of “things you will be rewarded for doing” (by others sharing the ideal). Likely some will claim that, in their philosophy, there is no such connotation: but it is on them to emphasize this, since this runs contrary to the intuitive perception of morality by most people. People who take up the ideology expecting the implied community aspect might understandably feel disappointed or even betrayed when they find it lacking, which might have happened to the OP.
As I said, cooperation is rational. There are, roughly speaking, two mechanisms to achieve cooperation: the “acausal” way and the “causal” way. The acausal way means doing something out of abstract reasoning that, if many others do the same, it will be in everyone’s benefit, and moreover many others follow the same reasoning. This might work even without a community, in principle.
However, the more robust mechanism is causal: tit-for-tat. This requires that other people actually reward you for doing the thing. One way to reward is by money, which EA does to some extent: however, it also encourages members to take pay cuts and/or make donations. Another way to reward is by the things money cannot buy: respect, friendship, emotional support and generally conveying the sense that you’re a cherished member of the community. On this front, more could be done IMO.
Even if we accept that EA is nothing more than a professional circle, it is still lacking in the respects I pointed out. In many professional circles, you work in an office with peers, leading naturally to a network of personal connections. On the other hand, AFAICT many EAs work independently/remotedly (I am certainly one of those), which denies the same benefits.
I agree with the OP that: Utilitarianism is not a good description of most people’s values, possibly not even a good description of anyone’s values. Effective altruism encourages people to pretend that they are intrinsically utilitarian, which is not healthy or truth-seeking. Intrinsic values are (to 1st approximation) immutable. It’s healthy to understand your own values, it’s bad to shame people for having “wrong” values.
I agree with critics of the OP that: Cooperation is rational, we should be trying to help each other over and above the (already significant) extent to which we intrinsically care about each other, because this is in our mutual interest. A healthy community rewards prosocial behavior and punishes sufficiently antisocial behavior (there should also be ample room for “neutral” though).
A point insufficiently appreciated by either: The rationalist/EA community doesn’t reward prosocial behavior enough. In particular, we need much more in the way of emotional support and mental health resources for community members. I speak from personal experience here: I am very grateful to this community for support in the career/professional sense. However, on the personal/emotional level, I never felt that the community cares about what I’m going through.
For the record, I contacted 3⁄4 but it led to nothing, alas. (I also thought of another person to contact but she moved to a different country in the intervening time.)
I wrote a review here. There, I identify the main generators of Christiano’s disagreement with Yudkowsky[1] and add some critical commentary. I also frame it in terms of a broader debate in the AI alignment community.
I divide those into “takeoff speeds”, “attitude towards prosaic alignment” and “the metadebate” (the last one is about what kind of debate norms should we have about this or what kind of arguments should we listen to.)
Yes, this is an important point, of which I am well aware. This is why I expect unbounded-ADAM to only be a toy model. A more realistic ADAM would use a complexity measure that takes computational complexity into account instead of . For example, you can look at the measure I defined here. More realistically, this measure should be based on the frugal universal prior.
Formalizing the richness of mathematics
Intuitively, it feels that there is something special about mathematical knowledge from a learning-theoretic perspective. Mathematics seems infinitely rich: no matter how much we learn, there is always more interesting structure to be discovered. Impossibility results like the halting problem and Godel incompleteness lend some credence to this intuition, but are insufficient to fully formalize it.
Here is my proposal for how to formulate a theorem that would make this idea rigorous.
(Wrong) First Attempt
Fix some natural hypothesis class for mathematical knowledge, such as some variety of tree automata. Each such hypothesis Θ represents an infradistribution over Γ: the “space of counterpossible computational universes”. We can say that Θ is a “true hypothesis” when there is some θ in the credal set Θ (a distribution over Γ) s.t. the ground truth Υ∗∈Γ “looks” as if it’s sampled from θ. The latter should be formalizable via something like a computationally bounded version of Marin-Lof randomness.
We can now try to say that Υ∗ is “rich” if for any true hypothesis Θ, there is a refinement Ξ⊆Θ which is also a true hypothesis and “knows” at least one bit of information that Θ doesn’t, in some sense. This is clearly true, since there can be no automaton or even any computable hypothesis which fully describes Υ∗. But, it’s also completely boring: the required Ξ can be constructed by “hardcoding” an additional fact into Θ. This doesn’t look like “discovering interesting structure”, but rather just like brute-force memorization.
(Wrong) Second Attempt
What if instead we require that Ξ knows infinitely many bits of information that Θ doesn’t? This is already more interesting. Imagine that instead of metacognition / mathematics, we would be talking about ordinary sequence prediction. In this case it is indeed an interesting non-trivial condition that the sequence contains infinitely many regularities, s.t. each of them can be expressed by a finite automaton but their conjunction cannot. For example, maybe the n-th bit in the sequence depends only the largest k s.t.2k divides n, but the dependence on k is already uncomputable (or at least inexpressible by a finite automaton).
However, for our original application, this is entirely insufficient. This is because in the formal language we use to define Γ (e.g. combinator calculus) has some “easy” equivalence relations. For example, consider the family of programs of the form “if 2+2=4 then output 0, otherwise...”. All of those programs would output 0, which is obvious once you know that 2+2=4. Therefore, once your automaton is able to check some such easy equivalence relations, hardcoding a single new fact (in the example, 2+2=4) generates infinitely many “new” bits of information. Once again, we are left with brute-force memorization.
(Less Wrong) Third Attempt
Here’s the improved condition: For any true hypothesis Θ, there is a true refinement Ξ⊆Θ s.t. conditioning Θ on any finite set of observations cannot produce a refinement of Ξ.
There is a technicality here, because we’re talking about infradistributions, so what is “conditioning” exactly? For credal sets, I think it is sufficient to allow two types of “conditioning”:
For any given observation A and p∈(0,1], we can form {θ∈Θ∣θ(A)≥p}.
For any given observation A s.t. minθ∈Θθ(A)>0, we can form {(θ∣A)∣θ∈Θ}.
This rules-out the counterexample from before: the easy equivalence relation can be represented inside Θ, and then the entire sequence of “novel” bits can be generated by a conditioning.
Alright, so does Υ∗ actually satisfy this condition? I think it’s very probable, but I haven’t proved it yet.