Zvi

Karma: 40,440

Zvi 17 Nov 2017 13:26 UTC
87 points
on: Hero Licensing
I found this to be a much better thought out, better explained, better reasoned and just plain more fun than all but the first few chapters of the actual book. In my mind these are just better examples, and examples that Eliezer understands better and for which he can present better and more accurate evidence.
It’s almost as if the book was written, using sufficiently modest examples and ludicrously charitable assumptions, so Eliezer would feel he has the right to say the things here, the things he actually means to say, and that this is the real point. That would help explain why (at least to me) the middle part of the book felt like it was flailing around and didn’t make sense. It wasn’t trying to. And so, at the right level of viewing, that too serves as an example of the problems at hand.
And of course, even then, this doesn’t go far enough. You don’t just need a hero liscence. You need an occupational liscence. You need a thinking liscence. You need a knowing anything at all liscence. And you need to not be caught not enforcing the liscensing agreements, at an arbitrary meta level, or to be caught being motivated by something other than such enforcement, lest you be smacked down to avoid someone else being caught thus, or even worse, scapegoated. And you don’t know what would be used as evidence or trigger such action, including your failure to identify that there was evidence on which you should have taken such action. Thus, the imitation game and its recursive, backward-chaining justifactions, like modesty arguments that mysteriously always have the same answer written at the bottom of the page before they start.
Or you could realize that the result of such a process seems unlikely to give efficient or reasonable answers in general, decide that’s more important than these hits to your status, and not do that.

Zvi 20 Jun 2022 18:24 UTC
72 points
41
in reply to: TurnTrout’s comment on: Where I agree and disagree with Eliezer
Sounds like same way we had a dumb questions post we need somewhere explicitly for posting dumb potential solutions that will totally never work, or something, maybe?
What links here?
- Half-baked AI Safety ideas thread by Aryeh Englander (23 Jun 2022 16:11 UTC; 64 points)
- Half-baked ideas thread (EA / AI Safety) by Aryeh Englander (EA Forum; 23 Jun 2022 16:05 UTC; 21 points)

Zvi 28 Apr 2019 23:13 UTC
70 points
on: The Forces of Blandness and the Disagreeable Majority
(Epistemic Status: Quick brainstorm slash free form just-write-it exercise. This wants to be a post but want to throw it out as a comment quickly first and see if it sounds right.)
Could we tie this directly in with Asymmetric Justice?
If you are a big thing you are being evaluated primarily on the basis of what horrible things you’ve done, and reap little of the relative benefit from the brilliant things. If you’re going to then enable many weird offensive things, that’s a losing plan. Even if the group is a huge win on net, some of them will be bad and get you in a lot of trouble.
If you are a small thing, and want to do one weird thing as the only thing, you have a chance that it turns out all right at least with respect to those you are appealing to with your newspaper, blog or what have you. So you can gain the benefits of exploration, free expression, creation of knowledge and so on.
If you are a medium-size thing doing correlated weird things, which are weird and offensive in the same way, then again your risk is contained, because if they’re sufficiently correlated, it’s all one thing, so you won’t reliably be evaluated as bad and can again get the benefits of your one thing. But it also means that in order to do that, you need to be consistent. No violating your group’s party lines so they evaluate you as just. And of course you need to support free speech to avoid being shut down yourself by the “moderates.”
So what happens? “The center” or ’moderates” trying to hold is the biggest thing, has to worry about all sides judging it asymmetrically, so it is forced to come out in favor of blandness. Since a big thing like capitalism or a major corporation or the government interacts with tons of stuff enough to get blamed for it they need to censor it in order to not be found guilty. Increasing polarization and uniformity on all sides.
And in parallel, as a moderate proposing policies and law, you can accuse a whole class of tings of being bad because one of them is bad with respect to one thing, and thus make the case that one must censor.
Which means this “moderate center” isn’t actually anything of the sort. It’s a third power with very little popular support trying to cram things down our throats, because they understand our point scoring systems better than we do—and only partly because they had a large role in engineering those systems. And they are responding to their own incentives.
You actually get the whole dynamic from first principles.
Individual people are small, can and want to take risks, feel increasingly censored for increasingly stupid reasons, and become more pro-free-speech. Large powerful things that want to appeal to multiple sides race with each other to be bigger censors so they can avoid being found guilty, and scapegoat the other moderate powerful things they’re struggling with for power, along with everyone else who they can directly censor to gain the upper hand as a group. Ideally they’d like to censor any attempt to portray things accurately or create clarity or common knowledge at all, because the people hate the censorship and they distrust power and the more information they find out, the bigger the negative points they’ll assign to every big powerful thing. This creates a tacit (at least) conspiracy of the powerful against all communication, coordination and creation of common knowledge on anything that might matter. A general opposition to reason and competence seems to logically follow.
Does that sound right?
What links here?
- magfrump's comment on The Forces of Blandness and the Disagreeable Majority by sarahconstantin (30 Dec 2020 2:48 UTC; 4 points)

Zvi 4 Dec 2021 13:16 UTC
LW: 66 AF: 30
AF
in reply to: Aryeh Englander’s comment on: Biology-Inspired AGI Timelines: The Trick That Never Works
Things I instinctively observed slash that my model believes that I got while reading that seem relevant, not attempting to justify them at this time:
1. There is a core thing that Eliezer is trying to communicate. It’s not actually about timeline estimates, that’s an output of the thing. Its core message length is short, but all attempts to find short ways of expressing it, so far, have failed.
2. Mostly so have very long attempts to communicate it and its prerequisites, which to some extent at least includes the Sequences. Partial success in some cases, full success in almost none.
3. This post, and this whole series of posts, feels like its primary function is training data to use to produce an Inner Eliezer that has access to the core thing, or even better to know the core thing in a fully integrated way. And maybe a lot of Eliezer’s other communications is kind of also trying to be similar training data, no matter the superficial domain it is in or how deliberate that is.
4. The condescension is important information to help a reader figure out what is producing the outputs, and hiding it would make the task of ‘extract the key insights’ harder.
5. Similarly, the repetition of the same points is also potentially important information that points towards the core message.
6. That doesn’t mean all that isn’t super annoying to read and deal with, especially when he’s telling you in particular that you’re wrong. Cause it’s totally that.
7. There are those for whom this makes it easier to read, especially given it is very long, and I notice both effects.
8. My Inner Eliezer says that writing this post without the condescension, or making it shorter, would be much much more effort for Eliezer to write. To the extent such a thing can be written, someone else has to write that version. Also, it’s kind of text in several places.
9. The core message is what matters and the rest mostly doesn’t?
10. I am arrogant enough to think I have a non-zero chance that I know enough of the core thing and have enough skill that with enough work I could perhaps find an improved way to communicate it given the new training data, and I have the urge to try this impossible-level problem if I could find the time and focus (and help) to make a serious attempt.
What links here?
- On A List of Lethalities by Zvi (13 Jun 2022 12:30 UTC; 161 points)

Zvi 17 Jan 2020 1:26 UTC
64 points
on: Reality-Revealing and Reality-Masking Puzzles
As I commented elsewhere I think this is great, but there’s one curious choice here, which is to compare exposure to The Singularity as a de-conversion experience and loss of faith rather than a conversion experience where one gets faith. The parallel is from someone going from believer to atheist, rather than atheist to believer.
Which in some ways totally makes sense, because rationality goes hand in hand with de-conversion, as the Sequences are quite explicit about over and over again, and often people joining the community are in fact de-converting from a religion (and when and if they convert to one, they almost always leave the community). And of course, because the Singularity is a real physical thing that might really happen and really do all this, and so on.
But I have the system-1 gut instinct that this is actually getting the sign wrong in ways that are going to make it hard to understand people’s problem here and how to best solve it.
(As opposed to it actually being a religion, which it isn’t.)
From the perspective of a person processing this kind of new information, the fact that the information is true or false, or supernatural versus physical, doesn’t seem that relevant. What might be much more relevant is that you now believe that this new thing is super important and that you can potentially have really high leverage over that thing. Which then makes everything feel unimportant and worth sacrificing—you now need to be obsessed with new hugely important thing and anyone who isn’t and could help needs to be woken up, etc etc.
If you suddenly don’t believe in God and therefore don’t know if you can be justified in buying hot cocoa, that’s pretty weird. But if you suddenly do believe in God and therefore feel you can’t drink hot cocoa, that’s not that weird.
People who suddenly believe in God don’t generally have the ‘get up in the morning’ question on their mind, because the religions mostly have good answers for that one. But the other stuff all seems to fit much better?
Or, think about the concept Anna discusses about people’s models being ‘tangled up’ with stuff they’ve discarded because they lost faith. If God doesn’t exist why not [do horrible things] and all that because nothing matters so do what you want. But this seems like mostly the opposite, it’s that the previous justifications have been overwritten by bigger concerns.

Zvi 22 Jan 2022 21:55 UTC
54 points
in reply to: Richard_Ngo’s comment on: What’s Up With Confusingly Pervasive Consequentialism?
Consider what happens if you had to solve your list of problems and didn’t inherently care about human values? To what extent would you do ‘unfriendly’ things via consequentialism? How hard would you need to be constrained to stop doing that? Would it matter if you could also do far trickier things by using consequentialism and general power-seeking actions?
The reason, as I understand it, that a chess-playing AI does things the way we want it to is that we constrain the search space it can use because we can fully describe that space, rather than having to give it any means of using any other approaches, and for now that box is robust.
But if someone gave you or I the same task, we wouldn’t learn chess, we would buy a copy of Stockfish, or if it was a harder task (e.g. be better than AlphaZero) we’d go acquire resources using consequentialism. And it’s reasonable to think that if we gave a fully generic but powerful future AI the task of being the best at chess, at some point it’s going to figure out that the way to do that is acquire resources via consequentialism, and potentially to kill or destroy all its potential opponents. Winner.
Same with the poem or the hypothesis, I’m not going to be so foolish as to attack the problem directly unless it’s already pretty easy for me. And in order to get an AI to write a poem that good, I find it plausible that the path to doing that is less monkeys on a typewriter and more resource acquisition so I can understand the world well enough to do that. As a programmer of an AI, right now, the path is exactly that—it’s ‘build an AI that gets me enough more funding to potentially get something good enough to write that kind of poem,’ etc.
Another approach, and more directly a response to your question here, is to ask, which is easier for you/the-AI: Solving the problem head-on using only known-safe tactics and existing resources, or seeking power via consequentialism?
Yes, at some amount of endowment, I already have enough resources relative to the problem at hand and see a path to a solution, so I don’t bother looking elsewhere and just solve it, same as a human. But mostly no for anything really worth doing, which is the issue?

Zvi 15 Jun 2019 13:22 UTC
49 points
in reply to: dxu’s comment on: No, it’s not The Incentives—it’s you
We all know that falsifying data is bad. But if that’s the way the incentives point (and that’s a very important if!), then it’s also bad to call people out for doing it.
No. No. Big No. A thousand times no.
(We all agree with that first sentence, everyone here knows these things are bad, that’s just quoted for context. Also note that everyone agrees that those incentives are bad and efficient action to change them would be a good idea.)
I believe the above quote is a hugely important crux. Likely it, or something upstream of it, is the crux. Thank you for being explicit here. I’m happy to know that this is not a straw-man, that this is not going to get the Mott and Bailey treatment.
I’m still worried that such treatment will mostly occur...
There is a position, that seems to be increasingly held and openly advocated for, that if someone does something according to their local, personal, short-term amoral incentives, that this is, if not automatically praiseworthy (although I believe I have frequently seen this too, increasingly explicitly, but not here or by anyone in this discussion), at least an immunity from being blameworthy, no matter the magnitude of that incentive. One cannot ‘call them out’ on such action, even if such calling out has no tangible consequences.
I’m too boggled, and too confused about how one gets there in good faith, to figure out how to usefully argue against such positions in a way that might convince people who sincerely disagree. So instead, I’m simply going to ask the question, are there any others here, that would endorse the quoted statement as written? Are there people who endorse the position in the above paragraph, as written? With or without an explanation as to why. Either, or both. If so, please confirm this.

Zvi 15 Jun 2018 1:03 UTC
45 points
in reply to: ozziegooen’s comment on: On the Chatham House Rule
I’ve attended one event under Chatham House rules. Not only was keeping who was there a secret costly, but people reliably considered it unreasonable that I actually kept that secret. “Oh, come on” and variants were used often, because actually keeping to the rule was annoying and they didn’t see the point.
People treating it as unreasonable does make keeping the rule even more expensive, and raises the probability it will be ignored—I believe others took the information part seriously but not the who was there part. But that also makes it really important we find a way to do the full no-one-knows-you-are-there thing when you need to do it, without it giving away that there was true need for it. If you say who attended until the moment you really can’t say, you’re doing Glomarization / Meta-Honesty wrong...

Zvi 8 Dec 2023 22:41 UTC
44 points
29
in reply to: keith_wynroe’s comment on: AI #41: Bring in the Other Gemini
From my perspective here’s what happened: I spent hours trying to parse his arguments. I then wrote an effort post, responding to something that seemed very wrong to me, that took me many hours, that was longer than the OP, and attempted to explore the questions and my model in detail.
He wrote a detailed reply, which I thanked him for, ignoring the tone issues in question here and focusing on thee details and disagreements. I spent hours processing it and replied in detail to each of his explanations in the reply, including asking many detailed questions, identifying potential cruxes, making it clear where I thought he was right about my mistakes, and so on. I read all the comments carefully, by everyone.
This was an extraordinary, for me, commitment of time, by this point, while the whole thing was stressful. He left it at that. Which is fine, but I don’t know how else I was supposed to ‘follow up’ at that point? I don’t know what else someone seeking to understand is supposed to do.
I agree Nate’s post was a mistake, and said so in OP here—either take the time to engage or don’t engage. That was bad. But in general no, I do not think that the thing I am observing from Pope/Belrose is typical of LW/AF/rationalist/MIRI/etc behaviors to anything like the same degree that they consistently do it.
Nor do I get the sense that they are open to argument. Looking over Pope’s reply to me, I basically don’t see him changing his mind about anything, agreeing a good point was made, addressing my arguments or thoughts on their merits rather than correcting my interpretation of his arguments, asking me questions, suggesting cruxes and so on. Where he notes disagreement he says he’s baffled anyone could think such a thing and doesn’t seem curious why I might think it.
If people want to make a higher bid for me to engage more after that, I am open to hearing it. Otherwise, I don’t see how to usefully do so in reasonable time in a way that would have value.

Zvi 17 Jan 2020 1:11 UTC
42 points
on: Reality-Revealing and Reality-Masking Puzzles
This post is great and much needed, and makes me feel much better about the goings-on at CFAR.
It is easy to get the impression that the concerns raised in this post are not being seen, or are being seen from inside the framework of people making those same mistakes. Sometimes these mistakes are disorientation that people know are disruptive and need to be dealt with, but other times I’ve encountered many who view such things as right and proper, and view not having such a perspective as blameworthy. I even frequently find an undertone of ‘if you don’t have this orientation something went wrong.’
It’s clear from this post that this is not what is happening for Anna/CFAR, which is great news.
This now provides, to me, two distinct things.
One, a clear anchor from which to make it clear that failure to engage with regular life, and failure to continue to have regular moral values and desires and cares and hobbies and so on, is a failure mode of some sort of phase transition that we have been causing. That it is damaging, and it is to be avoided slash the damage contained and people helped to move on as smoothly and quickly as possible.
Two, the framework of reality-revealing versus reality-masking, which has universal application. If this resonates with people it might be a big step forward in being able to put words to key things, including things I’m trying to get at in the Mazes sequence.

Zvi 26 Sep 2022 0:07 UTC
41 points
4
in reply to: Zach Stein-Perlman’s comment on: Announcing Balsa Research
This is a full energy top priority effort.
I will continue the blog as part of that effort, it is the reason I am in position to be able to do this, and I will continue to attend to other commitments because life is complicated, but the effective barrier is ‘I can only do so much in a week on this type of stuff no matter what anyway.’

Zvi 16 Aug 2019 10:53 UTC
41 points
0
in reply to: Raemon’s comment on: Partial summary of debate with Benquo and Jessicata [pt 1]
Possibly clearer version of what Jessica is saying:
Imagine three levels of explanation: Straightforward to you, straightforward to those without motivated cognition, straightforward even to those with strong motivated cognition.
It is reasonable to say that getting from level 1 to level 2 is often a hard problem, that it is on you to solve that problem.
It is not reasonable, if you want clarity to win, to say that level 2 is insufficient and you must reach level 3. It certainly isn’t reasonable to notice that level 2 has been reached, but level 3 has not, and thus judge the argument insufficient and a failure. It would be reasonable to say that reaching level 3 would be *better* and suggest ways of doing so.
If you don’t want clarity to win, and instead you want to accomplish specific goals that require convincing specific people that have motivated cognition, you’re on a different quest. Obfuscation has already won, because you are being held to higher standards and doing more work, and rewarding those who have no desire to understand for their failure to understand. Maybe you want to pay that price in context, but it’s important to realize what you’ve lost.

Zvi 3 May 2018 19:27 UTC
41 points
on: Duncan Sabien: “In Defense of Punch Bug”
Duncan, if you do still come here on occasion, thank you for a careful attempt at expressing a potentially risky opinion on a potentially risky subject.

Zvi 26 Sep 2022 0:17 UTC
40 points
7
on: The Redaction Machine
Good stuff. This is going to be the first work of fiction linked in a weekly post of mine.
Somewhat tempted to write the rationalfic version of this story, because these characters are missing all the fun.

Zvi 21 Jun 2012 0:48 UTC
40 points
on: What Would You Like To Read? A Quick Poll
To state the obvious: People more likely to look at blog posts are going to be more likely to look at this blog post.

Zvi 15 Dec 2021 7:41 UTC
38 points
in reply to: Richard_Ngo’s comment on: Zvi’s Thoughts on the Survival and Flourishing Fund (SFF)
I want, as much as possible, to get away from the question of whether ‘EA is good’ or ‘EA is bad’ to various extents. I made an effort to focus on sharing information, rather than telling people what conclusions or affects to take away from it.
What I am saying in the quoted text is that I believe there are specific things within EA that are deeply wrong. This is not at all a conflict with EA being unusually good.
I’m also saying wrong as in mistaken, and I’m definitely (this is me responding to the linked comment’s complaint) not intending on throwing around words like ‘evil’ or at least didn’t do it on purpose, and was trying to avoid making moral claims at all let alone non-consequentialist ones, although I am noting that I have strong moral framework disagreements.
For concrete clean non-EA example, one could say: The NFL is exceptional, but there is something deeply, deeply wrong with the way it deals with the problem of concussions. And I could want badly for them to fix their concussion protocols or safety equipment, and still think the NFL was pretty great.
And I do agree that there will be people who then say “So why do you hate the NFL?” (or “How can you not hate the NFL?”) but we need to be better than that, ideally everywhere, but at least here.
(Similarly, the political problem when someone says “I love my country, but X” or someone else says “How can you love your country when it does X”)
I do agree that these issues can be difficult, but if this kind of extraordinary effort (flagging the standard in bold text in a clearly sympathetic way, being careful to avoid moral claims and rather sharing intuitions, models and facts, letting the reader draw their own implications on all levels from the information rather than telling them what to conclude) isn’t good enough, than I’m confused what the alternative is that still communicates the information at all.

Zvi 25 Jul 2012 14:21 UTC
38 points
on: Game Theory As A Dark Art
I believe the real world results of these attempts, if they were attempted this brazenly, are as follows:

The Evil Plutocrat: Democrats and Republicans coordinate and agree to vote down your bill. Voting is simultaneous and public, and you can change it, so even these two groups can trust each other here.

The Hostile Takeover: Leave aside the question of whether your taking over is about to destroy the stock price, and assume it does, and assume the SEC doesn’t come knocking at your door. The other guy sees what you’re doing, and adjusts his offer slightly to counter yours, since he has more money than you do, and a similar threat can get him all the shares.

The Hostile Takeover II: Again assume what you’re doing is legal. The board members vote themselves another set of giant raises, and laugh at you for buying stock that will never see a dividend, since you’ve obviously violated all the norms involved. That, or they sign a contract between themselves and take all your stock.

The Dollar Auction: It’s proven to work as a tax on stupidity, especially if others aren’t allowed to talk (which prevents MBlume’s issue of coordination), with only rare backfires. Of course, the other cases are “look what happens when everyone else thinks things through” and this one is “look what happens when people don’t think things through...”

The Bloodthirsty Pirates: The first mate turns to the others, says “I propose we kill this guy and split the money evenly.” They all kill you. Then maybe they kill each other, and maybe they don’t. Alternatively, they’ve already coordinated. Either way, you’re clearly trying to cheat them, and pirates know what to do with cheaters.

The Prisoner’s Dilemma Redux: That was gorgeous to watch that one time. It likely won’t work again. The coin is another great trick that they’d likely outlaw after it was used once, since that one does work repeatedly.

In general, this is the “I get to set the rules of the game and introduce twists, and then everyone else has to use CDT without being able to go outside the game or coordinate in any way” school of exploitative game theory, with the Dollar Auction a case where you pick out the few people who think badly, often a quality strategy.

Zvi 22 Mar 2019 18:06 UTC
37 points
on: More realistic tales of doom
Is this future AI catastrophe? Or is this just a description of current events being a general gradual collapse?
This seems like what is happening now, and has been for a while. Existing ML systems are clearly making Type-I problems, already quite bad before ML was a thing at all, much worse, to the extent that I don’t see much ability left of our civilization to get anything that can’t be measured in a short term feedback loop—even in spaces like this, appeals to non-measurable or non-explicit concerns are a near-impossible sell.
Part II problems are not yet coming from ML systems, exactly, But we certainly have algorithms that are effectively optimized and selected for the ability to gain influence; the algorithm gains influence, which causes people to care about it and feed into it, causing it to get more. If we get less direct in the metaphor we get the same thing with memetics, culture, life strategies, corporations, media properties and so on. The emphasis on choosing winners, being ‘on the right side of history’, supporting those who are good at getting support. OP notes that this happens in non-ML situations explicitly, and there’s no clear dividing line in any case.
So if there is another theory that says, this has already happened, what would one do next?
What links here?
- What Failure Looks Like: Distilling the Discussion by Ben Pace (29 Jul 2020 21:49 UTC; 81 points)

Zvi 14 Jun 2019 12:49 UTC
36 points
in reply to: Raemon’s comment on: No, it’s not The Incentives—it’s you
If you’re an academic and you’re using fake data or misleading statistics, you are doing harm rather than good in your academic career. You are defrauding the public, you are making our academic norms be about fraud, you are destroying both public trust in academia in particular and knowledge in general, and you are creating justified reasons for this destruction of trust. You are being incredibly destructive to the central norms of how we figure things out about the world—one of many of which is whether or not it is bad to eat meat, or how we should uphold moral standards.
And you’re doing it in order to extract resources from the public, and grab your share of the pie.
I would not only rather you eat meat. I would rather you literally go around robbing banks at gunpoint to pay your rent.
If one really, really did think that personally eating meat was worse than committing academic fraud—which boggles my mind, but supposing that—what the hell are you doing in academia in the first place, and why haven’t you quit yet? Unless your goal now is to use academic fraud to prevent people from eating meat, which I’d hope is something you wouldn’t endorse, and not what 99%+ of these people are doing. As the author of OP points out, if you can make it in academia, you can make more money outside of it, and have plenty of cash left over for salads and for subsidizing other people’s salads, if that’s what you think life is about.

Zvi 3 May 2018 19:25 UTC
36 points
on: Duncan Sabien: “In Defense of Punch Bug”
Typical mind spot-check: If I saw the comment from D to C of “ugg, X in my feed” I would consider this (effectively) a request to not post Xs at all (especially when it comes from a housemate!), or at least a note that it causes substantial disutility to be considered. That’s a pretty strong negative signal. When C says they’ll stop Xing, and D responds “I didn’t mean for that to happen, you’re overreacting!” it feels disingenuous at best. D made the mistake slash did the thing slash owns the thing, not C, so if D didn’t mean that, D should say “I’m sorry, I came on way too strong, you don’t need to do that.” Yet Duncan seems to treat this as, if anything, C over/strongly reacting.
What do others think of this?