Meta: I approve of the practice of arguing against your own post in a comment.
>> Why take the risk of this escalating into a seriously negative sum outcome?
Because if someone does that to you (walking up to you and insulting you to your face, apropos of nothing), then the value to them of the situation’s outcome no longer matters to you—or shouldn’t, anyway; this person does not deserve that, not by a long shot.
I think I’m with Wei_Dai on this one—insulting me to my face, apropos of nothing, doesn’t change my valuation of them very much. I don’t know the reasons for such, but I presume it’s based on fear or pain and I deeply sympathize with those reasons for unpleasant, unreasoning actions. Part of my reaction is that it’s VERY DIFFICULT to insult me in any way that I won’t just laugh at the absurdity, unless you actually know me and are targeting my personal insecurities.
Only if it’s _NOT_ random and apropos of nothing am I likely to feel that there are strategic advantages to taking a risk now to prevent future occurrences (per your Schelling reference).
I agree that the return to “learning to navigate moods” varies by person.
It sounds to me, from your report, that you tend to be in moods conducive to learning. My sense is that there are many who are often in unproductive moods and many who aware that they spend too much time in unproductive moods. These people would find learning to navigate moods valuable.
In megaproject management, or even for multi-stakeholder questions more generally, I wonder about the utility of doing something like Github issue tracking or even a full-blown CRM tool to help manage the different stakeholders.
The key word on the above answer being “optimal”. It seemed to me like the post was saying “here’s one thing you can pay attention to to optimize your learning.” and you were replying “But I don’t pay attention to that and can still do learning.” which is essentially arguing against a point that the original post never made.
Ok I finally identified an incentive for deception. I think it was difficult for me to find because it’s not really about deceiving the evaluator.
Here’s a hypothesis that observations will never refute: the utility which the evaluator assigns to a state is equal to the reward that a human would provide if it were a human that controlled the provision of reward (instead of the evaluator). Under this hypothesis, maximizing evaluator-utility is identical to creating observations which will convince a human to provide high reward (a task, which entails deception when done optimally). In a sense, the AI doesn’t think it’s deceiving the evaluator; it thinks the evaluator fully understands what’s going on and likes seeing things that would confuse a human into providing high reward, as if the evaluator is ``in on the joke”. One of my take-aways here is that some of the conceptual framing I did got in the way of identifying a failure mode.
Note also that displayed totals are misleading—this is definitely not one vote per reader. A vote can be anywhere from 1 to over 10, depending on karma total of the voter and whether it’s a “strong” vote. For totals below 30 or so, it’s mostly noise rather than signal—this is 6-8 votes out of possibly hundreds of readers.
This seems to me to be missing the point made well by “Embedded Agency” and exemplified by the anvil problem: you can’t in practice build a system where you can achieve this kind of thing because there is not real separation between inside the system and outside the system, just a model which assumes such a distinction exists.
Rather than it being a simple checkbox to opt into the “collapse comments below 10 votes” feature, it might be better to allow the user to choose the threshold themselves. In my experience, except in specific technical topics, comments with karma above 10 are usually expressing popular opinion that is unlikely to lead to an update in my mental model. Because of that, I’m unlikely to opt into this. However, I’ve also found that comments at 1 karma (no votes) are also often in the same category. So being able to set the threshold at just 2 or 3 would be pretty useful, to both conserve space and also avoid skipping over potentially the most useful content.
I wonder if there would be enough interest to support a kind of matching app that would let people put their Amazon wishlist up and then match them with someone who had the same item but opposite needs (ie left vs right shoe), and then split the cost.
This looks like a duplicate.
While legibility is about readability as a result of how something is written/appears, readability is also affected by length. If a chart is bigger and more complicated it can convey more nuance—at the cost of being harder to read, and taking longer. While font can usually be changed without disrupting the message, it’s more work to do this for length, if it can be done at all without trading off in way that aren’t improvements. (If I invented a language for conveying messages more succinctly in visual form, and this message as a whole was 13876, that wouldn’t decrease the amount of time it takes to read (and understand it) it.)
Indeed. In particular I want to note Nate Soares’ point about how one of the reasons you don’t necessarily know what you’re fighting for, is that your goal(s) may change as you learn more, grow, etc. Similarly, illegible complex judgment criteria may shift over time (and for that reason will not be amenable to formalization, which is of necessity static), while still always being “my own judgment”; it is precisely that freedom to alter the criteria which I protect by resisting any proffered formalization.
I think there’s an inferential distance step I’m missing here, because I’m actually a bit at a loss as to how to relate my post to empiricism.
We ought to learn from the folly of others—not be discouraged by it.
I would note that while past examples of failure are something to improve upon, function should determine form, and past examples where form determined functionality to it’s detriment are important. While TVTropes may be fun to read, because I value LessWrong, and the ability to read LW, among other ways, via time, so I can read the latest posts (like how this site is going to change with the addition of new features), I don’t want LW to become exactly like the TVTropes wiki.
For some explicit examples:
If someone wrote a book without indentations or paragraphs, or ends to sentences it would be hard to read.
Likewise, a book with the binding broken and the pages out of order would be hard to read, but fairly simple to put in the right order—if there were page numbers. If there weren’t page numbers, and some pages were missing, it’d be hard.
An example of a problem, and a possible solution:
A) Suppose someone writes a new wiki page (on Batman (Franchise)). Then (maybe) they remember to add it to to Comics or Fiction.
(If new wiki pages show up on the frontpage, and are by default tagged “not categorized yet” (and categorized if the author categorized them) then maybe someone else can see that it needs to get slotted into the appropriate list and fixes it. This way of operating might work—unless there’s too many new articles all at once and the “categorization checkers” are flooded, and end up backlogged.)
B) Someone goes to the Fiction page (which is a list of pages) and adds a link to a “Batman (Franchise)” page (which turns red because it hasn’t been made yet). And then they go to the Batman page and write the article.
TL:DR; (Main point)
My point is that if organization is something that maybe happens after content creation as an afterthought there will be unorganized pages. But if organization work goes in beforehand then there aren’t unorganized pages. Yes sometimes things change as they’re worked out, and the pre-organization needs revision. But pre-organization is better than no organization.
A self-unaware system would not be capable of one particular type of optimization task:
Take real-world actions (“write bit 0 into register 11”) on the basis of anticipating their real-world consequences (human will read this bit and then do such-and-such).
This thing is an example of an optimization task, and it’s a very dangerous one. Maybe it’s even the only type of really dangerous optimization task! (This might be an overstatement, not sure.) Not all optimization tasks are in this category, and a system can be intelligent by doing other different types of optimization tasks.
A self-unaware system certainly is an optimizer in the sense that it does other (non-real-world) optimization tasks, in particular, finding the string of bits that would be most likely to follow a different string of bits on a real-world webpage.
As always, sorry if I’m misunderstanding you, thanks for your patience :-)
I think we’re on the same page! As I noted at the top, this is a brainstorming post, and I don’t think my definitions are quite right, or that my arguments are airtight. The feedback from you and others has been super-helpful, and I’m taking that forward as I search for more a rigorous version of this, if it exists!! :-)
See also You Don’t Get To Know What You’re Fighting For, which makes this sort of situation more explicit.
Thanks for this helpful comment. The architecture I’m imagining is: Model-choosing code finds a good predictive world-model out of a vast but finite space of possible world-models, by running SGD on 100,000 years of YouTube videos (or whatever). So the model-chooser is explicitly an optimizer, the engineer who created the model-chooser is also explicitly an optimizer, and the eventual predictive world-model is an extremely complicated entity with superhuman world-modeling capabilities, and I am loath to say anything about what it is or what it’s going to do.
Out of these three, (1) the engineer is not problematic because it’s a human, (2) the model-chooser is not problematic because it’s (I assume and expect) a known and well-understood algorithm (e.g. Transformer), and thus (3) the eventual predictive world-model is the only thing we’re potentially worried about. My thought is that, we can protect ourselves from the predictive world-model doing problematic consequentialist planning by scheming to give it no information whatsoever about how it can affect the world, even knowing that it exists or knowing what actions it is taking, such that if it has problematic optimization tendencies, it is unable to act on them.
(In regards to (1) more specifically, if a company is designing a camera, the cameras with properties that the engineers like are preferentially copied by the engineers into later versions. Yes, this is a form of optimization, but nobody worries about it more than anything else in life. Right?)
I don’t think “reasonable” is the correct word here. You keep assuming away the possibility of conflict. It’s easy to find a peaceful answer by simulating other people using empathy, if there’s nothing anyone cares about more than not rocking the boat. But what about the least convenient possible world where one party has Something to Protect which the other party doesn’t think is “reasonable”?
Yes, if someone has values that are in fact incompatible with the culture of the organization, they shouldn’t be joining that organization. I thought that was clear in my previous statements, but it may in fact have not been. If every damn time their own values are at odds with what are best for the organization given its’ values, that’s an incompatible difference. They should either find a different organization, or try the archipeligo model. There are such thing as irreconcilable value differences.
I don’t think the OP is compatible with the shared values and culture established in Sequences-era Overcoming Bias and Less Wrong.
I agree. I think when that culture was established, the community was missing important concepts about motivated reasoning and truth seeking and chose values that were in fact not optimized for the ultimate goal of creating a community that could solve important problems.
I think it is in fact good to experiment with the norms you’re talking about from the original site, but I think many of those norms originally caused the site to decline and people to go elsewhere. Given my current mental models, I predict a site that uses those norms to make less intellectual progress than a similar site using my norms although I expect you to have the opposite intuition. As I stated in the introduction, the goal of this post was simply to make sure that those mental models were in discourse.
Re your dialogue: The main thing that I got from it was that you think a lot of the arguments in the OP are motivated reasoning and will lead to bad incentives. I also got that this is a subject you care a lot about.