Director of Research at PAISRI
G Gordon Worley III(G Gordon Worley III)
Sometimes people at work say to me “wow, you write so clearly; how do you do it?” and I think “given the nonsense I’m normally trying to explain on LW, it’s hardly a surprise I’ve developed the skill well enough that when it’s something as ‘simple’ as explaining how to respond to a page or planning a technical project that I can write clearly; you should come see what it looks like when I’m struggling at the edge of what I understand!”.
Two ideas to share here.
Meditation is used to gain insight into the nature of reality; this is essentially what meditation is being used for in Buddhist practices. The mechanism by which this works isn’t very clear, but it seems to be something like “gather a bunch of evidence about the world and wait for it to mound up so high that it collapses whatever categories you had before so you find new ones”.
Related, I think meditation generalizes as a form of neural annealing, which is basically applying the annealing analogy to a process that can happen in the brain where the brain can reconfigure itself into a better state by cycling through a high energy state before returning to a lower energy state (or something like that). Lots of details not worked out here, but the link is the most up to date thing I have on this to share.
These are all quite reasonable, and I’m pretty open to the idea that I’m mistaken and anchoring too much on the fact that I didn’t find flash cards for spaced repetition useful, which might ultimately bias how heavily I weight things or assess the likelihood that flashcards would be helpful.
Maybe spreading cryptocurrency is secretly the best thing we can do short term to increase AI safety because it increases the cost of purchasing compute needed to build AI. Possibly offset, though, by the incentives to produce better processors for cryptocurrency mining that are also useful for building better AI.
Registering my continued dislike for dedicating effort on this feature (I voiced similar complaints when this was originally floated). My reasons:
retaining knowledge via spaced repetition is a poor fit for most of the posts on LW
the general advice is that using cards written by others is much less useful than cards you write yourself
nudges LW towards encouraging posts that are more amenable to space repetition, which I expect will marginally decrease other kinds of content
implicit in this is an argument that LW would be marginally worse if it spent more time on teaching things that you can learn by remembering them and less time on other things, like working towards reducing confusion about things we are currently confused about
I think there’s something valuable that comes with intentionally denying oneself something that is normally integral to one’s life.
I could make a case here for the psychological (and ~spiritual) benefits of the kind of general resilience this helps train, but on a very practical level sometimes, for example, the internet goes out, and it’s useful to know what to do if you’ve lost access to Google Mail, Docs, Maps, or whatever else you rely on. Doing that in a controlled setting is likely to be better because you can plan for it so it doesn’t negatively impact your life beyond the scope of the practice, and it allows you to get hands on experience with the situation so that would be hard to get otherwise.
I think of this as the same reason people in jobs that require high reliability do things like war games and practice drills to make sure they know what to do in high stakes situations by practicing first when stakes are low so they can make mistakes and learn without negative impacts.
Feels a lot to me like a similar argument to why there is a complex ecosystem of investment sources for startups that target different stages of company life cycle: angel investors are “small time” and can afford to take high variance bets on fewer companies they know well while large VC firms have lots of money and need to find enough and big enough places to put it all.
I’m unconvinced of the model you offer here.
First, I’m not really buying that Wikipedia is unreliable, since you, I, and other people treat it as highly reliable. Yes, at any point in time any individual page might be in error or misleading, but those tend to get corrected such that I trust the average Wikipedia entry to give me a reasonable impression of the slice of reality it considers.
But even supposing you’re right and people do think Wikipedia is unreliable and that’s why it’s secretly reliable, I don’t see how that causes it to become reliable. To compare, let’s take somewhere else on the internet that isn’t try to do what newspapers are but are a mess of fights over facts—social media. No one thinks social media is reliable, yet people argue facts there.
Okay, but you might say they don’t also have to generate a shared resource. But the only thing keeping Wikipedia from being defaced constantly is an army of bots that automatically revert edits that do exactly that plus some very dedicated editors who do the same thing for the defacing and bad edits the bots don’t catch. People actually are trying to write all kinds of biased things on Wikipedia, it’s just that other people are fighting hard to not let those things stay.
I suspect something different is going on, which you are pointing at but not quite getting to. I suspect the effects you’re seeing are because Wikipedia is not a system of record, since it depends on other sources to justify everything it contains. No new information is permitted on Wikipedia per its editing guidelines; everything requires a citation (though sometimes things remain uncited for a long time but aren’t removed). It’s more like an aggregated view on facts that live elsewhere. This is one things newspapers, for example, do: they allow creating new records of facts that can be cited by other sources. Wikipedia doesn’t do that, and because it doesn’t do that the stakes are different. No one can disagree with what’s on Wikipedia, only cite additional sources (or call sources into question) or argue that the presentation of the information is bad or confusing or misleading. That’s very differently than being able to directly argue over the facts.
In the face of existential risks from AI, whether or not the builder of a dangerous AI is more “prosocial” by some standard of prosociality doesn’t really matter: the point of existential risk is that the good guys can also lose. Under such a calculus, there’s not benefit to trying to beat someone else to building the same, since beating them just destroys the world faster and cuts off time that might have been used to do something safer.
Further, races are self-fulfilling prophecies: if we don’t think there is a race then there won’t be one. So all around we are better off avoiding things that advance capabilities research, especially that rapidly advice it in directions that are likely to cause amplification in directions not clearly aligned with human flourishing.
I think it’s probably even simpler than that: feedback loops are the minimum viable agent, i.e. a thermostat is the simplest kind of agent possible. Stuff like knowledge and planning are elaborations on the simple theme of the negative feedback circuit.
Also makes me think of TRIZ. I don’t really understand how to use it that well or even know if it produces useful results, but I know it’s popular within the Russosphere (or at least more popular there than anywhere else).
Political polarization in the USA has been increasing for decades, and has become quite severe. This may have a variety of causes, but it seems highly probable that the internet has played a large role, by facilitating the toxoplasma of rage to an unprecedented degree.
Contra the idea that the internet is to blame, polarization seems historically to be the “natural” state in both the USA and elsewhere. To get less of it you need specific mechanism that have a moderating effect.
For a long time in the US this was a combination of progressive Republicans (Whigs and abolitionists) and regressive Democrats (Dixiecrats) that caused neither major party to be able to form especially polarized policy positions. Once the Civil Rights Act and Roe v. Wade drove Dixiecrats out of the Democratic party and progressives out of the Republican party, respectively, the parties became able to align more on policy.
So extending this observation, rather than a new center, maybe what we need to get less polarization is something to hold the parties together along some line that’s orthogonal to policy preferences such that both parties must tolerate a wide range of opinions. I’m not sure how to do that, as the above situation was created by the Civil War and Reconstruction that made variously the Republican and Democrat parties unacceptable to certain voters (like former slaveholders and abolitionists) and it was only after a hundred years that identification with or against the “Party of Lincoln” melted away enough to allow a shift.
Maybe your new center idea could cause this, but I’m not reading in it a strong enough coordination mechanism to overcome the nature tendency for parties to align in opposite directions.
My advice is to accept that ‘haters are gonna hate’ and just take the hit. Make your arguments as clear and your advice as easy to follow as possible. But understand that no matter what you do, if you tell people to buy bitcoin at $230, the top comment might be critical. Some people will listen and benefit.
I’ve just been thinking about this with respect to two posts I recently authored. First I wrote “Forcing yourself to keep your identity small is self-harm” and this got a bunch of negative response (e.g. currently a score of 17 with 24 votes; my guess based on watching things come in is that it’s close to 50% downvotes). In response I wrote “Forcing Yourself is Self Harm, or Don’t Goodhart Yourself” and so far it’s doing “better” by some measures (score of 25 right now, but with only 11 votes, all positive as best I can tell).
The thing is both posts say exactly the same thing other than that the first post is vary concretely about a particular case while the latter is a general article that covers the original article as a special case. I basically wrote the second version by taking the original text and modifying it to be explicitly generalized rather than just about one case.
Now if I ask myself which one I think is better, I actually think it’s the first one even though the latter is better received in terms of karma. The second one lacks teeth and I think it’s too easy to read it and not really get what it’s saying in a concrete way. The reader might nod along saying “ah, yes, sage advice, I will follow it” and then promptly fail to integrate it into even a single place where it matters, whereas the former is very in your face about a single place where it matters and confronts the reader to consider that they may have been screwing up at doing a thing they value.
I like this kind of stuff that confronts readers because, although it may draw greater controversy, it also seems more likely to land for the person who will benefit from reading it, and managing criticism/downvotes only matters insofar as I draw too much negative attention and negatively impact the visibility of the post to people who would have benefitted from having seen it in a world where it was less criticized and less downvoted.
Of course in this isolated case of two articles there are confounding factors. For example, maybe people “came around” on my arguments by the time the second post came out since they saw the first one, or maybe more people just ignored the second post since it looked so much like the first one. But I’ve noticed this sort of trend over and over in my own writing and the writing of others: saying something direct that challenges the reader will draw the ire of readers who dislike having been challenged on something they hold dear, and saying the same thing in a less direct way that avoids triggering their displeasure also is actually worse because it less well lands for anyone and the people who were going to criticise it now don’t but without that meaning anything.
This post points at a core of why I like to talk about the subject-object relationship with respect to developmental psychology: the shifting of things from one side of the lens of intentionality to the other seems to be the key driver of development.
There’s some complexity here because English offers two words here, “subject” and “object” that can be used somewhat interchangeably in some situations but in most situations we have some notion that “subject” is on the left/upstream side of the causal arrow and “object” is on the right/downstream side. However Ben’s reuse of “subject” by shifting it from actor (“subject to”) to the acted upon (“as subject”) seems mostly poetic and a reasonable alternative to talking about object.
Of course, because English is noun-focused, it’s rather nice to have two different nouns for these concepts rather than having to point to them by using two different verb phrases as Ben does here.
I have my own mild preferences around using standard phrasing to trigger in people associations with that common body of work built around those standards, but regardless I don’t think anything in the post is actually at odds with standard phrasing, just different and, to my ear, equally clear, even if I have no intention of ever copying it.
Take your illustrative story. I’d say the problem here is not that the person is trying to focus on the narrow area of increasing productivity. It’s that they picked a bad metric and a bad way of continual measuring themselves against the metric. The story just kind of glosses over what I would say is the most important part!
I’d say that 65%-75% of the problem this person has is that they apparently didn’t seriously think about this stuff before hand and pre-commit to a good strategy for measurement.
The person who looks and says “I only wrote 100 words last hour?!??!” kind of reminds me of the investor checking their stock prices every day.
For this person three months or six months or a year might be a better time frame for checking how they’re doing. Regardless, the main point I want to make is that how well this person would be able to improve themselves in this area while maintaining their well being is largely dependent upon making good decisions on this very important question.
This is one of the weird issues with what I see as the problem I’m trying to illustrate with the story and the limitations of telling a single story about it.
What you say is true, but it’s a reduction of the problem to be less bad by applying weaker optimization pressure rather than an actual elimination of the problem. Weak Goodharting is still Goodharting and it will still, eventually, subtly screw you up.
This post is also advice, and so aimed mostly at folks less like you and more like the kind of person who doesn’t realize they’re actively making their life worse rather than better by trying too hard.
Forcing Yourself is Self Harm, or Don’t Goodhart Yourself
Have you thought much about the safety/alignment aspects of this approach. This seems very susceptible to Goodharting.
Nice! From my perspective this would be pretty exciting because, if natural abstractions exist, it solves at least some of the inference problem I view at the root of solving alignment, i.e. how do you know that the AI really understands you/humans and isn’t misunderstanding you/humans in some way that looks like it does understand from the outside but it doesn’t. Although I phrased this in terms of reified experiences (noemata/qualia as a generalization of axia), abstractions are essentially the same thing in more familiar language, so I’m quite excited for the possibility that we can prove that we may be able to say something about the noemata/qualia/axia of minds other than our own beyond simply taking for granted that other minds share some commonality with ours (which works well for thinking about other humans up to a point, but quickly runs up against problems of assuming too much even before you start thinking about beings other than humans).
A point that’s been made to me before about books and that supports your thesis that they serve to increase salience of ideas: books are worth reading because they cause you to live with an idea long enough to integrate it into your mind.
I’ve certainly noticed this facet of books for myself. There’s plenty of books I read that I could now hardly recall any specific arguments from and that I could summarize their “take-aways” in a few paragraphs, but that nonetheless I got a lot out of reading because for the weeks or months when I was working through it the book forced my attention continually back on to a particular topic, which then also carried to thinking about that topic at other times.
I carry this same sort of idea with me into my choice of where to place my attention beyond books with a question like: how will placing my attention on this bit of media affect what I pay attention to, and is it pointing my attention towards something I think is worth thinking about?