james.lucassen
This is great. Feels like a very good catch. Attempting to start a comment thread doing a post-mortem of why this happened and what measures might make this sort of clarity-losing definition drift happen less in the future.
One thing I am a bit surprised by is that the definition on the tag page for inside/outside view was very clearly the original definition, and included a link to the Wikipedia for reference class forecasting in the second sentence. This suggests that the drifted definition was probably not held as an explicit belief by a large number of highly involved LessWrongers. This in turn makes two different mechanisms seem most plausible to me:Maybe there was sort of a doublethink thing going on among experienced LW folks that made everyone use “outside view” differently in practice than how they explicitly would have defined it if asked. This would probably be mostly driven by status dynamics, and attempts to solve would just be a special case of trying to find ways to not create applause lights.
Maybe the mistake was mainly among relatively new/inexperienced LW folks who tried to infer the definition from context rather than checking the tag page. In that case, attempts to solve would mostly look like increasing the legibility of discourse within LW to new/inexperienced readers, possibly by making the tag definition pages more clickable or just decreasing the proliferation of jargon.
The first enigma seems like it’s either very closely related or identical to Hume’s problem of induction. If that is a fair-rephrasing, then I think it’s not entirely true that the key problem is that the use of empiricism cannot be justified by empiricism or refuted by empiricism. Principles like “don’t believe in kludgy unwieldy things” and “empiricism is a good foundation for belief” can in fact be supported by empiricism—because those heuristics have worked well in the past, and helped us build houses and whatnot.
I think the key problem is that empiricism both supports and refutes the claim “I know empiricism works because empirically it’s always worked well in the past”. This statement is empirically supported because empiricism has worked well in the past, but it’s also circular, and circular reasoning has not generally worked well in the past.
This can also be re-phrased as a conflict between object-level and meta-reasoning. On the object level, empiricism supports empiricism. But on the meta level, empiricism rejects circular reasoning.
That’s great. If I ever attempt to design my own conlang, I’m using this rule.
TLDR: if we model a human as a collection of sub-agents rather than single agent, how do we make normative claims about which sub-agents should or shouldn’t hammer down others? There’s no over-arching set of goals to evaluate against, and each sub-agent always wants to hammer down all the others.
If I’m interpreting things right, I think I agree with the descriptive claims here, but tentatively disagree with the normative ones. I agree that modeling humans as single agents is inaccurate, and a multi-agent model of some sort is better. I also agree that the Drowning Child parable emphasizes the conflict between two sub-agents, although I’m not sure it sets up one side against the other too strongly (I know some people for whom the Drowning Child conflict hammers down altruism).
What I have trouble with is thinking about how a multi-agent human “should” try to alter the weights of their sub-agents, or influence this “hammering” process. We can’t really ask the sub-agents for their opinion, since they’re always all in conflict with all the others, to varying degrees. If some event (like exposure to a thought experiment) forces a conflict between sub-agents to rise to confrontation, and one side or the other ends up winning out, that doesn’t have any intuitive normative consequences to me. In fact, it’s not clear to me how it could have normativity to it at all, since there’s no over-arching set of goals for it to be evaluated against.
Dang, I wish I had read this before the EA Forum’s creative writing contest closed. It makes a lot of sense that HPMOR could be valuable via this “first-person-optimizing-experience” mechanism—I had read it after reading the Sequences, so I was mostly looking for examples of rationality techniques and secret hidden Jedi knowledge.
Since HPMOR!Harry isn’t so much EA as transhumanist, I wonder if a first-person EA experience could be made interesting enough to be a useful story? I suppose the Comet King from Unsong is also kind of close to this niche, but not really described in first person or designed to be related to. This might be worth a stab...
Currently working on ELK—posted some unfinished thoughts here. Looking to turn this into a finished submission before end of January—any feedback is much appreciated, if anyone wants to take a look!
Another minor note: very last link, to splendidtable, seems to include an extra comma at the end of the link which makes it 404
unlike other technologies, an AI disaster might not wait around for you to come clean it up
I think this piece is extremely important, and I would have put it in a more central place. The whole “instrumental goal preservation” argument makes AI risk very different from the knife/electricity/car analogies. It means that you only get one shot, and can’t rely on iterative engineering. Without that piece, the argument is effectively (but not exactly) considering only low-stakes alignment.
In fact, I think if we get rid of this piece of the alignment problem, basically all of the difficulty goes away. If you can always try again after something goes wrong, then if a solution exists you will always find it eventually.
This piece seems like much of what makes the difference between “AI could potentially cause harm” and “AI could potentially be the most important problem in the world”. And I think even the most bullish techno-optimist probably won’t deny the former claim if you press them on it.Might follow this up with a post?
This might not work depending on the details of how “information” is specified in these examples, but would this model of abstractions consider “blob of random noise” a good abstraction?
On the one hand, different blobs of random noise contain no information about each other on a particle level—in fact, they contain no information about anything on a particle level, if the noise is “truly” random. And yet they seem like a natural category, since they have “higher-level properties” in common, such as unpredictability and idk maybe mean/sd of particle velocities or something.
This is basically my attempt to produce an illustrative example for my worry that mutual information might not be sufficient to capture the relationships between abstractions that make them good abstractions, such as “usefulness” or other higher-level properties.
My attempt to break down the key claims here:
The internet is causing rapid memetic evolution towards ideas which stick in people’s minds, encourage them to take certain actions, especially ones that spread the idea. Ex: wokism, Communism, QAnon, etc
These memes push people who host them (all of us, to be clear) towards behaviors which are not in the best interests of humanity, because Orthogonality Thesis
The lack of will to work on AI risk comes from these memes’ general interference with clarity/agency, plus selective pressure to develop ways to get past “immune” systems which allow clarity/agency
Before you can work effectively on AI stuff, you have to clear out the misaligned memes stuck in your head. This can get you the clarity/agency necessary, and make sure that (if successful) you actually produce AGI aligned with “you”, not some meme
The global scale is too big for individuals—we need memes to coordinate us. This is why we shouldn’t try and just solve x-risk, we should focus on rationality, cultivating our internal meme garden, and favoring memes which will push the world in the direction we want it to go
Putting this in a separate comment, because Reign of Terror moderation scares me and I want to compartmentalize. I am still unclear about the following things:
Why do we think memetic evolution will produce complex/powerful results? It seems like the mutation rate is much, much higher than biological evolution.
Valentine describes these memes as superintelligences, as “noticing” things, and generally being agents. Are these superintelligences hosted per-instance-of-meme, with many stuffed into each human? Or is something like “QAnon” kind of a distributed intelligence, doing its “thinking” through social interactions? Both of these models seem to have some problems (power/speed), so maybe something else?
Misaligned (digital) AGI doesn’t seem like it’ll be a manifestation of some existing meme and therefore misaligned, it seems more like it’ll just be some new misaligned agent. There is no highly viral meme going around right now about producing tons of paperclips.
Ah, so on this view, the endgame doesn’t look like
“make technical progress until the alignment tax is low enough that policy folks or other AI-risk-aware people in key positions will be able to get an unaware world to pay it”
But instead looks more like
“get the world to be aware enough to not bumble into an apocalypse, specifically by promoting rationality, which will let key decision-makers clear out the misaligned memes that keep them from seeing clearly”
Is that a fair summary? If so, I’m pretty skeptical of the proposed AI alignment strategy, even conditional on this strong memetic selection and orthogonality actually happening. It seems like this strategy requires pretty deeply influencing the worldview of many world leaders. That is obviously very difficult because no movement that I’m aware of has done it (at least, quickly), and I think they all would like to if they judged it doable. Importantly, the reduce-tax strategy requires clarifying and solving a complicated philosophical/technical problem, which is also very difficult. I think it’s more promising for the following reasons:
It has a stronger precedent (historical examples I’d reference include the invention of computability theory, the invention of information theory and cybernetics, and the adventures in logic leading up to Godel)
It’s more in line with rationalists’ general skill set, since the group is much more skewed towards analytical thinking and technical problem-solving than towards government/policy folks and being influential among those kinds of people
The number of people we would need to influence will go up as AGI tech becomes easier to develop, and every one is a single point of failure.
To be fair, these strategies are not in a strict either/or, and luckily use largely separate talent pools. But if the proposal here ultimately comes down to moving fungible resources towards the become-aware strategy and away from the technical-alignment strategy, I think I (mid-tentatively) disagree
I think we’re seeing Friendly memetic tech evolving that can change how influence comes about.
Wait, literally evolving? How? Coincidence despite orthogonality? Did someone successfully set up an environment that selects for Friendly memes? Or is this not literally evolving, but more like “being developed”?
The key tipping point isn’t “World leaders are influenced” but is instead “The Friendly memetic tech hatches a different way of being that can spread quickly.” And the plausible candidates I’ve seen often suggest it’ll spread superexponentially.
Whoa! I would love to hear more about these plausible candidates.
There’s insufficient collective will to do enough of the right kind of alignment research.
I parse this second point as something like “alignment is hard enough that you need way more quality-adjusted research-years (QARY’s?) than the current track is capable of producing. This means that to have any reasonable shot at success, you basically have to launch a Much larger (but still aligned) movement via memetic tech, or just pray you’re the messiah and can singlehandedly provide all the research value of that mass movement.”. That seems plausible, and concerning, but highly sensitive to difficulty of alignment problem—which I personally have practically zero idea how to forecast.
Memetic evolution dominates biological evolution for the same reason.
Faster mutation rate doesn’t just produce faster evolution—it also reduces the steady-state fitness. Complex machinery can’t reliably be evolved if pieces of it are breaking all the time. I’m mostly relying No Evolutions for Corporations or Nanodevices plus one undergrad course in evolutionary bio here.
Also, just empirically: memetic evolution produced civilization, social movements, Crusades, the Nazis, etc.
Thank you for pointing this out. I agree with the empirical observation that we’ve had some very virulent and impactful memes. I’m skeptical about saying that those were produced by evolution rather than something more like genetic drift, because of the mutation-rate argument. But given that observation, I don’t know if it matters if there’s evolution going on or not. What we’re concerned with is the impact, not the mechanism.
I think at this point I’m mostly just objecting to the aesthetic and some less-rigorous claims that aren’t really important, not the core of what you’re arguing. Does it just come down to something like:
“Ideas can be highly infectious and strongly affect behavior. Before you do anything, check for ideas in your head which affect your behavior in ways you don’t like. And before you try and tackle a global-scale problem with a small-scale effort, see if you can get an idea out into the world to get help.”
What do you think about the effectiveness of the particular method of digital decluttering recommended by Digital Minimalism? What modifications would you recommend? Ideal duration?
One reason I have yet to do a month-long declutter is because I remember thinking something like “this process sounds like something Cal Newport just kinda made up and didn’t particularly test, my own methods that I think of for me will probably better than Cal’s method he thought of for him”.
So far my own methods have not worked.
this is great,thanks!
I don’t think I understand how the scorecard works. From:
[the scorecard] takes all that horrific complexity and distills it into a nice standardized scorecard—exactly the kind of thing that genetically-hardcoded circuits in the Steering Subsystem can easily process.
And this makes sense. But when I picture how it could actually work, I bump into an issue. Is the scorecard learned, or hard-coded?
If the scorecard is learned, then it needs a training signal from Steering. But if it’s useless at the start, it can’t provide a training signal. On the other hand, since the “ontology” of the Learning subsystem is learned-from-scratch, then it seems difficult for a hard-coded scorecard to do this translation task.
Agree that this is definitely a plausible strategy, and that it doesn’t get anywhere near as much attention as it seemingly deserves, for reasons unknown to me. Strong upvote for the post, I want to see some serious discussion on this. Some preliminary thoughts:
How did we get here?
If I had to guess, the lack of discussion on this seems likely due to a founder effect. The people pulling the alarm in the early days of AGI safety concerns were disproportionately to the technical/philosophical side rather than to the policy/outreach/activism side.
In early days, focus on the technical problem makes sense. When you are the only person in the world working on AGI, all the delay in the world won’t help unless the alignment problem gets solved. But we are working at very different margins nowadays.
There’s also an obvious trap which makes motivated reasoning really easy. Often, the first thing that occurs when thinking about slowing down AGI development is sabotage—maybe because this feels urgent and drastic? It’s an obviously bad idea, and maybe that lets us to motivated stopping.
Maybe the “technical/policy” dichotomy is keeping us from thinking of obvious ways we could be making the future much safer? The outreach org you propose seems like not really either. Would be interested in brainstorming other major ways to affect the world, but not gonna do that in this comment.
HEY! FTX! OVER HERE!!
You should submit this to the Future Fund’s ideas competition, even though it’s technically closed. I’m really tempted to do it myself just to make sure it gets done, and very well might submit something in this vein once I’ve done a more detailed brainstorm.
now this is how you win the first-ever “most meetings” prize
I think a lot of this discussion becomes clearer if we taboo “intelligence” as something like “ability to search and select a high-ranked option from a large pool of strategies”.
Agree that the rate-limiting step for a superhuman intelligence trying to affect the world will probably be stuff that does not scale very well with intelligence, like large-scale transport, construction, smelting widgets, etc. However, I’m not sure it would be so severe a limitation as to produce situations like what you describe, where a superhuman intelligence sits around for a month waiting for more niobium. The more strategies you are able to search over, the more likely it is that you’ll hit on a faster way of getting niobium.
Agree that being able to maneuver in human society and simulate/manipulate humans socially would probably be much more difficult for a non-human intelligence than some other tasks humans might think of as equally difficult, since humans have a bunch of special-purpose mechanisms for that kind of thing. That being said, I’m not convinced it is so hard as to be practically impossible for any non-human to do. The amount of search power it took evolution to find those abilities isn’t so staggering that it could never be matched.
I’m pretty surprised by the position that “intelligence is [not] incredibly useful for, well, anything”. This seems much more extreme than the position that “intelligence won’t solve literally everything”, and like it requires an alternative explanation of the success of homo sapiens.
Thank you for posting this! There’s a lot of stuff I’m not mentioning because confirming agreements all the time makes for a lot of comment clutter, but there’s plenty of stuff to chew on here. In particular, the historical rate of scientific progress seems like a real puzzle that requires some explanation.