Interesting—I interpreted this section differently, and yet I think it ultimately cashes out as agreeing with your comment about incentives.
In my reading, the clear concrete instructions are about the priorities, and about how to communicate. From the rest of the post I understood clearly that this means instructions like:
Priority 1 this week is X. In any decision with a tradeoff between X and Y, choose X.
Work on X for the next 4 hours after this meeting. Do not work on anything else.
Schedule miscellaneous meetings on Tuesdays. Do not schedule them on any other day.
I think this cashes out as setting good incentives because these kinds of instructions make it very easy to evaluate the goodness of decisions, going as far as to effectively make a bunch of them automatically. I feel like we always have an incentive to go with the easy decision, and always have an incentive to follow instructions, which neatly screens off some bad things. In this way, the incentives are properly aligned.
Reviewing the examples in the post again, I think I was confused on first reading. I initially read the nuclear reactor example as being a completed version of the Michaelangelo example, but now I see it clearly includes the harms issue I was thinking about.I also think that the Library of Babel example contains my search thoughts, just not separated out in the same way as in the Poorly Calibrated Heuristics section.I’m going to chalk this one up to an oops!
Upvoted, because I think this a naturally interesting topic and is always relevant on LessWrong. I particularly like the threshold optimization section—it is accessible for people who aren’t especially advanced in math, and sacrifices little in terms of flow and readability for the rigor gains.I don’t agree that the cost of a false-positive is negligible in general. In order for that to be true, search would have to be reliable and efficient, which among other things means we would need to know what brilliant looked like, in searchable terms, before we found it. This has not been my experience in any domain I have encountered; by contrast it reliably takes me years of repeated searches to refine down to being able to identify the brilliant stuff. It appears to me that search grants access to the best stuff eventually, but doesn’t do a good job of making the best stuff easy to find. That’s the job of a filter (or curator).A second objection is that false positives can easily be outright harmful. For example, consider history: the 90% crap history is factually wrong along dimensions running from accidentally repeating defunct folk wisdom to deliberate fabrications with malicious intent. Crap history directly causes false beliefs, which is very different from bad poetry which is at worst aesthetically repulsive. Crap medical research, which I think in Sturgeon’s sense would include things like anti-vaccine websites and claims that essential oils cure cancer, cause beliefs that directly lead to suffering and death. This is weirdly also a search problem, since it is much easier for everyone to access the worst things which are free, and the best things are normally gated behind journals or limited-access libraries.
On reflection, I conclude that the precision-recall tradeoff varies based on subject, and also separately based on search, and that both types of variance are large.
This isn’t spending per se; rather it is increasing costs. Any increase in spending happens in the course of existing programs, such as handing out more loans once students respond to the incentives.
On top of this, the federal financial structure is a unique and horrifying house of cards. Their accounting methods are actually unique to them, and sometimes vary by department or agency; auditing is difficult and inconsistent; much of it is seemingly designed to obfuscate, though in a change-the-standards-by-committee-to-make-us-look-less-bad way rather than an intelligence/defense sort of way.
Since most of what the President is doing is changing how existing programs and agencies operate, these maneuvers would be difficult to challenge. If any particular move is challenged, it can almost certainly be accomplished in a different way on firmer authority grounds. Meanwhile, someone has to bear the publicity burden of trying to ensure student debt never goes down to push the case all the way to the Supreme Court, which is a long and expensive task.All of this assuming the Supreme Court would even hear such a case. This is one of those things that is difficult to bring before them due to the rules about standing, which is to say whether there is anyone suitable to bring the suit. The simplified version is that the person who sues has to have been harmed; but how to establish the harm to a person from someone else having debt forgiven or restructured?
I wonder if someone were to form a credible educational institution that used income sharing agreements in lieu of various loans, whether it could directly out-compete the current university system.
I am not doing anything different from you, but I don’t see any major tactical shifts that make much sense. The problem is that 401k and index funds already are the maximum-uncertain-future choices, for any future where the stock market succeeds as an institution. Residential real estate already is the lowest risk bet for any future where land is assessed according to price rather than according to use.
So mostly what I am trying to do is:
Identify ways to make my property more useful. This is basic things, like growing a chunk of our food, increasing the amount of maintenance I can do myself by owning tools and practicing, etc.
Try to identify triggers, which is to say things which clearly indicate it is time to dispose of a particular asset (or at least stop investing further in it). I am not successful in this so far, and the alternative remains “accumulate cash.”
The core of my intuition about this problem: the less certainty there is, the higher the premium on options. On the other hand, the only real options are the ones we can actually execute. This causes me to believe that the best investments have more to do with skills and knowledge—particularly of coherent approaches to problems that you expect to crop up, or at least where to find out about them. This is to save time and resources spent on search when the circumstances change.
There is an entirely different approach which I probably invest more thinking in, though less money and physical effort (so far): opportunities to make a contribution. By this I mean pro-social business ideas. The most recent example was following the supply chain crunch, and after reading A Brief Introduction to Container Logistics and I considered a business which went around buying up these containers and leases from the various participants in the name of being able to agree to both sides of the impasse described in the article. Still might if we hit another major crunch.
I agree they make for really good stories. I tell you what I would like to see more of in these stories is leaning into the moral dessert of it all.
Fox and Hound: make friends and gain the ability to survive bear attacks!
Mononoke: make not-enemies and not-die to spirit stampedes or cold iron!
Primal: make friends and you can eat anything!
Actually, the Primal example is so on the nose I feel like a better term is needed for coordination-related-morality. Moral dinner seems fitting. Be good, so you can eat.
If we’re talking Mad Max: Fury Road, or even Beyond Thunderdome, this feels like the characters are reclaiming a moral boundary that had collapsed.
Though I also note they are quite a bit more focused on the community element: do they want to be a community together; can they, personally, deal with those requirements; can they find a place and resources to do it; etc.
Other communities exist, but are overpoweringly and explicitly ingroup-eats-outgroup or even ingroup-eats-ingroup in the sense of being exploitative.
Chiefly because this is walking face-first into a race-to-the-bottom condition on purpose. There is a complete lack of causal information here.
I should probably clarify that I don’t believe this would be the chain of reasoning among alignment motivated people, but I can totally accept it from people who are alignment-aware-but-not-motivated. For example, this sort of seems like the thinking among people who started OpenAI initially.
A similar chain of reasoning an alignment motivated person might follow is: “3.5 years ago I predicted 5 years based on X and Y, and I observe X and Y are on on track. Since I also predicted 1.5 years to implement the best current idea, it is time to implement the best current idea now.”
The important detail is that this chain of reasoning rests on the factors X and Y, which I claim are also candidates for being strategically relevant.
I agree. I think of this as timelines not being particularly actionable: even in the case of a very short timeline of 5 years, I do not believe that the chain of reasoning would be “3.5 years ago I predicted 5 years, and I also predicted 1.5 years to implement the best current idea, so it is time to implement the best current idea now.”
Reasoning directly from the amount of time feels like a self-fulfilling prophecy this way. On the other hand, it feels like the model which generated the amount of time should somehow be strategically relevant. On the other other hand my model has quite collapsed in on itself post-Gato, so my instinct is probably truthfully the reverse: a better sense of what is strategically relevant to alignment causes more accurate timelines through a better generative model.
I upvoted this post—it is a good stab based on the easily accessible public information and a look at relevant theory.
Hypothesis 5 is the path to victory here. The core problem is that (almost) everyone is wrong about (almost) everything, and the least wrong people do not form a group. Some examples:
Moderates do not exist. By this I mean there is not and never was any such group of people. The existence of moderates is a mistake in tabulating the results of political surveys. The mistake looks like this: you might have a survey with multiple responses, and one person responds:
Q1. How do you feel about gay marriage?A1. Gay people should have civil unions rather than marriage
Q2. How involved should the government be in the economy?A2. Government should keep taxes lowBut another person responds with:
Q1. How do you feel about gay marriage?A1. Gay people should not be allowed to get married, or adopt, or teach childrenQ2. How involved should the government be in the economy?A2. Government should heavily tax the rich and important industries should be nationalized
Since both answers for the first person were conservative, the surveys marked that person as “very conservative.” The second person, with one extremely conservative answer and one extremely liberal answer, got marked as a moderate. It turns out if you graph the policy preferences of people who are moderate, undecided, or independent, they land all over the ideological map.
But most politicians, pundits, campaign staffers, and voters believe in moderates.
Money does not win elections. The narrative is straightforward here: the side with the most money usually wins, so money most be what caused victory. The best natural experiments for this are repeated contests between the same two people, and when we look at these contests we find that the effects of money are very weak: the number I remember is doubling the money brought 1% more of the vote and halving it cost 1%.
The alternative explanation based on this is the more likable/popular candidate receives more political donations. This one passes the smell test: if we invert these stories, it would be very weird if the actually-more-popular candidate systematically got less money, and it would be wild if the person who got more money systematically lost the election.
But most politicians, pundits, campaign staffers, and voters believe money wins elections.
Americans are not politically engaged. This one is something that campaign people understand, but the other groups mostly ignore in practice: a huge chunk eligible voters don’t vote. Campaigns target the populations that do vote, and policy is heavily influence by what is popular in campaigns.
The key insight here is that both parties in the US tend to enforce this rather than try to expand the electorate. While the blue team is notionally friendlier to this idea than the red team, what they do in practice is try to increase the participation rate of groups which they already expect to support them. This is why voter turnout is the conversation: the default election strategy is to identify the groups of likely voters and that are likely to support your side, then try to get the number of them that show up on election day as close to 100% as possible. At the same time, it is common to try and reduce voter turnout for the opponent. This is the mechanism of negative advertising—it drives people to stay home and not vote at all.
But most politicians, pundits, and voters believe campaigns are trying to persuade the public to vote for them instead of the other person. This is similar to the mechanism of the primaries hypothesis above, but primaries are explicit to party membership and activity.
Some issues are more relevant than others. The word political scientists use for this is salience. In elections, no candidate is ever evaluated by the public on the basis of all of their positions; instead, there are usually a few issues the election becomes “about”, and the candidates which are better positioned on those issues tend to win (subject to the turnout considerations above). This one also passes the smell test—even the smallest elections encompass a wide range of issues, and I definitely expect attention bandwidth to be a limiting factor.This is why messaging is so important—if a side can keep messaging discipline, then what the message is about is more likely to be what the election is about. I think this is also what drives a lot of those “X is really about Y” style arguments—these look to me like bids to increase the salience of one issue at the expense of another.
But, most politicians, pundits, and campaign staffers tend to act like the stuff they care about should be the issues of the campaign. This is much like your echo chamber hypothesis above.
All of this is compounded by all the usual problems like how difficult social research is, statistical incompetence, professional biases, etc.
In summary, I feel like the true model is a model of everyone else’s models being borked.
I also thought these looked similar, so I dedicated a half-hour or so of searching and I could not turn up any relation between either of the authors of the Research Gate summary and Boyd or the military as far as their Wikipedia pages and partial publication lists go. It appears those two have been writing books together on this set of principles since 2001, based on work going back to the 60′s and drawing from the systems management literature.
I also checked for some links between Rickover and Boyd, which I thought might be valid because one of Boyd’s other areas of achievement was as a ruthless trainer of fighter pilots, which seemed connected through the Navy’s nuclear training program. Alas, a couple of shots found them only together in the same document for one generic media article talking about famous ideas from the military.
It sort of looks like Rickover landed on a similar set of principles to Boyd’s, but with a goal more like trying to enforce a maximum loop size organization-wide for responding to circumstances.
I’m pretty sure I got this advice from Yudkowsky at some point, in a post full of writing advice, but I can’t find the reference at the moment.
I think this is in The 5 Second Level, specifically the parts describing and quoting from S. I. Hayakawa’s Language in Thought and Action.
I am super curious about how you conceptualize the relationship between the theorist’s theory space problem and the experimentalist’s high dimensional world in this case. For example:
Liberally abusing the language of your abstraction posts, is the theory like a map of different info summaries to each other, and the experimentalist produces the info summaries to be mapped, or the theorist points to a blank spot in their map where a summary could fit?
Or is that too well defined, and it is something more like the theorist only has a list of existing summaries, and uses these to try and give the experimentalist a general direction (through dimensionality)?
Or could it be something more like the experimenter has a pile of summaries because their job is moving through the markov blanket, and the theorist has the summarization rules?
Note that I haven’t really absorbed the abstractions stuff yet, so if you’ve covered it elsewhere please link; and if you haven’t wrangled the theory-space-vs-HD-world issue yet I’d still be happy if the answer was just some babbling.
This is totally different from creating comfort. I think lots of folk get this one confused. Your comfort is none of my business, and vice versa. If I can keep that straight while coming from a same-sided POV, and if you do something similar, then it’s easy to argue and listen both in good faith.
I agree that same-sidedness and comfort are totally different things, and I really appreciate the bluntness of same-sidedness as a term. I I also think you are undervaluing comfort here. People who are not comfortable do not reveal their true beliefs; same-sidedness doesn’t appear to resolve this problem because people who are not comfortable do not reveal their true beliefs even to themselves.
I have just recently been wondering where we stand on the very basic description of the problem criteria for productive conversations. Of late our conversations seem to have more of the flavor of proposal for solution → criticism of solution, which of course is fine if we have the problem described; but if that were the case why do so many criticisms take the form of disagreements over the nature of the problem?
A very reasonable objection is that there are too many unknowns at work, so people are working on those. But this feels like one meta-problem, so the same reasoning should apply and we want a description of the meta-problem.
I suppose it might be fair to say we are currently working on competing descriptions of the meta-problem. Note to self: doing another survey of the recent conversations with this in mind might be clarifying.
There is a list of jurisdictions which implement LVT either via single tax on value, or a split tax where the value of improvements is assessed separately (and charged a lower rate):https://en.wikipedia.org/wiki/Land_value_tax_in_the_United_StatesI think the best candidate for you is probably the Pittsburgh Business District, which seems small and close to your example. Alternatively I suggest Altoona City, which has the most-like-the-Georgians-advocate scheme of high value tax and zero improvement tax.All of these locations on the list should have public information available on the methods of assessment and actual revenue earned. Whether this can all be had over the internet is another matter, but following those threads should also bring up any studies done using their data.
Answering independently, I’d like to point out a few features of something like governance appearing as a result of the warning shot.
If a wave of new funding appears, it will be provided via grants according to the kind of criteria that make sense to Congress, which means AI Safety research will probably be in a similar position to cancer research since the War on Cancer was launched. This bodes poorly for our concerns.
If a set of regulations appear, they will ban or require things according to criteria that make sense to Congress. This looks to me like it stands a substantial chance of making several winning strategies actually illegal by accident, as well as accidentally emphasizing the most dangerous directions.
In general, once something has laws about it people stop reasoning about it morally, and default to the case of legal → good. I expect this to completely deactivate a majority of ML researchers with respect to alignment; it will simply be one more bureaucratic procedure for getting funding.
The ML sections touched on the subject of distributional shift a few times, which is that thing where the real world is different from the training environment in ways which wind up being important, but weren’t clear beforehand. I read the way to tackle this is called adversarial training, and what it means is you vary the training environment across all of its dimensions in order to to make it robust.
Could we abuse distributional shift to reliably break misaligned things, by adding fake dimensions? I imagine something like this:
We want the optimizer to move from point A to point B on a regular x,y graph.
Instead of training it a bunch of times on just an x,y graph, we add a third, fake dimension.
We do this multiple times, so for example we have one x,y graph and add a z dimension; and one x,y graph where we add a color dimension.
When the training is complete, we do some magic that is the equivalent of multiplying these two, which would zero out the fake dimensions (is the trick used by DeepMind with Gato similar to multiplying functions?) and leave us with the original x,y dimensions
I expect this would give us something less perfectly optimized than just focusing on the x,y graph, but any deceptive alignment would surely exploit the false dimension which goes away, and thus it would be broken/incoherent/ineffective.
So....could we give it enough false rope to hang itself?
Rhetoric about AGI: Notes to selfCaught the aftermath of a contest for generating short, no-context arguments for why AGI matters. It appears there will be a long-form contest in the future; these are basically notes towards an entry.
The categories of audience are too vague; my principal interest is in policymakers, so dedicate some time to analysis of (and by extension strategy for) this group.
This will require some consideration of how policy actually gets done. Big question, but I have a line of attack on that problem.
Be explicit about the rhetoric. This will provide the context for the strategy, a pattern people may mimic for future strategies, and as a nice bonus will serve as advocacy for the skillset. (Assuming it doesn’t suck)
What I expect to be able to come up with is a description of what kind of policymaker we want to convince, and a flexible plan of attack for identifying and then approaching ones who seem to fit that description. This stems from a hypothesis that policy change is accomplished by a small cadre of policymakers dedicated to the policy in question; and not accomplished by popularity, even among policymakers.
What exactly the dedicated cadre does is unclear to me, but my suspicion is that it mostly comes down to attention to detail and opportunism. Attention to detail because there are a lot of levels to making policy work that go beyond getting a bill passed, and only a dedicated person would be willing to invest that attention; opportunism because a dedicated cadre can rapidly deliver a finished product with most of the kinks worked out as soon as a window opens.