Fair enough. You were thinking about the problem from the point of view of hiring a teacher; when projecting it onto the problem from the point of a teacher deciding how to teach, I had to make additional assumptions not in the original post (ie, that “teachers care about true performance to some degree”).
Still, I think that putting it in concrete terms like this helped me understand (and agree with) the basic idea.
Am I correct in saying that this suggests avoiding Goodhart’s law by using pass/fail grading? Or at least, by putting a maximum on artificial rewards, such that optimizing for the reward is senseless beyond that point?
Let’s take a common case of Goodhart’s law: teachers who are paid based on their students’ test scores. Imagine that teachers are either good or bad, and can either teach to the test (strategize) or not. Both true and measured performance are better on average for good teachers than for bad, but have some random variance. Meanwhile, true performance is better when teachers don’t strategize, but measured performance is better when they do.
If good teachers care to some degree about true performance, and you set an appropriate cutoff and payouts, the “quantilized” equilibrium will be that good teachers don’t strategize (since they’re relatively confident that they can pass the threshold without it), but bad teachers do (to maximize their chances of passing the threshold). Meanwhile, good teachers still get higher average payouts than bad teachers. This is probably better than the Goodhart case where you manage to pay good teachers a bigger bonus relative to bad teachers, but all teachers strategize to maximize their payout. So this formalization seems to make sense in this simple test case.
ETA: I was trying to succinctly formalize the example above and I got as far as (U~𝒩(μ(teacher)-δ*strategy,σ²); I=-2δ*strategy ) but that is taking I as the difference between the test score and the true utility, not separating out test scores from payouts, and I don’t want to write out all the complications that result from that so I quit. I hope that the words are enough to understand what I meant. Also I don’t know why I was doing that via unicode when I should have just used LaTeX.
I’ll look into those possibilities. However, though my proposed work relates to AI alignment, it is not focused on that issue; and I’d consider it “outside the dominant paradigm” of AI alignment work.
Edited to add: I was going to do a separate post about those possibilities, but it appears that this website is a reasonably up-to-date summary of all the funding sources that are linked from that post, so me repeating that work would be redundant..
My time horizon is about 6 months. I could probably extend that by a few months but that would involve (tolerable but noticeable) sacrifices. So the difference between 1-6 months and 6-9 is meaningful to me, though not completely dispositive.
Crossposted on EA forum.
I am aware of the EA hotel, but since I have a family, I think it’s probably not an option. Thanks for the EA forum suggestion; I planned to go there next, but thought here was the best place to start (highest upside-to-downside ratio for a half-baked query).
I just published this, and it’s not immediately clear to me whether or not I put it in the right place, as a personal post. I expected to be asked “where do you want to publish this” when I clicked “publish”. I’ll try to make sure it’s in the right place but this interface is not transparent to me.
I encounter the same problem when I’m writing about voting theory. But there is a set of people who have followed past discussion closely enough to follow something technical like this with a glossary, but not without one. My solution has been to make sure every acronym I use has an entry on electowiki, and then include a note saying so with a link to electowiki. I think you could helpfully do the same using less wrong wiki.
Where’s the glossary again?
I’m an obsessive about voting theory, and have been for over 20 years now. As time passes and my knowledge deepens, I find that while I still feel “this is really important and people don’t pay enough attention to it”, I feel less and less that “this is MORE important than whatever people are talking about here and now, and it should be my job to make them change the subject”. Obviously I think this is a healthy change for me and my social graces, but it also means that you are more likely to hear about voting theory from a younger, shallower version of me than you are from me.
I don’t know how to solve that problem. It’s one thing to be immune enough to evangelists so that you can keep a balance of caring across multiple issues, as discussed in the post above; it’s another harder thing to be immune enough yet still curious enough to find your way past the proselytizers to the calmer, more-mature non-evangelist obsessives.
In my anecdotal experience, the kids are OK. At least as OK as we were when I was a kid in the 80s reading SF from the 60s and 70s.
If you want me to take this hypothesis more seriously than that, show more evidence.
On Gibbard-Satterthwaite, you are wrong. Please read the original papers; Wikipedia is not definitive here. There is a sense in which the sentence you quote from Wikipedia is not quite wrong, but that sense is so limited that the conclusion you draw from it is not supported.
In terms of the “craziest possible option” strategy: people may deliberately vote for something they believe will not win in order to “build up” voting power for later. When they decided to actually spend this built-up power, they would not vote for something crazy. Insofar as this strategy artificially increases their overall voting power over that of other voters, it undermines the fairness of the system. And in the worst case, it could backfire by actually electing a crazy option. In case of backfire, this would obviously not be a rational strategy ex post, but I believe the collective risk of such failed rationality is unacceptably high.
As for the “rich irony” of me calling something a nonstarter politically: just this week, approval voting passed in Fargo; and STAR voting came within a few percent of passing in Lane County, OR. Last summer, thousands of people voted on the Hugo Awards which had been nominated through E Pluribus Hugo. In British Columbia, voters are currently deciding between four election methods, three of which are proportional and two to three of which have never been used. I personally played a meaningful role in each of these efforts, and a pivotal role in some cases. All of these are clearly far beyond “nonstarter politically”. So yes, I’m not afraid to tilt at windmills sometimes, but sometimes the windmills actually are giants, and sometimes the giants lose. I believe I’ve earned some right to express an opinion about when that might be, and when it might not.
Can you define it in terms of “sensory”, “motor”, and “processing”? That is, in order to be an optimizer, you must have some awareness of the state of some system; at least two options for behavior that affect that system in some way; and a connection from awareness to action that tends to increase some objective function.
Works for bottle cap: no sensory, only one motor option.
Works for liver: senses blood, does not sense bank account. Former is a proxy for latter but a very poor one.
For bubbles? This definition would call bubbles optimizers of finding lower pressure areas of liquid, iff you say that they have the “option” of moving in some other direction. I’m OK with having a fuzzy definition in this case; in some circumstances, you might *want* to consider bubbles as optimizers, while in others, it might work better to take them as mechanical rule-followers.
Stag hunt has two equilibria and only the good one is strong. Prisoner’s dilemma has only 1 bad equilibrium. But here we’re talking asymmetrical Snowdrift/Chicken, where both the bad and good equilibria are strong, but, if there’s uncertainty about which is which, the best outcome is non-equilibrium mutual cooperation.
Condorcet is good. The one fundamental sense in which 3-2-1 is better is a better resistance to dark horse pathology, especially in the context of combined delegated and tactical voting. In Condorcet, in a highly-polarized situation, somebody 90% of everybody’s never heard of might be the Condorcet winner because each side rates them above the other. In 3-2-1, that person never makes it to the top 3.
This is not a strong argument, but it’s the one I have.
As regards IRV, it’s definitely worse than either.
“Summable” voting methods require only anonymous tallies (totals by candidate or candidate pair) to find a winner. These do not suffer from the problem you suggest.
But for non-summable methods, such as IRV/RCV/STV, you are absolutely correct. These methods must sacrifice either verifiability/auditability or anonymity. This is just one of the reasons such reforms are not ideal (though still better than choose-one voting, aka plurality).
I think this is an unfixably bad idea, in two ways: it’s a nonstarter politically, and it would be bad if it did get implemented.
I largely agree with the section on what’s wrong with the current situation.
But this goes off the rails when it asserts, in passing, that score voting is immune to the Gibbard-Satterthwaite theorem. Read the Satterthwaite proof of this theorem, and you’ll see how general it is. Cardinal voting escapes Arrow’s theorem, but does NOT escape G-S.
In particular, any proportional method is subject to free riding strategy. And since this system is designed to be proportional across time as well as seats, free riding strategy would be absolutely pervasive, and I suspect it would take the form of deliberately voting for the craziest possible option. If I’m right then, like Borda, this system could actually be worse than random-ballot-single-winner; impressively bad.
I think it’s great that you’re thinking about structural reform and voting reform, and you’re on the right track in many regards. I just hope you can let go of this particular idea. I’m sorry to be so negative, but I think it’s warranted here.