re: whether socialist firms work, I think the main problem is going to be managers making worse decisions because they’re more blended with the interests of labor. For example, a firm where decisions are made by the workers-as-a-whole probably wants to pay workers according to their average product instead of their marginal product, but this means they’ll hire too few workers (as average product is larger than marginal product, and so the ‘last workers’ are more expensive in the average-product firm).
My guess is that the main application area this should be investigated for is promotion / facilitation in developing countries. [I don’t see an obviously good way to shift US/European policy to promote this, and would rather limited reform effort be spent towards something like land use / permitting / voting reform. EA organizations are generally either non-profits (where making them ’co-op’s seems unlikely to have large effects) or for-profits where limiting the upside to owners is probably desirable (since the for-profit’s EA value is typically earning to give / it being easy to hire non-EA workers for a company with EA owners).]
Some scattered thoughts as trailheads:
Most of the evidence base is probably in developed countries, and so it makes sense to run more experiments / figure out which conditions favor co-ops. [The traditional ‘family firm’ / ‘corporate family’ is, in many ways, a co-op, just with a much higher barrier to entry.]
Legal agreements are harder to execute and trust, especially at lower levels of wealth. [See The Mystery of Capital for more.] Having boilerplate co-op formation agreements (or administrative support for forming co-ops) for various countries will probably touch one of the main pain points for creating them.
Management consulting firms tend to be more effective in developing countries than developed ones (as they’re more likely to be able to point out a ‘basic’ principle or practice that the managers would have picked up in business school in the developed world, but is not common practice in the developing country). It might be possible to find (or found) such a firm focused on doing consulting in developing countries, and have them push formation as a co-op. [My guess is it’s easier to get entrepreneurs to form new businesses in a new format than reform existing businesses as them, but this is an empirical question that’s worth checking.] This might also be a good laboratory for random assignment (noting that you can only randomize whether you advise they become a co-op, not whether they actually do, and places where it’s a worse idea might be more resistant).
Relatedly, one of the big transitions is from ‘owner as whip-cracker’ to ‘owner as optimizer’ (see here for more); it may be that co-ops have an easier time with this than traditional firms, and this gives them a more substantial edge in places where that transition is incomplete. [Given that this is both a cultural and a financial change, my guess is this edge is actually not that large; Taylor is pretty clear on incentives needing to be individual to have the largest effect.]
It might be possible to set up a microlending program that focuses on creating co-ops, or partially buying out existing owners to donate stake to the workers. [I imagine a core challenge here is that workers are not going to be excited about the risk associated with being in a co-op, and it takes some external energy source to get over the activation barrier here.]
I want to point out that I think the typical important case looks more like “wanting to do things for unusual reasons,” and if you’re worried about this approach breaking down there that seems like a pretty central obstacle. For example, suppose rather than trying to maintain a situation (the diamond stays in the vault) we’re trying to extrapolate (like coming up with a safe cancer cure). When looking at a novel medication to solve an unsolved problem, we won’t be able to say “well, it cures the cancer for the normal reason” because there aren’t any positive examples to compare to (or they’ll be identifiably different).
It might still work out, because when we ask “is the patient healthy?” there is something like “the normal reason” there. [But then maybe it doesn’t work for Dyson Sphere designs, or so on.]
I came here to say something roughly like Jim’s comment, but… I think what I actually want is grounding? Like, sure, you were playing the addictive fear game and now think you’re out of it. But do you think I was? If you think there’s something that differentiates people who are and aren’t, what is it?
[Like, “your heart rate increases when you think about AI” isn’t a definitive factor one way or another, but probably you could come up with a list of a dozen such indicators, and people could see which are true for them, and we could end up with population statistics.]
I’m interested in shorting Tether (for the obvious reasons), but my default expectation is that there are not counterparties / pathways that are robustly ‘good for it’. (Get lots of shorts on an exchange, then the exchange dies, then I look sadly at my paper profits.) Are there methods that people trust for doing this? Why do you trust them?
I’ve done this at a small startup before; it worked pretty well.
You can try to have feedback separately on the ‘ultimate desirability’ of consequences and the ‘practical usefulness’ of actions, where you build the consequence-prediction model solely from experimental data and the value-estimation model solely from human feedback. I think this runs into serious issues because humans have to solve the mixed problem, not the split problem, and so it will be difficult for humans to give well-split training data.
As well, having a solution that’s “real but expensive” would be a real step up from having no solution!
Thinking about the ‘losing is fun’ nature of some games. The slogan was originally popularized by Dwarf Fortress, but IMO the game that did it best was They Are Billions (basically, an asymmetrical RTS game where if the zombies break thru, they grow exponentially and so will probably wipe you out in moments). You would lose a run, know why you lost, and then maybe figure out the policy that meant you wouldn’t lose the next time.Another game I’ve been playing recently is Terra Invicta, a long game (technically a pausable RTS but much more like a TBS?) with a challenging UI (in large part because it has a ton of info to convey) where… I don’t think I ever actually lost, but I would consistently reach a point where I said “oh, I didn’t realize how to do X, and now that I know how, by missing out on it I think I’m behind enough that I should start over.”Similarly, in Factorio/Satisfactory/Dyson Sphere Program, I think I often reach a point where I say “oh, I’ve laid things out terribly / sequenced them wrong, I should start over and do it right this time.”But… this is sort of crazy, and I don’t quite understand what’s up with that part of my psychology. For a game like Satisfactory, I’m nearly strictly better off deconstructing everything and laying it out again than starting over (and often better off just moving to a new start location and leaving the old factory in place). Even for Terra Invicta, I’m probably better off using the various compensatory mechanisms (“you were too slow building moonbases, and so other people got the good spots? This is how you take over bases with a commando team”) than restarting. It’s more like… wanting to practice a performance, or experience the “everything goes well” trajectory rather than figuring out how to recover from many different positions. Why am I into that slice of what games can be?
There was already a moratorium on funding GoF research in 2014 after an uproar in 2011, which was not renewed when it expired. There was a Senate bill in 2021 to make the moratorium permanent (and, I think, more far-reaching, in that institutions that did any such research were ineligible for federal funding, i.e. much more like a ban on doing it at all than simply deciding not to fund those projects) that, as far as I can tell, stalled out. I don’t think this policy ask was anywhere near as crazy as the AI policy asks that we would need to make the AGI transition survivable!
It sounds like you’re arguing “look, if your sense of easy and hard is miscalibrated, you can’t reason by saying ‘if they can’t do easy things, then they can’t do hard things’,” which seems like a reasonable criticism on logical grounds but not probabilistic ones. Surely not being able to do things that seem easy is evidence that one’s not able to do things that seem hard?
Has EA invested much into banning gain-of-function research?
If it hasn’t, shouldn’t that negatively update us on how EA policy investment for AI will go?[In the sense that this seems like a slam dunk policy to me from where I sit, and if the policy landscape is such that it and things like it are not worth trying, then probably policy can’t deliver the wins we need in the much harder AI space.]
I just finished reading The Principles of Scientific Management, an old book from 1911 where Taylor, the first ‘industrial engineer’ and one of the first management consultants, had retired from consulting and wrote down the principles behind his approach.
[This is part of a general interest in intellectual archaeology; I got a masters degree in the modern version of the field he initiated, so there wasn’t too much that seemed like it had been lost with time, except perhaps some of the focus on making it palatable to the workers too; I mostly appreciated the handful of real examples from a century ago.]
But one of the bits I found interesting was thinking about a lot of the ways EY approaches cognition as, like, doing scientific management to thoughts? Like, the focus on wasted motion from this post. From the book, talking about why management needs to do the scientific effort, instead of the laborers:
The workman’s whole time is each day taken in actually doing the work with his hands, so that, even if he had the necessary education and habits of generalizing in his thought, he lacks the time and the opportunity for developing these laws, because the study of even a simple law involving, say, time study requires the cooperation of two men, the one doing the work while the other times him with a stop-watch.
This reminds me of… I think it was Valentine, actually, talking about doing a PhD in math education which included lots of watching mathematicians solving problems, in a way that feels sort of like timing them with a stop-watch.
I think this makes me relatively more excited about pair debugging, not just as a “people have less bugs” exercise but also as a “have enough metacognition between two people to actually study thoughts” exercise.
Like, one of the interesting things about the book is the observation that a switch from ‘initiative and incentive’ workplaces, where the boss puts all responsibility to do well on the worker and pays them if they do, to ‘scientific management’ workplaces, where the boss is trying to understand and optimize the process, and teach the worker how to be a good part of it, is that the workers in the ‘scientific management’ workplace can do much more sophisticated jobs, because they’re being taught how instead of having to figure it out on their own.
[You might imagine that a person of some fixed talent level could be taught how to do jobs at some higher complexity range than the ones they can do alright without support, which itself is a higher complexity range than jobs that they could both simultaneously do and optimize.]
“Security” for me has the connotation of being explicitely in relation to malicious adversaries
But, like, in my view the main issue with advanced AI capabilities is that they come with an adversary built-in. This does make it fundamentally different from protecting against an external adversary, but I think the change is smaller than, say, the change from “preventing people from falling off the boat.” Like, the issue isn’t that the boat deck is going to be wet, or the sea will be stormy, or stuff like that; the issue is that the boat is thinking!
This logic is based on: student 37 strongly suggesting that you can make classification mistakes early, and even in obvious cases; and looking at ‘% of INT<10 students in Thought-Talon’ and ‘% of COU<10 students in Dragonslayer’ as relatively unambiguous mistakes we can track the frequency of.
Tho presumably it could be the case that even if a student will be a poor fit for Thought-Talon, they would be an even poorer fit everywhere else?
It looks like it happened.
Is this saying “if model performance is getting better, then maybe it will have a sharp left turn, and if model performance isn’t getting better, then it won’t”?
We may have one example of realized out-of-distribution alignment: maternal attachment.
When someone becomes maternally attached towards a dog, doesn’t this count as an out-of-distribution alignment failure?
Makes sense; it wouldn’t surprise me if that’s what’s happening. I think this perhaps understates the degree to which the attempts at capture were mutual—a theory of change where OPP gives money to OpenAI in exchange for a board seat and the elevation of safety-conscious employees at OpenAI seems like a pretty good way to have an effect. [This still leaves the question of how OPP assesses safety-consciousness.]I should also note find the ‘nondisparagement agreements’ people have signed with OpenAI somewhat troubling because it means many people with high context will not be writing comments like Adam Scholl’s above if they wanted to, and so the absence of evidence is not as much evidence of absence as one would hope.
Both of whom then left for Anthropic with the split, right?
In particular, all of the RLHF work is basically capabilities work which makes alignment harder in the long term (because it directly selects for deception), while billing itself as “alignment”.
I share your opinion of RLHF work but I’m not sure I share your opinion of its consequences. For situations where people don’t believe arguments that RLHF is fundamentally flawed because they’re too focused on empirical evidence over arguments, the generation of empirical evidence that RLHF is flawed seems pretty useful for convincing them!