Referring to the “criticism of these numbers” arguments: the only one that stands out to me as very serious is 5. From my reading history and historiography, the problem of quantifying changes among groups of people who mostly practice subsistence farming and do not have any records of birth, health, death, or productivity is notorious. It looks to me like it would be down to archaeological and anthropological data to determine what their lives were like, and then the comparison with the lives of people on $X/day could begin.I wouldn’t go as far as calling the numbers bullshit, but wew lad do I expect the error bars to be huge on the early end of that chart. Time to go digging through those links to find out what they actually did!
Edit: yep, that’s the case. They took the extant work from historians using the archaeology/remains methods and combined them as well as they were able. I was interested to see that the highest-uncertainty parts aren’t so much the earlier periods as the periods where completely new products or radical product quality changes were introduced. So if we were to see the uncertainty, I expect it would start wider at the beginning and narrow as time went on, with spikes during stuff like the introduction of vaccines or factories.
Meta: I greatly appreciate that you took the time to contextualize the earlier relevant posts within this one.
Do you already have a plan of attack for the experimental testing? By this I mean using X application, or Y programming language, with Z amount of compute. If not, I would like to submit a request that you post that information when the time comes.
Recalling the Macroscopic Prediction paper by Jaynes, am I correct in interpreting this as being conceptually replacing the microphenomena/macrophenomena choices with near/far abstractions?
Following in this vein, does the phase-space trick seem to generalize to the abstractions level? By this I mean something like replacingpredict the behavior that can happen in the greatest number of ways, while agreeing with whatever information you have withchoose the low-dimensional summaries which have been constrained in the greatest number of ways, while accurately summarizing the far-away information
The tools at work I have used in the past were as much reference material as checklist; this had the effect of making them a completely separate, optional action item that people only use if they remember.
The example checklists from the post are all as basic as humanly possible: FLY AIRPLANE and WASH HANDS. These are all things everyone knows and can coordinate on anyway, but the checklist needs to be so simple that it doesn’t really register as an additional task. This feels like the same sort of bandwidth question as getting dozens or hundreds of people to coordinate on the statement USE THE CHECKLIST.
Put another way, I think that the reasoning in You Have About Five Words is recursive.
I’ve lately been contemplating the problem of developing high-quality checklists at work for troubleshooting programs that work with big data. It is easily the most difficult thing I am considering, but also easily the most productivity-improving given adoption. Previous efforts at getting such tools to work were not successful, but neither were they very good. The viability threshold seems *very* high, probably for You Have About Five Words reasons.
In Being The Pareto Best In The World you mention the problem of elbow room:
Problem is, for GEM purposes, elbow room matters. Maybe I’m the on the pareto frontier of Bayesian statistics and gerontology, but if there’s one person just little bit better at statistics and worse at gerontology than me, and another person just a little bit better at gerontology and worse at statistics, then GEM only gives me the advantage over a tiny little chunk of the skill-space.
I notice the converse of a multi-dimensional skillset is multi-dimensional assessment. In the same way it is hard to hire good programmers without knowing anything about programming, it will be hard for anyone else to assess a pareto-optimal product or skillset along multiple dimensions simultaneously.
It seems to me this challenge is pareto legibility. The more dimensions on the frontier, the noisier the assessment will necessarily be. This introduces a meta-problem where one of the skills on which you want to get good-enough is making your pareto frontier position legible enough for others to benefit from it.
As a practical matter this doesn’t seem like that big a deal for consumer goods like books, where even laypeople can take reviews of “X about this book was so good” and “I liked Y about this book” and round this off into a feeling of “muchly good.” By contrast, legibility seems exceptionally important for something like the econometric modeling applied to proteomics example.
Alternative title: Being The Pareto Best In The World for Writers.
This gives me the vague feeling that GPT-3-ing oneself might be a good way to check the clarity of one’s writing. If we train GPT-3 on all our writing over the course of a year, and then make prompts for it, how much would the coherency of GPT-3′s responses correlate with the clarity of the writing to the reader?
I also vote for very intuitive. The pareto frontier analogy is crunchy enough to come to grips with it, but giving it its own name is sufficiently imprecise as to not keep us stuck in game theory or otherwise hamstrung by artificial narrowness.
Welcome to LessWrong!We find ourselves in a perpetual tug-of-war between a desire for more reliable, higher quality posts and the ability of people to engage and contribute at all. The trade-off is this:
The higher the standard, whether style or rigor, the fewer people will write posts. To our dismay, this includes people who would actually meet the standards but fear that they would not beforehand. Naturally the potential contributions from people below the requirements are lost.
While this makes each post more productive to read, it also means that each post is higher-effort to read, which to our dismay often means posts stop being engaged with; we run the risk of churning out a small amount of posts which are very high quality but very poorly read.
So striking that balance prevents us from setting much in the way of style standards; we usually prefer to let the community speak which rewards multiple styles. I myself am on the write early, write often side of the fence.The mods may have a more nuanced and up-to-date opinion with respect to meta information like writing guides.
Upvote for expressing your true concern!I have a question about this thought:
In order for the orthogonality thesis to be true, it must be possible for the agent’s goal to remain fixed while its intelligence varies, and vice versa. Hence, it must be possible to independently alter the physical devices on which these traits are instantiated.
This is intuitive, but I am not confident this is true in general. Zooming out a bit, I understand this as saying: if we know that AGI can exist at two different points in intelligence/goal space, then there exists a path between those points in the space.
A concrete counter-example: we know that we can build machines that move with different power sources, and we can build essentially the same machine powered by different sources. So consider a Chevy Impala, with a gas-fueled combustion engine, and a Tesla Model 3, with a battery-powered electric motor. If we start with a Chevy Impala, we cannot convert it into a Tesla Model 3, or vice-versa: at a certain point, we would have changed the vehicle so much it no longer registers as an Impala.My (casual) understanding of the orthogonality thesis is that for any given goal, an arbitrarily intelligent AGI could exist, but it doesn’t follow we could guarantee keeping the goal constant while increasing the intelligence of an extant AGI for path dependence reasons.What do you think about the difference between changing an existing system, vs. building it to specs in the first place?
For the first thing I have been trying to shift lately to asking people to tell me the story of how they came to that belief. This is doubly useful because only a tiny fraction of the population actually has the process of belief formation explicit enough in their heads to tell me.
Congratulations on the new org, and also on the recent promotion to fatherhood!
How the heck do I update on this?I don’t feel like I have a graceful way to de-weight something when it turns out poorly in this fashion. I feel comfortable with unwinding an update I previously made, but in this case it amounts to just throwing out everything I have head-chunked as behavioral economics.
This feels wrong-ish, in the sense that it isn’t as though all the research was a complete fiction; a more correct operation would be to adjust my priors in such a way as to capture what the research actually shows, rather than what I thought it showed.Trouble is, this is even more work than making the initial updates, because the whole failure mode is an inability to have confidence in any existing distillation of the ideas. This means tackling the relevant studies one at a time, with only a few newer review or meta papers to help.On the upside, it occurs to me that I integrated virtually none of the mentioned results well enough that it met the anticipated experiences standard; maybe that means I never really updated in the first place and this costs nothing to lose.
I’d be interested in hearing more about how Google runs things, which I have no knowledge of; I note that Bell Labs and DARPA are the go-to examples of standout production, and one of them is defunct now. This also scans with (what I understand to be) the usual economic finding that large firms produce more innovation on average than small ones.
I have other examples from industry I’ve been thinking about where the problem seems to be dividing labor at the wrong level (or maybe not applying it completely at every level?). The examples I have in mind here are pharmaceutical companies, who underwent a long series of mergers starting in the 80s.
The interesting detail here is that they did do specialization very similar to the pin example, but they didn’t really account for research being a much broader problem than pin-making. Prior to the big downsizing of labs in pharma and chemicals, the story went that all these labs during mergers had been glommed together and ideas would just be put in one end and then either a working product came out the other end or not; there was no feedback loop at the level of the product or idea.
This looks to me like a case where small university labs are only specialized at the meta level, and the failing industrial labs are only specialized at the object level. It feels to me like if there were a graceful way to describe the notion that specialization has dimensionality, we’d be able to innovate better.
Specialization of Labor in Research
I think we do specialization of labor wrong a lot. This makes me think there is a lot to be gained from a sort of back-to-basics approach, and I feel like knowledge work in general and research in particular are good candidates.The classic example of specialization of labor is comparing the output of a single blacksmith in the production of pins with 1⁄10 the output of a factory where 10 men work producing pins, but each is responsible for 1-3 tasks in the pin-making process. Each man gets very good at his appointed tasks, and as a consequence the per-worker output is 4800 pins per day, and the lone metalworker’s output is perhaps 20 pins per day.
Imagine for a moment how research is done: there is a Primary Investigator who runs a lab, who works with grad students and postdocs. Normally when the PI has a research project, he will assign the work to his different assistants with so-and-so gathering the samples, and so-and-so running them through the equipment, and so-and-so preparing some preliminary analysis. In this way the labor is divided; Adam Smith called it division of labor; why don’t I think it counts?Because they aren’t specializing. The pitch with the pin factory is that each worker gets very good at their 1-3 pin-making tasks. The division of labor is ad-hoc in the research case; in fact the odds are good that the opportunity to get very good is effectively avoided, because each assistant is expected to be competent in each of the relevant tasks.
This is because a scientist is a blacksmith-of-abstractions, and the grad students and postdocs are the apprentices. The point is only mostly to produce research; the other point is to turn the assistants into future PIs themselves.
This feels like a natural complement to the censorship/persuasion axis of development. It feels like it would be natural to use this method to detect how aligned a group of people is to each other; we expect that a person and their pseudonyms will be put into the same group. Given how important ideological sorting is to dating, it’s likely dating services might do something like provide a list of pseudonyms-most-likely-to-be-this-person.
Are the methods of analysis considered part of the “methodological technology” this thread of research considers incomplete?If so, the whole thing sort of trivializes to “statistics suck, and therefore science methodologically sucks.” On the flip side, how difficult/expensive would it be to run a series of these specifying the analytical methods in the same way the hypothesis and data sources were specified? One group does effect sizes instead of significance, one group does likelihood functions instead of significance, etc.
I keep updating in favor of a specialization-of-labor theory for reorganizing science. First order of business: adding analysts to create a Theory/Experiment/Analysis trifecta.
I liked this post. I will say it looks like the Facebook/Google+ example is pretty understated here: the story of the article is that Google leadership met in secret to produce documents and reorgs in a dedicated strategy to confront Facebook, whereas the Facebook people just, well, Facebook’d:
And what was everyone working on?For those in the user-facing side of Facebook, it meant thinking twice on a code change amid the constant, hell-for-leather dash to ship some new product bell or whistle, so we wouldn’t look like the half-assed, thrown-together, social-media Frankenstein we occasionally were.For us in the Ads team, it was mostly corporate solidarity that made us join the weekend-working mob.
And what was everyone working on?
For those in the user-facing side of Facebook, it meant thinking twice on a code change amid the constant, hell-for-leather dash to ship some new product bell or whistle, so we wouldn’t look like the half-assed, thrown-together, social-media Frankenstein we occasionally were.
For us in the Ads team, it was mostly corporate solidarity that made us join the weekend-working mob.
Importantly though, it looks in retrospect like Google+ was dead on arrival. No one ever actually used it for anything. Facebook’s counter-move was to stay the course. Granted, this is not a trivial task in the face of possible upheaval.
This puts me in mind of how the Soviet Union deployed phage therapy, because they weren’t as good at pharmaceutical development as the US. I understand the idea is making a comeback in the face of antibiotic resistant strains; it seems human trials for phage therapy in the US were approved in 2019.This further makes me wonder if we could develop something that attacked or fed on prions, via some method of targeting the fold in the protein. Is that a thing? Do any microscopic organisms hunt via geometry?
I don’t, because as far as I understand it there is no principal/agent mechanism at work in a Stag Hunt. I can see I was powerfully vague though, so thank you for pointing that out via the question.
I was comparing Stag Hunt to the Prisoner’s Dilemma, and the argument is this:
Prisoner’s Dilemma is one agent reasoning about another agent. This is simple, so there will be many papers on it.
Stag Hunt is multiple agents reasoning about multiple agents. This is less simple, so there will be fewer papers, corresponding to the difference in difficulty.
I expect the same to also apply to the transition from one principal and one agent to multiple principles and multiple agents.
Returning to the sufficiency claim: I think I weigh the “Alignment framings from MIRIs early years” arguments more heavily than Andrew does; I estimate a mild over-commitment to the simplest-case-first-norm of approximately the same strength as the community’s earlier over-commitment to modest epistemology would be sufficient to explain the collective “ugh” response. It’s worth noting that the LessWrong sector is the only one referenced that has much in the way of laypeople—which is to say people like me—in it. I suspect that our presence in the community biases it more strongly towards simpler procedures, which leads me to put more weight on the over-commitment explanation.That being said, my yeoman-community-member impressions of the anti-politics bias largely agree with Andrew’s, even though I only read this website, and some of the high-level discussion of papers and research agenda posts from MIRI/Open AI/Deepmind/etc. My gut feeling says there should be a way to make multi/multi AI dynamics palatable for us despite this. For example, consider the popularity of posts surrounding Voting Theory, which are all explicitly political. Multi/multi dynamics are surely less political than that, I reason.