I’ve been lurking on LW since 2013, but only started posting recently. My day job is “analytics broadly construed”; my degree is in physics; I used to write on Quora and Substack but stopped. I’m based in Kuala Lumpur, Malaysia.
Mo Putera
Apparently Jeff Bezos used to do something like this with his regular “question mark emails”, which struck me as interesting in the context of an organization as large and complex as Amazon. Here’s what it’s like from the perspective of one recipient (partial quote, more at the link):
About a month after I started at Amazon I got an email from my boss that was a forward of an email Jeff sent him. The email that Jeff had sent read as follows:
“?”
That was it.
Attached below the “?” was an email from a customer to Jeff telling him he (the customer) takes a long time to find a certain type of screws on Amazon despite Amazon carrying the product.
A “question mark email” from Jeff is a known phenomenon inside Amazon & there’s even an internal wiki on how to handle it but that’s a story for another time. In a nutshell, Jeff’s email is public and customers send him emails with suggestions, complaints, and praise all the time. While all emails Jeff receives get a response, he does not personally forward all of them to execs with a “?”. It means he thinks this is very important.
It was astonishing to me that Jeff picked that one seemingly trivial issue and a very small category of products (screws) to personally zoom in on. …
Where are you going with this line of questioning?
If it’s high-quality distillation you’re interested in, you don’t necessarily need a PhD. I’m thinking of e.g. David Roodman, now a senior advisor at Open Philanthropy. He majored in math, then did a year-long independent study in economics and public policy, and has basically been self-taught ever since. Holden Karnofsky considers what he does extremely valuable:
David Roodman, who is basically the person that I consider the gold standard of a critical evidence reviewer, someone who can really dig on a complicated literature and come up with the answers, he did what, I think, was a really wonderful and really fascinating paper, which is up on our website, where he looked for all the studies on the relationship between incarceration and crime, and what happens if you cut incarceration, do you expect crime to rise, to fall, to stay the same? He picked them apart. What happened is he found a lot of the best, most prestigious studies and about half of them, he found fatal flaws in when he just tried to replicate them or redo their conclusions.
When he put it all together, he ended up with a different conclusion from what you would get if you just read the abstracts. It was a completely novel piece of work that reviewed this whole evidence base at a level of thoroughness that had never been done before, came out with a conclusion that was different from what you naively would have thought, which concluded his best estimate is that, at current margins, we could cut incarceration and there would be no expected impact on crime. He did all that. Then, he started submitting it to journals. It’s gotten rejected from a large number of journals by now. I mean starting with the most prestigious ones and then going to the less.
Robert Wiblin: Why is that?
Holden Karnofsky: Because his paper, it’s really, I think, it’s incredibly well done. It’s incredibly important, but there’s nothing in some sense, in some kind of academic taste sense, there’s nothing new in there. He took a bunch of studies. He redid them. He found that they broke. He found new issues with them, and he found new conclusions. From a policy maker or philanthropist perspective, all very interesting stuff, but did we really find a new method for asserting causality? Did we really find a new insight about how the mind of a …
Robert Wiblin: Criminal.
Holden Karnofsky: A perpetrator works. No. We didn’t advance the frontiers of knowledge. We pulled together a bunch of knowledge that we already had, and we synthesized it. I think that’s a common theme is that, I think, our academic institutions were set up a while ago. They were set up at a time when it seemed like the most valuable thing to do was just to search for the next big insight.
These days, they’ve been around for a while. We’ve got a lot of insights. We’ve got a lot of insights sitting around. We’ve got a lot of studies. I think a lot of the times what we need to do is take the information that’s already available, take the studies that already exist, and synthesize them critically and say, “What does this mean for what we should do? Where we should give money, what policy should be.”
I don’t think there’s any home in academia to do that. I think that creates a lot of the gaps. This also applies to AI timelines where it’s like there’s nothing particularly innovative, groundbreaking, knowledge frontier advancing, creative, clever about just … It’s a question that matters. When can we expect transformative AI and with what probability? It matters, but it’s not a work of frontier advancing intellectual creativity to try to answer it.
Yeah, I agree that’s a weird way to define “high-dimensional”. I’m more partial to defining it as “when the curse of dimensionality becomes a concern”, which is less precise but more useful.
in the minds of people like Eliezer Yudkowsky or Paul Christiano, we’re more likely doomed than not
My impression for Paul is the opposite – he guesses “~15% on singularity by 2030 and ~40% on singularity by 2040”, and has said “quantitatively my risk of losing control of the universe though this channel [Eliezer’s list of lethalities] is more like 20% than 99.99%, and I think extinction is a bit less less likely still”. (That said I think he’d probably agree with all the reasons you stated under “I personally lean towards those latter views”.) Curious to know where you got the impression that Paul thinks we’re more likely doomed than not; I’d update more on his predictions than on nearly anyone else’s, including Eliezer’s.
Is there something similar for the EA Forum?
I think your ‘Towards a coherent process for metric design’ section alone is worth its weight in gold. Since most LW readers aren’t going to click on your linked paper (click-through rates being as low in general as they are, from my experience in marketing analytics), let me quote that section wholesale:
Given the various strategies and considerations discussed in the paper, as well as failure modes and limitations, it is useful to lay out a simple and coherent outline of a process for metric design. While this will by necessity be far from complete, and will include items that may not be relevant for a particular application, it should provide at least an outline that can be adapted to various metric design processes. Outside of the specific issues discussed earlier, there is a wide breadth of expertise and understanding that may be needed for metric design. Citations in this section will also provide a variety of resources for at least introductory further reading on those topics.
Understand the system being measured, including both technical (Blanchard & Fabrycky, 1990) and organizational (Berry & Houston, 1993) considerations.
Determine scope
What is included in the system?
What will the metrics be used for?
Understand the causal structure of the system
What is the logic model or theory? (Rogers, Petrosino, Huebner, & Hacsi, 2000)
Is there formal analysis (Gelman, 2010) or expert opinion (van Gelder, Vodicka, & Armstrong, 2016) that can inform this?
Identify stakeholders (Kenny, 2014)
Who will be affected?
Who will use the metrics?
Whose goals are relevant?
Identify the Goals
What immediate goals are being served by the metric(s)? How are individual impacts related to performance more broadly? (Ruch, 1994)
What longer term or broader goals are implicated?
Identify Relevant Desiderata
Availability
Cost
Immediacy
Simplicity
Transparency
Fairness
Corruptibility
Brainstorm potential metrics
What outcomes important to capture?
What data sources exist?
What methods can be used to capture additional data?
What measurements are easy to capture?
What is the relationship between the measurements and the outcomes?
What isn’t captured by the metrics?
Consider and Plan
Understand why and how the metric is useful. (Manheim, 2018)
Consider how the metrics will be used to diagnose issues or incentify people.(Dai, Dietvorst, Tuckfield, Milkman, & Schweitzer, 2017)
Plan how to use the metrics to develop the system, avoiding the “reward / punish” dichotomy. (Wigert & Harter, 2017)
Perform a pre-mortem (Klein, 2007)
Plan to routinely revisit the metrics (Atkins, Wanick, & Wills, 2017)
IL’s comment has a BOTEC arguing that video data isn’t that unbounded either (I think the 1% usefulness assumption is way too low but even bumping it up to 100% doesn’t really change the conclusion that much).
There’s a tangentially related comment by Scott Alexander from over a decade ago, on the subject of writing advice, which I still think about from time to time:
The best way to improve the natural flow of ideas, and your writing in general, is to read really good writers so much that you unconsciously pick up their turns of phrase and don’t even realize when you’re using them. The best time to do that is when you’re eight years old; the second best time is now.
Your role models here should be those vampires who hunt down the talented, suck out their souls, and absorb their powers. Which writers’ souls you feast upon depends on your own natural style and your goals. I’ve gained most from reading Eliezer, Mencius Moldbug, Aleister Crowley, and G.K. Chesterton (links go to writing samples from each I consider particularly good); I’m currently making my way through Chesterton’s collected works pretty much with the sole aim of imprinting his writing style into my brain.
Stepping from the sublime to the ridiculous, I took a lot from reading Dave Barry when I was a child. He has a very observational sense of humor, the sort where instead of going out looking for jokes, he just writes about a topic and it ends up funny. It’s not hard to copy if you’re familiar enough with it. And if you can be funny, people will read you whether you have any other redeeming qualities or not.
Getting imprinted with good writers like this will serve you for your entire life. It will serve you whether you’re on your fiftieth draft of a thesis paper, or you’re rushing a Less Wrong comment in the three minutes before you have to go to work. It will even serve you in regular old non-written conversation, because wit and clarity are independent of medium.
Thinking of myself as a vampire hunting down the talented and feasting on their souls to absorb their powers is somewhat dramatic, but vividly memorable...
What do you think about deep work (here’s a semi-arbitrarily-chosen explainer)? I suppose the Monday time block after the meeting lets you do that, but that’s maybe <10% of the workweek; you also did mention “If people want to focus deeply for a while, they can put on headphones”. That said, many of your points aren’t conducive to deep work (e.g. “If you need to be unblocked by someone, the fastest way is to just go to their desk and ask them in person” interrupts the other person’s deep work block, same with “use a real-time chat platform like Slack to communicate and add all team members to all channels relevant to the team”, and “if one of your teammates messages you directly, or if someone @ tags you, you want to respond basically immediately”).
I’ve always wondered about this, given my experience working at a few young-ish high-growth top-of-industry companies—I always hated the constant interruption but couldn’t deny how much faster everything moved; that said I mostly did deep work well after office hours (so a workweek would basically be 40 hours of getting interrupted to death followed by 20-30 hours of deep work + backlog-clearing), as did ~everyone else.
I’m curious if Eliezer endorses this, especially the first paragraph.
I’m curious how you think your views here cash out differently from (your model of) most commenters here, especially as pertains to alignment work (timelines, strategy, prioritization, whatever else), but also more generally. If I’m interpreting you correctly, your pessimism on the usefulness-in-practice of quantitative progress probably cashes out in some sort of bet against scaling (i.e. maybe you think the “blessings of scale” will dry up faster than others think)?
+1 for “quantity has a quality all its own”. “More is different” pops up everywhere.
Carbon dating
You’re gesturing in the right direction, but if it’s the age of the universe you’re looking for, you really want something like uranium-lead dating instead, which is routinely used to date rocks up to 4.5 billion years old with precision in the ~1% range. Carbon dating can’t reliably measure dates more than ~50,000 years ago except in special circumstances, since the half-life of 14C is 5,730 years.
Awhile back johnswentworth wrote What Do GDP Growth Curves Really Mean? noting how real GDP (as we actually calculate it) is a wildly misleading measure of growth because it effectively ignores major technological breakthroughs – quoting the post, real GDP growth mostly tracks production of goods which aren’t revolutionized; goods whose prices drop dramatically are downweighted to near-zero, so that when we see slow, mostly-steady real GDP growth curves, that mostly tells us about the slow and steady increase in production of things which haven’t been revolutionized, and approximately-nothing about the huge revolutions in e.g. electronics. Some takeaways by the author that I think pertain to the “AI and GDP growth acceleration” discussion:
“even after a hypothetical massive jump in AI, real GDP would still look smooth, because it would be calculated based on post-jump prices, and it seems pretty likely that there will be something which isn’t revolutionized by AI… Whatever things don’t get much cheaper are the things which would dominate real GDP curves after a big AI jump”
“More generally, the smoothness of real GDP curves does not actually mean that technology progresses smoothly. It just means that we’re constantly updating the calculations, in hindsight, to focus on whatever goods were not revolutionized”
“using price as a proxy for value is just generally not great for purposes of thinking about long-term growth and technology shifts” (quote)
Ever since I read that post I’ve generally discounted claims that AI progress would accelerate GDP growth to double digits unless the claims address the methodological point about how GDP is calculated. This isn’t an argument against the potential for AI progress to revolutionize human civilization or whatever; it’s (as I interpret it) an argument against using anything remotely resembling GDP in trying to quantify how much AI progress is revolutionizing human civilization, because it just won’t capture it.
This post is great, I suspect I will be referencing it from time to time.
I don’t know if you meant to include the footnotes as well, since they aren’t present in this post. For instance, I tried clicking on
After a week, you’ll likely remember why you started, but it may be hard to bring yourself to really care[2]
and it just doesn’t lead anywhere, although I did find it on your blog.
Would you say this is the same as babbling?
Can’t wait!
Thanks!
I agree with this comment, and I’m confused why it’s so disagreed with (-6 agreement karma vs +11 overall). Can anyone who disagreed explain their reasoning?