I’m not confident that was actually the decline and shouldn’t have sounded so confident in my post.
Though your explanation is confusing to me, because it doesn’t explain the data-point that LW ended up having a lot of bad content and discussion, rather than no content and discussion.
Anyhow, I believe this discussion should be had in the meta section of the site, and that we should focus more on the object-level of the question here.
My claim is definitely not about the global financial system. It’s about single financial markets becoming more accurate at tracking the metrics they’re measuring as more people join, by default.
If I became convinced that the proper analogy is that companies by default become better at optimising profit as more employees join, I’d change my mind about the importance of prediction/financial markets. But I’d bet strongly that that is not the case.
Don’t think I disagree, I’ve made a very similar point to yours in a previous LW thread here.
Also, my point is not that the gains from being a correct contrarian in the financial market always outweigh the social punishment for contrarianism, or that you can always trade between the two currencies. But despite being frozen out of investing, Michael Burry is still a multi-millionaire. That is an interesting observation. It’s related to why I think Robin Hanson is excited about prediction markets—they present a remarkable degree of robustness to social status games and manipulability.
Also I’m very curious about the outcome of Taleb’s investments (some people say they’re going awfully, which is why he’s selling books...), so please share any links.
I’d like to coin a new term for that thing which the US President has a lot of: coordination capital.
This seems to require some combination of:
Coordination capital is depreciated as it is used
Consider the priest Kalil mentions. He’s able to declare people married because people think he is. It’s the equilibrium, and everyone benefits from maintaining it. But if he tests his powers and start declaring strange marriages not endorsed by the local social norm, the equilibrium might shift. Similarly, if the president tries to rally companies around a stag hunt, but does so poorly and some choose rabbit, they’re all more likley to choose rabbit in future.
There are returns-to-scale to coordination capital
The more plan executions you successfully coordinate, the more willing future projects will be to approach you with their plans.
There is an upper bound to the amount of coordination capital
If you have a Schelling coordination point, and someone finds it bad and declares they will build a new, better coordination point, there is risk that you’ll end up not with two but with zero coordination points. Similarly, coordination capital is scarce and it can result in lock-in scenarios if held by the wrong entities.
Background and implications
Part of the reason I want a term for this thing is that I’ve been experiencing a lack of this thing when working on coordination infrastructure for the EA and x-risk communities. I’m trying to build a forecasting platform and community to (among other things) build common knowledge of some timelines considerations, to coordinate around them.
However, to get people to use it, I can’t just call up Holden Karnofsky, Nick Bostrom, and Nate Soares in order to kickstart the thing and make it a de facto Schelling point. Rather, I have to do some amount of “hustling”, and things that don’t scale—finding people in the community with natural interest in stuff, reaching out to them personally, putting in legwork here and there to keep discussions going and add a missing piece to a quantitative model… and try to do this enough to hit some kind of escape velocity.
I don’t have enough coordination capital, so I try to compensate by other means. Another example is Uber—they’re trying to move riders and drivers to a new equilibrium, they didn’t have much coordination capital initially, and this requires them to burn a lot cash/free energy.
Writing this I’m a bit worried that all the leaders of the EA /x-risk communities are leaders of particular organizations with an object-level mission. They’re primarily incentivised to achieve the organisation’s mission, and there is no one who, like the president, simply serves to coordinate the community around the execution of plans. This suggests this function might be underutilised on the margin.
Nitpick: “We’ve gotten much better at making guesstimates” and “Guesstimates have become more effective” are quite different claims, and it’s not clear which one(s) you disagree with.
[Epistemic status: this comment is much less clear in elucidating the inputs rather than outputs of my thinking than I would have preferred, but I share it written roughly rather than not at all.]
On priors, it would be incredibly surprising to me if the best introduction to learning how to think about society did not include any of the progress we’ve made in fields like microeconomics and statistics (which only reached maturity in the last 100 years or so), or even simply empiricism and quantitative thinking (which only reached maturity in the last 500 years or so).
I believe there has been an absolutely outstanding amount of genuine conceptual and distillation progress in understanding society since Ancient Greece.
Another part of my experience feeding into this prior is that my undergrad was in philosophy at Oxford, and some professors really liked deeply studying ancient originals and criticising translations. In my experience this mostly didn’t correlate with a productive or healthy epistemic culture.
This is an update on the timeline for paying out the bounties on this question. They will be awarded for work done before May 13th, but we’re delayed by another few weeks in deciding on the allocation. Apologies!
The fact that they’re measuring accuracy in a pretty bad way is evidence against them having a good algorithm.
Here’s Anthony Aguirre (Metaculus) and Julia Galef on Rationally Speaking.
Anthony: On the results side, there’s now an accrued track record of a couple of hundred predictions that have been resolved, and you can just look at the numbers. So, that shows that it does work quite well.
Julia: Oh, how do you measure how well it works?
Anthony: There’s a few ways — going from the bad but easy to explain, to the better but harder to explain…
Julia: That’s a good progression.
Anthony: And there’s the worst way, which I won’t even use — which is just to give you some examples of great predictions that it made. This I hate, so I won’t even do it.
Julia: Good for you for shunning that.
Anthony: So looking over sort of the last half year or so, since December 1st, for example… If you ask for how many predictions was Metaculus on the right side of 50% — above 50% if it happened or below 50% if it didn’t happen — that happens 77 out of 81 times the question resolved, so that’s quite good.
And some of the aficionados will know about Brier scores. That’s sort of the fairly easy to understand way to do it, which is that you assign a zero if something doesn’t happen, and a one if something does happen. Then you take the difference between the predicted probability and that number. So if you predict at 20% and it didn’t happen, you’d take that as a .2, or if it’s 80% and it does happen and that’s also a .2, because it’s a difference between the 80% and a one, and then you square that number.
So Brier scores can run from basically zero to one, where low numbers are good. And if you calculate that for that same set of 80 questions, it’s .072, which is a pretty good score.
This is a prediction I make, with “general-seeming” replaced by “more general”, and I think of this as a prediction inspired much more by CAIS than by EY/Bostrom.
I notice I’m confused. My model of CAIS predicts that there would be poor returns to building general services compared to specialised ones (though this might be more of a claim about economics than a claim about the nature of intelligence).
The following exchange is also relevant:
[-] Raiden 1y link 30
Robin, or anyone who agrees with Robin:
What evidence can you imagine would convince you that AGI would go FOOM?
Reply[-] jprwg 1y link 22
While I find Robin’s model more convincing than Eliezer’s, I’m still pretty uncertain.
That said, two pieces of evidence that would push me somewhat strongly towards the Yudkowskian view:
A fairly confident scientific consensus that the human brain is actually simple and homogeneous after all. This could perhaps be the full blank-slate version of Predictive Processing as Scott Alexander discussedrecently, or something along similar lines.
Long-run data showing AI systems gradually increasing in capability without any increase in complexity. The AGZ example here might be part of an overall trend in that direction, but as a single data point it really doesn’t say much.
Reply[-] RobinHanson 1y link 23
This seems to me a reasonable statement of the kind of evidence that would be most relevant.
EY seems to have interpreted AlphaGo Zero as strong evidence for his view in the AI-foom debate, though Hanson disagrees.
Showing excellent narrow performance *using components that look general* is extremely suggestive [of a future system that can develop lots and lots of different “narrow” expertises, using general components].
It is only broad sets of skills that are suggestive. Being very good at specific tasks is great, but doesn’t suggest much about what it will take to be good at a wide range of tasks. [...] The components look MORE general than the specific problem on which they are applied, but the question is: HOW general overall, relative to the standard of achieving human level abilities across a wide scope of tasks.
It’s somewhat hard to hash this out as an absolute rather than conditional prediction (e.g. conditional on there being breakthroughs involving some domain-specific hacks, and major labs keep working on them, they will somewhat quickly superseded by breakthroughs with general-seming architectures).
Maybe EY would be more bullish on Starcraft without imitation learning, or AlphaFold with only 1 or 2 modules (rather than 4⁄5 or 8⁄9 depending on how you count).
If people provided this as a service, they might be risk-averse (it might make sense for people to be risk-averse with their runway), which means you’d have to pay more than hourly rate/chance of winning.
This might not be a problem, as long as the market does the cool thing markets do: allowing you to find someone with a lower opportunity cost than you for doing something.
I think the question, narrowly interpreted as “what would cause me to spend more time on the object-level answering questions on LW” doesn’t capture most of the exciting things that happen when you build an economy around something. In particular, that suddenly makes various auxiliary work valuable. Examples:
Someone spending a year living off of one’s savings, learning how to summarise comment threads, with the expectation that people will pay well for this ability in the following years
A competent literature-reviewer gathering 5 friends to teach them the skill, in order to scale their reviewing capacity to earn more prize money
A college student building up a strong forecasting track-record and then being paid enough to do forecasting for a few hours each week that they can pursue their own projects instead of having to work full-time over the summer
A college student dropping out to work full-time on answering questions on LessWrong, expecting this to provide a stable funding stream for 2+ years
A professional with a stable job and family and a hard time making changes to their life-situation, taking 2 hours/week off from work to do skilled cost-effectiveness analyses, while being fairly compensated
Some people starting a “Prize VC” or “Prize market maker”, which attempts to find potential prize winners and connect them with prizes (or vice versa), while taking a cut somehow
I have an upcoming post where I describe in more detail what I think is required to make this work.
Thanks for pointing that out, the mention of YouTube might be misleading. Overall this should be read as a first-principles argument, rather than an empirical claim about YouTube in particular.
Why are you measuring it in proportion of time-until-agent-AGI and not years? If it takes 2 years from comprehensive services to agent, and most jobs are automatable within 1.5 years, that seems a lot less striking and important than the claim pre-operationalisation.
A major problem in predicting CAIS safety is to understand the order in which various services are likely to arise, in particular whether risk-reducing services are likely to come before risk-increasing services. This seems to require a lot of work in delineating various kinds of services and how they depend on each other as well as on algorithmic advancements, conceptual insights, computing power, etc. (instead of treating them as largely interchangeable or thinking that safety-relevant services will be there when we need them). Since this analysis seems very hard to do much ahead of time, I think we’ll have to put very wide error bars on any predictions of whether CAIS would be safe or unsafe, until very late in the game.
I’m broadly sympathetic to the empirical claim that we’ll develop AI services which can replace humans at most cognitively difficult jobs significantly before we develop any single superhuman AGI (one unified system that can do nearly all cognitive tasks as well as or better than any human).
I’d be interested in operationalising this further, and hearing takes on how many years “significantly before” entails.
He also adds:
One plausible mechanism is that deep learning continues to succeed on tasks where there’s lots of training data, but doesn’t learn how to reason in general ways—e.g. it could learn from court documents how to imitate lawyers well enough to replace them in most cases, without being able to understand law in the way humans do. Self-driving cars are another pertinent example. If that pattern repeats across most human professions, we might see massive societal shifts well before AI becomes dangerous in the adversarial way that’s usually discussed in the context of AI safety.
Another data-point: I love bullet points and have been sad and confused about how little they’re used in writing generally. In fact, when reading dense text, I often invest a few initial minutes in converting it to bullet points just in order to be able to read and understand it better.
Here’s PG on a related topic, sharing some of his skepticism for when bullet points are not appropriate: http://paulgraham.com/nthings.html
One should be able to think quantitatively about that, eg how many questions do you need to ask until you find out whether your extremization hurt you. I’m surprised by the suggestion that GJP didn’t do enough, unless their extremizations were frequently in the >90% range.
I did, he said a researcher mentioned it in conversation.