I’m interested in the economics of computing and big-picture trends in machine learning. https://www.tamaybesiroglu.com/
Tamay
Predicting GPU performance
Revisiting algorithmic progress
If the data is low-quality and easily distinguishable from human-generated text, it should be simple to train a classifier to spot LM-generated text and exclude this from the training set. If it’s not possible to distinguish, then it should be high-enough quality so that including it is not a problem.
ETA: As people point out below, this comment was glib and glosses over some key details; I don’t endorse this take anymore.
Good question. Some thoughts on why do this:
Our results suggest we won’t be caught off-guard by highly capable models that were trained for years in secret, which seems strategically relevant for those concerned with risks
We looked whether there was any ‘alpha’ in these results by investigating the training durations of ML training runs, and found that models are typically trained for durations that aren’t far off from what our analysis suggests might be optimal (see a snapshot of the data here)
It independently seems highly likely that large training runs would already be optimized in this dimension, which further suggests that this has little to no action-relevance for advancing the frontier
The longest training run
Trends in GPU price-performance
Announcing Epoch: A research organization investigating the road to Transformative AI
I’m not sure what you mean; I’m not looking at log-odds. Maybe the correlation is an artefact from noise being amplified in log-space (I’m not sure), but it’s not obvious to me that this isn’t the correct way to analyse the data.
Thanks! At least for Gopher, if you look at correlations between reductions in log-error (which I think is the scaling laws literature suggests would be the more natural framing) you find a more tighter relationship, particularly when looking at the relatively smaller models.
Thanks, though I was hoping for something like a Google Sheet containing the data.
This is super interesting. Are you able to share the underlying data?
It is unless it’s clear that a side that made a mistake in entering a lopsided bet. I guess the rule-of-thumb is to follow big bets (which tends to be less clearly lopsided) or bets made by two people whose judgment you trust.
Are you thinking of requiring each party to accept bets on either side?
Being forced to bet both sides could ensure honesty, assuming they haven’t found other bets on the same or highly correlated outcomes they can use for arbitrage.
Yes. Good point.
And including from other parties, or only with each other?
I was thinking that betting would be restricted to the initial two parties (i.e. A and B), but I can imagine an alternative in which it’s unrestricted.
You could imagine one party was betting at odds they consider very favourable to them, and the other party betting at odds they consider only slightly favourable, based on their respective beliefs. Then, even if they don’t change their credences, one party has more room to move their odds towards their own true credences, and so drag the average towards it, and take the intermediate payments,
Sorry, I’m confused. Isn’t the ‘problem’ that the bettor who takes a relatively more favourable odds has higher expected returns a problem with betting in general?
We also propose betting using a mechanism that mitigates some of these issues:
Since we recognize that betting incentives can be weak over long time-horizons, we are also offering the option of employing Tamay’s recently described betting procedure in which we would enter a series of repeated 2-year contracts until the resolution date.
A concrete bet offer to those with short AGI timelines
Here’s a rough description of an idea for a betting procedure that enables people who disagree about long-term questions to make bets, despite not wanting to commit to waiting until the long-term questions are resolved.
Suppose person A and person B disagree about whether P, but can’t find any clear concrete disagreements related to this question that can be decided soon. Since they want to bet on things that pay out soon (for concreteness say they only want to bet on things that can pay out within 5 years), they don’t end up betting on anything.
What they can do is they could agree to bet on P, and enter into a contract (or a good-faith agreement) that requires them to, after a period of 5 years, report their true odds about P. The contract would then enable either bettor to unanimously back out of the bet, at which point the payouts would be distributed according to the difference of the odds they agreed to and the average of the odds that they currently report. In other words, the bettor who was closer to the consensus after 5 years is paid out in proportion to how much closer they were.
To ensure that bettors approximately truthfully report their odds about P after the horizon of 5 years, the contract requires A and B to report their odds to a trusted intermediary (who announces these odds simultaneously), and requires either party to accept any follow-up bets at (some function of) these reported credences.
Bettors might agree ahead of time to the range of acceptable follow-up bet sizes, though importantly, follow-up bet sizes need to be expected to be relatively large (say, a non-trivial fraction of the existing bets) to ensure that bettors have an incentive to report something close to their true beliefs.
Follow-up bets could be revisited in the same way after another 5 years, and this would continue until P resolves, or until the betters settle. However, because bettors are required to take follow-up bets, they also have an incentive to develop accurate beliefs about P so we might expect disagreements to usually be resolved short of when P resolves. They furthermore have an incentive to arrive at a consensus if they want to avoid making follow-up bets.
On this mechanism, bettors know that they can expect to fairly resolve their bets on a short horizon, as each will have an incentive to end the bet according to their consensus-view of who was closer to the truth. Hence, bettors would be keen to bet with each-other about P if they think that they’re directionally right, even when they don’t want to wait until P completely is decided.
- A concrete bet offer to those with short AGI timelines by 9 Apr 2022 21:41 UTC; 194 points) (
- 9 Apr 2022 22:42 UTC; 7 points) 's comment on A concrete bet offer to those with short AGI timelines by (
Tamay’s Shortform
Thanks!
Could you make another graph like Fig 4 but showing projected cost, using Moore’s law to estimate cost? The cost is going to be a lot, right?
Good idea. I might do this when I get the time—will let you know!
There is an insightful literature that documents and tries to explain why large incumbent tech firms fail to invest appropriately in disruptive technologies, even when they played an important role in its invention. I speculatively think this sheds some light on why we see new firms such as OpenAI rather than incumbents such as Google and Meta leading the deployment of recent innovations in AI, notably LLMs.
Disruptive technologies—technologies that initially fail to satisfy existing demands but later surpass the dominant technology—are often underinvested in by incumbents, even when these incumbents played a major role in their invention. Henderson and Clark, 1990 discuss examples of this phenomenon, such as Xerox’s failure to exploit their technology and transition from larger to smaller copiers:
and RCA’s failure to embrace the small transistorized radio during the 1950s:
A few explanations of this “Innovator’s curse” are given in the literature:
Christensen (1997) suggests this is due to, among other things:
Incumbents focus on innovations that address existing customer needs rather than serving small markets. Customer bases usually ask for incremental improvements rather than radical innovations.
Disruptive products are simpler and cheaper; they generally promise lower margins, not greater profits
Incumbents’ most important customers usually don’t want radically new technologies, as they can’t immediately use these
Reinganum (1983) shows that under conditions of uncertainty, incumbent monopolists will rationally invest less in innovation than entrants will, for fear of cannibalizing the stream of rents from their existing products
Leonard-Barton (1992) suggests that the same competencies that have driven incumbent’s commercial success may produce ‘competency traps’ (engrained habits, procedures, equipment or expertise that make change difficult); see also Henderson, 2006
Henderson, 1993 highlights that entrants have greater strategic incentives to invest in radical innovation, and incumbents fall prey to inertia and complacency
After skimming a few papers on this, I’m inclined to draw an analogue here for AI: Google produced the Transformer; labs at Google, Meta, and Microsoft, have long been key players in AI research, and yet, the creation of explicitly disruptive LLM products that aim to do much more than existing technologies has been led mostly by relative new-comers (such as OpenAI, Anthropic, and Cohere for LLMs and StabilityAI for generative image models).
The same literature also suggests how to avoid the “innovator curse”, such as through establishing independent sub-organizations focused on disruptive innovations (see Christensen ,1997 and Christensen, 2003), which is clearly what companies like Google have done, as its AI labs have a large degree of independence. And yet this seems not to seem to have been sufficient to establish the dominance of these firms when it comes to the frontiers of LLMs and the like.