Engineer at CoinList.co. Donor to LW 2.0.
Nitpick: wouldn’t this graph be much more natural with the x and y axes reversed? I’d want to input the reduction in log-error over a cheaper compute regime to predict the reduction in log-error over a more expensive one.
Ah, thanks for the clarification!
A fully general tech company is a technology company with the ability to become a world-leader in essentially any industry sector...Notice here that I’m focusing on a company’s ability to do anything another company can do
A fully general tech company is a technology company with the ability to become a world-leader in essentially any industry sector...
Notice here that I’m focusing on a company’s ability to do anything another company can do
To clarify, is this meant to refer to a fixed definition of sectors and what other companies can do as they existed prior to the TCS?
Or is it meant to include FGTCs being able to copy the output of other FGTCs?
I’d assume you mean something like the former, but I think it’s worth being explicit about the fact that what sectors exist and what other companies can do will be moving targets.
The main theory we’ll end up at, based on the accounting data, is that college costs are driven mainly by a large increase in diversity of courses available, which results in much lower student/faculty ratios, and correspondingly higher costs per student.
The “driven by” wording in the above suggests cause. It makes it sound to me like the increase in course diversity (and decrease in student-faculty ratios) comes first, and the increased cost is the result.
Is that what you meant?
If so, I think that case has not been demonstrated in the post. I’m with Eliezer that the extra money has to be spent somewhere. And while it’s interesting to learn that it’s spent on faculty and support staff (rather than e.g. research, or w/e), it’s not at all clear to me that it tells us much about why the prices have gone up.
[TFP] isn’t measured directly, but calculated as a residual by taking capital and labor increases out of GDP growth.
What does it mean to take out capital increases?
I assume taking out labor increases just means adjusting for a growing population. But isn’t the reason that we get economic growth per capita at all because we build new stuff (including intangible stuff like processes and inventions) that enables us to build more stuff more efficiently? And can’t all that new stuff be thought of as capital?
Or is what’s considered “capital” only a subset of that new stuff that fits into particular categories — maybe tangible things like factories and intangible things only when someone puts an explicit price on them and they show up on a firm’s balance sheet?
If that’s the case, it seems like TFP is kind of a God-of-the-gaps quantity that is mostly a consequence of what’s categorized as capital or not. And capital + TFP is the more “real” and natural quantity.
(But that might be totally wrong, because I don’t know what I’m talking about.)
Zvi would tell you that yes it has: How I Lost 100 Pounds Using TDT.
Similarly: If we say two people are the same at the atomic (or whatever) level, we can no longer speak about a notion of “choice” at all. To talk about choice is to mix up levels of abstraction.
This doesn’t make any sense to me. People are made of atoms. People make choices. Nothing is inconsistent about that. If two people were atomically identical, they’d make the same choices. But that wouldn’t change anything about how the choice was happening. Right?
Suppose we made an atom-by-atom copy of you, as in the post. Does the existence of this copy mean that you stop choosing your own decisions?
Have I just misunderstood what you’re saying?
Gotcha, in that case:
Pick the paratrooper nearest to centroid of all paratroopers and go directly towards their location in a straight line.
As you move, the location of the centroid will change, but which paratrooper is nearest won’t change. This is because your contribution to the average distance to that paratrooper will decrease by at least as much as your contribution to the average distance to any other paratrooper (because you’re moving in a straight line towards them).
(For n equals two, you have to pick one of you to move and one to stay fixed ahead of time. For n is three or more, except for in measure zero cases, there will be exactly one paratrooper who is nearest to the centroid, with no ties.)
EDIT: Actually I think there’s a flaw in this, but I don’t see which part is wrong. The reason I think there’s a flaw is that 1) I think the centroid moves away from you as you move towards it, but 2) it seems like my argument about the delta in your contribution to the average distance to a point applies to the location of the centroid itself, in which case the location of the centroid shouldn’t move as you move towards it. So there’s a contradiction...
EDIT2: Oh, I see the flaw. The centroid moves away from you in your direction of travel. The magnitude of the decrease in average distance to the centroid is maximized along the line connecting you and the centroid, but it’s negative for points in between you two. So, if you move towards a point that point’s distance from the center of mass is actually increasing the most, not the least.
EDIT3: Wait, actually, I think we can rescue this. What if we ignore the centroid altogether, and just choose the paratrooper that has the lowest average distance to each of the other paratroopers. Then I think maybe my original line of reasoning works?
EDIT4: I figured out what I was missing here — when you move towards a point, it’s true that the average distance of that point to all other points is shrinking at least as much as for any other point, as I was saying above, except for yourself. You might be moving closer to multiple points, meaning that you might become the point with the least average distance to other points!
Does the radar tell the paratroopers which other paratrooper is where, or just that there is some unknown paratrooper at each spot? In other words, are the paratroopers labeled on the radar?
(I also maybe should mention that I don’t mind this post.)
Do you mean Zack’s post or Eliezer’s post?
But if one potential answer to the alignment problem lies in the way our brains work, maybe we should try to understand that better, instead of (or in addition to) letting a machine figure it out for us through some kind of “value learning”.
Ah, I see. You might be interested in this sequence then!
I think we (mostly) all agree that we want to somehow encode human values into AGIs. That’s not a new idea. The devil is in the details.
Hmm, in that case, would “all the problems” be better than either “hardest problem” or “gold”?
Isn’t it just a lot more interesting to bet on gold than on hardest problem?
The former seems like it averages out over lots of random bits of potential bad luck. And if the bot can do the hardest problem, wouldn’t it likely be able to do the other problems as well?
fearing death, and loving life. [...] The latter was particularly dominant when I was in primary school, when a part of me emerged that was very afraid of death
Should that have read, “The former …”?
Now, when a QNI comes along, it doesn’t necessarily look like a discontinuity, because there might be a lot of work to bridge the distance between idea and implementation. And, this work involves a lot of small details. Because of this, the first version is probably often only a slight improvement on SOTA. So, I’m guessing that QNIs produce something more like a discontinuity in the derivative than a discontinuity in the SOTA itself.
Don’t have a great source for this at hand, but my impression is that seemingly-QNIs surprisingly often just power existing exponential trends, meaning no change in derivative (on a log graph).
(A random comment in support of this — I remember chip design expert Jim Keller saying on Lex Fridman’s podcast that Moore’s Law is just a bunch of separate s-curves, as they have to come up with new ideas to work through challenges to shrinking transistors, and the new techniques work for a range of scales and then have to be replaced with new new ideas.)
Not sure if this question is easily settled, but it might be a crux for various views — how often do QNIs actually change the slope of the curve?
The problem with this model is, its predictions depend a lot on how you draw the boundary around “field”. Take Yudkowsky’s example of startups. How do we explain small startups succeed where large companies failed?
I don’t quite see how this is a problem for the model. The narrower you draw the boundary, the more jumpy progress will be, right?
Successful startups are big relative to individuals, but not that big relative to the world as a whole. If we’re talking about a project / technology / company that can rival the rest of the world in its output, then the relevant scale is trillions of dollars (prob deca-trillions), not billions.
And while the most fantastically successful startups can become billion dollar companies within a few years, nobody has yet made it to a trillion in less than a decade.
EDIT: To clarify, not trying to say that something couldn’t grow faster than any previous startup. There could certainly be a ‘kink’ in the rate of progress, like you describe. I just want to emphasize that:
startups are not that jumpy, on the world scale
the actual scale of the world matters
A simple model for the discontinuousness of a field might have two parameters — one for the intrinsic lumpiness of available discoveries, and one for total effort going into discovery. And,
all else equal, more people means smoother progress — if we lived in a trillion person world, AI progress would be more continuous
it’s an open empirical question whether the actual values for these parameters will result in smooth or jumpy takeoff:
even if investment in AI is in the deca-trillions and a meaningful fraction of all world output, it could still be that the actual territory of available discoveries is so lumpy that progress is discontinuous
but, remember that reality has a surprising amount of detail, which I think tends to push things in a smoother direction — it means there are more fiddly details to work through, even when you have a unique insight or technological advantage
or, in other words, even if you have a random draw from a distribution that ends up being an outlier, actual progress in the real world will be the result of many different draws, which will tend to push things more toward the regime of normals
Looks like it was fixed. (Maybe the mods did it?)
FYI the title and one of the reference in the post say “GTP” instead of “GPT”.
Question about terminology — would it be fair to replace “embedding” with “concept”?(Something feels weird to me about how the word “embedding” is used here. It seems like it’s referring to a very basic general idea — something like “the building blocks of thought, which would include named concepts that we have words for, but also mental images, smells, and felt senses”. But the particular word “embedding” seems like it’s emphasizing the geometric idea of representing something from a high dimensional space in a lower dimensional space, and that seems like it’s not the most relevant part. (Yes, we can think of all concepts geometrically, as clusters in thingspace, but depending on context, it might be more natural to emphasize that idea of a location in a space, or to think of them as just concepts.))