I have usually seen that quotation in the modified form: “only two hard things: cache invalidation, naming things, and off-by-one errors”. (It appears that this modification was introduced by someone called Leon Bambrick.)
I like the modified version because (1) it’s funny and (2) off-by-one errors are indeed a common source of trouble (though, I think, in a rather different way from cache invalidation and naming things). I do wish Karlton had said “software development” rather than “computer science”, though.
At least one of us is confused. It never occurred to me that the original comment was intended as a joke (except in so far as it’s a deliberate drastic oversimplification) and I don’t think I understand what you mean about cacheing being subsumed by naming (especially as the alleged hard problem is not cacheing but cache invalidation—which seems to me to have very little to do with naming).
I’m probably missing something here; could you explain your interpretation of the original comment a bit more? (With of course the understanding that explaining jokes tends to ruin them.)
cache invalidation—which seems to me to have very little to do with naming
I don’t agree with Douglas_Knight’s claim about the intent of the quote, but a cache is a kind of (application of a) key-value data structure. Keys are names. What information is in the names affects how long the cache entries remain correct and useful for.
(Correct: the value is still the right answer for the key. Useful: the entry will not be unused in the future, i.e. is not garbage in the sense of garbage-collection.)
I agree that a cache can be thought of as involving names, but even if—as you suggest, and it’s a good point that I hadn’t considered in this context—you sometimes have some scope to choose how much information goes into the keys and hence make different tradeoffs between cache size, how long things are valid for, etc., it seems pretty strange to think of that as being about naming.
Well, as iceman mentioned on a different subthread, a content-addressable store (key = hash of value) is fairly clearly a sort of naming scheme. But the thing about the names in a content-addressable store is that unlike meaningful names, they say nothing about why this value is worth naming; only that someone has bothered to compute it in the past. Therefore a content-addressable store either grows without bound, or has a policy for deleting entries. In that way, it is like a cache.
For example, Git (the version control system) uses a content-addressable store, and has a policy that objects are kept only if they are referenced (transitively through other objects) by the human-managed arbitrary mutable namespace of “refs” (HEAD, branches, tags, reflog).
Tahoe-LAFS, a distributed filesystem which is partially content-addressable but in any case uses high-entropy names, requires that clients periodically “renew the lease” on files they are interested in keeping, which they do by recursive traversal from whatever roots the user chooses.
Why do you believe that the problem of naming doesn’t fall into computer science? Because people in that field find the question to low status to work on?
Nothing to do with status (did I actually say something that suggested a status link?), and my claim isn’t that computer science doesn’t have a problem with naming things (everything has a problem with naming things) but that when Karlton said “computer science” he probably meant “software development”.
[EDITED to remove a remark that was maybe unproductively cynical.]
The question isn’t whether computer science has a problem with naming things but whether naming information structures is a computer science problem.
It’s not a problem of algorithms but it’s a problem of how to relate with information. Given how central names are to human reasoning and human intelligence, caring about names seems to be relevant for building artificial intelligence.
I have usually seen that quotation in the modified form: “only two hard things: cache invalidation, naming things, and off-by-one errors”. (It appears that this modification was introduced by someone called Leon Bambrick.)
I like the modified version because (1) it’s funny and (2) off-by-one errors are indeed a common source of trouble (though, I think, in a rather different way from cache invalidation and naming things). I do wish Karlton had said “software development” rather than “computer science”, though.
But that joke distracts from the original joke that caching is subsumed by “naming things.”
At least one of us is confused. It never occurred to me that the original comment was intended as a joke (except in so far as it’s a deliberate drastic oversimplification) and I don’t think I understand what you mean about cacheing being subsumed by naming (especially as the alleged hard problem is not cacheing but cache invalidation—which seems to me to have very little to do with naming).
I’m probably missing something here; could you explain your interpretation of the original comment a bit more? (With of course the understanding that explaining jokes tends to ruin them.)
I don’t agree with Douglas_Knight’s claim about the intent of the quote, but a cache is a kind of (application of a) key-value data structure. Keys are names. What information is in the names affects how long the cache entries remain correct and useful for.
(Correct: the value is still the right answer for the key. Useful: the entry will not be unused in the future, i.e. is not garbage in the sense of garbage-collection.)
I agree that a cache can be thought of as involving names, but even if—as you suggest, and it’s a good point that I hadn’t considered in this context—you sometimes have some scope to choose how much information goes into the keys and hence make different tradeoffs between cache size, how long things are valid for, etc., it seems pretty strange to think of that as being about naming.
Well, as iceman mentioned on a different subthread, a content-addressable store (key = hash of value) is fairly clearly a sort of naming scheme. But the thing about the names in a content-addressable store is that unlike meaningful names, they say nothing about why this value is worth naming; only that someone has bothered to compute it in the past. Therefore a content-addressable store either grows without bound, or has a policy for deleting entries. In that way, it is like a cache.
For example, Git (the version control system) uses a content-addressable store, and has a policy that objects are kept only if they are referenced (transitively through other objects) by the human-managed arbitrary mutable namespace of “refs” (HEAD, branches, tags, reflog).
Tahoe-LAFS, a distributed filesystem which is partially content-addressable but in any case uses high-entropy names, requires that clients periodically “renew the lease” on files they are interested in keeping, which they do by recursive traversal from whatever roots the user chooses.
Why do you believe that the problem of naming doesn’t fall into computer science? Because people in that field find the question to low status to work on?
Nothing to do with status (did I actually say something that suggested a status link?), and my claim isn’t that computer science doesn’t have a problem with naming things (everything has a problem with naming things) but that when Karlton said “computer science” he probably meant “software development”.
[EDITED to remove a remark that was maybe unproductively cynical.]
The question isn’t whether computer science has a problem with naming things but whether naming information structures is a computer science problem.
It’s not a problem of algorithms but it’s a problem of how to relate with information. Given how central names are to human reasoning and human intelligence, caring about names seems to be relevant for building artificial intelligence.