To speak to the second of naming things, I’m a big fan of content addressable everything. Addressing all content by hash_function() has major advantages. This may require another naming layer to give human recognizable names to hashes, but I think this still goes a long way towards making things better.
You might find Joe Armstrong’s The Mess We’re In interesting, and provides some simple strawman algorithms for deduplication, though they probably aren’t sophisticated enough to run in practice.
(My roomate walked in while I was watching that lecture when I had headphones on, and just saw the final conclusion slide:
We’ve made a mess
We need to reverse entropy
Quantum mechanics sets limits to the ultimate speed of computation
We need Math
Abolish names and places
Build the condenser
Make low-power computers—no net environmental damage
And just did that smile and nod thing. The above makes it sound like Armstrong is a crank, but it all makes sense in context, and I’ve deliberately copied just this last slide without any other context to try to get you to watch it. If you like theoretical computer science, I highly recommend watching the lecture.)
To speak to the second of naming things, I’m a big fan of content addressable everything. Addressing all content by hash_function() has major advantages. This may require another naming layer to give human recognizable names to hashes, but I think this still goes a long way towards making things better.
It also requires (different) attention to versioning. That is, if you have arbitrary names, you can change the referent of the name to a new version, but you can’t do that with a hash. You can’t use just-a-hash in any case where you might want to upgrade/substitute the part but not the whole.
Conversely, er, contrapositively, if you need referents to not change ever, hashes are great.
To speak to the second of naming things, I’m a big fan of content addressable everything. Addressing all content by hash_function() has major advantages. This may require another naming layer to give human recognizable names to hashes, but I think this still goes a long way towards making things better.
You might find Joe Armstrong’s The Mess We’re In interesting, and provides some simple strawman algorithms for deduplication, though they probably aren’t sophisticated enough to run in practice.
(My roomate walked in while I was watching that lecture when I had headphones on, and just saw the final conclusion slide:
We’ve made a mess
We need to reverse entropy
Quantum mechanics sets limits to the ultimate speed of computation
We need Math
Abolish names and places
Build the condenser
Make low-power computers—no net environmental damage
And just did that smile and nod thing. The above makes it sound like Armstrong is a crank, but it all makes sense in context, and I’ve deliberately copied just this last slide without any other context to try to get you to watch it. If you like theoretical computer science, I highly recommend watching the lecture.)
It also requires (different) attention to versioning. That is, if you have arbitrary names, you can change the referent of the name to a new version, but you can’t do that with a hash. You can’t use just-a-hash in any case where you might want to upgrade/substitute the part but not the whole.
Conversely, er, contrapositively, if you need referents to not change ever, hashes are great.