I recently interviewed with Epoch, and as part of a paid work trial they wanted me to write up a blog post about something interesting related to machine learning trends. This is what I came up with:
http://www.josephius.com/2022/09/05/energy-efficiency-trends-in-computation-and-long-term-implications/
Darklight
I should point out that the logic of the degrowth movement follows from a relatively straightforward analysis of available resources vs. first world consumption levels. Our world can only sustain 7 billion human beings because the vast majority of them live not at first world levels of consumption, but third world levels, which many would argue to be unfair and an unsustainable pyramid scheme. If you work out the numbers, if everyone had the quality of life of a typical American citizen, taking into account things like meat consumption to arable land, energy usage, etc., then the Earth would be able to sustain only about 1-3 billion such people. Degrowth thus follows logically if you believe that all the people around the world should eventually be able to live comfortable, first world lives.
I’ll also point out that socialism is, like liberalism, a child of the Enlightenment and general beliefs that reason and science could be used to solve political and economic problems. Say what you will about the failed socialist experiments of the 20th century, but the idea that government should be able to engineer society to function better than the ad-hoc arrangement that is capitalism, is very much an Enlightenment rationalist, materialist, and positivist position that can be traced to Jean-Jacques Rousseau, Charles Fourier, and other philosophes before Karl Marx came along and made it particularly popular. Marxism in particular, at least claims to be “scientific socialism”, and historically emphasized reason and science, to the extent that most Marxist states were officially atheist (something you might like given your concerns about religions).
In practice, many modern social policies, such as the welfare state, Medicare, public pensions, etc., are heavily influenced by socialist thinking and put in place in part as a response by liberal democracies to the threat of the state socialist model during the Cold War. No country in the world runs on laissez-faire capitalism, we all utilize mixed market economies with varying degrees of public and private ownership. The U.S. still has a substantial public sector, just as China, an ostensibly Marxist Leninist society in theory, has a substantial private sector (albeit with public ownership of the “commanding heights” of the economy). It seems that all societies in the world eventually compromised in similar ways to achieve reasonably functional economies balanced with the need to avoid potential class conflict. This convergence is probably not accidental.
If you’re truly more concerned with truth seeking than tribal affiliations, you should be aware of your own tribe, which as far as I can tell, is western, liberal, and democratic. Even if you honestly believe in the moral truth of the western liberal democratic intellectual tradition, you should still be aware that it is, in some sense, a tribe. A very powerful one that is arguably predominant in the world right now, but a tribe nonetheless, with its inherent biases (or priors at least) and propaganda.
Just some thoughts.
I’m using the number calculated by Ray Kurzweil for his book, the Age of Spiritual Machines from 1999. To get that figure, you need 100 billion neurons firing every 5 ms, or 200 Hz. That is based on the maximum firing rate given refractory periods. In actuality, average firing rates are usually lower than that, so in all likelihood the difference isn’t actually six orders of magnitude. In particular, I should point out that six orders of magnitude is referring to the difference between this hypothetical maximum firing brain and the most powerful supercomputer, not the most energy efficient supercomputer.
The difference between the hypothetical maximum firing brain and the most energy efficient supercomputer (at 26 GigaFlops/watt) is only three orders of magnitude. For the average brain firing at the speed that you suggest, it’s probably closer to two orders of magnitude. Which would mean that the average human brain is probably one order of magnitude away from the Landauer limit.
This also assumes that its neurons and not synapses that should be the relevant multiplier.
Okay, so I contacted 80,000 hours, as well as some EA friends for advice. Still waiting for their replies.
I did hear from an EA who suggested that if I don’t work on it, someone else who is less EA-aligned will take the position instead, so in fact, it’s slightly net positive for myself to be in the industry, although I’m uncertain whether or not AI capability is actually funding constrained rather than personal constrained.
Also, would it be possible to mitigate the net negative by choosing to deliberately avoid capability research and just take an ML engineering job at a lower tier company that is unlikely to develop AGI before others and just work on applying existing ML tech to solving practical problems?
I previously worked as a machine learning scientist but left the industry a couple of years ago to explore other career opportunities. I’m wondering at this point whether or not to consider switching back into the field. In particular, in case I cannot find work related to AI safety, would working on something related to AI capability be a net positive or net negative impact overall?
Even further research shows the most recent Nvidia RTX 3090 is actually slightly more efficient than the 1660 Ti, at 36 TeraFlops, 350 watts, and 2.2 kg, which works out to 0.0001 PetaFlops/Watt and 0.016 PetaFlops/kg. Once again, they’re within an order of magnitude of the supercomputers.
So, I did some more research, and the general view is that GPUs are more power efficient in terms of Flops/watt than CPUs, and the most power efficient of those right now is the Nvidia 1660 Ti, which comes to 11 TeraFlops at 120 watts, so 0.000092 PetaFlops/Watt, which is about 6x more efficient than Fugaku. It also weighs about 0.87 kg, which works out to 0.0126 PetaFlops/kg, which is about 7x more efficient than Fugaku. These numbers are still within an order of magnitude, and also don’t take into account the overhead costs of things like cooling, case, and CPU/memory required to coordinate the GPUs in the server rack that one would assume you would need.
I used the supercomputers because the numbers were a bit easier to get from the Top500 and Green500 lists, and I also thought that their numbers include the various overhead costs to run the full system, already packaged into neat figures.
Thoughts On Computronium
Another thought is that maybe Less Wrong itself, if it were to expand in size and become large enough to roughly represent humanity, could be used as such a dataset.
So, I had a thought. The glory system idea that I posted about earlier, if it leads to a successful, vibrant democratic community forum, could actually serve as a kind of dataset for value learning. If each post has a number attached to it that indicates the aggregated approval of human beings, this can serve as a rough proxy for a kind of utility or Coherent Aggregated Volition.
Given that individual examples will probably be quite noisy, but averaged across a large amount of posts, it could function as a real world dataset, with the post content being the input, and the post’s vote tally being the output label. You could then train a supervised learning classifier or regressor that could then be used to guide a Friendly AI model, like a trained conscience.
This admittedly would not be provably Friendly, but as a vector of attack for the value learning problem, it is relatively straightforward to implement and probably more feasible in the short-run than anything else I’ve encountered.
Darklight’s Shortform
A further thought is that those with more glory can be seen almost as elected experts. Their glory is assigned to them by votes after all. This is an important distinction from an oligarchy. I would actually be inclined to see the glory system as located on a continuum between direct demcracy and representative democracy.
So, keep in mind that by having the first vote free and worth double the paid votes does tilt things more towards democracy. That being said, I am inclined to see glory as a kind of proxy for past agreement and merit, and a rough way to approximate liquid democracy where you can proxy your vote to others or vote yourself.
In this alternative “market of ideas” the ideas win out because people who others trust to have good opinions are able to leverage that trust. Decisions over the merit of the given arguments are aggregated by vote. As long as the population is sufficiently diverse, this should result in an example of the Wisdom of Crowds phenomenon.
I don’t think it’ll dissolve into a mere flag waving contest, anymore than the existing Karma system on Reddit and Less Wrong does already.
Perhaps a nitpick detail, but having someone rob them would not be equivalent, because the cost of the action is offset by the ill-gotten gains. The proposed currency is more directly equivalent to paying someone to break into the target’s bank account and destroying their assets by a proportional amount so that no one can use them anymore.
As for the more general concerns:
Standardized laws and rules tend in practice to disproportionately benefit those with the resources to bend and manipulate those rules with lawyers. Furthermore, this proposal does not need to replace all laws, but can be utilized alongside them as a way for people to show their disapproval in a way that is more effective that verbal insult, and less coercive than physical violence. I’d consider it a potential way to channel people’s anger so that they don’t decide to start a revolution against what they see as laws that benefit the rich and powerful. It is a way to distribute a little power to individuals and allow them to participate in a system that considers their input in a small but meaningful way.
The rules may be more consistent with laws, but in practice, they are also contentious in the sense that the process of creating these laws is arcane and complex and the resulting punishments often delayed for years as they work through the legal system. Again, this makes sense when determining how the coercive power of the state should be applied, but leaves something to be desired in terms of responsiveness to addressing real world concerns.
Third-party enforcement is certainly desirable. In practice, the glory system allows anyone outside the two parties to contribute and likely the bulk of votes will come from them. As for cycles of violence, the exchange rate mechanism means that defence is at least twice as effective as attack with the same amount of currency, which should at least mitigate the cycles because it won’t be cost-effective to attack without significant public support. Though this is only relevant to the forum condition.
In the general condition as a currency, keep in mind that as a currency functions as a store of value, there is a substantial opportunity cost to spending the currency to destroy other people’s currency rather than say, using it to accrue interest. The cycles are in a sense self-limiting because people won’t want to spend all their money escalating a conflict that will only cause both sides to hemorrhage funds, unless someone feels so utterly wronged as to be willing to go bankrupt to bankrupt another, in which case, one should honestly be asking what kind of injustice caused this situation to come into being in the first place.
All that being said, I appreciate the critiques.
As for the cheaply punishing prolific posters problem, I don’t know a good solution that doesn’t lead to other problems, as forcing all downvotes to cost glory makes it much harder to deal with spammers who somehow get through the application process filter. I had considered an alternative system in which all votes cost glory, but then there’s no way to generate glory except perhaps by having admins and mods gift them, which could work, but runs counter to the direct democracy ideal that I was sorta going for.
What I meant was you could farm upvotes on your posts. Sorry. I’ll edit it for clarity.
And further to clarify, you’d both be able to gift glory and also spend glory to destroy other people’s glory, at the mentioned exchange rate.
The way glory is introduced into the system is that any given post allows everyone one free vote on them that costs no glory.
So, I guess I should clarify, the idea is that you can both gift glory, which is how you gain the ability to post, and also you gain or lose glory based on people’s upvotes and downvotes on your posts.
The average human lifespan is about 70 years or approximately 2.2 billion seconds. The average human brain contains about 86 billion neurons or roughly 100 trillion synaptic connections. In comparison, something like GPT-3 has 175 billion parameters and 500 billion tokens of data. Assuming very crudely weight/synapse and token/second of experience equivalence, we can see that the human model’s ratio of parameters to data is much greater than GPT-3, to the point that humans have significantly more parameters than timesteps (100 trillion to 2.2 billion), while GPT-3 has significantly fewer parameters than timesteps (175 billion to 500 billion). Given the information gain per timestep is different for the two models, but as I said, these are crude approximations meant to convey the ballpark relative difference.
This means basically that humans are much more prone to overfitting the data, and in particular, memorizing individual data points. Hence why humans experience episodic memory of unique events. It’s not clear that GPT-3 has the capacity in terms of parameters to memorize its training data with that level of clarity, and arguably this is why such models seem less sample efficient. A human can learn from a single example by memorizing it and retrieving it later when relevant. GPT-3 has to see it enough times in the training data for SGD to update the weights sufficiently that the general concept is embedded in the highly compressed information model.
It’s thus, not certain whether or not existing ML models are sample inefficient because of the algorithms being used, or if its because they just don’t have enough parameters yet, and increased efficiency will emerge from scaling further.