An eccentric dreamer in search of truth and happiness for all. Formerly posted on Felicifia back in the day under the same name. Been a member of Less Wrong and involved in Effective Altruism since roughly 2013.
Darklight
Update: I made an interactive webpage where you can run the simulation and experiment with a different payoff matrix and changes to various other parameters.
So, I adjusted the aggressor system to work like alliances or defensive pacts instead of a universal memory tag. Basically, now players make allies when they both cooperate and aren’t already enemies, and make enemies when defected against first, which sets all their allies to also consider the defector an enemy. This, doesn’t change the result much. The alliance of nice strategies still wins the vast majority of the time.
I also tried out false flag scenarios where 50% of the time the victim of a defect first against non-enemy will actually be mistaken for the attacker. This has a small effect. There is a slight increase in the probability of an Opportunist strategy winning, but most of the time the alliance of nice strategies still wins, albeit with slightly fewer survivors on average.
My guess for why this happens is that nasty strategies rarely stay in alliances very long because they usually attack a fellow member at some point, and eventually, after sufficient rounds one of their false flag attempts will fail and they will inevitably be kicked from the alliance and be retaliated against.
The real world implications of this remain that it appears that your best bet of surviving in the long run as a person or civilization is to play a nice strategy, because if you play a nasty strategy, you are much less likely to survive in the long run.
In the limit, if the nasty strategies win, there will only be one survivor, dog eat dog highlander style, and your odds of being that winner are 1/N, where N is the number of players. On the other hand, if you play a nice strategy, you increase the strength of the nice alliance, and when the nice alliance wins as it usually does, you’re much more likely to be a survivor and have flourished together.
My simulation currently by default has 150 players, 60 of which are nice. On average about 15 of these survive to round 200, which is a 25% survival rate. This seems bad, but the survival rate of nasty strategies is less than 1%. If I switch the model to use 50 Avengers and 50 Opportunists, on average 25 Avengers survive to zero Opportunists, a 50% survival rate for the Avengers.
Thus, increasing the proportion of starting nice players increases the odds of nice players surviving, so there is an incentive to play nice.
Admittedly this is a fairly simple set up without things like uncertainty and mistakes, so yes, it may not really apply to the real world. I just find it interesting that it implies that strong coordinated retribution can, at least in this toy set up, be useful for shaping the environment into one where cooperation thrives, even after accounting for power differentials and the ability to kill opponents outright, which otherwise change the game enough that straight Tit-For-Tat doesn’t automatically dominate.
It’s possible there are some situations where this may resemble the real world. Like, if you ignore mere accusations and focus on just actual clear cut cases where you know the aggression has occurred, such as with countries and wars, it seems to resemble how alliances form and retaliation occurs when anybody in the alliance is attacked?
I personally also see it as relevant for something like hypothetical powerful alien AGIs that can see everything that happens from space, and so there could be some kind of advanced game theoretic coordination at a distance with this. Though that admittedly is highly speculative.
It would be nice though if there was a reason to be cooperative even to weaker entities as that would imply that AGI could possibly have game theoretic reasons not to destroy us.
Okay, so I decided to do an experiment in Python code where I modify the Iterated Prisoner’s Dilemma to include Death, Asymmetric Power, and Aggressor Reputation, and run simulations to test how different strategies do. Basically, each player can now die if their points falls to zero or below, and the payoff matrix uses their points as a variable such that there is a power difference that affects what happens. Also, if a player defects first in any round of any match against a non-aggressor, they get the aggressor label, which matters for some strategies that target aggressors.
Long story short, there’s a particular strategy I call Avenger, which is Grim Trigger but also retaliates against aggressors (even if the aggression was against a different player) that ensures that the cooperative strategies (ones that never defect first against a non-aggressor) win if the game goes enough rounds. Without Avenger though, there’s a chance that a single Opportunist strategy player wins instead. Opportunist will Defect when stronger and play Tit-For-Tat otherwise.
I feel like this has interesting real world implications.Interestingly, Enforcer, which is Tit-For-Tat but also opens with Defect against aggressors, is not enough to ensure the cooperative strategies always win. For some reason you need Avenger in the mix.
Edit: In case anyone wants the code, it’s here.
I was recently trying to figure out a way to calculate my P(Doom) using math. I initially tried just making a back of the envelope calculation by making a list of For and Against arguments and then dividing the number of For arguments by the total number of arguments. This led to a P(Doom) of 55%, which later got revised to 40% when I added more Against arguments. I also looked into using Bayes Theorem and actual probability calculations, but determining P(E | H) and P(E) to input into P(H | E) = P(E | H) * P(H) / P(E) is surprisingly hard and confusing.
Why There Is Hope For An Alignment Solution
Minor point, but the apology needs to sound sincere and credible, usually by being specific about the mistakes and concise and to the point and not like, say, Bostrom’s defensive apology about the racist email a while back. Otherwise you can instead signal that you are trying to invoke the social API call in a disingenuous way, which can clearly backfire.
Things like “sorry you feel offended” also tend to sound like you’re not actually remorseful for your actions and are just trying to elicit the benefits of an apology. None of the apologies you described sound anything like that, but it’s a common failure state among the less emotionally mature and the syncophantic.
I have some ideas and drafts for posts that I’ve been sitting on because I feel somewhat intimidated by the level of intellectual rigor I would need to put into the final drafts to ensure I’m not downvoted into oblivion (something a younger me experienced in the early days of Less Wrong).
Should I try to overcome this fear, or is it justified?
For instance, I have a draft of a response to Eliezer’s List of Lethalities post that I’ve been sitting on since 2022/04/11 because I doubted it would be well received given that it tries to be hopeful and, as a former machine learning scientist, I try to challenge a lot of LW orthodoxy about AGI in it. I have tremendous respect for Eliezer though, so I’m also uncertain if my ideas and arguments aren’t just hairbrained foolishness that will be shot down rapidly once exposed to the real world, and the incisive criticism of Less Wrongers.
The posts here are also now of such high quality that I feel the bar is too high for me to meet with my writing, which tends to be more “interesting train-of-thought in unformatted paragraphs” than the “point-by-point articulate with section titles and footnotes” style that people tend to employ.
Anyone have any thoughts?
I would be exceedingly cautious about this line of reasoning. Hypomania tends to not be sustainable, with a tendency to either spiral into a full blown manic episode, or to exhaust itself out and lead to an eventual depressive episode. This seems to have something to do with the characteristics of the thoughts/feelings/beliefs that develop while hypomanic, the cognitive dynamics if you will. You’ll tend to become increasingly overconfident and positive to the point that you will either start to lose contact with reality by ignoring evidence to the contrary of what you think is happening (because you feel like everything is awesome so it must be), or reality will hit you hard when the good things that you expect to happen, don’t, and you update accordingly (often overcompensating in the process).
In that sense, it’s very hard to stay “just” hypomanic. And honestly, to my knowledge, most psychiatrists are more worried about potential manic episodes than anything else in bipolar disorder, and will put you on enough antipsychotics to make you a depressed zombie to prevent them, because generally speaking the full on psychosis level manic episodes are just more dangerous for everyone involved.
Ideally, I think your mood should fit your circumstances. Hypomania often shows up as inappropriately high positive mood even in situations where it makes little sense to be so euphoric, and that should be a clear indicator of why it can be problematic.
It can be tempting to want to stay in some kind of controlled hypomania, but in reality, this isn’t something that to my knowledge is doable with our current science and technology, at least for people with actual bipolar disorder. It’s arguable that for individuals with normally stable mood, putting them on stimulants could have a similar effect as making them a bit hypomanic (not very confident about this though). Giving people with bipolar disorder stimulants that they don’t otherwise need on the other hand is a great way to straight up induce mania, so I definitely wouldn’t recommend that.
I still remember when I was a masters student presenting a paper at the Canadian Conference on AI 2014 in Montreal and Bengio was also at the conference presenting a tutorial, and during the Q&A afterwards, I asked him a question about AI existential risk. I think I worded it back then as concerned about the possibility of Unfriendly AI or a dangerous optimization algorithm or something like that, as it was after I’d read the sequences but before “existential risk” was popularized as a term. Anyway, he responded by asking jokingly if I was a journalist, and then I vaguely recall him giving a hedged answer about how current AI was still very far away from those kinds of concerns.
It’s good to see he’s taking these concerns a lot more seriously these days. Between him and Hinton, we have about half of the Godfathers of AI (missing LeCun and Schmidhuber if you count him as one of them) showing seriousness about the issue. With any luck, they’ll push at least some of their networks of top ML researchers into AI safety, or at the very least make AI safety more esteemed among the ML research community than before.
The average human lifespan is about 70 years or approximately 2.2 billion seconds. The average human brain contains about 86 billion neurons or roughly 100 trillion synaptic connections. In comparison, something like GPT-3 has 175 billion parameters and 500 billion tokens of data. Assuming very crudely weight/synapse and token/second of experience equivalence, we can see that the human model’s ratio of parameters to data is much greater than GPT-3, to the point that humans have significantly more parameters than timesteps (100 trillion to 2.2 billion), while GPT-3 has significantly fewer parameters than timesteps (175 billion to 500 billion). Given the information gain per timestep is different for the two models, but as I said, these are crude approximations meant to convey the ballpark relative difference.
This means basically that humans are much more prone to overfitting the data, and in particular, memorizing individual data points. Hence why humans experience episodic memory of unique events. It’s not clear that GPT-3 has the capacity in terms of parameters to memorize its training data with that level of clarity, and arguably this is why such models seem less sample efficient. A human can learn from a single example by memorizing it and retrieving it later when relevant. GPT-3 has to see it enough times in the training data for SGD to update the weights sufficiently that the general concept is embedded in the highly compressed information model.
It’s thus, not certain whether or not existing ML models are sample inefficient because of the algorithms being used, or if its because they just don’t have enough parameters yet, and increased efficiency will emerge from scaling further.
I recently interviewed with Epoch, and as part of a paid work trial they wanted me to write up a blog post about something interesting related to machine learning trends. This is what I came up with:
http://www.josephius.com/2022/09/05/energy-efficiency-trends-in-computation-and-long-term-implications/
I should point out that the logic of the degrowth movement follows from a relatively straightforward analysis of available resources vs. first world consumption levels. Our world can only sustain 7 billion human beings because the vast majority of them live not at first world levels of consumption, but third world levels, which many would argue to be unfair and an unsustainable pyramid scheme. If you work out the numbers, if everyone had the quality of life of a typical American citizen, taking into account things like meat consumption to arable land, energy usage, etc., then the Earth would be able to sustain only about 1-3 billion such people. Degrowth thus follows logically if you believe that all the people around the world should eventually be able to live comfortable, first world lives.
I’ll also point out that socialism is, like liberalism, a child of the Enlightenment and general beliefs that reason and science could be used to solve political and economic problems. Say what you will about the failed socialist experiments of the 20th century, but the idea that government should be able to engineer society to function better than the ad-hoc arrangement that is capitalism, is very much an Enlightenment rationalist, materialist, and positivist position that can be traced to Jean-Jacques Rousseau, Charles Fourier, and other philosophes before Karl Marx came along and made it particularly popular. Marxism in particular, at least claims to be “scientific socialism”, and historically emphasized reason and science, to the extent that most Marxist states were officially atheist (something you might like given your concerns about religions).
In practice, many modern social policies, such as the welfare state, Medicare, public pensions, etc., are heavily influenced by socialist thinking and put in place in part as a response by liberal democracies to the threat of the state socialist model during the Cold War. No country in the world runs on laissez-faire capitalism, we all utilize mixed market economies with varying degrees of public and private ownership. The U.S. still has a substantial public sector, just as China, an ostensibly Marxist Leninist society in theory, has a substantial private sector (albeit with public ownership of the “commanding heights” of the economy). It seems that all societies in the world eventually compromised in similar ways to achieve reasonably functional economies balanced with the need to avoid potential class conflict. This convergence is probably not accidental.
If you’re truly more concerned with truth seeking than tribal affiliations, you should be aware of your own tribe, which as far as I can tell, is western, liberal, and democratic. Even if you honestly believe in the moral truth of the western liberal democratic intellectual tradition, you should still be aware that it is, in some sense, a tribe. A very powerful one that is arguably predominant in the world right now, but a tribe nonetheless, with its inherent biases (or priors at least) and propaganda.
Just some thoughts.
I’m using the number calculated by Ray Kurzweil for his book, the Age of Spiritual Machines from 1999. To get that figure, you need 100 billion neurons firing every 5 ms, or 200 Hz. That is based on the maximum firing rate given refractory periods. In actuality, average firing rates are usually lower than that, so in all likelihood the difference isn’t actually six orders of magnitude. In particular, I should point out that six orders of magnitude is referring to the difference between this hypothetical maximum firing brain and the most powerful supercomputer, not the most energy efficient supercomputer.
The difference between the hypothetical maximum firing brain and the most energy efficient supercomputer (at 26 GigaFlops/watt) is only three orders of magnitude. For the average brain firing at the speed that you suggest, it’s probably closer to two orders of magnitude. Which would mean that the average human brain is probably one order of magnitude away from the Landauer limit.
This also assumes that its neurons and not synapses that should be the relevant multiplier.
Okay, so I contacted 80,000 hours, as well as some EA friends for advice. Still waiting for their replies.
I did hear from an EA who suggested that if I don’t work on it, someone else who is less EA-aligned will take the position instead, so in fact, it’s slightly net positive for myself to be in the industry, although I’m uncertain whether or not AI capability is actually funding constrained rather than personal constrained.
Also, would it be possible to mitigate the net negative by choosing to deliberately avoid capability research and just take an ML engineering job at a lower tier company that is unlikely to develop AGI before others and just work on applying existing ML tech to solving practical problems?
I previously worked as a machine learning scientist but left the industry a couple of years ago to explore other career opportunities. I’m wondering at this point whether or not to consider switching back into the field. In particular, in case I cannot find work related to AI safety, would working on something related to AI capability be a net positive or net negative impact overall?
Even further research shows the most recent Nvidia RTX 3090 is actually slightly more efficient than the 1660 Ti, at 36 TeraFlops, 350 watts, and 2.2 kg, which works out to 0.0001 PetaFlops/Watt and 0.016 PetaFlops/kg. Once again, they’re within an order of magnitude of the supercomputers.
So, I did some more research, and the general view is that GPUs are more power efficient in terms of Flops/watt than CPUs, and the most power efficient of those right now is the Nvidia 1660 Ti, which comes to 11 TeraFlops at 120 watts, so 0.000092 PetaFlops/Watt, which is about 6x more efficient than Fugaku. It also weighs about 0.87 kg, which works out to 0.0126 PetaFlops/kg, which is about 7x more efficient than Fugaku. These numbers are still within an order of magnitude, and also don’t take into account the overhead costs of things like cooling, case, and CPU/memory required to coordinate the GPUs in the server rack that one would assume you would need.
I used the supercomputers because the numbers were a bit easier to get from the Top500 and Green500 lists, and I also thought that their numbers include the various overhead costs to run the full system, already packaged into neat figures.
Recently I tried out an experiment using the code from the Geometry of Truth paper to try to see if using simple label words like “true” and “false” could substitute for the datasets used to create truth probes. I also tried out a truth probe algorithm based on classifying with the higher cosine similarity to the mean vectors.
Initial results seemed to suggest that the label word vectors were sorta acceptable, albeit not nearly as good (around 70% accurate rather than 95%+ like with the datasets). However, testing on harder test sets showed much worse accuracy (sometimes below chance, somehow). So I can probably conclude that the label word vectors alone aren’t sufficient for a good truth probe.
Interestingly, the cosine similarity approach worked almost identically well as the mass mean (aka difference in means) approach used in the paper. Unlike the mass mean approach though, the cosine similarity approach can be extended to a multi-class situation. Though, logistic regression can also be extended similarly, so it may not be particularly useful either, and I’m not sure there’s even a use case for a multi-class probe.
Anyways, I just thought I’d write up the results here in the unlikely event someone finds this kind of negative result as useful information.