Chris_Leong’s Shortform
- Summary: “Imagining and building wise machines: The centrality of AI metacognition” by Johnson, Karimi, Bengio, et al. by (11 Nov 2024 16:13 UTC; 29 points)
- Share AI Safety Ideas: Both Crazy and Not by (1 Mar 2025 19:08 UTC; 17 points)
- Summary: “Imagining and building wise machines: The centrality of AI metacognition” by Johnson, Karimi, Bengio, et al. by (EA Forum; 5 Jun 2025 12:16 UTC; 12 points)
- 's comment on Third-wave AI safety needs sociopolitical thinking by (EA Forum; 27 Mar 2025 5:28 UTC; 5 points)
- Share AI Safety Ideas: Both Crazy and Not by (EA Forum; 26 Feb 2025 13:09 UTC; 4 points)
- 's comment on Tentatively against making AIs ‘wise’ by (EA Forum; 12 Feb 2025 6:45 UTC; 4 points)
- 's comment on Summary: “Imagining and building wise machines: The centrality of AI metacognition” by Johnson, Karimi, Bengio, et al. by (23 Mar 2025 16:58 UTC; 2 points)
- 's comment on AI for AI safety by (3 Apr 2025 4:24 UTC; 1 point)
I guess orgs need to be more careful about who they hire as forecasting/evals researchers.
Sometimes things happen, but three people at the same org...
This is also a massive burning of the commons. It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias. It is valuable for folks to be able to share information freely with folks at such orgs without having to worry about them going off and doing something like this.
But this only works if those less worried about AI risks who join such a collaboration don’t use the knowledge they gain to cash in on the AI boom in an acceleratory way. Doing so undermines the very point of such a project, namely, to try to make AI go well. Doing so is incredibly damaging to trust within the community.
Now let’s suppose you’re an x-risk funder considering whether to fund their previous org. This org does really high-quality work, but the argument for them being net-positive is now significantly weaker. This is quite likely to make finding future funding harder for them.
This is less about attacking those three folks and more just noting that we need to strive to avoid situations where things like this happen in the first place. This requires us to be more careful in terms of who gets hired.
(note: I work at Epoch) This attitude feels like a recipe for creating an intellectual bubble. Of course people will use the knowledge they gain in collaboration with you for the purposes that they think are best. I think it would be pretty bad for the AI safety community if it just relied on forecasting work from card-carrying AI safety advocates.
Thanks for weighing in.
Oh, additional screening could very easily have unwanted side-effects. That’s why I wrote: “It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias” and why it would be better for this issue to never have arisen in the first place. Actions like this can create situations with no good trade-offs.
I was definitely not suggesting that the AI safety community should decide which forecasts to listen to based on the views of the forecasters. That’s irrelevant, we should pay attention to the best forecasters.
I was talking about funding decisions. This is a separate matter.
If someone else decides to fund a forecaster even though we’re worried they’re net-negative or they do work voluntarily, then we should pay attention to their forecasts if they’re good at their job.
Seems like several professions have formal or informal restrictions on how they can use information that they gain in a particular capacity to their advantage. People applying for a forecasting role are certainly entitled to say, ’If I learn anything about AI capabilities here, I may use it to start an AI startup and I won’t actually feel bad about this”. It doesn’t mean you have to hire them.
It is entirely normal for there to be widely accepted, clearly formalized, and meaningfully enforced restrictions on how people use knowledge they’ve gotten in this or that setting… regardless of what they think is best. It’s a commonplace of professional ethics.
Sure, there are in some very specific settings with long held professional norms that people agree to (e.g. doctors and lawyers). I don’t think this applies in this case, though you could try to create such a norm that people agree to.
I would like to see serious thought given to instituting such a norm. There’s a lot of complexities here, figuring out what is or isn’t kosher would be challenging, but it should be explored.
I largely agree with the underlying point here, but I don’t think its quite correct that something like this only applies in specific professions. For example, I think every major company is going to expect employees to be careful about revealing internal info, and there are norms that apply more broadly (trade secrets, insider trading etc.).
As far as I can tell though, those are all highly dissimilar to this scenario because they involve an existing widespread expectation of not using information in a certain way. Its not even clear to me in this case what information was used in what way that is allegedly bad.
I don’t think this is true. People can’t really restrict their use of knowledge, and subtle uses are pretty unenforceable. So it’s expected that knowledge will be used in whatever they do next. Patents and noncompete clauses are attempts to work around this. They work a little, for a little.
Agreed. This is how these codes form. Someone does something like this and then people discuss and decide that there should be a rule against it or that it should at least be frowned upon.
I think the conclusion is not Epoch shouldn’t have hired Matthew, Tamay, and Ege but rather [Epoch / its director] should have better avoided negative-EV projects (e.g. computer use evals) (and shouldn’t have given Tamay leadership-y power such that he could cause Epoch to do negative-EV projects — idk if that’s what happened but seems likely).
Seems relevant to note here that Tamay had a leadership role from the very beginning: he was the associate director already when Epoch was first announced as an org.
This seems like a better solution on the surface, but once you dig in, I’m not so sure.
Once you hire someone, assuming they’re competent, it’s very hard for you to decide to permanently bar them from gaining a leadership role. How are you going to explain promoting someone who seems less competent than them to a leadership role ahead of them? Or is the plan to never promote them and refuse to ever discuss it, which would create weird dynamics within an organisation.
I would love to hear if you think otherwise, but it seems unworkable to me.
I think its not all that uncommon for people who are highly competent in their current role to be passed over for promotion to leadership. LeBron James isn’t guaranteed to job as the MBA commissioner just because he balls hard. Things like “avoid[ing] negative-EV projects” would be prime candidates for something like this. If you’re amazing at executing technical work on your assigned projects but aren’t as good at prioritizing projects or coming up with good ideas for projects, then I could definitely see that blocking a move to leadership even if you’re considered insanely competent technically.
I mean, good luck hiring people with a diversity of viewpoints who you’re also 100% sure will never do anything that you believe to be net negative. Like what does “diversity of viewpoints” even mean apart from that?
Everything has trade-offs.
I agree that attempting to be 100% sure that they’re responsible would be a mistake. Specifically, the unwanted impacts would likely be too high.
Can you state more specifically what the alleged bad actions are here? Based on some of the discussions under your post about professional norms surrounding information disclosure, I think it is worth distinguishing two cases.
First, consider a norm that limits the disclosure of some relatively specific and circumscribed pieces of information, such as a doctor not being allowed to reveal personal health information of patients outside of what is needed to provide care.
Second, a general norm that if you cooperate with someone and they provide you some info, you won’t use that info contrary to their interests. Its not 100% clear to me, but your post sounds a lot like this second one.
I think the second scenario raises a lot of issues. Its seems challenging to enforce, hard to understand and navigate, costly for people to attempt to conform to, and potentially counterproductive for what seems to be your goal. You are considering a specific case at a specific point in time, but I don’t think that gives the full picture of the impact of such a norm. For example, consider ex-OpenAI employees who left due to concerns about AI safety. Should the expectation be that they only use information and experience they gained at OpenAI in a way that OpenAI would approve of?
Now, if Epoch and/or specific individuals made commitments that they violated, that might be more like the first case, but its not clear that is what happened here. If it is, more explanation of how this is the case would be helpful, I think.
I agree that this issue is complex and I don’t pretend to have all of the solutions.
I just think it’s really bad if people feel that they can’t speak relatively freely with the forecasting organisations because they’ll misuse the information. I think this is somewhat similar to how it is important for folks to be able to speak freely to their doctor/lawyer/psychologist though I admit that the analogy isn’t perfect and that straightforwardly copying these norms over would probably be a mistake.
Nonetheless, I think it is worthwhile discussing whether there should be some kind of norms and what they should be. As you’ve rightly pointed out, are a lot of issues that would need to be considered. I’m not saying I know exactly what these norms should be. I see myself as more just starting a discussion.
(This is distinct from my separate point about it being a mistake to hire folk who do things like this. It is a mistake to have hired folks who act strongly against your interests even if they don’t break any ethical injuctions)
To “misuse” to me implies taking a bad action. Can you explain what misuse occurred here? If we assume that people at OpenAI now feel less able to speak freely after things that ex-OpenAI employees have said/done would you likewise characterize those people as having “misused” information or experience they gained at OpenAI? I understand you don’t have fully formed solutions and that’s completely understandable, but I think my questions go to a much more fundamentally issue about what the underlying problem actually is. I agree it is worth discussing, but I think it would clarify the discussion to understand what the intent of such a norm would (and if achieving that intent would in fact be desirable).
If Coca-Cola hires someone who later leaves and goes to work for Pepsi because Pepsi offered them higher compensation, I’m not sure it would make sense for Coca-Cola to conclude that they should make big changes to their hiring process, other than perhaps increasing their own compensation if they determine that is a systematic issue. Coca-Cola probably needs to accept that “its not personal” is sometimes going to be the natural of the situation. Obviously details matter, so maybe this case is different, but I think working in an environment where you need to cooperate with other people/institutions means you also have to sometimes accept that people you work with will make decisions based on their own judgements and interests, and therefore may do things you don’t necessarily agree with.
They’re recklessly accelerating AI. Or, at least, that’s how I see it. I’ll leave it to others to debate whether or not this characterisation is accurate.
Details matter. It depends on how bad it is and how rare these actions are.
I know I’ve responded to a lot of your comments, and I get the sense you don’t want to keep engaging with me, so I’ll try to keep it brief.
We both agree that details matter, and I think the details of what the actual problem is matter. If, at bottom, the thing that Epoch/these individuals have done wrong is recklessly accelerate AI, I think you should have just said that up top. Why all the “burn the commons”, “sharing information freely”, “damaging to trust” stuff? It seems like you’re saying at the end of the day, those things aren’t really the thing you have a problem with. On the other hand, I think invoking that stuff is leading you to consider approaches that won’t necessarily help with avoiding reckless acceleration, as I hope my OpenAI example demonstrates.
I believe those are useful frames for understanding the impacts.
https://www.dwarkesh.com/p/ege-tamay
There’s lots of things that “might” happen. When we’re talking about the future of humanity, we can’t afford to just glaze over mights.
Hegel—A Very Short Introduction by Peter Singer—Book Review Part 1: Freedom
Hegel is a philosopher who is notorious for being incomprehensible. In fact, for one of his books he signed a contract that assigned a massive financial penalty for missing the publishing deadline, so the book ended up being a little rushed. While there was a time when he was dominant in German philosophy, he now seems to be held in relatively poor regard and his main importance is seen to be historical. So he’s not a philosopher that I was really planning to spend much time on.
Given this, I was quite pleased to discover this book promising to give me A Very Short Introduction, especially since it is written by Peter Singer, a philosopher who write and thinks rather clearly. After reading this book, I still believe that most of what Hegel wrote was pretentious nonsense, but the one idea that struck me as the most interesting was his conception of freedom.
A rough definition of freedom might be ensuring that people are able to pursue whatever it is that they prefer. Hegel is not a fan abstract definitions of freedom which treat all preferences the same and don’t enquire where they come from.
In his perspective, most of our preferences are purely a result of the context in which we exist and so such an abstract definition of freedom is merely the freedom to be subject to social and historical forces. Since we did not choose our desires, he argues that we are not free when we act from our desires. Hegel argues that, “every condition of comfort reveals in turn its discomfort, and these discoveries go on for ever”. One such example would be the marketing campaigns to convince us that sweating was embarrassing (https://www.smithsonianmag.com/…/how-advertisers-convinced…/).
This might help clarify further: Singer ties this to the more modern debate between Radical Economists and Liberal Economists. Liberal economists use how much people pay as a measure of how strong their preferences are and refuse to get into the question of whether any preferences are more valuable than any other seeing this as ideological. Radical economists argue that many of our desires are a result of capitalism. They would say that if I convince you that you are ugly and then I sell you $100 of beauty products to restore your confidence, then I haven’t created $100 worth of value. They argue that refusing to value any preference above any other preference is an ideological choice in and of itself; and that there is no way to step outside of ideology.
If pursuing our desires is not freedom, what is? Kant answers that freedom is following reason and performing your duty. This might not sound very much like freedom, quite the opposite, but for Kant not following your reason was allowing yourself to be a slave of your instincts. Here’s another argument: perhaps a purely rational being wouldn’t desire the freedom to shirk their duty, so insofar as this is freedom, it might not be of a particularly valuable kind and if you think this is imposing on your freedom this is because of your limited perspective.
Hegel thought that Kant’s answer was a substantial advance, but he also thought it was empty of content. Kant viewed duty in terms of the categorical imperative, “Do not act except if you could at the same time will that it would become a universal law”, but Hegel thought it was empty of content. Kant would say that you shouldn’t steal because you couldn’t will a world where everyone would steal from everyone else. But mightn’t some people be fine with such a world, particularly if they thought they might come out on top? Even if you don’t want to consider people with views that extreme, you can almost always find a universal to justify whatever action you want. Why should the universal that the thief would have to accept be, “Anyone can steal from another person” instead of, “Anyone can steal from someone who earned who doesn’t deserve their wealth?” (See section III of You Kant Dismiss Universalizability). Further, Kant’s absolutist form of morality (no lying even to save a friend from a murderer) seems to require us to completely sacrifice our natural desires.
Hegel’s solution to this was to suggest the need for what he calls an organic community; or a community that is united in its values. He argues that such communities shape people’s desires to such an extent that most people won’t even think about pursuing their own interests and that this resolves the opposition between morality and self-interest that Kant’s vision of freedom creates. However, unlike the old organic communities which had somewhat arbitrary values, Hegel argued that the advance of reason meant that the values of these communities also had to be based on reason, otherwise freethinking individuals wouldn’t align themselves with the community.
Indeed, this is the key part of his much-aligned argument that the Prussian State was the cumulation of history. He argued that the French revolution has resulted in such bloodshed because it was based on an abstract notion of freedom which was pursued to the extent that all the traditional institutions were bulldozed over. Hegel argued that the evolution of society should built upon what already exists and not ignore the character of the people or the institutions of society. For this reason, his ideal society would have maintained the monarchy, but with most of the actual power being delegated to the houses, except in certain extreme circumstances.
I tend to think of Hegel as primarily important for his contributions to the development of Western philosophy (so even if he was wrong on details he influenced and framed the work of many future philosophers by getting aspects of the framing right) and for his contributions to methodology (like standardizing the method of dialectic, which on one hand is “obvious” and people were doing it before Hegel, and on the other hand is mysterious and the work of experts until someone lays out what’s going on).
Which aspects of framing do you think he got right?
“In more simplistic terms, one can consider it thus: problem → reaction → solution. Although this model is often named after Hegel, he himself never used that specific formulation. Hegel ascribed that terminology to Kant. Carrying on Kant’s work, Fichte greatly elaborated on the synthesis model and popularized it.”—Wikipedia; so Hegel deserves less credit than he is originally granted.
Interesting.
I don’t recall anymore, it’s been too long for me to remember enough specifics to answer your question. It’s just an impression or cached thought I have that I carry around from past study.
Book Review: Communist Manifesto
“The history of all hitherto existing society is the history of class struggles. Freeman and slave, patrician and plebeian, lord and serf, guild-master and journeyman, in a word, oppressor and oppressed, stood in constant opposition to one another, carried on an uninterrupted, now hidden, now open fight, that each time ended, either in the revolutionary reconstitution of society at large, or in the common ruin of the contending classes”
Overall summary: Given the rise of socialism in recent years, now seemed like an appropriate time to review the Communist Manifesto. At times I felt that Marx’s writing was keenly insightful, at other times I felt he was in ignorance of basic facts and at other times I felt that he held views that were reasonable at the time, but for which the flaws are now obvious. In particular, I found the first-half much more engaging than I expected because, say what you like about Marx, he’s an engaged and poetic writer. Towards the end, the focused shifted into particular time-bounded political disputes for which I neither had the knowledge to understand nor the interest to acquire. At the start, I felt that I already had a decent grasp of the communist impulse and I haven’t become any more favourable to communism, but reading this rounded out a few more details of the communist critique of capitalism.
Capitalism: Despite being its most famous critic, Marx has a strong appreciation for the power of capitalism. He writes about it sweeping away all the old feudal bonds and how it draws even the most “barbarian” nations into civilisation. He writes about it stripping every occupation previously admired of its halo into its “paid wage labourers”; and undoubtedly some professions are affected far too much by market concerns, but this has to be weighed up against the increase in access that has been brought. He even writes that it has accomplished “wonders far exceeding the Egyptian Pyramids, Roman Acquaducts and Gothic Cathedrals” and his willingness to acknowledge this in such strong terms increased my respect for him. Marx can’t see capitalism as anything, but exploitation; for those who would answer that it lifts all boats, I don’t think he has a strong reply apart from denial that this occurs. To steelman him, even if people are better off financially, they can be worse off overall if they are now working only the simplest, most monotonous jobs. That would have been a stronger argument when much more work was in factories, but with increasing automation, these are precisely those jobs that are disappearing. Another argument would be that over time the capitalists who survive will be those who are best at lowering wage costs, by minimising the use of labour and ensuring that the work is set up to use as much unskilled labour as possible. So even if people were financially better off in the short term, they might be worse off over the long term. However, history seems to have shown the opposite, with modern wages far greater than in pre-industrial, pre-capitalist times.
Class warfare: Marx made several interesting comments on this. How the bourgeoise were often empowered by the monarchy to limit the power of the nobility. That the proletariat should be thought of as a new class, separate from the peasants, since their interests diverge with the later more likely to try rolling things back than to support creating a new order. How the bourgeois would seek help from the proletariat against aristocrats, dragging the proletariat into the political arena. How the proletariat were not unified in Marx’s time, but how improved communication provided the means for national unification. And that a section of the bourgeois who were threatened with falling into the proletariat would join with the proletariat. I definitely think class analysis has value, but I worry how Marxists often don’t be able to see things in any way other than class. We are members of classes; that is true; but we are also individuals and no one way of carving up the space captures all of reality. For example, Marx includes masters/apprentices in his oppressor/oppressed hierarchy, even the though most of the later will eventually become the former
Personal property: It was interesting hearing him talking about abolishing personal property as that is an element of the original communism that seems to be de-emphasised these days, with the focus more on seizing the means of production. I expect that this is related to a change in context; Marx was able to write that private property is done away with for 9/10s of the population, I don’t know how true it was at the time, but it certainly isn’t true today. Nonetheless, I found it interesting that his desire to abolish bourgeois property was similar to the bourgeois desire to abolish feudal property; both believe that the kind of property they want to abolish is based upon exploitation and unearned privilege.
False consciousness: For Marx, the ideas that are dominant in society are just the ideas of the elites. Law, morality and religion are just prejudices of the bourgeois. People don’t structure society based upon ideas, rather the ideas are determined by the structure of society and what allows society to be as productive as possible. Marx doesn’t provide an exact chain of causation, but perhaps he believes that the elites benefit from increases in production and therefore always push society in that direction, in order to realise their short-term interests. The question then arises: if everyone else has a false consciousness why then doesn’t Marx also? Again speculating, perhaps Marx would say when a system is on its last legs, the flaws and contradiction become too large for the elite ideology to remain cover up. Alternatively, perhaps it is only the dominant ideas in society that are determined by the structure of society and other ideas can exist, just without being allowed any real influence. I still feel Marx overstates the power of false consciousness, but at least I now have an answer to this question that’s somewhat reasonable.
It is not obvious to me from reading the text whether you are aware of the distinction between “private property” and “personal property” in Marxism. So, just to make sure: “private property” refers to the means of production (e.g. a factory), and “personal property” refers to things that are not means of production (e.g. a house where you live, clothes, food, toys).
The ownership of “private property” should be collectivized (according to Marx/ists), because… simply said, you can use the means of production to generate profit, then use that profit to buy more means of production, yadda yadda, the rich get exponentially richer on average and the poor get poorer.
With “personal property” this effect does not happen; if you have one table and I have two tables, there is no way for me to use this advantage to generate further tables, until I become the table-lord of the planet.
(There seem to be problems with this distinction. For example, things can be used either productively or unproductively; I can use my computer to create software or browse social networks. Some things can be used productively in unexpected ways; even the extra table could be used in a workshop to produce stuff. I am not a Marxist, but I suppose the answer would probably be something like “you are allowed to browse the web on your personal computer, but if we catch you privately producing and selling software, you get shot”.)
So, is this the confusion of Marxist terms, or do you mean that today more than 10% of people own means of production? In which sense? (Not sure if Marx would also count indirect ownership, such as having your money in an index fund, which buys shares of companies, which own the means of production.)
Did Marx actually argue for abolishing “personal proprety” (according to his definition, i.e. ownership of houses or food)?
For many people nowadays, their own brain is their means of production, often assisted by computers and their software, but those are cheap compared what what can be earned by using them. Marx did not know of such things, of course, but how do modern Marxists view this type of private ownership of means of production? For that matter, how did Marx view a village cobbler who owned his workshop and all his tools? Hated exploiter of his neighbours? How narrow was his motte here?
I once talked about this with a guy who identified as a Marxist, though I can’t say how much his opinions are representative for the rest of his tribe. Anyway… he told me that in the trichotomy of Capital / Land / Labor, human talent is economically most similar to the Land category. This is counter-intuitive if you take the three labels literally, but if you consider their supposed properties… well, it’s been a few decades since I studied economics, but roughly:
The defining property of Capital is fungibility. You can use money to buy a tech company, or an airplane factory, or a farm with cows. You can use it to start a company in USA, or in India. There is nothing that locks money to a specific industry or a specific place. Therefore, in a hypothetical perfectly free global market, the risk-adjusted profit rates would become the same globally. (Because if investing the money in cows gives you 5% per annum, but investing money in airplanes gives you 10%, people will start selling cow farms and buying airplane factories. This will reduce the number of cow farms, thus increasing their profit, and increase the competition in the airplane market, thus reducing their profit, until the numbers become equal.) If anything is fungible in the same way, you can classify it as Capital.
The archetypal example of Labor is a low-qualified worker, replaceable at any moment by a random member of the population. Which also means that in a free market, all workers would get the same wage; otherwise the employers would simply fire the more expensive ones and replace them with the cheaper ones. However, unlike money, workers are typically not free to move across borders, so you get different wages in different countries. (You can’t build a new factory in the middle of USA, and move ten thousand Indian workers there to work for you. You could do it the other way round: move the money, and build the factory in India instead. But if there are reasons to keep the factory in USA, you are stuck with American workers.) But within country it means that as long as a fraction of population is literally starving, you can hire them for the smallest amount of money they can survive with, which sets the equilibrium wage on that level. Because those starving ones won’t say no, and anyone who wants to be paid more will be replaced by those who accept the lower wage. Hypothetically, if you had more available job positions than workers, the wages would go up… but according to Malthus, this lucky generation of workers would simply have many kids, which would fix this exception in the next generation. -- Unless the number of job positions for low-qualified workers can keep growing faster than the population. But even in that case, the capitalists would probably successfully lobby the government to fix the problem by letting many immigrants in. Somewhere on the planet, there are enough starving people. Also, if the working people are paid just as much as they need to survive, they can hardly save money, so they can’t get out of this trap.
Now the category of Land contains everything that is scarce, so it usually goes to the highest bidder. But no matter how much rent you get for the land, you cannot use the rent to generate more of it. So, in long term the land will get even more expensive, and a lot of increased productivity will be captured by the land owners.
From this perspective, being born with a IQ 200 brain is like having inherited a gold mine, which would belong to the Land category. Some people need your for their business, and they can’t replace you with a random guy on the street. The number of potential jobs for IQ 200 people exceeds the number of IQ 200 people, so the employers must bid for your brain. But it is different from the land in the sense that it’s you who has to work using your brain; you can’t simply rent your brain to a factory and let some cheap worker operate it. Perhaps this would be equivalent to a magical gold mine, where only the owner can enter, so if he wants to profit from owning the gold mine, he has to also do all the work. Nonetheless, he gets extra profit from the fact that he owns the gold mine. So it’s like he offers the employer a package consisting of his time + his brain. And his salary could be interpreted as consisting of two parts: the wage, for the time he spends using his brain (which is numerically equivalent to how much money a worker would get for working the same amount of time); and the rent for the brain, that is the extra money compared to the worker. (For example, suppose that workers in your country are paid $500 monthly, and software developers are paid $2000 monthly. That would mean that for an individual software developer, the $500 is the wage for his work, and $1500 is the rent for using his brain.) That means that extraordinarily smart employees are (smaller) part working class, and (greater) part rentier class. They should be reminded that if, one day, enough people become equally smart (whether through eugenics, genetic engineering, selective immigration, etc.), their income will also drop to the smallest amount of money they can survive with.
As I said, no idea whether this is an orthodox or a heretical opinion within Marxism.
IANAM[1], but intuitively it seems to me that an exception ought to be made (given the basic idea of Marxist theory) for individuals who own means of production the use of which, however, does not involve any labor but their own.
So in the case of the village cobbler, sure, he owns the means of production, but he’s the only one mixing his labor with the use of those tools. Clearly, he can’t be exploiting anyone. Should the cobbler take on an assistant (continuing my intuitive take on the theory), said assistant would presumably have to now receive some suitable share in the ownership of the workshop/tools/etc., and in the profits from the business (rather than merely being paid a wage), as any other arrangement would constitute alienation from the fruits of his (the assistant’s) labor.
On this interpretation, there does not here seem to be any contradiction or inconsistency in the theory. (I make no comment, of course, on the theory’s overall plausibility, which is a different matter entirely.)
I Am Not A Marxist.
https://www.marxists.org/archive/marx/works/1847/communist-league/1850-ad1.htm
Thanks for clarifying this terminology, I wasn’t aware of this distinction when I wrote this post
Before I even got to your comment, I was thinking “You can pry my laptop out of my cold dead hands Marx!”
Thank you for this clarification on personal vs private property.
Book Review: So Good They Can’t Ignore You by Cal Newport:
This book makes an interesting contrast to The 4 Hour Workweek. Tim Ferris seems to believe that the purpose of work should be to make as much money as possible in the least amount of time and that meaning can then be pursued during your newly available free time. Tim gives you some productivity tips in the hope that it will make you valuable enough to negotiate flexibility in terms of how, when and where you complete your work, plus some dirty tricks as well.
Cal Newport’s book is similar in that it focuses on becoming valuable enough to negotiate a job that you’ll love and downplays the importance of pursuing your passions in your career. However, while Tim extolls the virtues of being a digital nomad, Cal Newport emphasises self-determination theory and autonomy, competence and relatedness. That is, the freedom to decide how you pursue your work, the satisfaction of doing a good job and the pleasure of working with people who you feel connected to. He argues that these traits are rare and valuable and so that if you want such a job you’ll need skills that rare and valuable to offer in return.
That’s the core of his argument against pre-existing passion; passions tend to cluster into a few fields such as music, arts or sports and only a very few people can ever make these the basis of their careers. Even for those who are interested in less insanely competitive pursuits such as becoming a yoga instructor or organic farmer, he cautions against pursuing the dream of just quitting your job one day. That would involve throwing away all of the career capital that you’ve accumulated and hence your negotiating power. Further, it can easily lead to restlessness, that is, jumping from career to career all the while searching for the “one” that meets an impossibly high bar.
Here are some examples of the kind of path he endorses:
Someone becoming an organic farmer after ten years of growing and selling food on the side, starting in high school. Lest this been seen as a confirmation of the passion hypothesis, this was initially just to make some money
A software tester making her way up to the head of testing to the point where she could demand that she reduce her hours to thirty per week and study philosophy
A marketer who gained such a strong reputation that he was able to form his own sub-agency within the bigger agency and then eventually form his own completely independent operation
Cal makes a very strong argument. When comparing pursuing a passion to more prosaic career paths, we often underestimate how fulfilling the later might eventually become if we work hard and use our accumulated career capital to negotiate the things that we truly want. This viewpoint resonates with me as I left software to study philosophy and psychology, without fully exploring options related to software. I now have a job that I really enjoy as it offers me a lot of freedom and flexibility.
One of the more compelling examples is Cal’s analysis of Steve Jobs. We tend to think of Job’s success as a prototypical case of following your passion, but his life shows otherwise. Jobs’ entry into technology (working for Atari) was based upon the promise of a quick buck. He’d been traversing around India and needed a real job. Jobs was then involved in a timesharing company, but he left for a commune without telling the others and was replaced by the time he made it back. So merely a year before he started Apple, he was hardly passionate about technology or entrepreneurship. This seems to have only occurred as he became more successful.
This is prototypical of Cal’s theory: instead of leveraging passion to become So Good They Can’t Ignore You (TM), he believes that if you become So Good They Can’t Ignore You (TM) that passion will follow. In evidence, Cal notes that people often passionate about many different things at different times, including things they definitely weren’t passionate about before. He suggests this is indicative of our ability to develop passions under the right circumstances.
Personally, I feel that the best approach will vary hugely depending on individual circumstance, but I suspect Cal is sadly right for most people. Nonetheless, Cal provides lists three exceptions. A job or career path is not suitable for his strategy if there aren’t opportunities to distinguish yourself, it is pointless or harmful to society or if it requires you to work with people you hate.
Towards the end of the book, Cal focuses on strategies for becoming good at what you do. While this section wasn’t bad, I didn’t find it particularly compelling either. I wish I’d just read the start of the book which covers his case against focusing on pre-existing passion, as that was by far the most insightful and original part of the book for me. Perhaps the most interesting aspect was how he found spending 14 hours of focused attention deconstructing a key paper in his field to have been a valuable use of time. I was surprised to hear that it paid off in terms of research opportunities, but I suppose it isn’t so implausible that such projects could pay off if you picked an especially important paper.
Further notes:
- If you are going to only read this or Four Hour Workweek, I’d suggest this one to most people. I feel that this one is less likely to be harmful and is applicable to a broader range of people, many who won’t immediately have the career capital to follow Tim’s advice. On the other hand, Tim’s book might be more useful if, unlike me, you don’t need to be convinced of Cal’s thesis.
- Cal points out that if you become valuable enough to negotiate more freedom, then you also become valuable enough that people will want to stop you. The challenge is figuring out whether you have sufficient career capital to overcome this resistance. Cal suggest not pursuing control without evidence people are willing to pay you either in money or with something else valuable; I find his position reductive and insufficiently justified.
- Cal believes that it is important to have a mission for your career, but that it is hard to pick a mission without already being deep inside a field. He notes that discoveries are often made independently and theorises that this is because often a discovery isn’t likely or even possible until certain prerequisites are in place, such as ideas, technologies or social needs. It’s only when you are at the frontier that you have sufficient knowledge to see and understand the next logical developments
FWIW I think this and maybe some of the other book review shortforms you’ve done would make fine top level posts.
Thanks, I’ll think about it. I invested more effort in this one, but for some of the others I was optimising for speed
+1 for book-distillation, probably the most underappreciated and important type of post.
As I said before, I’ll be posting book reviews. Please let me know if you have any questions and I’ll answer them to the best of my ability.
Book Review: The AI does not hate you by Tom Chivers
The title of this book comes from a quote by Elizier Yudkowsky which reads in full: “The AI does not hate you, nor does it love you, but you are made of atoms which it can use of something else”. This book covers not only potential risks from AI, but the rationalist community from which this evolved and also touches on the effective altruism movement.
This book fills something of a gap in the book market; when people are first learning about existential risks from AI I usually recommend the two-part Wait by Why post (https://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html) and then I’m not really sure what to recommend next. The sequences are ridiculously long and Bostrom’s Superintelligence is a challenging read for those not steeped in philosophy and computer science. In contrast, this book is much more accessible and provides the right level of detail for a first introduction, rather than someone who has already decided to try entering the field.
I mostly listened to this book to see if I could recommend it. Most of the material was familiar, but I was also pleasantly surprised a few times to hear a new take (at least to me). It was engaging and well-written throughout. Regarding what’s covered: there’s an excellent introduction to the alignment problem; the discussion of Less Wrong mostly focuses on cognitive biases, but also covers a few other key concepts like the Map and Territory and Bayesianism; the Center for Applied Rationality is mostly reduced to just double crux; Slatestarcodex is often quoted, but not a focus; and Effective Altruism isn’t the focus, but there’s a good general introduction. I also thought he dealt well with someone of the common criticisms of the community.
Even though there are notable omissions, these are understandable given the need to keep the book to a reasonable length. And it could have been possible to more fully capture the flavour of the community, but given how hard it is to describe the essence of a community with such broad interests, I think he did an admirable job. All in all, this is an excellent introduction to the topic if you’ve been hearing about AI Safety or Less Wrong and want to dive in more
Someone should see how good Deep Research is at generating reports on AI Safety content.
It did OK at control.
My current working take is that it is at the level of a median-but-dedicated undergraduate of a top university who is interested and enthusiastic about AI safety. But Deep Research can do in 10 minutes what would take that undergraduate about 20 hours.
Happy to try a prompt for you and see what you think.
How about “Please summarise Eliezer Yudkowsky’s views on decision theory and its relevance to the alignment problem”.
People have said that to get a good prompt it’s better to have a discussion with a model like o3-mini, o1, or Claude first, and clarify various details about what you are imagining, then give the whole conversation as a prompt to OA Deep Research.
Here you go: https://chatgpt.com/share/67a34222-e7d8-800d-9a86-357defc15a1d
Thanks, seems pretty good on a quick skim, I’m a bit less certain on the corrigibility section, also more issues might become apparent if I read through it more slowly.
There is a world that needs to be saved. Saving the world is a team sport. All we can do is to contribute our part of the puzzle, whatever that may be and no matter how small, and trust in our companions to handle the rest. There is honor in that, no matter how things turn out in the end.
I have no interest in honor if it’s celebrated on a field of the dead. Virtue ethics is fine, as long as it’s not an excuse to not figure out what needs doing and how it’s going to get done.
Doing ones own part and trusting that the other parts are done by anonymous unknown others is a very silly coordination strategy. We need plans that amount to success, not just everyone doing whatever sounds nice to them.
Edit: I very much agree that saving the world is a team sport. Perhaps it’s relevant that successful teams always do some planning and coordinating.
It’s at times like these that I absolutely love the distinction between “karma” and “agreement” around here. +1 for the former, as per the overall sentiment. −1 for the latter, as per the sheer nonsensical-ity of the scale of the matter.
The “world” doesn’t need “saving”. Never did. Never will. If for no other reason than there is no “one” world, to begin with. What you think about when mentioning the “world” itself will drastically different from what I have in mind, from what Eliezer has in mind, from what anyone else around here has in mind.
Our brains can only ever hold such a tiny amount of information in our short-term storage, that to even hope it ever represents any significant portion of the “world” itself is laughable. Even your long-term storage / episodic + semantic memory only ever came in contact with such a tiny portion of the “world”.
You can’t “save” what you barely “know” to begin with.
Yet there’s a deeper rabbit hole still.
When you say “save the world” you likely mean either “saving our local ecosystem” (as in: all the biological forms of self-organizing matter, as you know it), “saving our species” (Homo Sapiens, first and foremost), or “saving your world” (as in: the part of reality you have personally grown up in, conditioned yourself to, assimilated with, and currently project onto the rest of real world as the likely only world, to begin with—a.k.a. Typical Mind Fallacy).
The “world” doesn’t need “saving”, though. It came before you. It will persist after you. Probably. Physics. Anyhow.
What may need some “help” is society. Not “the” abstract, ephemeral, all-encompassing, thus absolutely void of any and all meaning to begin, “society”. But the society, made out of “people”. As in: “individuals”. Living in their own “world”. Only ever coming in contact with <1% of information you’ve likely come into contact with, so far.
They don’t need your attempts at “saving” them, either. What they need is specific solutions to specific problems within specific domains of specific kind of relationship to the domains, closely/farther adjacent to it.
You will never solve any of them. Unless you stop throwing around phrases like “saving the world”, in the first place. The world came into being via a specific kind of process. It is now maintained by specific kind of feedback loops, recurrent cycles, incentive structures, reward/punishment mechanisms driving mostly unconscious decision making processes, and individual habits of each and every individual operating within their sphere of influence.
You want to help? Figure out what kind of incremental changes you can begin to introduce in any of them, in order to begin extinguishing the sort of problems you’ve now elevated to the rank of “saving-worthy” in your own head. Note that, in all likelihood, by extinguishing one you will merrily introduce a whole bunch of others—something you won’t get to discover until much later one. Yet that is, realistically, what you can actually go on to accomplish.
“Saving the world”? Please. Do you even know what’s exactly going on in the opposite side of the globe today?
Great sentiment. Horrible phrasing. Nothing personal. “Helping people” is a team’s sport.
Side note: are these quick takes turning into a new Twitter/X feed? Gosh, please don’t. Please!
I read this paragraph as saying ~the same thing as the original post in a different tone
We know well enough what people mean by “world”—the stuff they care about. The fact that physics keeps on happening if humanity is snuffed out is no comfort at all to me or to most humans.
Arguing epistemology is not going to prevent a nuclear apocalypse or us being wiped out by the new intelligent species we are inventing. The fact that you don’t know what’s happening on the other side of the world has no bearing on existential dangers facing those people. That’s what I mean by saving the world, and I expect what the author meant. This is a different thing than just helping people by your own values and estimates.
I very much agree that pithy mysterious statements for others to argue over is not a good use of the quick takes here.
Book Review: The 4 Hour Workweek
This is the kind of book that you either love or hate. I found value in it, but I can definitely understand the perspective of the haters. First off: the title. It’s probably one of the most blatant cases of over-promising that I’ve ever seen. Secondly, he’s kind of a jerk. A number of his tips involve lying and in school he had a strategy of interrogating his lecturers in detail when they gave him a bad mark so that they’d think very carefully assigning him a bad grade. And of course, while drop-shipping might have been an underexploited strategy at the time when he wrote the book, it’s now something of a saturated market.
On the plus side, Tim is very good at giving you specific advice. To give you the flavour, he advises the following policies for running an online store: avoid international orders, no expedited or overnight shipping, two options only—standard and premium; no cheque or Western union, no phone number if possible, minimum wholesale order with tax id and faxed in order form, ect. Tim is extremely process oriented and it’s clear that he has deep expertise here and is able to share it unusually well. I found it fascinating to see how he thought even though I don’t have any intention of going into this space.
This book covers a few different things:
- Firstly, he explains why you should aim to have control over when and where you work. Much of this is about cost, but it’s also about the ability to go on adventures, develop new skills and meet people you wouldn’t normally meet. He makes a good case and hopefully I can confirm whether it is as amazing as he says soon enough
- Tim’s philosophy of work is that you should try to find a way of living the life you want to live now. He’s not into long-term plans that, in his words, require you to sacrifice the best years of your life in order to obtain freedom later. He makes a good point for those with enough career capital to make it work, but it’s bad advice for many other who decide to just jump on the travel blogging or drop-shipping train without realistic expectations of how hard it is to make it in those industries
- Tim’s productivity advice focuses on ruthlessly (and I mean ruthlessly) minimising what he does to the most critical by applying the 80⁄20 rule. For example, he says that you should have a todo list and a not todo list. He says that your todo list shouldn’t have more than two items and you should ask yourself, “If this was the only thing I accomplished today, would I be satisfied?”.
- A large part of minimising your work involves delegating these tasks to other people and Tim goes into detail about how to do this. He is a big fan of virtual assistants, to the point ofc even delegating his email.
- Lots of this book is business advice. Unlike most businesses, Tim isn’t optimising for making the most money, but for making enough money to support his lifestyle while taking up the least amount of his time. I suspect that this would be great advice for many people who already own a business
- Tim also talks about how to figure out what to do with your spare time if you manage to obtain freedom. He advises chasing excitement instead of happiness. He finds happiness too vague, while excitement will motivation you to grow and develop. He suggests that it is fine to go wild at first, jumping from place to place, chasing whatever experiences you want, but at some point it’ll lose it’s appeal and you’ll want to find something more meaningful.
I’d recommend this book, but only to people with a healthy sense of skepticism. There’s lots of good advice in this book, but think very carefully before you become drop-shipper #2001. And remember that you don’t have to become a jerk just because he tells you to! That said, it’s not all about drop-shipping. A much wider variety of people probably could find a way to work remotely or reduce their hours than we normally think, although it might require some hard work to get there. In so far as the goal is to optimise for your own happiness, I generally agree with his idea of the good life.
Further highlights:
- Doing the unrealistic is easier than doing the realistic as there is less competition
- Leverage strengths, instead of fixing weakness. Multiplication of results beats incremental improvement
- Define your nightmare. Would it really be permanent? How could you get it back on track? What are the benefits of the more probable outcome?
- We encourage children to dream and adults to be realistic
Book Review: Civilization and its discontents
Freud is the most famous psychologist of all time and although many of his theories are now discredited or seem wildly implausible, I thought it’d be interesting to listen to him to try and understand why it sounded plausible in the first place.
At times Freud is insightful and engaging; at other times, he falls into psychoanalytic lingo in such a way that I couldn’t follow what he was trying to say. I suppose I can see why people might have assumed that the fault was with their failure to understand.
It’s a short read, so if you’re curious, there isn’t that much cost to going ahead and reading it, but this is one of those rare cases where you can really understand the core of what he was getting at from the summary on Wikipedia (https://en.m.wikipedia.org/wiki/Civilization_and_Its_Discontents)
Since Wikipedia has a summary, I’ll just add a few small remarks. This book focuses on a key paradox; our utter dependence on it for anything more than the most basic survival; but how it requires us to repress our own wants and desires so as to fit in with an ordered society. I find this to be an interesting answer to the question of why there is so much misery despite our material prosperity.
It’s interesting to re-examine this in light of the modern context. Society is much more liberal than it was in Freud’s time, but in recent years people have become more scared of speaking their minds. Repression still exists, it is just off a different form. If Freud is to be believed, we should expect this repression to result in all kinds of be psychological effects, many of which won’t appear linked on the surface.
Further thoughts:
- I liked his chapter on methods humans deal suffering and their limitations as it contained what seemed to be found evaluations. He points out that that the path of a yogi is at best the happiness of quietness, that love cannot be guaranteed to last, that sublimation through art is available only to a few and is even then only of limited strength, ect. He just didn’t think there was any good solution to this problem.
- Freud was sceptical of theories like communism because he didn’t believe that human nature could really change. He argued that aggression existed in the nursery and before the existence of property. He didn’t doubt that we could suppress urges, but he seemed to believe that it was much more costly than other people realised, and even then that it would likely come out in some other form
- Freud proposed his theory of the Narcissism of Small Differences, that the people who we hate most not those with values completely foreign to our own, but this who we are in close proximity to. He describes this as a form of narcissism since these conflicts can flare up over the most minor of differences.
- Freud suggested that those who struggled the most with temptation were saints, since their self-denial led to the constant frustration of their desires
- Freud noted how absurd, ” Love your neighbour as yourself” would sound to someone hearing it for the first time. He imagines that we’d skepticalky ask questions, “Why should I care about them just as much as my family?” and “Why should I love them if they are bad people or don’t love me?”. He actually goes further and argues that “a love that does not discriminate does injustice to its object”
If this were a story, there’d be some kind of academy taking in humanity’s top talent and skilling them up in alignment.
Most of the summer fellowships seem focused on finding talent that is immediately useful. And I can see how this is tempting given the vast numbers of experienced and talented folks seeking to enter the space. I’d even go so far as to suggest that the majority of our efforts should probably be focused on finding people who will be useful fairly quickly.
Nonetheless, it does seem as though there should be at least one program that aims to find the best talent (even if they aren’t immediately useful) and which provides them with the freedom to explore and the intellectual environment in which to do so.
I wish I could articulate my intuition behind this clearer, but the best I can say for now is that my intuition is that continuing to scale existing fellowships would likely provide decreasing marginal returns and such an academy wouldn’t be subject to this because it would be providing a different kind of talent.
Upskilling bright young people “to do alignment” is tricky to do in a systematic way, since bright young people want / need to do whatever they’re curious about.
Maybe you’re thinking younger than I was thinking.
I expect you’d mostly want folks who’d already completed an undergraduate degree, with sufficiently talented folks being pulled in earlier.
I think SPARC and its decedents are something like this.
How long does SPARC go for?
Thoughts on the introduction of Goodhart’s. Currently, I’m more motivated by trying to make the leaderboard, so maybe that suggests that merely introducing a leaderboard, without actually paying people, would have had much the same effect. Then again, that might just be because I’m not that far off. And if there hadn’t been the payment, maybe I wouldn’t have ended up in the position where I’m not that far off.
I guess I feel incentivised to post a lot more than I would otherwise, but especially in the comments rather than the posts since if you post a lot of posts that likely suppresses the number of people reading your other posts. This probably isn’t a worthwhile tradeoff given that one post that does really well can easily outweight 4 or 5 posts that only do okay or ten posts that are meh.
Another thing: downvotes feel a lot more personal when it means that you miss out on landing on the leaderboard. This leads me to think that having a leaderboard for the long term would likely be negative and create division.
I really like the short-form feature because after I have articulated a thought my head feels much clearer. I suppose that I could have tried just writing it down in a journal or something; but for some reason I don’t feel quite the same effect unless I post it publicly.
This is the first classic that I’m reviewing. One of the challenges with figuring out which classics to read is that there are always people speaking very highly of it and in a vague enough manner that it makes it hard for you to decide whether to read it. Hopefully I can avoid this trap.
Book Review: Animal Farm
You probably already know the story. In a thinly veiled critique of the Russian Revolution, the animals in a farm decide to revolt against the farmer and run the the farm themselves. At start, the seven principles of Animalism are idealistically declared, but as time goes on, things increasingly seem to head downhill…
Why is this a classic?: This book was released at a time when the intellectual class was firmly sympathetic to the Soviets, ensuring controversy and then immortality when history proved it right.
Why you might want to read this: Short (only 112 pages or 3:11 on Audible), the story always moves along at a brisk pace, the writing is engaging and a few very emotionally impactful moments. The broader message of being wary of the promises made by idealistic movements still holds (especially “all animals are equal, but some animals are more equal than others”). This book does a good job illustrating many of the social dynamics that occur in totalitarianism, from the rewriting of history, to the false confessions, to the the cult of the individual.
Why you might not want to read this: The concrete anti-Soviet message is of little relevance now given that what happened is common knowledge. You can probably already guess how the story goes: the movement has a promising start, but with small red flags that become bigger over time. The animals are constantly unrealistically naive, maybe this strikes you as clumsy, or maybe you see that just as how satire is?
Wow, I’ve really been flying through books recently. Just thought I should mention that I’m looking for recommendations for audio books; bonus points for books that are short. Anyway....
Book Review: Zero to One
Peter Thiel is the most famous contrarian in Silicon Valley. I really enjoyed hearing someone argue against the common wisdom of the valley. Most people think in terms of beating the competition; Thiel thinks in terms of establishing a monopoly so that there is no competition. Agile methodology and the lean startup are all the rage, but Thiel argues that this only leads to incremental improvements and that truly changing the world requires you to commit to a vision. Most companies was to disrupt your competitors, but for Thiel this means that you’ve fallen into competition, instead of forging your own unique path. Most venture funds aim to diversify, but Thiel is more selective, only investing in companies that have billion dollar potential. Many startups spurn marketing, but Thiel argues that this is dishonest and that PR is also a form of marketing, even if that isn’t anyone’s job title. Everyone is betting on AI replacing humans, while Thiel is more optimistic about human/ai teams.
Some elaboration is in order, I’ll just mention that might prefer to read the review on Slatestarcodex instead of mine (https://slatestarcodex.com/20…/…/31/book-review-zero-to-one/)
• Aren’t monopolies bad? Thiel argues that monopoly power is what allows a corporation to survive the brutal world of competing to survive. This means that it can pay employees well, have social values other than making profit and invest in the future. Read Scott’s review for a discussion on how to build a company that truly is one of a kind.
• Thiel argues that monopolies try to hide that fact by presenting themselves as just one player in a larger industry (ie. Google presents itself as a tech company, instead of an internet advertising company, even that this aspect brings in essentially all the money), while those firms competing try to present themselves as having cornered an overly specific market (ie. isn’t clear that British food in Palo Alto is its own market as opposed to competing against all the other food chains)
• In addition to splitting people into optimists and pessimists, Thiel splits people into define and indefinite. You might think that a “definite optimist” would be someone who is an optimist and 100% certain the future will go well, but what he actually means is that they are an optimist and they have an idea of what the future will look like or could like like. In contrast, an indefinite optimist would be an optimist who has no idea how exactly the world might improve or could change.
• Thiel argues that startup returns are distributed according to a power law such that half of the return from a portfolio might be just one company. He applies it to life too; arguing that it’s better to set yourself up so that they’ll be one career that you’ll be amazing at, rather than studying generally so that there’ll be a dozen that you’d be only okay at.
• While many in the valley believe in just building a product and figuring out how to sell it later, Thiel argues that you don’t have a product if you don’t have a way of reaching customers
I’m not involved in startups, so I can’t vouch for how good his advice is, but given that caveat, I’d strongly recommend it for anyone thinking of going into that space since it’s always good to have your views challenged. But, I’d also recommend it as a general read, I think that there’s a lot that’d be interesting for a general audience, especially the argument against acquired a broad undifferentiated experience. I do think that in order to get the most out of this, you’d need to already be familiar with startup culture; ie. minimum viable products, the lean startup, ect. as he kind of assumes that you know this stuff.
So should you read the book or just Scott’s review? The main aspect Scott misses is the discussion of power law distributions. This discussion is basically the Pareto Principle on steroids; when a single billion-dollar company could make your more profit than the rest of your investments combined all that matters is whether a company could be a unicorn or not (the essay Prospecting for Gold makes a similar point for EA https://www.effectivealtruism.org/…/prospecting-for-gold-o…/) But besides from that, Scott’s review covers most of the main ideas well. So maybe you could skip the book, but if you’re like me you might find that you need to read the book in order to actually remember these ideas. Besides, it’s concise and well-written.
I think that there’s good reasons why the discussion on Less Wrong has turned increasingly towards AI Alignment, but I am also somewhat disappointed that there’s no longer a space focusing on rationality per se.
Just as the Alignment forum exists as a separate space that automatically cross-posts to LW, I’m starting to wonder if we need a rationality forum that exists as a separate space that cross-posts to LW, as if I were just interested in improving my rationality I don’t know if I’d come to Less Wrong.
(To clarify, unlike the Alignment Forum, I’d expect such a forum to be open-invite b/c the challenge would be gaining any content at all).
Alternatively, I think there is a way to hide the AI content on LW, but perhaps there should exist a very convenient and visible user interface for that. I would propose an extreme solution, like a banner on the top of the page containing a checkbox that hides all AI content. So that anyone, registered or not, could turn the AI content off in one click.
The Alignment forum works because there are a bunch of people who professionally pursue research over AI Alignment. There’s no similar group of people for whom that’s true for rationality.
I don’t know if you need professionals, just a bunch of people who are interested in discussing the topic. It wouldn’t need to use the Alignment Forum’s invite-only system.
Instead, it would just be a way to allow LW to cater to both audiences at the same time.
IIRC, you can get post on Alignment Forum only if you are invited or moderators crossposted it? The problem is that Alignment Forum is deliberately for some sort of professionals, but everyone wants to write about alignment. Maybe it would be better if we had “Alignment Forum for starters”.
One thing I’m finding quite surprising about shortform is how long some of these posts are. It seems that many people are using this feature to indicate that they’ve just written up these ideas quickly in the hope that the feedback is less harsh. This seems valuable; the feedback here can be incredibly harsh at times and I don’t doubt that this has discouraged many people from posting.
I pushed a bit for the name ‘scratchpad’ so that this use case was a bit clearer (or at least not subtly implied as “wrong”). Shortform had enough momentum as a name that it was a bit hard to change tho. (Meanwhile, I settled for ’shortform means either the writing is short, or it took a (relatively) short amount of time to write)
“I’m sorry, I didn’t have the time to write you a short email, so I wrote you a long one instead.”
Can confirm. I don’t post on normal lesswrong because the discourse is brutal.
Collapsable boxes are amazing. You should consider using them in your posts.
They are a particularly nice way of providing a skippable aside. For example, filling in background information, answering an FAQ or including evidence to support an assertion.
Compared to footnotes, collapsable boxes are more prominent and are better suited to contain paragraphs or formatted text.
Less Wrong might want to consider looking for VC funding for their forum software in order to deal with the funding crunch. It’s great software. It wouldn’t surprise me if there were businesses who would pay for it and it could allow an increase in the rate of development. There’s several ways this could go wrong, but it at least seems worth considering.
I’m sure they thought about it.
I think this is dramatically tougher than a lot of people think. I wrote more about it here.
https://www.facebook.com/ozzie.gooen/posts/pfbid0377Ga4W8eK89aPXDkEndGtKTgfR34QXxxNCtwvdPsMifSZBY8abLmhfybtMUkLd8Tl
Why the focus on wise AI advisors? (Metapost[1] with FAQ 📚🙋🏻♂️, Informal[2] Edition 🕺, Working Draft 🛠️)
About this Post - 🔗:
🕺wiseaiadvisors.com,🕴️Formal Edition (coming soon)This post is still in draft, so any feedback would be greatly appreciated 🙏. It’ll be posted as a full, proper Less Wrong/EA Forum/Alignment forum post, as opposed to just a short-form, when it’s ready 🌱🌿🌳.
✉️ PM via LW profile📲 QR Code
This post is a collaboration between Chris Leong (primary author) and Christopher Clay (editor), written in the voice of Chris Leong.
We have worked very hard on this[3] and we hope you find it to be of some use. Despite this work, it will most likely contain enough flaws that we will be at least somewhat embarrassed at having written at least parts of it and yet at some point a post has to go out into world to fend for itself.
There’s no guarantee that this post will have the kind of impact that we deeply wish for, or even any achieve anything at all… And yet—regardless[4] - it is an honour to serve[5] especially at this dark hour when AGI looms on the horizon and doubt has begun to creep into the hearts of men[6] 🫡. The sheer scale of the problem feels overwhelming at times, and yet… we do what we can[7].
This post[8] doubles[9] as both as serious attempt to produce an original “alignment proposal”[10] and a passion project[11]. The AI safety community has spend nearly two and a half decades trying to convince people to take AI risks seriously. Whilst there have been significant successes, these have also fallen short[12]. Undoubtedly this is incredibly naive[13], but perhaps passion and authenticity[14] can succeed where arguments and logic have failed ✨🌠.
Acknowledgements:
The initial draft of this post was produced during the 10th edition of AI Safety Camp. Thanks to my team: Matthew Hampton, Richard Kroon, and Chris Cooper, for their feedback. Unlike most other AI Safety Camp projects, which focus on a single, unitary project, we were more of a research collective with each person pursuing their own individual projects.
I am continuing development during the 2025 Summer Edition of the Cambridge ERA: AI Fellowship with an eye to eventually produce a more formal output. Thanks to my research manager, Peter Gebauer and my mentor, Prof. David Manley.
I also greatly appreciate feedback provided by many others[15], including: Jonathan Kummerfeld.
✒️ Selected Quotes:
🕵️Did you and the other scientists not stop to consider the implications of what you were creating? — Roger Robb
When you see something that is technically sweet, you go ahead and do it and you argue about what to do about it only after you have had your technical success. That is the way it was with the atomic bomb[16]— Oppenheimer
We stand at a crucial moment in the history of our species. Fueled by technological progress, our power has grown so great that for the first time in humanity’s long history, we have the capacity to destroy ourselves—severing our entire future and everything we could become.
Yet humanity’s wisdom has grown only falteringly, if at all, and lags dangerously behind. Humanity lacks the maturity, coordination and foresight necessary to avoid making mistakes from which we could never recover. As the gap between our power and our wisdom grows, our future is subject to an ever-increasing level of risk. This situation is unsustainable. So over the next few centuries[20], humanity will be tested: it will either act decisively to protect itself and its long-term potential, or, in all likelihood, this will be lost forever — Toby Ord, The Precipice[21]
We have created a Star Wars civilization, with Stone Age emotions, medieval institutions, and godlike technology — Edward O. Wilson, The Social Conquest of Earth[22]
Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb. Such is the mismatch between the power of our plaything and the immaturity of our conduct — Nick Bostrom, Founder of the Future of Humanity Institute, Superintelligence[23]
If we continue to accumulate only power and not wisdom, we will surely destroy ourselves — Carl Sagan, Pale Blue Dot[24]
Never has humanity had such power over itself, yet nothing ensures that it will be used wisely, particularly when we consider how it is currently being used…There is a tendency to believe that every increase in power means “an increase of ‘progress’ itself ”, an advance in “security, usefulness, welfare and vigour; …an assimilation of new values into the stream of culture”, as if reality, goodness and truth automatically flow from technological and economic power as such. — Pope Francis, Laudato si’[25]
The fundamental test is how wisely we will guide this transformation – how we minimize the risks and maximize the potential for good — António Guterres, Secretary-General of the United Nations[26]
Our future is a race between the growing power of our technology and the wisdom with which we use it. Let’s make sure that wisdom wins — Stephen Hawking, Brief Answers to the Big Questions[27]
🎁 Additional quotes (from Life Itself)🔮 Preview
📖 𝙱𝙰𝚂𝙸𝙲 𝚃𝙴𝚁𝙼𝙸𝙽𝙾𝙻𝙾𝙶𝚈 - Wise AI advisors? Wisdom? (I recommend skipping initially) 🙏
⬇️ I suggest initially skipping this section and focusing on the core argument for now[33] ⬇️
Wise AI Advisors?
Do you mean?:
• AIs trained to provide advice to humans ✅
• AIs trained to act wisely in the world ❌
• Humans trained to provide wise advise about AI ❌
Most work on Wise AI is focused on the question of how AI could learn to act wisely in the world[34], however, I’m more interested in the former as it allows humans to compensate for the weaknesses in the AIs[35].
Even though Wise AI Advisers doesn’t refer to humans, I am primarily interested in how Wise AI Advisors could be deployed as part of a cybernetic system.
Training humans to be wiser would help with this project:
• Wiser humans can train wiser AI
• When we combine AI and humans into a cybernetic system, wiser humans will both be better able to elicit capabilities from the AI and also better able to plug any gaps in the AIs wisdom.
What do you mean by wisdom?
For the purposes of the following argument, I’d encourage you to first consider this in relation to how you conceive of wisdom rather than worrying too much about how I conceive of wisdom. Two main reasons:
• I suspect this reduces the chance of losing someone partway through the argument because they conceive of wisdom slightly differently than I do[36].
• I believe that there are many different types of wisdom that are useful for steering the world in a positive direction and many perspectives on wisdom that are worth investigating. I’m encouraging readers to first consider this argument in relation to their own understanding of wisdom in order to increase the diversity of approaches pursued.
Even though I’d encourage you to read the core argument first, if you really want to hear more about how I conceive of wisdom right now, you can scroll down to the Clarifications section to find out more about what I believe. 🔜🧘 or ⬇️⬇️⬇️😔⭐ My core argument… in a single sentence ☝️⭐
😮 Absolutely critical? That’s a strong claim! - 🛡️[37]
Yes, I’ve[38] intentionally chosen a high bar for what I’ll be arguing for in this post. I believe that there’s a strong case to be made that the gameboard looks much less promising without them[39][40].
Don’t worry, I’ll be addressing a long list of objections[41] ⚔️🌎. Let’s go 🪂!
☞ Three Key Advantages:
✳️🗺️ — This is complementary with almost any plan for making AGI go well.
🆓🍒 — The opportunity cost is minimal. Financially, much of this work could be pursued as a startup and I expect non-profit projects to appeal to new funders. Talentwise, I expect pursuing this direction to (counter-intuitively) increase the effective talent available for other directions by drawing more talent into the space[42].
🦋🌪️ — Even modest boosts in wisdom could be significant. Single decisions often shape the course of history.
(See the main post for more detail.)
Five additional benefits
🚨💪🦾 — 1: This approach scales with increases in capabilities.
2) Forget marginal improvements, wisdom tech could provide the missing piece for another strategy, by allowing us to pursue it in a wiser manner.
3) Wisdom technology is likely favourable from a differential technology perspective. Many actors are only reckless because they’re unwise.
4) Even a small coalition using these advisors to refine strategy, improve coordination and engage in non-manipulative persuasion could significantly shift the course of civilisation over time.
5) Suppose all else fails: training up a bunch of folks with both a deep understanding of the alignment problem and a strong understanding of wisdom seems incredibly useful.
Please note: This post is still a work in progress. Some sections have undergone more editing and refinement than others. Some bits may be inconsistent, such as if I’m in the middle of making a change. As this is just a draft, I may not end up endorsing all the claims made. Feedback is greatly appreciated 🙏.
⭐ My 𝙲𝙾𝚁𝙴 𝙰𝚁𝙶𝚄𝙼𝙴𝙽𝚃… in a single sentence ☝️⭐
Unaided human wisdom is vastly insufficient for the task before us..
of navigating an entire series of highly uncertain and deeply contested decisions
where a single mistake could prove ruinous
with a greatly compressed timeline
⭐ … or in less than three minutes[45]⏱️⭐
Two useful framings:
Going through them in reverse order:
🅅 𝚄 𝙻 𝙽 𝙴 𝚁 𝙰 𝙱 𝙸 𝙻 𝙸 𝚃 𝚈 – 🌊🚣:
The development of advanced AI technologies will have a massive impact on society given the essentially infinite ways to deploy such a general technology. There are lots of ways this could go well, and lots of ways w
e all diethis could go extremely poorly.In more detail...
i) At this stage I’m not claiming any particular timelines.
I believe it’s likely to be
absurdlyquite fast, but I don’t make this claim this until we get to Speed.I suspect that often when people doubt this claim, they’ve implicitly assumed that I was talking about the short or medium term, rather than the long term 🤔. After all, the claim that there are many ways that AI could plausible lead to dramatic benefits or harms over the next 50 or 100 years feels like an extremely robust claim. There are many things that a true artificial general intelligence could do. It’s mainly just a question of how long it takes to develop the technology.
ii) “We only have to be lucky once, you have to be lucky every time”—the IRA on the offense-defense balance. Unfortunately, if there’s one thing computers are good at, it’s persistence.
iii) “Mossad is much more clever and powerful than novices implicitly imagine a “superintelligence” will be; in the sense that, when novices ask themselves what a “superintelligence” will be able to do, they fall well short of the actual Mossad.”—Eliezer Yudkowsky
🅄 𝙽 𝙲 𝙴 𝚁 𝚃 𝙰 𝙸 𝙽 𝚃 𝚈 – 🌅💥:
We have massive disagreement on what we expect the development of AI, let alone the best strategy[49]. Making the wrong call could prove catastrophic.
In more detail...
i) A lot of this uncertainty just seems inherently really hard to resolve. Predicting the future is hard.
ii) However hard this is to resolve in theory, it’s worse in practise. Instead of an objective search for the truth, these discussions are distorted by all these different factors including money, social status and the need for meaning.
iii) More on the kicker: We’re seeing increasing polarisation, less trust in media and experts[51] and AI stands to make this worse. This is not where we want to be starting from and who knows how long this might take to resolve?
🅂 𝙿 𝙴 𝙴 𝙳 – 😱⏳:
AI Is developing incredibly rapidly… We have limited time to act and to figure out how to act.[52].
In more detail...
i) Even if timelines aren’t short, we might still be in trouble if the take-off speed is fast. Unfortunately, humanity is not very good at preparing for abstract, speculative-seeming threats ahead of time.
ii) Even if neither timelines nor take-off speeds are fast in an absolute sense, we might still expect disaster if they are fast in a relative sense. Governance—especially global governance—tends to proceed rather slowly. Even though it can happen much faster when there’s a crisis, sometimes problems need to be solved ahead of time and once you’re in them it’s too late. As an example, once an AI induced pandemic is spreading, you may have already lost.
iii) Even if neither timelines and take-off speed are fast in an absolute or relative sense, it’s just hard on a fundamental level to regulate or control agents that are capable of acting far faster than humans, especially if we develop agents that are adaptive.
Reflections—Why the SUV Triad is Fucking[56] Scary
The speed of development makes this problem much harder. Even if alignment were easy and governance didn’t require anything special, we could still fail because it’s been decided that we have to race as fast as possible.
Even if a threat can’t led to catastrophe, it can still distract us from those that can. It’s hard to avoid catastrophe when we don’t know where to focus our efforts ⚪🪙⚫.
Many of the threats constitute civilisational-level risk by themselves. We could successful navigate all the other threats, but simply drop the ball once and all that could be for naught.
Big if true: It may even present a reason (draft) to expect Disaster-By-Default ‼️
In more detail… - TODO:
i) In an adversarial environment, your vulnerability is determined by the domain where you’re most vulnerable. Your weakest link. Worse, this applies recursively. Your vulnerability within a domain is determined by the kind of attack you’re least able to withstand.
ii) Civilisation has limited co-ordination capacity. A committee can only cover so many issues in a given time. Global co-ordination is incredibly difficult to achieve—but it’s possible. However, one at a time is a lot easier than dozens. 😅⏳💣💣💣
iii) Talk about resiliance being hard and challenges with general agents.
<TODO: Feels like there’s some more implicit subclaims here I need to address>
Subclaim 1: As humanity gain access to more power, we need more wisdom in order to navigate it
This claim is almost a cliche, but is it true?
I think that it is, at least in the case of AI:
The more powerful a technology is, the greater the cost of accidentally “dropping a ball”
As a general technology, the more powerful AI becomes, the more balls there are to drop
As AI becomes more powerful, we venture further and further away from past experience (we could even call this the societal training distribution)
In more detail...
Objection: AI can help us reduce our chance of dropping a ball
Response: I agree, that’s why I’m proposing this research direction, but I don’t think this happens by default.
One possibility is that we create strongly aligned general defender agents (seems unlikely) that we just give mostly free reign (very risky).
Or we need AI that helps us make wise decisions ahead of time (where to allocate resources, where to concretrate oversight), which is much harder to train for than bio/cyber/manipulation capabilities.
Question: What about society’s natural process of accumulating wisdom?
Let’s take a step back. If we construe wisdom broadly, then we end up with two kinds of wisdom[3]:
Hindsight: Wisdom gained through hard, bitter experience
Foresight: Being able to figure out what needs to be done even without direct experience
The slower the rate of development, the more we can lean on hindsight, the faster it is, then more we need to lean on foresight.
Our timeline seems to be fast, so unfortunately it seems that we will need to rely heavily on foresight, instead of society’s natural process of accumulating metis (or practical wisdom) through trial and error.
Subclaim 2: By default, increases in capabilities don’t result in increases in wisdom
In more detail...
Objection: AI can help us reduce our chance of dropping a ball
Response: I agree, that’s why I’m proposing this research direction, but I don’t think this happens by default.
One possibility is that we create strongly aligned general defender agents (seems unlikely) that we just give mostly free reign (very risky).
Or we need AI that helps us make wise decisions ahead of time (where to allocate resources, where to concretrate oversight), which is much harder to train for than bio/cyber/manipulation capabilities.
Question: What about society’s natural process of accumulating wisdom?
Let’s take a step back. If we construe wisdom broadly, then we end up with two kinds of wisdom[3]:
Hindsight: Wisdom gained through hard, bitter experience
Foresight: Being able to figure out what needs to be done even without direct experience
The slower the rate of development, the more we can lean on hindsight, the faster it is, then more we need to lean on foresight.
Our timeline seems to be fast, so unfortunately it seems that we will need to rely heavily on foresight, instead of society’s natural process of accumulating metis (or practical wisdom) through trial and error.
I propose that the most straightforward way to address this is to train wise AI advisors. But what about the alternatives?:
Perhaps we need to buy time, that is to pause? As previously discussed, there are incredibly strong forces pushing against this. Even if a pause is necessarily, it seems unlikely that we could persuade enough actors to see this, agree on a robust mechanism and unpause at the appropriate time without some kind of major increase in wisdom.
Given the number of different threats and how fast they’re coming at us, whack-a-mole simply isn’t going to cut it[58]. We need general solutions.
Whilst we can and should be developing human wisdom as fast as possible, this process tends to be slow. I don’t believe that this will cut it by itself. In any case, I expect increases in human wisdom and increases in AI wisdom to be strongly complementary.
In more detail...
TODO: This diagram still needs to be updated. I don’t want the alternatives I address to look like just a random list. Providing a diagram that shows how we could try to operate on different points helps make this seem more legible.
TODO: Address more exotic responses.
In light of this:
📌 I believe that AI is much more likely to go well for humanity if we develop wise AI advisors
I am skeptical of the main alternatives
I am serious about making this happen...
If you are serious about this too, please:
✉️ message me[59].☞ Or for a More Formal Analysis (Using Three Different Frameworks) — TODO
Importance-Tractability-Neglectedness
This is a standard EA framework for considering cause areas. Wise AI is broad enough that I consider it reasonable to analyse it as a cause area.
Safety-Freedom-Value
This framework is designed to appeal more to startup/open-source folks. These folks are more likely to put significant weight on the social value a technology provides and on freedom beyond mere utility.
Searching for Solutions
This framework is designed for folk who think current techniques are unlikely to work.
☞ I’m sold! How can I get involved? 🥳🎁
As I said, if you’re serious, please
✉️ PM me. If you think you might be serious, but need to talk it through, please reach out as well.I’d be useful for me to know your background and how you think you could contribute. Maybe tell you a few facts about yourself, what interests you and drop a link to your LinkedIn profile?
If you scroll down, you’ll see that I’ve answered some more questions about getting involved, but I’ll include some useful links here as well:
•
List of potentially useful projects: for those who want to know what could be done concretely. Scroll down further for a list of project lists ⬇️.•
Resources for founding an AI safety startup or non-profit: since I believe there should be multiple organisations pursuing this agenda•
Resources for getting started in AI Safety more broadly: since some of these resources might be useful here as well🤔 Clarifications
Reminder: If you’re looking for a definition of wise AI advisors, it’s at the 🔝 of the page .
I’ve read through your argument and substituted my own understanding of wisdom. Now that
you’ve wasted my timeI’ve done this, perhaps you could clarify how you think about wisdom? ✅🙏Sure. I’ve written at the top of this post ⬆️ why I often try to dodge this question initially. But given that you’ve made it this far, I’ll share some thoughts. Just don’t over-update on my views 😂.
Given how rapidly AI is developing, I suspect that we’re unlikely to resolve the millennia-long philosophical debate about the true nature of wisdom before AGI is built. Therefore, I suggest that we instead sidestep this question by identifying specific capabilities related to wisdom that might useful for steering the world in a positive direction.
I’d suggest that examples might include: the ability to make wise strategic decisions, non-manipulation persuasion and the ability to find win-wins. I’ll try to write up a longer list in the future.
Some of these capabilities will be more or less useful for steering the world in positive directions. On the other hand, negative externalities such as accelerating timelines or enabling malicious actors.
My goal is to figure out which capabilities to prioritise by balancing the benefits against the costs and then incorporating feasibility.
It may be the case that some people decide that wisdom requires different capabilities than those I end up deciding are important. As long as they pick a capability that isn’t net-negative, I don’t see that as bad. In fact, I see different people pursuing different understandings of wisdom as adding robustness to different world models.
If wisdom ultimately breaks down into specific capabilities, why not simply talk about these capabilities and avoid using a vague concept like wisdom? 🙅♂️🌐
So the question is: “Why do I want to break wisdom down into separate capabilities instead of choosing a unitary definition of wisdom and attempting to train that into an AI system?”
Firstly, I think the chance of us being able to steer the world towards a positive direction is much higher if we’re able to combine multiple capabilities together, so it makes sense to have a handle for the broader project, in addition to handles for individual sub-projects. I believe that techniques designed for one capability will often carry over to other capabilities, as will the challenges, and having a larger handle makes it easier to make these connections. I also think there’s a chance that these capabilities amplify each other (as per the final few paragraphs of
Imagining and Building Wise Machines[60] by Johnson, Bengio, Grossmann, et al).Secondly, I believe we should be aiming to increase both human wisdom and AI wisdom simultaneously. In particular, I believe it’s important to increase the wisdom of folks creating AI systems and that this will then prove useful for a wide variety of specific capabilities that we might wish to train.
Finally, I’m interested in investigating this frame as part of a more ambitious plan to solve alignment on a principled level. Instead of limiting the win condition to building an AI that always (competently) acts in line with human values, the wise AI Advisors frame broadens it such that the AI only needs to inspire humans to make the right decision. It’s hard to know in advance whether this reframing will be any easier, but even if it doesn’t help, I have a strong intuition that understanding why it doesn’t help would shed light on the barriers to solving the core alignment problem.
Weren’t you focused on wise AI advisors via Imitation Learning before? 🎯
Yep, I was focused on it before. I now see that goal as overly narrow. The goal is to produce wise AI Advisors via any means. I think that Imitation Learning is underrated, but there are lots of other approaches that are worth exploring as well.
✋ Objections
Isn’t this argument a bit pessimistic? 🙅♂️⚖️. I prefer to be optimistic. 👌
Optimism isn’t about burying your head in the sand and ignoring the massive challenges facing us. That’s denialism. Optimism is about rolling up your sleeves and doing what needs to be done.
The nice thing about this proposal from an optimistic standpoint is that, assuming there is a way for AI to go well for humanity, then it seems natural to expect that there is some way to leverage AI to help us find it[61].
Additionally, the argument for developing wise AI advisors isn’t in any way contingent on a pessimistic view of the world. Even if you think AI is likely to go well by default, wise AI advisors could still be of great assistance for making things go even better. For example, facilitating negotiations between powers, navigating the safety-openness tradeoff and minimising any transitional issues.
But I don’t think timelines are short. They could be long. Like more than a decade. 👌🤷♂️
Short timelines add force to the argument I’ve made above, but they aren’t at all a necessary component[62].
Even if AI will be developed over decades rather than years, there’s still enough different challenges and key decisions that unaugmented human wisdom is unlikely to be sufficient.
In fact, my proposal might even work better over long timelines, as it provides more time for AI advisers to help steer the world in a positive direction[63].
Don’t companies have a commercial incentive to train wise AI by default? 🤏
I’m extremely worried about the incentives created by a general chatbot product. The average user is low-context and this creates an incentive towards sychophancy 🤥.
I believe that a product aimed at providing advice for critically important decisions would provide better incentives, even it were created by the same company.
Furthermore, given the potential for short timelines, it seems extremely risky 🎰 to rely purely on the profit motive, especially since there is a much stronger profit motive to pursue capabilities 💰💰💰. A few months’ delay could easily mean that a wise AI advisor isn’t available for a crucial decision 🚌💨. Humanity has probably already missed having such advisors available to assist us during a number of key decision points 😭.
Won’t this just be used by malicious actors? Doesn’t this just accelerate capabilities? 🤔❌
I expect both the benefits and the externalities to vary hugely by capability. I expect some to be positive, some to be negative and some to be extremely hard to determine. More work is required to figure out which capabilities are best from a differential technology development perspective.
I understand that this answer might be frustrating, but I think it’s worth sharing these ideas even though i haven’t yet had time to run this analysis. I have a
list of projectsthat I hope will prove fairly robust, despite all the uncertainty.Is there value in wisdom given that wisdom is often illegible and this makes it non-verifiable? ✔️💯
Oscar Delany makes this argument in
Tentatively against making AIs 'wise'(runner up in theAI Impact Essay Competition).This will depend on your definition of wisdom.
I admit that this tends to be a problem with how I tend to conceive of wisdom, however I will note that
Imagining and Building Wise Machines(summary) takes the opposite stance—the wisdom, conceived of as metacognition—can actually assist with explainability.But let’s assume that this is a problem. How much of a problem is it?
I suspect that this varies significantly by actor. There’s a reasonable argument that public institutions shouldn’t be using such tools for reasons of justice. However, these arguments have much less force when it comes to private actors.
Even for private actors, it makes sense to use more legible techniques as much as possible, but I don’t think this will be sufficient for all decisions. In particular, I don’t think objective reasoning is sufficient for navigating the key decisions facing society in the transition to advanced AI.
But I also want to push back against the non-verifiability. You can do things like run a pool of advisors and only take dramatic action if more than a certain proportion agree, plus you can do testing, even advanced things like latent adversarial testing. It’s not as verifiable as we’d like, but it’s not like we’re completely helpless here.
There will also be ways to combine wise AI advisors with more legible systems.
I’m mostly worried about inner alignment. Does this proposal address this? 🤞✔️
Inner alignment is an extremely challenging issue. If only we had some… wise AI advisors to help us navigate this problem.
“But this doesn’t solve the problem, this assumes that these advisors are aligned themselves”: Indeed, that is a concern. However, I suspect that the wise AI advisors approach has less exposure to these kinds of risks as it allows us to achieve certain goals at a lower base model capability level.
• Firstly, wise people don’t always have huge amounts of intellectual firepower. So I believe that we will be able to achieve a lot without necessarily using the most powerful models.
• Secondly, the approach of combined human-AI teams allows the humans to compensate for any weaknesses present in the AIs.
In summary, this approach might help in two ways: by reducing exposure and advising us on how to navigate the issue.
🌅 What makes this agenda so great?
Thanks for asking 🥳. Strangely enough I was looking for an excuse to hit a bunch of my key talking points 😂. Here’s my top three:
i: 🤝🧭🌎 Imagine a coalition of organisations attempting to steer the world towards positive outcomes[64], with wise AI advisors helping them refine strategy, improve coordination and engage in non-manipulative persuasion. How much much do you think this could shift our trajectory? It seems to me that even a small coalition could make a significant difference over time.
ii: 🆓🍒✳️ Wise AI advisors are most likely complementary with almost any plan for making AGI go well[67]. I expect that much of this work could take the form of a startup and that non-profits working in this space would appeal to new funders, such that there’s very little opportunity cost in terms of pursuing this direction.
iii: 🧑🔬🧭🌅 Wisdom tech seems favourable from a differential technology development perspective. Most AI technology differentially advantages “reckless and unwise” since “responsible & wise” actors[69] need more time to figure out how to deploy a technology. There’s a limit to how much we can speed up human processes, but wise AI advisors could likely reduce this time lag further[70]. There’s also a reasonable chance than wisdom tech allows “reckless and unwise” actors to realise their own foolishness[71]. For those who favour openness, wisdom tech might be safer to distribute.
🎁 Not persuaded yet? Here’s three more bonus points :
🦋🌪️: Even modest boosts in wisdom could be significant. Single decisions often shape the course of history. Altering the right decision might be the difference between avoiding and experiencing a catastrophe. Similarly, there are individual actors who could dramatically change the strategic situation by deciding to start acting responsibly.
🚧🌱: Progress on the core part of the alignment problem seems to have stalled. They say insanity is doing the same thing over and over again without success, so perhaps we should be looking for new framings? The wisdom framing seems quite underexplored and potentially fruitful to me[73]. It’s not at all obvious that we want our technology to be intelligent more than we want it to be wise[74]. Furthermore, considering how to steer the world using cybernetic systems involving both humans and AI provides a generalisation of the alignment problem. Maybe neither of these frames will ultimately make sense, but even if that’s the case, I suspect that understanding precisely why they aren’t fruitful would be valuable[75].
🐦⬛🦉: Even if the most ambitious parts of this proposal fail on a concrete level, training up a bunch of folks with both a deep understanding of the alignment problem and wisdom seems pretty darn valuable. Even if we don’t know exactly what these people might do, it feels like a robustly useful skillset for the community to be developing[76].
I believe that this proposal is particularly promising because it has so many different plausible theories of change. It’s hard to know in advance what assumptions will or will not pan out
You might also be interested in my draft post
N Stories of Impact for Wise AI Advisors 🏗️which attempts to cleanly seperate out the various possible theories of impact.☞ I wasn’t sold before, but I am now. How can I get involved? 🥳🎁
If you’re serious, please
✉️ PM me. If you think you might be serious, but need to talk it through, please reach out as well.I’d be useful for me to know your background and how you think you could contribute. Maybe tell you a few facts about yourself, what interests you and drop a link to your LinkedIn profile?
If you scroll down, you’ll see that I’ve answered some more questions about getting involved, but I’ll include some useful links here as well:
•
List of potentially useful projects: for those who want to know what could be done concretely. Scroll down further for a list of project lists ⬇️.•
Resources for founding an AI safety startup or non-profit: since I believe there should be multiple organisations pursuing this agenda•
Resources for getting started in AI Safety more broadly: since some of these resources might be useful here as wellNow that we’ve clarified some of the possible theories of impact, the next section will delve more into specific approaches and projects, including listing some (hopefully) robustly beneficial projects that could be pursued in this space.
I also intend for some of my future work to be focused on making things more concrete. As an example, I’m hoping to spend some time during my ERA fellowship attempting to clarify what kinds of wisdom are most important for steering the world in positive directions.
So whilst I think it is important to make this proposal more concrete, I’m not going to rush. Doing things well is often better than doing them as fast as possible. It took a long time for AI safety to move from abstract theoretical discussions to concrete empirical research and I expect it’ll also take some time for these ideas to mature[77].
🌱 What can I do? Is this viable? Is this useful?
☞ Do you have specific projects that might be useful? -
📚Main Listor expand for other listsYes, I created a list:
Potentially Useful Projects in Wise AI. It contains a variety of projects from ones that would be marginally helpful to incredibly ambitious moonshots.What other project lists exist?
Here are some project lists (or larger resources containing a project list) for wise AI or related areas:
•
AI Impacts Essay Competition: Covers the automation of philosophy and wisdom.•
Fellowship on AI for Human Reasoning - Future of Life Foundation: AI tools for coordination and epistemics•
AI for Epistemics - Benjamin Todd- He writes: “The ideal founding team would cover the bases of: (i) forecasting / decision-making expertise (ii) AI expertise (iii) product and entrepreneurial skills and (iv) knowledge of an initial user-type. Though bear in mind that if you have a gap in one of these areas now, you could probably fill it within a year” and then provides a list of projects.•
Project ideas: Epistemics - Forethought- focuses on improving epistemics in general, not just AI solutions.Do you have any advice for creating a startup in this space? -
📚ResourcesSee the
AI Safety & Entrepreneurship wiki pagefor resources including articles, incubation programs, fiscal sponsorship and funding.Is it really useful to make AI incrementally wiser? ✔️💯
AI being incrementally wiser might still be the difference between making a correct or incorrect decision at a key point.
Often several incremental improvements stack on top of each other, leading to a major advance.
Further, we can ask AI advisors to advise us on more ambitious projects to train wise AI. And the wiser our initial AI advisors are, the more likely this is to go well. That is, improvements that are initially incremental might be able to be leveraged to gain further improvements.
In the best case, this might kick off a (strong) positive feedback cycle (aka wisdom explosion).
Sure, you can make incremental progress. But is it really possible to train an AI to be wise in any deep way? 🤞✔️🤷
Possibly 🤷.
I’m not convinced it’s harder than any of the other ambitious alignment agendas and we won’t know how far we can go without giving it a serious effort. Is training an AI to be wise really harder than aligning it? If anything, it seems like a less stringent requirement.
Compare:
• Ambitious mechanistic interpretability aims to perfectly understand how a neural network works at the level of individual weights
• Agent foundations attempting to truly understand what concepts like agency, optimisation, decisions are values are at a fundamental level
• Davidad’s Open Agency architecture attempting train AI’s that come with proof certificates that an AI has less than a certain probability of having unwanted side-effects
Is it obvious that any of these are easier than training a truly wise AI advisor? I can’t answer for you, but it isn’t obvious to me.
Given the stakes, I think it is worth pursuing ambitious agendas anyway. Even if you think timelines are short, it’s hard to justify holding a probability approaching 100%, so it makes sense for folk to be pursuing plans on different timelines.
I understand your point about all ambitious alignment proposals being extremely challenging, but do you have any specific ideas for training AI to be deeply wise (even if speculative)? ✅
This section provides a high-level overview of some of my more speculative ideas. Whilst these ideas are very far from fully developed, I nonetheless thought it was worthwhile sharing an early version of them. That said, I’m very much on the set of “let a hundred flowers bloom”. Whilst I think the approach I propose is promising, I also think it’s much more likely than not that someone else comes up with an even better idea.
I believe that imitation learning is greatly underrated. I have a strong intuition that using the standard ML approach for training wisdom will fail because of the traditional Goodhart’s law reasons, except that it’ll be worse because wisdom is such a fuzzy thing (Who the hell knows what wisdom really is?).
I feel that training a deeply wise AI system requires an objective function that we can optimise hard on. This naturally draws me towards imitation learning. It has its flaws, but it certainly seems like we could optimise much harder on an imitation function than with attempting to train wisdom directly.
Now, imitation learning is often thought of as weak and there’s certainly some truth in this when we’re just talking about an initial imitation model. However, this isn’t a cap on the power of imitation learning, only a floor. There’s nothing stopping us from training a bunch of such models and using amplification techniques such as debate, trees of agents or even iterated distillation and amplification. I expect there to be many other such techniques for amplifying your initial imitation models.
See
An Overview of "Obvious" Approaches to Training Wise AI Advisorsfor more information. This post was the first half of my entry into the AI Impacts Essay Competition on the Automation of Wisdom and Philosophy which placed third.Why mightn’t I want to work on this?
I think it’s important to consider whether you might not be the right person to make progress on this.
I mean — wisdom? Who the hell knows what that is?
I expect most people who try to gain traction on this to just end up confusing themselves and others. So, you probably shouldn’t work on this unless you’re a particularly good fit.
On the other hand, there are very strong selection effects where those who are fools think they are wise and those who are wise doubt their own wisdom.
I wish I had a good answer here. All I can say is to avoid both arrogance and performative modesty when reflecting on this question.
Many of the methods you’ve proposed are non-embodied. Doesn’t wisdom require embodiment? 🤔❌🤷
There’s a very simplistic version argument for this that proves too much: people have used this claim to argue that LLMs were never going to be able to learn to reason, whilst o3 seems to have conclusively disproven this. So I don’t think that the standard version of this argument works.
It’s also worth noting that in robotics, there are examples of zero-shot transfer to the real world. Even if wisdom isn’t directly analogous, this suggests that large amounts of real-world experience might be less crucial than it appears at first glance.
All this said, I find it plausible that a more sophisticated version of this argument might post a greater challenge. Even if this were the case, I see no reason why we couldn’t combine both embodied and non-embodied methods. So even if there was a more sophisticated version that ultimately turned out to be true, this still wouldn’t demonstrated that research into non-embodied methods was pointless.
✍️ Conclusion:
Coming soon 😉.
☞ Okay, you’ve convinced me now. Any chance you could repeat how to get involved so I don’t have to scroll up? ✅📚
Sure 😊. Round
1️⃣2️⃣ 🔔, here we go 🪂!As I said, if you’re serious, please
✉️ PM me. If you think you might be serious, but need to talk it through, please reach out as well.I’d be useful for me to know your background and how you think you could contribute. Maybe tell you a few facts about yourself, what interests you and drop a link to your LinkedIn profile?
Useful link:
•
List of potentially useful projects: for those who want to know what could be done concretely. Scroll down further for a list of project lists ⬇️.•
Resources for founding an AI safety startup or non-profit: since I believe there should be multiple organisations pursuing this agenda•
Resources for getting started in AI Safety more broadly: since some of these resources might be useful here as well📘 Appendix:
Who are you? 👋
Hi, I’m Chris (Leong). I’ve studied maths, computer science, philosophy and psychology. I’ve been interested in AI safety for nearly a decade and I’ve participated in quite a large number of related opportunities.
With regards to the intersection of AI safety and wisdom:
I won third prize in the
AI Impacts Competition on the Automation of Wisdom and Philosophy. It’s divided into two parts:•
An Overview of “Obvious” Approaches to Training Wise AI Advisors•
Some Preliminary Notes on the Promise of a Wisdom ExplosionI recently ran an
AI Safety Camp project on Wise AI Advisors.More recently, I was invited to continue my research as a ERA Cambridge Fellow in Technical AI Governance.
Feel free to connect with me on
LinkedIn(ideally mentioning your interest in wise AI advisors so I know to accept) or✉️ PM me.Hi, I’m Christopher Clay. I was previously a Non-Trivial Fellow and I participated in the Global Challenges Project.
Does anyone else think wise AI or wise AI advisors are important? ✔️💯
Yes, here are a few:
Imagining and Building Wise Machines: The centrality of AI metacognition
original papersummaryAuthors: Yoshua Bengio, Igor Grossmann, Melanie Mitchell and Samuel Johnson[78] et al
The paper mostly focuses on wise AI agents, however, not exclusively so:
AI Impacts Automation of Wisdom and Philosophy Competition
Organised by: Owen Cotton-Barrett
Judges: Andreas Stuhlmüller, Linh Chi Nguyen, Bradford Saad and David Manley[79]
competition announcementWise AI Wednesdays postwinners and judge's comments[80]Beyond Artificial Intelligence (AI): Exploring Artificial Wisdom (AW)
Authors: Dilip V Jest, Sarah A Graham, Tanya T Nguyen, Colin A Depp, Ellen E Lee, Ho-Cheol Kim
paperWhat do you see as the strongest counterarguments against this general direction? 🔜🙏
I’ve discussed several possible counter-arguments above, but I’m going to hold off publicly posting about which ones seem strongest to me for now. I’m hoping this will increase diversity of thought and nerdsnipe people into helping red-team my proposal for me 🧑🔬👩🔬🎯. Sometimes with feedback there’s a risk where if you say, “I’m particularly worried about these particular counter-arguments” you can redirect too much of the discussion onto those particular points, at the expense of other, perhaps stronger criticisms.
☞ Do you have any recommendations for further reading? ✅📚
I’ve created a weekly post series on Less Wrong called
Wise AI Wednesdays.Motivation
‘AI for societal uplift’ as a path to victory -
LW Post: Examines the conditions in which a “societal uplift”—epistemics + co-ordination + institutional steering—might or might not lead to positive outcomesN Stories of Impact for Wise AI Advisors -
Draft 🏗️: Different stories about how wise AI advisors could be useful for having a positive impact on the world.Artificial Wisdom
Imagining and building wise machines: The centrality of AI metacognition by Johnson, Karimi, Bengio, et al. -
paper,summary: This paper argues that wisdom involves two kinds of strategies (task-level strategies & metacognitive strategies). Since current AI is pretty good at the former, they argue that we should pursue the latter as a path to increasing AI wisdom.Finding the Wisdom to Build Safe AI by Gordon Seidoh Worley -
LW post: Seidoh talks about his own journey in becoming wiser through Zen and outlines a plan for building wise AI. In particular, he argues that it will be hard to produce wise AI without having a wise person to evaluate it.Designing Artificial Wisdom: The Wise Workflow Research Organisation by Jordan Arel -
EA forum post: Jordan proposes mapping the workflows within an organisation that is researching a topic like AI safety or existential risk. AI could be used to automate or augment parts of their work. This proportion would increase over time, with the hope being that this would eventually allow us to fully bootstrap an artificially wise system.Should we just be building more datasets? by Gabriel Recchia -
Substack: Argues that an underrated way of increasing the wisdom of AI systems would be building more datasets (whilst also acknowledging the risks).Tentatively Against Making AIs ‘wise’ by Oscar Delany -
EA forum post: This articles argues that insofar as wisdom is conceived of as being more intuitive than carefully reasoned pursuing AI wisdom would be a mistake as we need AI reasoning to be transparent. I’ve included this because it seems valuable to have at least one critical article.Neighbouring Areas of Research
What’s Important In “AI for Epistemics”? by Lukas Finnveden -
Forethought: AI for Epistemics is a subtly different but overlapping area. It is close enough that this article is worth reading. It provides an overview of why you might want to work on this, heuristics for good interventions and concrete projects.AI for AI Safety by Joe Carlsmith (
LW post): Provides a strategic analysis of why AI for AI safety is important whether it’s for making direct safety progress, evaluating risks, restraining capabilities or improving “backdrop capacity”. Great diagrams.AI Tools for Existential Security by Lizka Vaintrob and Owen Cotton-Barratt --
Forethought: Discusses how applications of AI can be used to reduce existential risks and suggests strategic implications.Not Superintelligence: Supercoordination -
forum post[82]: This article suggests that software-mediated supercoordination could be beneficial for steering the world in positive directions, but also identifies the possibility of this ending up as a “horrorshow”.Human Wisdom
Stanford Encyclopedia of Philosophy Article on Wisdom by Sharon Ryan -
SEP article: SEP articles tend to be excellent, but also long and complicated. In contrast, this article maintains the excellence while being short and accessible.Thirty Years of Psychology Wisdom Research: What We Know About the Correlates of an Ancient Concept by Dong, Weststrate and Fournier -
paper: Provides an excellent overview of how different groups within psychology view wisdom.The Quest for Artificial Wisdom by Sevilla -
paper: This article outlines how wisdom is viewed in the Contemplative Sciences discipline. It has some discussion of how to apply this to AI, but much of this discussion seems outdated in light of the deep learning paradigm.Applications to Governance
🏆 Wise AI support for government decision-making by Ashwin -
Substack(Prize winning entry in theAI Impacts Automation of Wisdom and Philosophy Competition): This article convinced me that it isn’t too early to start trying to engage the government on wise AI. In particular, Ashwin considers the example of automating the Delphi process. He argues that even though you might begin by automating parts of the process, over time you could expand beyond this, for example, by helping the organisers figure out what questions they should be asking.Some of my own work:
🏆 My third prize-winning entry in the
AI Impacts Automation of Wisdom and Philosophy Competition(split into two parts):Some Preliminary Notes on the Promise of a Wisdom Explosion: Defines a wisdom explosion as a recursive self-improvement feedback loop that enhances wisdom, unlike intelligence as per the more traditional intelligence explosion. Argues that wisdom tech is safer from a differential technology perspective.An Overview of "Obvious" Approaches to Training Wise AI Advisors: Compares four different high-level approaches to training wise AI: direct training, imitation learning, attempting to understand what wisdom is at a deep principled level, the scattergun approach. One of the competition judges wrote: “I can imagine this being a handy resource to look at when thinking about how to train wisdom, both as a starting point, a refresher, and to double-check that one hasn’t forgotten anything important”.☞ What are some projects that exist in this space? — TODO
Automating the Delphi Method for Safety Cases— Philip Fox, Ketana Krishna, Tuneer Mondal, Ben Smith, Alejandro Tlaie — Ashwin and Michaelah Gertz-Billingsley proposed automating the Delphi Mehod as a foot-in-the door for getting the government to use wise AI. Well, turns out these folk at Arcadia Impact have done it!⇢ “What’s an metapost?” — Oh, I just made that term up. Essentially, it’s just a megapost, but instead of being one long piece of text, it uses collapsible sections to keep the core post lightweight 🪶. In other words, it’s both a megapost and a minipost at the same time 😉.
⇢ “But people will confuse this with a meta post?” — No space. Simple 🤷♂️.
⇢ This term happens to also be appropriate in a second sense; some aspects are inspired by metamodernism.
⇢ I jokingly refer to this as the “Party Edition” as it’s designed to be shared at informal social gatherings 🕺.
⇢ “Party edition? Why the hell would you make a party edition!?” — I could you tell a story about how this will actually be impactful (HPMoR; Siliconversations; spreading ideas by going from person to person being massively underrated), but at the end of the day, I wanted to make it, so I made it 🤷♂️.
⇢ “But what is it?” — I’ve handcrafted this version for more casual contexts. I’ve tried to be more authentic and I hope you find it more engaging. That said, these are serious issues, so I’m also creating a “serious edition” for situations for contexts where sharing a post like this might come off as disrespectful 🎩. Otherwise: 🥳.
⇢ “That makes sense, but are the lame jokes really necessary?” — Yes, they are most necessary. “When the novice attained the rank of grad student, he took the name Bouzo and would only discuss rationality while wearing a clown suit” 🤡🎩
“Why put so much effort into a Less Wrong post?” — 🟥🍉
“Regardless” — Yes. One must imagine Sisyphus happy… ❤️🦉.
⇢ 🔥👀🐎 — I remember reading about how, in The Lord of the Rings, they even took the time to inscribe some writing inside someone’s armor that could never be seen on camera. I found this inspiring; as a demonstration of what true dedication to art looks like — ▶️
⇢ Complete tangent but...
⇢ “I see in your eyes the same fear that would take the heart of me. A day may come when the courage of men fails, when we forsake our friends and break all bonds of fellowship, but it is not this day. An hour of wolves and shattered shields, when the age of men comes crashing down, but it is not this day! This day we fight!! By all that you hold dear on this good Earth, I bid you stand” 💍🌋🛡️ — ▶️
⇢ “If we as humans ever face extinction via alien invasion, I nominate Aragorn to come to life and lead mankind” — @jlop6822 (Youtube comment)
“If I see a situation pointed south, I can’t ignore it. Sometimes I wish I could” — Steve Rogers 🛡️🇺🇸
It’s really just this version that is the passion project. The “Formal Version” is necessary and the right version for certain contexts, but that format also loses something 💔.
“We will call this discourse, oscillating between a modern enthusiasm and a postmodern irony, metamodernism… New generations of artists increasingly abandon the aesthetic precepts of deconstruction, parataxis, and pastiche in favor of aesth-ethical notions of reconstruction, myth, and metaxis… History, it seems, is moving rapidly beyond its all too hastily proclaimed end” — Notes on metamodernism
I’m using the the term “alignment proposal” in an extremely general sense, particularly “how we could save the world”, rather than the narrow sense of achieving good outcomes by specifically aligning models. That said, wise AI advisors could assist with aligning models and they also provide us with a generalised version of the alignment problem (discussed later in this post) 🌏 → 🌞.
“It is only as an aesthetic phenomenon that existence and the world are eternally justified” — ❤️🦉🔨.
Reality does not grade on a curve 🚀🏰💥
Is naiviety always bad? Is there some kind of universal rule? Or are there exceptions? Perhaps there are situations where a certain form of naiviety can form a self-fulfilling prophecy (or hyperstition) 🔮🪞.
“Of all that is written, I love only what a man has written with his blood. Write with blood, and you will experience that blood is spirit” — ❤️🦉🔨.
Unfortunately, I’ve forgotten the names of many people who provided me with useful feedback 😔.
🇺🇸🚀 — 🏆: 📽️🎭🦾🎬🎞️🎥🎼 — ▶️
Commenting on the parallels between the Manhattan project and AI on This Past Weekend with Theo Von ☢️🤖.
Source: NeurIPS’2019 workshop on Fairness and Ethics, Vancouver, BC, On the Wisdom Race, December 13th, 2019 🦉🏎️🏁
One reason why this arrangement of quotes amuses me is because the Sam Altman quote was actually in response to the host specifically asking Sam about what he thought about Yoshua Bengio’s concerns 😂.
“Centuries” — I wish 😭.
🧗♀️
🗿👑☢️
👦💣💥
📺: 🪐🔭
⛪: 🤲🥣 → 🦁🌱
🇺🇳: 👨💻, ⭐⭐⭐⭐⭐
👨🔬👨🦼: 🌬️🕳️
Salvia Ego Ipse, Philosopher of AI, Liber magnus, sapiens et praetiosus philosophiae ÷
“But what if an equivalent increase in wisdom turns out to be impossible?” — This is left as an exercise to the reader 😱.
I refer to this as the SUV Triad 🫰
⇢ I refer to this the Wisdom-Capabilities Gap or simply the Wisdom Gap.
⇢ “Does it make sense to assume that greater capabilities require greater wisdom to handle?” — Yes, I believe it does. I’ll argue for this in the Core Argument 🔜.
I may still decide to weaken this claim. This is a draft post after all 🙏.
⇢ “Then why is this section first?”: You might find it hard to believe, but there’s actually folks who’d freak out if I didn’t do this 🔱🫣. Yep… 🤷♂️
⇢ I recommend diving straight into the core argument, and seeing if you agree with it when you substitute your own understanding of what wisdom is. I promise you 💍, I’m not doing this arbitrarily. However, if you really want to read the definitions first, then you should feel free to just go for it 🧘👌.
For example, see a and b.
I also believe that this increases our chances of being able to set up a positive feedback loop 🤞🤞🤞 of increasing wisdom (aka “wisdom explosion” 🦉🎆) compared to strategies that don’t leverage humans. But I don’t want to derail the discussion by delving into this right now 🐇🕳️.
I’d like to think of myself as being the kind of person who’s sensible enough to avoid getting derailed by small or irrelevant details 🤞🤞. Nonetheless, I can recall
countlessa few times when I’ve failed at this in the past 🚇💥. Honestly, I wouldn’t be surprised if many people were in the same boat 🧑🤝🧑🧑🤝🧑🧑🤝🧑🌏🤞.I’m using emojis to provide a visual indication of my answer without needing to expand the section 🦸🩻.
This post is being written collaboratively by Chris Leong (primary author) and Christopher Clay (second author) in the voice of Chris Leong 🤔🤝✍️. The blame for any cringe in the footnotes can be pinned on Chris Leong 🎯.
⇢ I believe that good communication involves being as open about the language game ❤️🦉 being being played as possible 🫶.
Given this, I want to acknowledge that this post is not intended to deliver a perfectly balanced analysis, but rather to lay out an optimistic case for wise AI advisors 🌅. In fact, feel free to think of it as a kind of manifesto 📖🚪📜📌.
I suspect that many, if not most, readers of Less Wrong will be worried that this must necessarily come at the expense of truth-seeking 🧘. In fact, I probably would have agree with this not too long ago. However, recently I’ve been resonating a lot more with the idea that there’s value in all kinds of language games, so long as they are deployed in the right context and pursued in the right way 🤔[83]. Most communication shouldn’t be a manifesto, but the optimal number of manifestos isn’t zero (❗).
⇢ Stone-cold rationality is important, but so is the ability to dream 😴💭🎑. It’s important to be able to see clearly, but also to lay out a vision 🔭✨. These are in tension, sure, but not in contradiction ⚖️. Synthesis is hard, but not impossible 🧗.
⇢ If you’re interested in further discussion, see the next footnote...
⇢ “Two footnotes on the exact same point? This is madness!”: No, this is Sparta!
🌍⛰️🐰🕳️Spartans don’t quit and neither do we 💪. So here it is: Why it makes sense to lay out an optimistic case, Round 🔔🔔.
Who doesn’t love a second bite at the apple? 👸🏻🍏💀😴
⇢ Adopting a lens of optimism unlocks creativity by preventing you from discarding ideas too early (conversely, a lens of pessimism aids with red-teaming by preventing you from discarding objections too easily). It increases the chance that bad ideas make it through your initial filter, but these can always be filtered further down the line. I believe that such a lens is the right choice for now, given that this area of thoughtspace is quite neglected. Exploration is a higher priority than filtration 🌬️⛵🌱.
⇢ “But you’re biasing people toward optimism”: I take this seriously 😔🤔. At the same time, I don’t believe that readers need to be constantly handheld lest they form the wrong inference. Surely, it’s worth risking some small amount of bias to avoid coming off as overly paternalistic?
Should texts be primarily written for the benefit of those who can’t apply critical thinking? To be honest, that idea feels rather strange to me 🙃. I think it’s important to primarily write for the benefit of your target audience and I don’t think that the readers I care about most are just going to absorb the text wholesale, without applying any further skepticism or cognitive processing 🤔. Trusting the reader has risks, sure, but I believe there are many people for whom this is both warranted and deserved 🧭⛵🌞.
“Why all the emojis?”: All blame belongs to Chris Leong 🎯.
“Not who deserves to be shot, but why are you using them?”: A few different reasons: they let me bring back some of the expressiveness that’s possible in verbal conversation, but which is typically absent in written communication 🥹😱, and striking a more casual tone helps differentiate it from any more academic or rigorous treatment (😉) that might hypothetically come later 🤠🛤️🎓 and they help keep the reader interested 🎁.
But honestly: I mostly just feel drawn to the challenge. The frame “both/and” has been resonating a lot with me recently and one such way is that I’ve been feeling drawn towards trying to produce content that manages to be both fun and serious at the same time 🥳⚖️🎩.
I’m starting to wonder if this point is actually true. This is a NOTE TO MYSELF to review this claim.
“Why can’t we afford to leave anything on the table?”—The task we face is extremely challenging, the stakes sky high, it’s extremely unclear what would have to happen for AI to go well, we have no easy way on gaining clarity on what would have to happen and we’re kind of in an AI arms race which basically precludes us from fully thinking everything through 😅⏳⏰.
“But what about prioritisation?” — Think Wittgenstein. I’m asserting the need for a mindset, a way of being 🧘🏃♂️🏋️.
“But I don’t understand you” — If a lion could speak, we could not understand him ❤️🦉🕵️.
This section is currently being edited, so it may take more than 3 minutes. Sorry 🙏!
I selected this name because it’s easy to say and remember, but in my heart it’ll always be the MST/MSU/MSV trifecta (minimally spare time, massive strategic uncertainty, many scary vulnerabilities 💔🚢🌊🎶. I greatly appreciate Will MacAskill’s recommendation that I simplify the name 🙏.
I don’t think I’m exactly arguing anything super original or unprecedented here 😭😱, but I don’t know if I’ve ever really seen anyone really go hard on this shape of argument before 🌊🌊🌊.
Some of these risks could directly cause catastrophe. Others might indirectly lead to this via undermining vital societal infrastructure such that we are then exposed to other threats 🕷️🕸️.
One of the biggest challenges here is that AI throws so many civilisational scale challenges towards us at once. Humanity can deal with civilisational scale challenges, but global co-ordination is hard and when there’s so many different issues to deal with its hard to give each problem the proper attention it deserves 😅🌫️💣💣💣.
I’m intentionally painting these questions as more of a binary than they actually are to help illustrate the range of opinions.
Some of this is deserved, imho 🤷♂️.
In many ways, the figuring out how to act part of the problem is the much harder component 🧭🌫️🪨. Given perfect co-ordination between nation states, these issues could almost certainly be resolved quite quickly, but given there’s no agreement on what needs to be done and politics being the mind-killer, I wouldn’t hold your breadth… 😯😶🐌🚀🚀🚀
The exponential increase in task-length actually generalises beyond coding.
What’s more impressive is how it was solved: “Why am I excited about IMO results we just published: - we did very little IMO-specific work, we just keep training general models—all natural language proofs—no evaluation harness We needed a new research breakthrough and @alexwei_ and team delivered”
https://x.com/MillionInt/status/1946551400365994077 🤯
Some people have complained that the students used in this experiment were underqualified as judges, but I’m sure there’ll be a follow-up next year that addresses this issue 🤷♂️.
⇢ Clark: “Sir, this is a Less Wrong. You can’t swear here” — In order for a film to get a 12A or PG-13 rating, it cannot contain gratuitous use of profanity. One ‘f#@k’ is allowed, anymore and it automatically becomes a 15 or R rated” — See, PG-13.
⇢ Jax: “Censorship is terrible and the belief that swearing hurts kids is utterly absurd. Do do you think they’ve forgotten what it was like as a kid?” — Not taking sides, but restrictions aren’t always negative. Sometimes limiting a resource encourages people to use that resource to maximum effect 🎯.
See Life Itself’s article on the “wisdom gap” which is the terminology I prefer in informal contexts. 10 Billion uses the term capabilities-wisdom gap, which I’ve reversed in order to emphasis wisdom. I prefer this version for more formal contexts 📖😕.
I don’t want to dismiss the value of addressing specific issues in terms of buying us time 🔄.
I recommend finishing the post first (don’t feel obligated to expand all the sections) 🙏.
This paper argues that robustness, explainability and cooperation all facilitate each other 🤞💪. In case the link to wisdom is confusing (🦉❓), the authors argue that metacognition is both key to wisdom and key to these capabilities ☝️.
This argument could definitely be more rigorous, however, this response is addressed to optimists and this it is especially likely to be true if we adopt an optimistic stance on technical and social alignment. So I feel fine leaving it as is 🤞🙏.
⚠️ If timelines are extremely short, then many of my proposals might become irrelevant because there won’t be time to develop them 😭⏳. We might be limited to simple things like tweaking the system prompt or developing a new user interface, as opposed to some of the more ambitious approaches that might require changing how we train our models. However, there’s a lot of uncertainty with timelines, so more ambitious projects might make sense anyway. 🤔🌫️🤞🤷.
“If your proposal might even work better over long timelines, why does your core argument focus on short timelines 🤔⁉️”: There are multiple arguments for developing wise AI advisors and I wanted to focus on one that was particularly legible 🔤🙏.
Andrew Critch has labelled this—“pivoting” civilisation across multiple acts across multiple persons and institutions—a pivotal process. He compares this favourably to the concept of a pivotal act—“pivoting” civilisation in a single action—on the basis of it being easier, safer and more legitimate 🗺️🌬️⛵️⛵️⛵️🌎.
I say “forget X” partly in jest. My point is that if there’s a realistic chance of realising stronger possibility, then it would likely be much more significant than the weaker possibility 👁️⚽.
I think there’s a strong case that if these advisers can help you gain allies at all, they can help you gain many allies 🎺🪂🪂🪂.
In
So You Want To Make Marginal Progress..., John Wentworth argues that “when we don’t already know how to solve the main bottleneck(s)… a solution must generalize to work well with the whole wide class of possible solutions to other subproblems… (else) most likely, your contribution will not be small; it will be completely worthless” 🗑️‼️.My default assumption is that people will be making further use of AI advice going forwards. But that doesn’t contradict my key point, that certain strategies may become viable if we “go hard on” producing wise AI advisors 🗺️🔓🌄.
“But “reckless and unwise” and “responsible & wise” aren’t the only possible combinations!”—True 👍, but they co-occur sufficiently often that I think this is okay for a high-level analysis 🤔🤞🙏.
They could also speed up the ability of malicious and “reckless & unwise” actors to engage in destructive actions, however I still think such advisors would most likely to improve the comparative situation since many defences and precautions can be put in place before they are needed 🤔🤞💪.
This argument is developed further in
Some Preliminary Notes on the Promise of a Wisdom Explosion🦉🎆📄.I agree it’s highly unlikely that we could convince all the major labs to pursue a wisdom explosion instead of an intelligence explosion 🙇. Nonetheless, I don’t think this renders the question irrelevant ☝️. Let’s suppose (😴💭) it really were the case that the transition to AGI was would most likely go well if society had decided to pursue a wisdom explosion rather than an intelligence explosion ‼️. I don’t know, but that seems like it’d be the kind of thing that would be pretty darn useful to know 🤷♂️. My experience with a lot of things is that it’s a lot easier to find additional solutions than it is to find the first. In other words, solutions aren’t just valuable for their potential to be implemented, but because of what they tell us about the shape of the problem 🗝️🚪🌎.
How can we tell which framings are likely to be fruitful and ones are likely to be not 👩🦰🍎🐍🌳? This isn’t an easy question, but my intuition is that a framing more likely to be worth pursuing if they lead to questions that feel like they should (❓) be being asked, but have been neglected due to unconscious framing effects 🙈🖼️. In contrast, a framing is less likely to be fruitful if it asks questions that seem interesting prima facie, but for which the best researchers within the existing paradigm have good answers for, even if most researchers do not 🏜️✨.
I acknowledge that “intelligence” is only rather loosely connect with the capabilities AI systems have in practise. Factors like prestige, economic demand and tractability play a larger role in terms of what capabilities are developed than the name used to describe the area of research. Nonethless, I think it’d be hard to argue that the framing effect hasn’t exterted any influence. So I think it is worthwhile understanding how this may have shaped the field and whether this influence has been for the best 🕵️🤔.
A common way that an incorrect paradigm can persist is if existing researchers don’t have good answers to particular questions, but see them as unimportant 🙄👉.
I have an intuition that it’s much more valuable for the AI safety and governance community to be generating talent with a distinct skillset/strengths than to just be recruiting more folk like those we already have 🧩. As we attract more people with the same skillset we likely experience decreasing marginal returns due to the highest impact roles already being filled 🙅♂️📉.
That said, given how fast AI development is going, I’m hoping the process of concretisation can be significantly sped up. Fortunately, I think it’ll be easier to do it this time as I’ve already seen what the process looked like for alignment 🔭🚀🤞.
First author 🥇.
Wei Dai who was originally announced as a judge withdrew 🕳️.
Including me 🏆🍀😊.
They write: “The precise opinions expressed in this post should not be taken as institutional views of AI Impacts, but as approximate views of the competition organizers” ☝️👌.
Linking to this article does not constitute an endorsement of Sofiechan or any other views shared there. Unfortunately, I am not aware of other discussions of this concept 😭🤷, so I will keep this link in this post temporarily 🙏🔜.
You could even say: with the right person, and to the right degree, and at the right time, and for the right purpose, and in the right way ❤️🦉.
Despite my contention on the associated paper post that focusing on wisdom in this sense is ducking the hard part of the alignment problem, I’ll stress here that it Iseems thoroughly useful if it’s a supplement not a substitute for work on the hard parts of the problem—technical, theoretical and societal.
I also think it’s going to be easier to create wise advisors than you think, at least in the weak sense that they make their human users effectively wiser.
In short, think simple prompting schemes and eventually agentic scaffolds can do a lot of the extra work it takes to turn knowledge into wisdom, and that there’s an incentive for orgs to train for “wisdom” in the sense you mean as well. So we’ll get wiser advisors as we go, at little or no extra effort. More effort would of course help more.
I believe Deep Research has already made me wiser. I can get a broader context for any given decision.
And that was primarily achieved by prompting; the o3 model that powers OpenAI’s version does seem to help but Perplexity introducing a nearly-as-good system just a week or two later indicates that just the right set of prompts were extremely valuable.
Current systems aren’t up to helping very much with the hypercomplex problems surrounding alignment. But they can now help a little. And any improvements will be a push in the right direction.
Training specifically for “wisdom” as you define it is a push toward a different type of useful capability, so it may be that frontier labs pursue similar training by default.
(As an aside, I think your “comparisons” are all wildly impractical and highly unlikely to be executed before we hit AGI, even on longer realistic estimates. It’s weird that they’re considered valid points of comparison, as all plans that will never be executed have exactly the same value. But that’s where we’re at in the project right now.)
To return from the tangent, I don’t think wise advisors is actually asking anyone to go far out of their default path toward capabilities. Wise advisors will help with everything, including things with lots of economic value, and with AGI alignment/survival planning.
I’ll throw in the caveat that fake wisdom is the opposite of helpful, and there’s a risk of getting sycophantic confabulations on important topics like alignment if you’re not really careful. Sycophantic AIs and humans collaborating to fuck up alignment in a complementarily-foolish clown show that no one will laugh it is now one of my leading models of doom after John Wentworth’s pointing it out.
That’s why I favor AI as a wisdom-aid rather than trying to make it wiser-than-human on its own- if it was, we’d have to trust it, and we probably shouldn’t truest AI more than humans until well past the alignment crunch.
Thanks for sharing your thoughts.
I agree that humans with wise AI advisors is a more promising approach, at least at first, then attempting to directly program wisdom into an autonomously acting agent.
Beyond that, I personally haven’t made up my mind yet about the best way to use wisdom tech.
Why do you think wise AI advisors avoid the general problems with other AI?
Well, we’re going to be training AI anyway. If we’re just training capabilities, but not wisdom, I think things are unlikely to go well. More thoughts on this here.
Hm, I thought this use of “wise” is almost identical to capabilities. It’s sort of like capabilities with less slop or confabulation, and probably more ability to take the context of the problem/question into account. Both of those are pretty valuable, although people might not want to bother even swerving capabilities in that direction.
I’ll post some extracts from the Seoul Summit. I can’t promise that this will be a particularly good summary, I was originally just writing this for myself, but maybe it’s helpful until someone publishes something that’s more polished:
Frontier AI Safety Commitments, AI Seoul Summit 2024
The major AI companies have agreed to Frontier AI Safety Commitments. In particular, they will publish a safety framework focused on severe risks: “internal and external red-teaming of frontier AI models and systems for severe and novel threats; to work toward information sharing; to invest in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights; to incentivize third-party discovery and reporting of issues and vulnerabilities; to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated; to publicly report model or system capabilities, limitations, and domains of appropriate and inappropriate use; to prioritize research on societal risks posed by frontier AI models and systems; and to develop and deploy frontier AI models and systems to help address the world’s greatest challenges”
″Risk assessments should consider model capabilities and the context in which they are developed and deployed”—I’d argue that the context in which it is deployed should account for whether it is open or closed source/weights
”They should also be accompanied by an explanation of how thresholds were decided upon, and by specific examples of situations where the models or systems would pose intolerable risk.”—always great to make policy concrete”
In the extreme, organisations commit not to develop or deploy a model or system at all, if mitigations cannot be applied to keep risks below the thresholds.”—Very important that when this is applied the ability to iterate on open-source/weight models is taken into account
https://www.gov.uk/government/publications/frontier-ai-safety-commitments-ai-seoul-summit-2024/frontier-ai-safety-commitments-ai-seoul-summit-2024
Seoul Declaration for safe, innovative and inclusive AI by participants attending the Leaders’ Session
Signed by Australia, Canada, the European Union, France, Germany, Italy, Japan, the Republic of Korea, the Republic of Singapore, the United Kingdom, and the United States of America.
”We support existing and ongoing efforts of the participants to this Declaration to create or expand AI safety institutes, research programmes and/or other relevant institutions including supervisory bodies, and we strive to promote cooperation on safety research and to share best practices by nurturing networks between these organizations”—guess we should now go full-throttle and push for the creation of national AI Safety institutes
“We recognise the importance of interoperability between AI governance frameworks”—useful for arguing we should copy things that have been implemented overseas.
“We recognize the particular responsibility of organizations developing and deploying frontier AI, and, in this regard, note the Frontier AI Safety Commitments.”—Important as Frontier AI needs to be treated as different from regular AI.
https://www.gov.uk/government/publications/seoul-declaration-for-safe-innovative-and-inclusive-ai-ai-seoul-summit-2024/seoul-declaration-for-safe-innovative-and-inclusive-ai-by-participants-attending-the-leaders-session-ai-seoul-summit-21-may-2024
Seoul Statement of Intent toward International Cooperation on AI Safety Science
Signed by the same countries.
“We commend the collective work to create or expand public and/or government-backed institutions, including AI Safety Institutes, that facilitate AI safety research, testing, and/or developing guidance to advance AI safety for commercially and publicly available AI systems”—similar to what we listed above, but more specifically focused on AI Safety Institutes which is a great.
”We acknowledge the need for a reliable, interdisciplinary, and reproducible body of evidence to inform policy efforts related to AI safety”—Really good! We don’t just want AIS Institutes to run current evaluation techniques on a bunch of models, but to be actively contributing to the development of AI safety as a science.
“We articulate our shared ambition to develop an international network among key partners to accelerate the advancement of the science of AI safety”—very important for them to share research among each other
https://www.gov.uk/government/publications/seoul-declaration-for-safe-innovative-and-inclusive-ai-ai-seoul-summit-2024/seoul-statement-of-intent-toward-international-cooperation-on-ai-safety-science-ai-seoul-summit-2024-annex
Seoul Ministerial Statement for advancing AI safety, innovation and inclusivity
Signed by: Australia, Canada, Chile, France, Germany, India, Indonesia, Israel, Italy, Japan, Kenya, Mexico, the Netherlands, Nigeria, New Zealand, the Philippines, the Republic of Korea, Rwanda, the Kingdom of Saudi Arabia, the Republic of Singapore, Spain, Switzerland, Türkiye, Ukraine, the United Arab Emirates, the United Kingdom, the United States of America, and the representative of the European Union
“It is imperative to guard against the full spectrum of AI risks, including risks posed by the deployment and use of current and frontier AI models or systems and those that may be designed, developed, deployed and used in future”—considering future risks is a very basic, but core principle
”Interpretability and explainability”—Happy to interpretability explicitly listed
”Identifying thresholds at which the risks posed by the design, development, deployment and use of frontier AI models or systems would be severe without appropriate mitigations”—important work, but could backfire if done poorly
”Criteria for assessing the risks posed by frontier AI models or systems may include consideration of capabilities, limitations and propensities, implemented safeguards, including robustness against malicious adversarial attacks and manipulation, foreseeable uses and misuses, deployment contexts, including the broader system into which an AI model may be integrated, reach, and other relevant risk factors.”—sensible, we need to ensure that the risks of open-sourcing and open-weight models are considered in terms of the ‘deployment context’ and ‘foreseeable uses and misuses’
”Assessing the risk posed by the design, development, deployment and use of frontier AI models or systems may involve defining and measuring model or system capabilities that could pose severe risks,”—very pleased to see a focus beyond just deployment
”We further recognise that such severe risks could be posed by the potential model or system capability or propensity to evade human oversight, including through safeguard circumvention, manipulation and deception, or autonomous replication and adaptation conducted without explicit human approval or permission. We note the importance of gathering further empirical data with regard to the risks from frontier AI models or systems with highly advanced agentic capabilities, at the same time as we acknowledge the necessity of preventing the misuse or misalignment of such models or systems, including by working with organisations developing and deploying frontier AI to implement appropriate safeguards, such as the capacity for meaningful human oversight”—this is massive. There was a real risk that these issues were going to be ignored, but this is now seeming less likely.
”We affirm the unique role of AI safety institutes and other relevant institutions to enhance international cooperation on AI risk management and increase global understanding in the realm of AI safety and security.”—“Unique role”, this is even better!
”We acknowledge the need to advance the science of AI safety and gather more empirical data with regard to certain risks, at the same time as we recognise the need to translate our collective understanding into empirically grounded, proactive measures with regard to capabilities that could result in severe risks. We plan to collaborate with the private sector, civil society and academia, to identify thresholds at which the level of risk posed by the design, development, deployment and use of frontier AI models or systems would be severe absent appropriate mitigations, and to define frontier AI model or system capabilities that could pose severe risks, with the ambition of developing proposals for consideration in advance of the AI Action Summit in France”—even better than above b/c it commits to a specific action and timeline
https://www.gov.uk/government/publications/seoul-ministerial-statement-for-advancing-ai-safety-innovation-and-inclusivity-ai-seoul-summit-2024
I don’t want to comment on the whole Leverage Controversy as I’m far away enough from the action that other people are probably better positioned to sensemake here.
On the other hand, I have been watching some of Geoff Anders’ streams does seem pretty good at theorising by virtue of being able to live-stream this. I expect this to be a lot harder than it looks, when I’m trying to figure out my position on an issue, I often find myself going over the same ground again and again and again, until eventually I figure out a way of putting what I want to express into words.
That said, I’ve occasionally debated with some high-level debaters and given almost any topic they’re able to pretty much effortlessly generate a case and how the debate is likely to play out. I guess it seems on par with this.
So I think his ability to livestream demonstrates a certain level of skill, but I almost view it as speed-chess vs. chess, in that there’s only so much you can tell about a person’s ability in normal chess from how good they are at speed chess.
I think I’ve improved my own ability to theorise by watching the streams, but I wouldn’t be surprised if I improved similarly from watching Eliezer, Anna or Duncan livestream their attempts to think through an issue. I also expect that there’s a similar chance I would have gained a significant proportion of the benefit just from watching someone with my abilities or even slightly worse on the basis of a) understanding the theorising process from the outside b) noticing where they frame things differently than I would have.
Trying to think about what is required to be a good debater:
general intelligence—to quickly understand the situation and lay out your response;
“talking” skills—large vocabulary, talking clearly, not being shy, body language and other status signals;
background knowledge—knowing the models, facts, frequently used arguments, etc.;
precomputed results—if you already spent a lot of time thinking about a topic, maybe even debating it.
These do not work the same way, for example clear talking and good body language generalize well; having lots of precomputed results in one area will not help you much in other areas (unless you use a lot of analogies to the area you are familiar with—if you do this the first time, you may impress people, but if you do this repeatedly, they will notice that you are a one-topic person).
I believe that watching good debaters in action would help. It might be even better to focus on different aspects separately (observing their body language, listening to how they use their voice, understanding their frames, etc.).
Any in particular, or what most of them are like?
In what ways do you believe you improved?
I think I’m more likely to realise that I haven’t hit the nail on the head and so I go back and give it another go.
Random idea: A lot of people seem discouraged from doing anything about AI Safety because it seems like such a big overwhelming problem.
What if there was a competition to encourage people to engage in low-effort actions towards AI safety, such as hosting a dinner for people who are interested, volunteering to run a session on AI safety for their local EA group, answering a couple of questions on the stampy wiki, offering to proof-read a few people’s posts or offering a few free tutorial sessions to aspiring AI Safety Researchers.
I think there’s a decent chance I could get this funded (prize might be $1000 for the best action and up to 5 prizes of $100 for random actions above a certain bar)
Possible downsides: Would be bad if people reach out to important people or the media without fully thinking stuff through, but can be mitigated by excluding those kinds of actions/ adding guidelines
Keen for thoughts or feedback.
I like it, and it’s worth trying out.
Those don’t seem like very low effort to me, but they will to some. Do they seem to you like they are effective (or at least impactful commensurate with the effort)? How would you know which ones to continue and what other types of thing to encourage?
I fear that it has much of the same problem that any direct involvement in AI safety does: what’s the feedback loop for whether it’s actually making a difference? Your initial suggestions seem more like actions toward activism and pop awareness, rather than actions toward AI Safety.
The nice thing about prizes and compensation is that it moves the question from the performer to the payer—the payer has to decide if it’s a good value. Small prizes or low comp means BOTH buyer and worker have to make the decision of whether this is worthwhile.
Solving the productivity-measurement problem itself seems overwhelming—it hasn’t happened even for money-grubbing businesses, let alone long-term x-risk organizations. But any steps toward it will do more than anything else I can think of to get broader and more effective participation. Being able to show that what I do makes a measurable difference, even through my natural cynicism and imposter syndrome, is key to my involvement.
I am not typical, so don’t take my concerns as the final word—this seems promising and relatively cheap (in money; it will take a fair bit of effort in guiding the sessions and preparing materials for the tutoring. Honestly, that’s probably more important than the actual prizes).
I guess they just feel like as good a starting place as any and are unlikely to be net-negative. That’s more important than anything else. The point is to instill agency so that people start looking for further opportunities to make a difference. I might have to write a few paragraphs of guidelines/suggestions for some of the most common potential activities.
I hadn’t really thought too much about follow-up, but maybe I should think more about it.
Here’s a crazy[1] idea that I had. But I think it’s an interesting thought experiment.
What if we programmed an AGI had the goal of simulating the Earth, but with one minor modification? In the simulation, we would have access to some kind of unfair advantage, like an early Eliezer Yudkowsky getting a mysterious message dropped on his desk containing a bunch of the progress we’ve made in AI Alignment.
So we’d all die in real life when the AGI broke out of its box and turned the Earth into compute to better simulate us, but we might survive in virtual reality, at least if you believe simulations to be conscious.
In other news, I may have just spoiled a short story I was thinking of writing.
#Chris’Phoenix #ForgetRokosBasilisk
And probably not very good.
I really dislike the fiction that we’re all rational beings. We really need to accept that sometimes people can’t share things with us. Stronger: not just accept but appreciate people who make this choice for their wisdom and tact. ALL of us have ideas that will strongly trigger us and if we’re honest and open-minded, we’ll be able recall situations when we unfairly judged someone because of a view that they held. I certainty can, way too many times to list.
I say this as someone who has a really strong sense of curiosity, knowing that I’ll feel slightly miffed when someone doesn’t feel comfortable being open with me. But it’s my job to deal with that, not the other person.
Don’t get me wrong. Openness and vulnerability are important. Just not *all* the time. Just not *everything*.
When you start identifying as a rationalist, the most important habit is saying “no” whenever someone says: “As a rationalist, you have to do X” or “If you won’t do X, you are not a true rationalist” etc. It is not a coincidence that X usually means you have to do what the other person wants for straightforward reasons.
Because some people will try using this against you. Realize that this usually means nothing more then “you exposed a potential weakness, they tried to exploit it” and is completely unrelated to the art of rationality.
(You can consider the merits of the argument, of course, but you should do it later, alone, when you are not under pressure. Don’t forget to use the outside view; the easiest way is to ask a few independent people.)
I’ve recently been reading about ordinary language philosophy and I noticed that some of their views align quite significantly with LW. They believed that many traditional philosophical question only seemed troubling because of the philosophical tendency to assume words like “time” or “free will” necessarily referred to some kind of abstract entity when this wasn’t necessary at all. Instead they argued that by paying attention to how we used these words in ordinary, everyday situations we could see that the way people used these words didn’t need to assume these abstract entities and that we could dissolve the question.
I found it interesting that the comment thread on dissolving the question makes no reference to this movement. It doesn’t reference Wittgenstein either who also tried to dissolve questions.
(https://www.lesswrong.com/posts/Mc6QcrsbH5NRXbCRX/dissolving-the-question)
Is that surprising? It’s not as if the rationalsphere performed some comprehensive survey of philosophy before announcing the superiority of its own methods.
From my perspective, saying that “this philosophical opinion is kinda like this Less Wrong article” sounds like “this prophecy by Nostradamus, if you squint hard enough, predicts coronavirus in 2020″. What I mean is that if you publish huge amounts of text open to interpretation, it is not surprising that you can find there analogies to many things. I would not be surprised to find something similar in the Bible; I am not surprised to find something similar in philosophy. (I would not be surprised to also find a famous philosopher who said the opposite.) In philosophy, the generation of text is distributed, so some philosophers likely have track record much better than the average of their discipline. Unfortunately—as far as I know—philosophy as a discipline doesn’t have a mechanism to say “these ideas of these philosophers are the good ones, and this is wrong”. At least my time at philosophy lessons was wasted listening to what Plato said, without a shred of ”...and according to our current scientific knowledge, this is true, and this is not”.
Also, it seems to me that philosophers were masters of clickbait millenia before clickbait was a thing. For example, a philosopher is rarely satisfied by saying things like “human bodies are composed of 80% water” or “most atoms in the universe are hydrogen atoms”. Instead, it is typically “everything is water”. (Or “everything is fire”. Or “everything is an interaction of quantum fields”… oops, the last one was actually not said by a philosopher; what a coincidence.) Perhaps this is selection bias. Maybe people who walked around ancient Greece half-naked and said things like “2/3 of everything is water” existed, but didn’t draw sufficient attention. But if this is true, it would mean that philosophy optimizes for shock value instead of truth value.
So, without having read Wittgenstein, my priors are that he most likely considered all words confused; yes, words like “time” and “free will”, but also words like “apple” and “five”. (And then there was Plato who assumed that there was a perfect idea of “apple” and a perfect idea of “time”.)
Now I am not saying that everything written by Wittgenstein (or other philosophers) is worthless. I am saying that in philosophy there are good ideas mixed with bad ones, and even the good ones are usually exaggerated. And unless someone does the hard work of separating the wheat from chaff, I’d rather ignore philosophy, and read sources that have better signal-to-noise ratio.
I won’t pretend that I have a strong understanding here, but as far as I can tell, (Later) Wittgenstein and the Ordinary Language Philosophers considered our conception of the number “five” existing as an abstract object as mistaken and would instead explain how it is used and consider that as a complete explanation. This isn’t an unreasonable position, like I honestly don’t know what numbers are and if we say they are an abstract entity it’s hard to say what kind of entity.
Regarding the word “apple” Wittgenstein would likely say attempts to give it a precise definition are doomed to failure because there are an almost infinite number of contexts or ways in which it can be used. We can strongly state “Apple!” as a kind of command to give us one, or shout it to indicate “Get out of the way, there is an apple coming towards you” or “Please I need an Apple to avoid starving”. But this is only saying attempts to spec out a precise definition are confused, not the underlying thing itself.
(Actually, apparently Wittgenstein consider attempts to talk about concepts like God or morality as necessarily confused, but thought that they could still be highly meaningful, possibly the most meaningful things)
These are all good points. I could agree that all words are to some degree confused, but I would insist that some of them are way more confused than others. Otherwise, the very act of explaining anything would be meaningless: we would explain one word by a bunch of words, equally confusing.
If the word “five” is nonsense, I can take the Wittgenstein’s essay explaining why it is nonsense, and say that each word in that essay is just a command that we can shout at someone, but otherwise is empty of meaning. This would seem to me like an example of intelligence defeating itself.
Wittgenstein didn’t think that everything was a command or request; his point was that making factual claims about the world is just one particular use of language that some philosophers (including early Wittgenstein) had hyper-focused on.
Anyway, his claim wasn’t that “five” was nonsense, just that when we understood how five was used there was nothing further for us to learn. I don’t know if he’d even say that the abstract concept five was nonsense, he might just say that any talk about the abstract concept would inevitably be nonsense or unjustified metaphysical speculation.
These are situations where I woud like to give a specific question to the philosopher. In this case it would be: “Is being a prime number a property of number five, or is it just that we decided to use it as a prime number?”
I honestly have no idea how he’d answer, but here’s one guess. Maybe we could tie prime numbers to one of a number of processes for determining primeness. We could observe that those processes always return true for 5, so in a sense primeness is a property of five.
Book Review: Waking Up by Sam Harris
This book aims to convince everyone, even skeptics and athiests, that there is value in some spiritual practises, particularly those related to meditation. Sam Harris argues that mediation doesn’t just help with concentration, but can also help us reach transcendental states that reveal the dissolution of the self. It mostly does a good job of what it sets out to do, but unfortunately I didn’t gain very much benefit from this book because it focused almost exclusively on persuading you that there is value here, which I already accepted, rather than providing practical instructions.
One area where I was less convinced was his claims about there not being a self. He writes that when meditating allows you to directly experience this, but worry he hasn’t applied sufficient skepticism. If you experience flying through space in an altered mental, it doesn’t mean that you are really flying through space. Similarly, how do we know that he is experiencing the lack of a self, rather than the illusion of there being no self?
I was surprised to see that Sam was skeptical of a common materialist belief that I had expected him to endorse. Many materialists argue against the notion of philosophical-zombies by arguing that if it seems conscious we should assume it is conscious. However, Sam Harris argues that the phenomenon of anaesthesia awareness, waking up completely paralysed during surgery, shows that there isn’t always a direct link between appearing conscious and actual consciousness. (Dreams seem to imply the same point, if less dramatically). Given the strength of this argument, I’m surprised that I haven’t heard it before.
Sam also argues that split-brain patients imply that consciousness is divisible. While split-brain patients actually still possess some level of connection between the two halves, I still consider this phenomenon to be persuasive evidence that this is the case. After all, it is possible for the two halves to have completely different beliefs and objectives without either side being aware of these.
On meditation, Sam is a fan of the Dzogchen approach that directly aims at experiencing no-self, rather than the slower, more gradual approaches. This is because waiting years for a payoff is incredibly discouraging and because practises like paying attention to the sensation of breath reinforce the notion of the self which meditation seeks to undermine. At the same time, he doesn’t fully embrace this style of teaching, arguing that the claim every realisation is permanent is dangerous as it leads to treating people as role models even when their practise is flawed.
Sam argues against the notion of gurus being perfect; they are just humans like the rest of us. He notes that is hard to draw the line between practises that lead to enlightenment and abuse; indeed he argues that a practise can provide spiritual insight AND be abusive. He notes that the reason why abuse seems to occur again and again is that when people seek out a guru it’s because they’ve arrived at the point where they realise that there is so much that they don’t know and they need the help of someone who does.
He also argues against assuming mediative experiences provide metaphysical insights. He points out that they are often the same experiences that people have on psychedelics. In fact, he argues that for some people having a psychedelic experience is vital for their spiritual development as it demonstrates that there really are other brain states out there. He also discusses near death experiences and again dismisses claims that they provide insight into the afterlife—they match experiences people have on drugs and they seem to vary by culture.
Further points:
- Sam talked about experiencing universal love while on DMT. Many religions contain this idea of universal love, but he couldn’t appreciate it until he had this experience
- He argues that it is impossible to stay angry for more than a few seconds without continuously thinking thoughts to keep us angry. To demonstrates this, he asks us to imagine that we receive an important phone call. Most likely we will put our anger aside.
Recommended reading:
- https://samharris.org/a-plea-for-spirituality/
- https://samharris.org/our-narrow-definition-of-science/
FWIW no self is a bad reification/translation of not self, and the overhwleming majority seem to be metaphysically confused about something that is just one more tool rather than some sort of central metaphysical doctrine. When directly questioned “is there such a thing as the self” the Buddha is famously mum.
What’s the difference between no self and not self?
No-self is an ontological claim about everyone’s phenomenology. Not self is a mental state that people can enter where they dis-identify with the contents of consciousness.
One of the problems with the general anti zombie principle, is that it makes much too strong a claim that what appears conscious, must be.
There appears to be something of a Sensemaking community developing on the internet, which could roughly be described as a spirituality-inspired attempt at epistemology. This includes Rebel Wisdom, Future Thinkers, Emerge and maybe you could even count post-rationality. While there are undoubtedly lots of critiques that could be made of their epistemics, I’d suggest watching this space as I think some interesting ideas will emerge out of it.
Review: Human-Compatible by Stuart Russell
I wasn’t a fan of this book, but maybe that’s just because I’m not in the target audience. As a first introduction to AI safety I recommend The AI Does Not Hate You by Tom Chivers (facebook.com/casebash/posts/10100403295741091) and for those who are interested in going deeper I’d recommend Superintelligence by Nick Bostrom. The strongest chapter was his assault on arguments against those who think we shouldn’t worry about superintelligence, but you can just read it here: https://spectrum.ieee.org/…/many-experts-say-we-shouldnt-wo…).
I learned barely anything that was new from this book. Even when it came to Russell’s own approach, Co-operative Reinforcement Learning, I felt that the treatment was shallow (I won’t write about this approach until I’ve had a chance to review it directly again). There were a few interesting ideas that I’ll list below, but I was surprised by how little I’d learned by the end. There’s a decent explanation of some very basic concepts within AI, but this was covered in a way that was far too shallow for me to recommend it.
Interesting ideas/quotes:
- More processing power won’t solve AI without better algorithms. It simply gets you the wrong answer faster
- Language bootstrapping: Comprehension is dependent on knowing facts and extracting facts is dependent on comprehension. You might think that we could bootstrap an AI using easy to comprehend text, but in practise we end up extracting incorrect facts that scrambled further comprehension
- We have an advantage with predicting humans as we have a human mind to simulate with; it’ll take longer for AIs to develop this ability
- He suggests that we have a right to mental security and that it is naive to trust that the truth will win out. Unfortunately, he doesn’t address any of the unfortunate concerns
- By default, a utility maximiser won’t want us to turn it off as that would interfere with its goals. We could reward it when we turn it off, but that could incentivise it to manipulate it to turn us off. Instead, if the utility maximiser is trying to optimise for our reward function and it is uncertain about what it is, then it would let us turn it off
- We might decide that we don’t want to satisfy all preferences, for example, we mightn’t feel any obligation to take into account preferences that are sadistic, vindictive or spiteful. But refusing to consider these preferences could unforeseen consequences, what if envy can’t be ignored as a factor without destroying our self-esteem?
- It’s hard to tell if an experience has taught someone more about preferences or changed their preferences (at least without looking into their brain. In either case the response is the same.
- We want robots to avoid interpreting commands too literally, as opposed to information about human preferences. For example, if I ask a robot to fetch a cup of coffee, I assume that the nearest outlet isn’t the next city over nor that it will cost $100. We don’t want the robot to fetch it at all costs.
Despite having read dozens of articles discussing Evidential Decision Theory (EDT), I’ve only just figured out a clear and concise explanation of what it is. Taking a step back, let’s look at how this is normally explained and one potential issue with this explanation. All major decision theories (EDT, CDT, FDT) rate potential decisions using expected value calculations where:
Each theory uses a different notion of probability for the outcomes
Each theory uses the same utility function for valuing the outcomes
So it should be just a simple matter of stating what the probability function is. EDT is normally explained as using P(O|S & D) where O is the outcome, S is the prior state and D is the decision. At this point it seems like this couldn’t possibly fail to be what we want. Indeed, if S described all state, then there wouldn’t be the possibility of making the smoking lesion argument.
However, that’s because it fails to differentiate between hidden state and visible state. EDT uses visible state, so we can write it as P(O|V & D). The probability distribution of O actually depends on H as well, ie. it is some function f(V, H, D). In most cases H is uncorrelated with D, but this isn’t always necessarily the case. So what might look like the direct effect of V and D on P might actually turn out to be the indirect effects of D affecting our expected distribution of H then affecting P. For example, in Smoking Lesion, we might see ourselves scoring poorly in the counterfactual where we smoke and we assume that this is because of our decision. However, this ignores the fact that when we smoke, H is likely to contain the lesion and also cancer. So we think we’ve set up a fair playing field for deciding between smoking and non-smoking, but we haven’t because of the differences in H.
Or to summarise: “The decision can correlate with hidden state, which can affect the probability distribution of outcomes”. Maybe this is already obvious to everyone, but this was the key I need to be able to internalise these ideas on an intuitive level.
Anti-induction and Self-Reinforcement
Induction is the belief that the more often a pattern happens the more likely it is to continue. Anti-induction is the opposite claim: the more likely a pattern happens the less likely future events are to follow it.
Somehow I seem to have gotten the idea in my head that anti-induction is self-reinforcing. The argument for it is as follows: Suppose we have a game where at each step a screen flashes an A or a B and we try to predict what it will show. Suppose that the screen always flashes A, but the agent initially thinks that the screen is more likely to display B. So it guesses B, observes that it guessed incorrectly and then, if it is an anti-inductive agent will increase it’s likelihood that the next symbol will be B because of anti-induction. So in this scenario your confidence that the next symbol will be B, despite the long stream of As, will keep increasing. This particular anti-inductive belief is self-reinforcing.
However, there is a sense in which anti-induction is contradictory—if you observe anti-induction working, then you should update towards it not working in the future. I suppose the distinction here is that we are using anti-induction to update our beliefs on anti-induction and not just our concrete beliefs. And each of these is a valid update rule: in the first we apply this update rule to everything including itself and in the other we apply this update rule to things other than itself. The idea of a rule applying to everything except itself feels suspicious, but is not invalid.
Also, it’s not that the anti-inductive belief that B will be next is self-reinforcing. After all, anti-induction given consistent As pushes you towards believing B more and more regardless of what you believe initially. In other words, it’s more of an attractor state.
The best reason to believe in anti-induction is that it’s never worked before. Discussed at a bit of depth in https://www.lesswrong.com/posts/zmSuDDFE4dicqd4Hg/you-only-need-faith-in-two-things .
Here’s one way of explaining this: it’s a contradiction to have a provable statement that is unprovable, but it’s not a contradiction for it to be provable that a statement is unprovable. Similarly, we can’t have a scenario that is simultaneously imagined and not imagined, but we can coherently imagine a scenario where things exist without being imagined by beings within that scenario.
Rob Besinger:
Inverted, by switching “provable” and “unprovable”:
It’s a contradiction to have an unprovable statement that is provable, but it’s not a contradiction for it to be unprovable that a statement is provable.
“It’s a contradiction to have a provable statement that is unprovable”—I meant it’s a contradiction for a statement to be both provable and unprovable.
“It’s not a contradiction for it to be provable that a statement is unprovable”—this isn’t a contradiction
You made a good point, so I inverted it. I think I agree with your statements in this thread completely. (So far, absent any future change.) My prior comment was not intended to indicate an error in your statements. (So far, in this thread.)
If there is a way I could make this more clear in the future, suggestions would be appreciated.
Elaborating on my prior comment via interpretation, so that it’s meaning is clear, if more specified*:
A’ is the same as A because:
While B is true, B’ seems false (unless I’m missing something). But in a different sense B’ could be true. What does it mean for something to be provable? It means that ‘it can be proved’. This gives two definitions:
a proof of X “exists”
it is possible to make a proof of X
Perhaps a proof may ‘exist’ such that it cannot exist (in this universe). That as a consequence of its length, and complexity, and bounds implied by the ‘laws of physics’* on what can be represented, constructing this proof is impossible. In this sense, X may be true, but if no proof of X may exist in this universe, then:
Something may have the property that it is “provable”, but impossible to prove (in this universe).**
*Other interpretations may exist, and as I am not aware of them, I think they’d be interesting.
**This is a conjecture.
Thanks for clarifying
Book Review: Awaken the Giant Within Audiobook by Tony Robbins
First things first, the audiobook isn’t the full book or anything close to it. The standard book is 544 pages, while the audiobook is a little over an hour and a half. The fact that it was abridged really wasn’t obvious.
We can split what he offers into two main categories: motivational speaking and his system itself. The motivational aspect of his speaking is very subjective, so I’ll leave it to you to evaluate yourself. You can find videos of his on Youtube and you should know within a few minutes whether you like his style.
Instead I’ll focus on reviewing his system. The first key aspect Robbins focuses on what he calls neuro-associations; that is what experiences we link pleasure and pain to. While we may be able to maintain a habit using willpower in the short-term, Robbins believes that in order to maintain it over the long term we need to change our neuro-associations to link please to actions that are good for us and pain to actions that are bad for us.
He argues that we can attach positive or negative neuro-associations to an action by making the advantages or disadvantages as salient as possible. The images on packs of cigarettes are a good example of that principle in action, as would be looking the scans of people who have lung cancer. In addition, we can reward ourselves for success (though he doesn’t discuss the possibility of punishing yourself for failure). This seems like a plausible method for affecting change and one that seems worthwhile experimenting with, although I’ve never experienced much motivation from rewarding myself as it doesn’t really feel like the action is connected to the reward.
The second key aspect of his system is to draw a distinction between decisions and preferences. Most of the time when we say that we’ve decided to do something, such as going to the gym, we’re only just saying that we were prefer that to happen. We haven’t really decided that we WILL do what we’ve said, come what may.
Robbins see the ability to make decisions that we are strongly committed to as key to success. For that reason he recommends practising using our “decision muscles” to strengthen them, so that they are ready when needed. This seems like good advice. Personally, I think it’s important to be honest with yourself about when you have a preference and when you’ve actually made a decision in Robbin’s sense. After all, committed decisions take energy and have a cost as sometimes you’ll commit to something that is a mistake, so it’s important to be selective about what you are truly committed to as otherwise you may end up committed to nothing at all.
There are lots more elements to his system, but those two particular ones are at the core and seemed to be the most distinctive aspects of this book. It’s hard to review such a system without having tried it, but my current position is as follows: I could see myself listening to another one of his audiobooks, although it isn’t really a priority for me.
The sad thing about philosophy is that as your answers become clearer, the questions become less mysterious and awe-inspiring. It’s easy to assume that an imposing question must have an impressive answer, but sometimes the truth is just simple and unimpressive and we miss this because we didn’t evolve for this kind of abstract reasoning.
Examples?
I used to find the discussion of free will interesting before I learned it was just people talking past each other. Same with “light is both a wave and a particle” until I understood that it just meant that sometimes the wave model is a good approximation and other times the particle model is. Debates about morality can be interesting, but much less so if you are a utilitarian or non-realist.
Semantic differences almost always happen, but are rarely the only problem.
There are certainly different definitions of free will, but even so problems, remain:-
There is still an open question as to whether compatibilist free will is the only kind anyone ever needed or believed in, and as to whether libertarian free will is possible at all.
The topic is interesting, but no discussion about it is interesting. These are not contradictory.
The open question about strong determinism vs libertarian free will is interesting, and there is a yet-unexplained contradiction between my felt experience (and others reported experiences) and my fundamental physical model of the universe. The fact that nobody has any alternative model or evidence (or even ideas about what evidence is possible) that helps with this interesting question makes the discussion uninteresting.
So Yudkowsky’s theory isn’t new?
Not new that I could tell—it is a refreshing clarity for strict determinism—free will is an illusion, and “possible” is in the map, not the territory. “Deciding” is how a brain feels as it executes it’s algorithm and takes the predetermined (but not previously known) path.
He does not resolve the conflict that it feels SOOO real as it happens.
That’s an odd thing to say since the feeling of free will is about the only thing be addresses.
I’m going to start writing up short book reviews as I know from past experience that it’s very easy to read a book and then come out a few years later with absolutely no knowledge of what was learned.
Book Review: Everything is F*cked: A Book About Hope
To be honest, the main reason why I read this book was because I had enjoyed his first and second books (Models and The Subtle Art of Not Giving A F*ck) and so I was willing to take a risk. There were definitely some interesting ideas here, but I’d already received many of these through other sources: Harrari, Buddhism, talks on Nietzsche, summaries of The True Believer; so I didn’t gain as much from this as I’d hoped.
It’s fascinating how a number of thinkers have recently converged on the lack of meaning within modern society. Yuval Harrari argues that modernity has essentially been a deal sacrificing meaning for power. He believes that the lack of meaning could eventually lead to societal breakdown and for this reason he argued that we need to embrace shared narratives that aren’t strictly true (religion without gods if you will; he personally follows Buddhism). Jordan Peterson also worries about a lack of meaning, but seeks to “revive God” as someone kind of metaphorical entity.
Mark Manson is much more skeptical, but his book does start asking similar lines. He tells the story of gaining meaning from his grandfather’s death by trying to make him proud although this was kind of silly as they hadn’t been particularly close or even talked recently. Nonetheless, he felt that this sense of purpose had made him a better person and improved his ability to achieve his goals. Mark argues that we can’t draw motivation from our thinking brain and that we need these kinds of narratives to reach our emotional brain instead.
However, he argues that there’s also a downside to hope. People who are dissatisfied with their lives can easily fall prey to ideological movements which promise a better future, especially when they feel a need for hope. In other words, there is both good and bad hope. It isn’t especially clear what the difference is in the book, but he explained to me in an email that his main concern was how movements cause people to detach from reality.
His solution is to embrace Nietzsche concept of Amor Fati—that is a love of one’s fate whatever it may be. Even though this is also a narrative itself, he believes that it isn’t so harmful as unlike other “religions” it doesn’t require us to detach from reality. My main takeaway was his framing of the need for hope as risky. Hope is normally assumed to be good; now I’m less likely to make this assumption.
It was fascinating to see how he put his own tact on this issue and it certainly isn’t a bad book, but there just wasn’t enough new content for me. Maybe others who haven’t been exposed to some of these ideas will be more enthused, but I’ve read his blog so most of the content wasn’t novel to me.
Further thoughts: After reading the story of his Grandfather, I honestly was expecting him to to propose avoiding sourcing our hope from big all-encapsulating narratives in favour of micro-narratives, but he didn’t end up going this direction.
Random thought: We should expect LLM’s trained on user responses to have much more situational knowledge than early LLM’s trained on the pre-Chatbot internet because users will occasionally make reference to the meta-context.
It may be possible to get some of this information from pre-training on chatlogs/excerpts that make their way onto the internet, but the information won’t be quite as accessible because of differences in the context.
For the record, I see the new field of “economics of transformative AI” as overrated.
Economics has some useful frames, but it also tilts people towards being too “normy” on the impacts of AI and it doesn’t have a very good track record on advanced AI so far.
I’d much rather see multidisciplinary programs/conferences/research projects, including economics as just one of the perspectives represented, then economics of transformative AI qua economics of transformative AI.
(I’d be more enthusiastic about building economics of transformative AI as a field if we were starting five years ago, but these things take time and it’s pretty late in the game now, so I’m less enthusiastic about investing field-building effort here and more enthusiastic about pragmatic projects combining a variety of frames).
Sharing this resource doc on AI Safety & Entrepreneurship that I created in case anyone finds this helpful:
https://docs.google.com/document/d/1m_5UUGf7do-H1yyl1uhcQ-O3EkWTwsHIxIQ1ooaxvEE/edit?usp=sharing
I was talking with Rupert McCallum about the simulation hypothesis yesterday. Rupert suggested that this argument is self-defeating; that is it pulls the rug from under its own feet. It assumes the universe has particular properties, then it tries to estimate the probability of being in a simulation from these properties and if the probability is sufficiently high, then we conclude that we are in a simulation. But if we are likely to be in a simulation, then our initial assumptions about the universe are likely to be false, so we’ve disproved the assumptions we relied on to obtain these probabilities.
This all seems correct to me, although I don’t see this as a fatal argument. Let’s suppose we start by assuming that the universe has particular properties AND that we are not a simulation. We can then estimate the odds of someone with your kind of experiences being in a simulation within these assumptions. If the probability is low, then our assumption will be self-consistent, but the if probability is sufficiently high, then it become probabilistically self-defeating. We would have to adopt different assumptions. And maybe the most sensible update would be to believe that we are in a simulation, but maybe it’d be more sensible to assume we were wrong about the properties of the universe. And maybe there’s still scope to argue that we should do the former.
This counterargument was suggested before by Danila Medvedev and it doesn’t’ work. The reasons are following: if we are in a simulation, we can’t say anything about the outside world—but we are still in simulation and this is what was needed to be proved.
“This is what was needed to be proved”—yeah, but we’ve undermined the proof. That’s why I backed up and reformulated the argument in the second paragraph.
One more way to prove simulation argument is a general observation that explanations which have lower computational cost are dominating my experience (that is, a variant of Occam Razor). If I see a nuclear explosion, it is more likely to be a dream, a movie or a photo. Thus cheap simulations should be more numerous than real worlds and we are likely to be in it.
It’s been a while since I read the paper, but wasn’t the whole argument around people wanting to simulate different versions of their world and population? There’s a baked in assumption that worlds similar to ones own are therefore more likely to be simulated.
Yeah, that’s possible. Good point!
Three levels of forgiveness—emotions, drives and obligations. The emotional level consists of your instinctual anger, rage, disappointment, betrayal, confusion or fear. This is about raw raws. The drives consists of your “need” for them to say sorry, make amends, regret their actions, have a conversation or emphasise with you. In other words, it’s about needing the situation to turn out a particular way. The obligations are very similar to the drives, except it is about their duty to perform these actions rather than your desire to make it happen.
Someone can forgive all of these levels. Suppose someone says that they are sorry and the other person “there is nothing to forgive”. Then perhaps they mean that there was no harm or that they have completely forgiven all levels.
Alternatively, someone might forgive on one-level, but not another. For example, it seems that most of the harm of holding onto a grudge comes from the emotional level and the drives level, but less from the duties level.
The phrase “an eye for an eye” could be construed as duty—that the wrong another does you is a debt you have to repay. (Possibly inflated, or with interest. It’s also been argued that it’s about (motivating) recompense—you pay the price for taking another’s eye, or you lose yours.)
Interesting point, but you’re using duty differently than me. I’m talking about their duties towards you. Of course, we could have divided it another way or added extra levels.
Your duties (towards others) may include what you are supposed to do if others don’t fulfill their duties (towards you).
Writing has been one of the best things for improving my thinking as it has forced me to solidify my ideas into a form that I’ve been able to come back to later and critique when I’m less enraptured by them. On the other hand, for some people it might be the worst thing for their thinking as it could force them to solidify their ideas into a form that they’ll later feel compelled to defend.
Book Review: The Rosie Project:
Plot summary: After a disastrous series of dates, autistic genetics professor Don Tilman decides that it’d be easier to just create a survey to eliminate all of the women who would be unsuitable for him. Soon after, he meets a barmaid called Rosie who is looking for help with finding out who her father is. Don agrees to help her, but over the course of the project Don finds himself increasingly attracted to her, even though the survey suggests that he is completely unsuitable. The story is narrated in Don’s voice. He tells us all about his social mishaps, while also providing some extremely straight-shooting observations on society
Should I read this?: If you’re on the fence, I recommend listening to a couple of minutes as the tone is remarkably consistent throughout, but without becoming stale
My thoughts: I found it to be very humorous. but without making fun of Don. We hear the story from his perspective and he manages to be a very sympathetic character. The romance manages to be relatively believable since Don manages to establish himself as having many attractive qualities despite his limited social sills. However, I couldn’t believe that he’d think of Rosie as “the most beautiful woman in the world”; that kind of romantic idealisation is just too inconsistent with his character. His ability to learn skills quickly also stretched credibility, but it felt more believable after he dramatically failed during one instance. I felt that Don’s character development was solid; I did think that he’d struggle more to change his schedule after keeping it rigid for so long, but that wasn’t a major issue for me. I appreciated that by the end he had made significant growth (less strict on his expectations for a partner, not sticking so rigidly to a schedule, being more accomodating of other people’s faults), but he was still largely himself.
Doublechecking, this is fiction?
Yep, fiction
I think I spent more time writing this than reading the book, as I find reviewing fiction much more difficult. I strongly recommend this book: it doesn’t take very long to read, but you may spend much longer trying to figure out what to make of it.
Book Review: The Stranger by Camus (Contains spoilers)
I’ve been wanting to read some existentialist writing for a while and it seemed reasonable to start with a short book like this one. The story is about a man who kills a man for what seems to be no real reason at all and who is then subsequently arrested and must come to terms with his fate. It grapples with issues such as the meaning of life, the inevitability of death and the expectations of society.
This book that works perfectly as an audio-book because it’s written in the first person and its a stream of consciousness. In particular, you can just let the thoughts wash over you then pass away, in a way that you can’t with a physical book.
This book starts with the death of Mersault’s mother and his resulting indifference. Mersault almost entirely lacks any direction or purpose in life—not caring about opportunities at work, Salamano abusing his dog or whether or not he marries Marie. Not much of a hint is given at his detachment, except him noting that he had a lot of ambition as a young man, but gave up on such dreams when he had to give up his education.
Despite his complete disillusionment, it’s not that he cares about nothing at all. Without optimism, he has no reason to plan for the future. Instead, he focuses almost exclusively on the moment—being friends with Raymond because he has no reason not to, being with Marie because she brings him pleasure in the present, more tragically, shooting the Arab for flashing the sun in his eyes with a knife.
In my interpretation, Mersault never formed a strong intent to kill him, but just drifted into it. He didn’t plan to have the gun with him, but simply took it to stop Raymond acting rashly. He hadn’t planned to create a confrontation; he just returned to the beach to cool off, then assumed that the Arab was far enough away to avoid any issues. When the Arab pulled out his knife, it must have seemed natural to pull out his gun. Then, with the heat clouding his judgement, his in-the-moment desire to make the situation go away; and his complete detachment from caring, he ends up killing a man when he didn’t need to as he was still far away. Then after he’s fired the first shot, he likely felt like he’d made his choice and that there was then nothing left to do but fire the next four.
While detachment involves no optimism in the emotional sense, in terms of logic it isn’t entirely pessimistic. After all, someone who is detached by their lack of care assumes that things cannot become significantly worse. Mersault falls victim to this trap and in the end it costs him dearly. This occurs not just when he shoots the Arab, but throughout the legal process where he shows what seems like a stunning naivety, completely unaware of what he has to lose until he is pretty much told he is to be executed.
I found his trial to be one of the most engaging parts of the book. A man is dead, but the circumstances relating to this death are almost tangential to the whole thing. Instead, the trial focuses much more on tangential factors such as whether he had felt a sufficient amount of grief for his mother and his association with a known low-life Raymond. This passage felt like a true illustration of human nature; in particular our tendency to fit everything into a particular narrative and also how “justice” can often end up being more about our disgust at the perpetrator as a person than about what they’ve done. Mersault undoubtedly deserves punishment for pulling the trigger early, but the trial he was given was a clear miscarriage of justice.
This book does a good job of illustrating the absurdity of life. How much of our daily lives are trivial, the contradictions in much of human behaviour, the irrationality of many of our social expectations and how our potential sources of meaning fail to be fundamentally meaningful. But then how also how we can find meaning in things that are meaningless.
Indeed, it is only his imprisonment that really makes him value life outside and it is only his impending execution that makes him value life itself. He survives prison by drawing pleasure from simple things, like seeing what tie his defence later will wear and that his happiness does not have to be constrained by his unfortunate circumstances. Mersault ultimately realises that he has to make his own purpose, instead of just expecting it to be out there in the universe.
Further thoughts: One of the most striking sub-plots in this book is that of Salamano and his dog. Salamano is constantly abusing his dog and complaining about how bad it’s behaviour is, but when the dog runs away, Salamano despairs about what will happen to him now that he no longer has the dog. This is a perfect example of just how absurd human actions can be both generally and particularly when we are in denial about our true feelings.
Pet theory about meditation: Lots of people say that if you do enough meditation that you will eventually realise that there isn’t a self. Having not experienced this myself, I am intensely curious about what people observe that persuades them to conclude this. I guess I get a sense that many people are being insufficiently skeptical. There’s a difference between there not appearing to be such a thing as a self and a self not existing. Indeed, how do we know meditation just doesn’t temporarily silence whatever part of our mind is responsible for self-hood?
Recently, I saw a quote from Sam Harris that makes me think I might (emphasis on might) finally know what people are experiencing. In a podcast with Eric Winstein he explains that he believes there isn’t a self because, “consciousness is an open space where everything is appearing—that doesn’t really answer to I or me”. The first part seems to mirror Global Workspace Theory, the idea (super roughly) that there is a part of the brain for synthesising thoughts from various parts of the brain which can only pay attention to one thought at a time.
The second part of Sam Harris’ sentence seems to say that this Global Workspace “doesn’t answer to I or me”. This is still vague, but it sounds like there is a part of the brain that identifies as “I or me” that is separate from this Global Workspace or that there are multiple parts that are separate from the Global Workspace and don’t identify as “I or me”. In the first of these sub-interpretations, “no-self” would merely mean that our “self” is just another sub-agent and not the whole of us. In the second of these sub-interpretations, it would additionally be true that we don’t have a unitary self, but multiple fragments of self-hood.
Anyway, as I said, I haven’t experienced no-self, but curious to see if this resonates with people who have.
Placeholder for an experimental art project — Under construction 🚧[1]
Art in the Age of the Internet
𝕯𝖔𝖔𝖒؟
𝒽𝑜𝓌 𝓉𝑜 𝒷𝑒𝑔𝒾𝓃? 𝓌𝒽𝒶𝓉 𝒶𝒷𝑜𝓊𝓉 𝒶𝓉 𝕿𝖍𝖊 𝕰𝖓𝖉?[5]
𝕿𝖍𝖊 𝕰𝖓𝖉? 𝕚𝕤 𝕚𝕥 𝕣𝕖𝕒𝕝𝕝𝕪 𝕿𝖍𝖊 𝕰𝖓𝖉?
𝓎𝑒𝓈. 𝒾𝓉 𝒾𝓈 𝕿𝖍𝖊 𝕰𝖓𝖉. 𝑜𝓇 𝓂𝒶𝓎𝒷𝑒 𝒯𝒽ℯ 𝐵ℯℊ𝒾𝓃𝓃𝒾𝓃ℊ.
𝓌𝒽𝒶𝓉𝑒𝓋𝑒𝓇 𝓉𝒽𝑒 𝒸𝒶𝓈𝑒, 𝒾𝓉 𝒾𝓈 𝒶𝓃 𝑒𝓃𝒹.[6]
Ilya: The AI scientist shaping the world
Journal
There’s No Rule That Says We’ll Make It — Rob Miles
More
MIRI announces new “Death With Dignity” strategy, April 2nd, 2022
Well, let’s be frank here. MIRI didn’t solve AGI alignment and at least knows that it didn’t. Paul Christiano’s incredibly complicated schemes have no chance of working in real life before DeepMind destroys the world. Chris Olah’s transparency work, at current rates of progress, will at best let somebody at DeepMind give a highly speculative warning about how the current set of enormous inscrutable tensors, inside a system that was recompiled three weeks ago and has now been training by gradient descent for 20 days, might possibly be planning to start trying to deceive its operators.
Management will then ask what they’re supposed to do about that.
Whoever detected the warning sign will say that there isn’t anything known they can do about that. Just because you can see the system might be planning to kill you, doesn’t mean that there’s any known way to build a system that won’t do that. Management will then decide not to shut down the project—because it’s not certain that the intention was really there or that the AGI will really follow through, because other AGI projects are hard on their heels, because if all those gloomy prophecies are true then there’s nothing anybody can do about it anyways. Pretty soon that troublesome error signal will vanish.
When Earth’s prospects are that far underwater in the basement of the logistic success curve, it may be hard to feel motivated about continuing to fight, since doubling our chances of survival will only take them from 0% to 0%.
That’s why I would suggest reframing the problem—especially on an emotional level—to helping humanity die with dignity, or rather, since even this goal is realistically unattainable at this point, die with slightly more dignity than would otherwise be counterfactually obtained...
Three Quotes on Transformative Technology
Did you and the other scientists not stop to consider the implications of what you were creating? — Roger Robb
When you see something that is technically sweet, you go ahead and do it and you argue about what to do about it only after you have had your technical success. That is the way it was with the atomic bomb— Oppenheimer
✒️ Selected Quotes:
We stand at a crucial moment in the history of our species. Fueled by technological progress, our power has grown so great that for the first time in humanity’s long history, we have the capacity to destroy ourselves—severing our entire future and everything we could become.
Yet humanity’s wisdom has grown only falteringly, if at all, and lags dangerously behind. Humanity lacks the maturity, coordination and foresight necessary to avoid making mistakes from which we could never recover. As the gap between our power and our wisdom grows, our future is subject to an ever-increasing level of risk. This situation is unsustainable. So over the next few centuries, humanity will be tested: it will either act decisively to protect itself and its long-term potential, or, in all likelihood, this will be lost forever — Toby Ord, The Precipice
We have created a Star Wars civilization, with Stone Age emotions, medieval institutions, and godlike technology — Edward O. Wilson, The Social Conquest of Earth
Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb. Such is the mismatch between the power of our plaything and the immaturity of our conduct — Nick Bostrom, Founder of the Future of Humanity Institute, Superintelligence
If we continue to accumulate only power and not wisdom, we will surely destroy ourselves — Carl Sagan, Pale Blue Dot
Never has humanity had such power over itself, yet nothing ensures that it will be used wisely, particularly when we consider how it is currently being used…There is a tendency to believe that every increase in power means “an increase of ‘progress’ itself ”, an advance in “security, usefulness, welfare and vigour; …an assimilation of new values into the stream of culture”, as if reality, goodness and truth automatically flow from technological and economic power as such. — Pope Francis, Laudato si’
The fundamental test is how wisely we will guide this transformation – how we minimize the risks and maximize the potential for good — António Guterres, Secretary-General of the United Nations
Our future is a race between the growing power of our technology and the wisdom with which we use it. Let’s make sure that wisdom wins — Stephen Hawking, Brief Answers to the Big Questions
❤️🔥 Desires
𝓈𝑜𝓂𝑒𝓉𝒾𝓂𝑒𝓈 𝐼 𝒿𝓊𝓈𝓉 𝓌𝒶𝓃𝓉 𝓉𝑜 𝓂𝒶𝓀ℯ 𝒜𝓇𝓉
𝕥𝕙𝕖𝕟 𝕞𝕒𝕜𝕖 𝕚𝕥
𝒷𝓊𝓉 𝓉𝒽ℯ 𝓌𝑜𝓇𝓁𝒹 𝒩𝐸𝐸𝒟𝒮 𝒮𝒶𝓋𝒾𝓃ℊ...
𝕪𝕠𝕦 𝕔𝕒𝕟 𝓈𝒶𝓋ℯ 𝕚𝕥?
𝐼… 𝐼 𝒸𝒶𝓃 𝒯𝓇𝓎...
Hope
Scraps
Ilya Sutskever
The Optimist, Keach Hagey
Time 100 AI 2024
Twitter
Safe Superintelligence Inc.
⇢ Note to self: My previous project had too much meta-commetary and this may have undermined the sincerity, so I should probably try to minimise meta-commentary.
⇢ “You’re going to remove this in the final version, right?” — Maybe.
“But you can’t quote ChatGPT 😠!”—Internet Troll ÷
“I would say the flaw of Xanadu’s UI was treating transclusion as ‘horizontal’ and side-by-side” — Gwern 🙃
“StretchText is a hypertext feature that has not gained mass adoption in systems like the World Wide Web… StretchText is similar to outlining, however instead of drilling down lists to greater detail, the current node is replaced with a newer node”—Wikipedia
This ‘stretching’ to increase the amount of writing, or contracting to decrease it gives the feature its name. This is analogous to zooming in to get more detail.
Ted Nelson coined the term c. 1967.
Conceptually, StretchText is similar to existing hypertexts system where a link provides a more descriptive or exhaustive explanation of something, but there is a key difference between a link and a piece of StretchText. A link completely replaces the current piece of hypertext with the destination, whereas StretchText expands or contracts the content in place. Thus, the existing hypertext serves as context.
⇢ “This isn’t a proper implementation of StretchText” — Indeed.
In defence of Natural Language DSLs — Connor Leahy
Did this conversation really happen? — 穆
⇢ “Sooner or later, everything old is new again” — Stephen King
⇢ “Therefore if any man be in Christ, he is a new creature: old things are passed away; behold, all things have become new.” — 2 Corinthians 5:17
QR Code for: Why the focus on wise AI advisors? (plus FAQ)
Short Link: https://shorturl.at/idQt9
There’s been a lot of discussion about how Less Wrong is mostly just AI these days.
If that’s something that folk want to address, I suspect that the best way to do this would be to run something like the Roots of Progress Blog-Building Intensive. My admittedly vague impression is that it seems to have been fairly successful.
Between Less Wrong for distribution, Lighthaven for a writing retreat and Less Online for networking, a lot of the key infrastructure is already there to run a really strong program if the Lightcone team ever decided to pursue this.
There was discussion about an FHI of the West before, but that seems hard given the current funding situation. I suspect that a program like this would be much more viable.
We’ve actually been thinking about something quite related! More info soon.
(Typo: Lightcone for a writing retreat → Lighthaven for a writing retreat)
I just created a new Discord server for generated AI safety reports (ie. using Deep Research or other tools). Would be excited to see you join (ps. Open AI now provides uses on the plus plan 10 queries per month using Deep Research).
https://discord.gg/bSR2hRhA
Was thinking about entropy and the Waluigi effect (in a very broad, metaphorical sense).
The universe trends towards increasing entropy, in such an environment it is evolutionarily advantageous to have the ability to resist it. Notice though that life seems to have overshot and resulted in far more complex ordered systems (both biological or manmade) than what exists elsewhere.
It’s not entirely clear to me, but it seems at least somewhat plausible that if entropy were weaker, the evolutionary pressure would be weaker and the resulting life and systems produce by such life would ultimately be less complex than they are in our world.
Life happens within computations in datacenters. Separately, there are concerns about how well the datacenters will be doing when the universe is many OOMs older than today.
Sorry, I can’t quite follow how this connects. Any chance you could explain?
Confusing entropy arguments are suspicious (in terms of hope for ever making sense). That’s a sketch of how entropy in physics becomes clearly irrelevant for the content of everything of value (as opposed to amount). Waluigi effect is framing being stronger than direction within it, choice of representation more robust than what gets represented. How does natural selection enter into this?
Life evolves in response to pressure. Entropy is one such source of pressure.
On free will: I don’t endorse the claim that “we could have acted differently” as an unqualified statement.
However, I do believe that in order to talk about decisions, we do need to grant validity to a counterfactual view where we could have acted differently as a pragmatically useful fiction.
What’s the difference? Well, you can’t use the second to claim determinism is false.
This lack of contact with naive conception of possibility should be developed further, so that the reasons for temptation to use the word “fiction” dissolve. An object that captures a state of uncertainty doesn’t necessarily come with a set of concrete possibilities that are all “really possible”. The object itself is not “fictional”, and its shadows in the form of sets of possibilities were never claimed to either be “real possibilities” or to sum up the object, so there is no fiction to be found.
A central example of such an object is a program equipped with theorems about its “possible behaviors”. Are these behaviors “really possible”? Some of them might be, but the theorems don’t pin that down. Instead there are spaces on which the remaining possibilities are painted, shadows of behavior of the program as a whole, such as a set of possible tuples for a given pair of variables in the code. A theorem might say that reality lies within the particular part of the shadow pinned down by the theorem. One of those variables might’ve stood for your future decision. What “fiction”? All decision relevant possibility originates like that.
I argue that “I can do X” means “If I want to do X, I will do X”. This can be true (as an unqualified statement) even with determinism. It is different from saying that X is physically possible.
It seems as though it should be possible to remove the Waluigi effect[1] by appropriately training a model.
Particularly, some combination of:
Removing data from the training that matches this effect
Constructing new synthetic data which performs the opposite of the Waluigi effect
However, removing this effect might be problematic for certain situations where we want the ability to generate such content, for example, if we want it to write a story.
In this case, it might pay to add back the ability to generate such content within certain tags (ie. <story></story>), but train it not to produce such content otherwise.
Insofar as it exists. Surprisingly, appearing on Know Your Meme, does not count as very strong evidence
Speculation from The Nature of Counterfactuals
I decided to split out some content from the end of my post The Nature of Counterfactuals because upon reflection I don’t feel it is as high quality as the core of the post.
I finished The Nature of Counterfactuals by noting that I was incredibly unsure of how we should handle circular epistemology. That said, there are a few ideas I want to offer up on how to approach this. The big challenge with counterfactuals is not imagining other states the universe could be in or how we could apply our “laws” of physics to discover the state of the universe at other points of time. Instead, the challenge comes when we want to construct a counterfactual representing someone choosing a different decision. After all, in a deterministic universe, someone could only have made a different choice if the universe were different, but then it’s not clear why we would care about the fact that someone in a different universe would have achieved a particular score when we just care about this universe.
I believe that answer to this question will be roughly that in certain circumstances we only care about particular things. For example, let’s suppose Omega is programmed in such a way that it would be impossible for Amy to choose box A without gaining 5 utility or choose box B without gaining 10 utility. Assume that in the universe Amy chooses box A and gains 5 utility. We’re tempted to say “If she had chosen box B she would have gained 10 utility” even though she would have to occupy a different mental state at the time of the decision and the past would be different because the model has been set up so that those factors are unimportant. Since those factors are the only difference between the state where she chooses A and the state where she chooses B we’re tempted to treat these possibilities as the same situation.
So naturally, this leads to a question, why should we build a model where those particular factors are unimportant? Does this lead to pure subjectivity? Well, the answer seems to be that often in practise such a heuristic tends to work well—agents that ignore such factors tend to perform pretty close to agents that account for them—and often better when we include time pressure in our model.
This is the point where the nature of counterfactuals becomes important—whether they are ontologically real or merely a way in which we structure our understanding of the universe. If we’re looking for something ontologically real, the fact that a heuristic is pragmatically useful provides quite limited information about what counterfactuals actually are.
On the other hand, if they’re a way of structuring our understanding, then we’re probably aiming to produce something consistent from our intuitions and our experience of the universe. And from this perspective, the mere fact that a heuristic is intuitively appealing counts as evidence for it.
I suspect that with a bit more work this kind of account could be enough to get a circular epistemology off the ground.
My position on Newcomb’s Problem in a sentence: Newcomb’s paradox results from attempting to model an agent as having access to multiple possible choices, whilst insisting it has a single pre-decision brain state.
If anyone was planning on submitting something to this competition, I’ll give you another 48 hours to get it in—https://www.lesswrong.com/posts/Gzw6FwPD9FeL4GTWC/usd1000-usd-prize-circular-dependency-of-counterfactuals.
Thick and Thin Concepts
Take for example concepts like courage, diligence and laziness. These concepts are considered thick concepts because they have both a descriptive component and a moral component. To be courageous is most often meant* not only to claim that the person undertook a great risk, but that it was morally praiseworthy. So the thick concept is often naturally modeled as a conjunction of a descriptive claim and a descriptive claim.
However, this isn’t the only way to understand these concepts. An alternate would be along the following lines: Imagine D+M>=10 with D>=3 and M>=3. So there would be a minimal amount that the descriptive claim has to fit and a minimal amount the moral claim has to fit and a minimal total. This doesn’t seem like an unreasonable model of how thick concepts might apply.
Alternatively, there might be an additional requirement that the satisfaction of the moral component is sufficiently related to the descriptive component. For example, suppose in order to be diligent you need to work hard in such a way that the hard work causes the action to be praiseworthy. Then consider the following situation. I bake you a cake and this action is praiseworthy because you really enjoy it. However, it would have been much easier for me to have bought you a cake—including the effort to earn the money—and you would actually have been happier had I done so. Further, assume that I knew all of this in advance. In this case, can we really say that you’ve demonstrated the virtue of diligence?
Maybe the best way to think about this is Wittgensteinian: that thick concepts only make sense from within a particular form of life and are not so easily reduced to their components as we might think.
* This isn’t always the case though.
I’ve always found the concept belief in belief slightly hard to parse cognitively. Here’s what finally satisfied my brain: whether you will be rewarded or punished in heaven is tied to whether or not God exists, whether or not you feel a push to go to church is tied to whether or not you believe in God. If you do go to church and want to go your brain will say, “See I really do believe” and it’ll do the reverse if you don’t go. However, it’ll only affect your belief in God indirectly through your “I believe in God” node. Putting it another way, going to church is evidence you believe in God, not evidence that God exists. Anyway, the result of all this is that your “I believe in God” node can become much stronger than your “God exists” node
EDT agents handle Newcomb’s problem as follows: they observe that agents who encounter the problem and one-box do better on average than those who encounter the problem and two-box, so they one-box.
That’s the high-level description, but let’s break it down further. Unlike CDT, EDT doesn’t worry about the fact that their may be a correlation between your decision and hidden state. It assumes that if the visible state before you made your decision is the same, then the counterfactuals generated by considering your possible decisions are comparable. In other words, any differences in hidden state, such as you being a different agent or money being placed in the box, are attributed to your decision (see my previous discussion here)
I’ve been thinking about Rousseau and his conception of freedom again because I’m not sure I hit the nail on the head last time. The most typical definition of freedom and that championed by libertarians focuses on an individual’s ability to make choices in their daily life. On the more libertarian end, the government is seen as an oppressor and a force of external compulsion.
On the other hand, Rousseau’s view focuses on “the people” and their freedom to choose the kind of society that they want to live in. Instead of being seen as an external entity, the government is seen as a vessel through which the people can express and realise this freedom (or at least as potentially becoming such a vessel).
I guess you could call this a notion of collective freedom, but at the same time this risks obscuring an important point: that at the same time it is an individual freedom as well. Part of it is that “the people” is made up of individual “people”, but it goes beyond this. The “will of the people” at least in its idealised form isn’t supposed to be about a mere numerical majority or some kind of averaging of perspectives or the kind of limited and indirect influence allowed in most representative democracies, but rather it is supposed to be about a broad consensus; a direct instantiation of the will of most individuals.
There is a clear tension between these kinds of freedom in that the more the government respects personal freedom that less control the people have over the kind of society they want to live in and the more the government focuses on achieving the “will of the people” the less freedom exists for those for whom this doesn’t sound so appealing.
I can’t recall the arguments Rousseau makes for this position, but I expect that they’d be similar to the arguments for positive freedoms. Proponents of positive freedom argue that theoretical freedoms, such as there being no legal restriction against gaining an education, are worthless if these opportunities aren’t actually accessible, say if this would cost more money than you could ever afford.
Similarly, proponents of Rousseau’s view could argue that freedom over your personal choices is worthless if you exist within a terrible society. Imagine there were no spam filters and so all of it made it through. Then the freedom to use email would be worthless without the freedom to choose to exist in a society without spam. Instead of characterising this as a trade-off between utility and freedom, Rousseau would see this as a trade-off between two different notions of freedom.
Now I’m not saying Rousseau’s views are correct—I mean the French revolution was heavily influenced by him and we all saw how that worked out. And it also depends on there being some kind of unified “will of the people”. But at the same time it’s an interesting perspective.
Can you make this a little more explicit? France is a pretty nice place—are you saying that the counterfactual world where there was no revolution would be significantly better?
All the guillotining. And the necessity of that was in part justified with reference to Rousseau’s thought
Sure. I’m asking about the “we all saw how that worked out” portion of your comment. From what I can see, it worked out fairly well. Are you of the opinion that the French Revolution was an obvious and complete utilitarian failure?
I haven’t looked that much into French history, just think it is important to acknowledge where that line of thought can end up.
What does it mean to define a word? There’s a sense in which definitions are entirely arbitrary and what word is assigned to what meaning lacks any importance. So it’s very easy to miss the importance of these definitions—emphasising a particular aspect and provides a particular lense with which to see the world.
For example, if define goodness as the ability to respond well to others, it emphasizes that different people have different needs. One person may want advice, while another simple encouragement. Or if we define love as acceptance of the other, it suggests that one of the most important aspects of love is the idea that true love should be somewhat resilient and not excessively conditional.
As I wrote before, evidential decision theory can be critiqued for failing to deal properly with situations where hidden state is correlated with decisions. EDT includes differences in hidden state as part of the impact of the decision, when in the case of the smoking lesion, we typically want to say that it is not.
However, Newcomb’s problem also has hidden state is correlated with your decision. And if we don’t want to count this when evaluating decisions in the case of the Smoking Lesion, perhaps we shouldn’t count this in the case of Newcomb’s? Or is there a distinction? I think I’ll try analysing this in terms of the erasure theory of coutnerfactuals at some point
Does FDT make this any clearer for you?
There is a distinction in the correlation, but it’s somewhat subtle and I don’t fully understand it myself. One silly way to think about it that might be helpful is “how much does the past hinge on your decision?” In smoker’s lesion, it is clear the past is very fixed—even if you decide to not to smoke, that doesn’t affect the genetic code. But in Newcomb’s, the past hinges heavily on your decision: if you decide to one-box, it must have been the case that you could have been predicted to one-box, so it’s logically impossible for it to have gone the other way.
One intermediate example would be if Omega told you they had predicted you to two-box, and you had reason to fully trust this. In this case, I’m pretty sure you’d want to two-box, then immediately precommit to one-boxing in the future. (In this case, the past no longer hinges on your decision.) Another would be if Omega was predicting from your genetic code, which supposedly correlated highly with your decision but was causally separate. In this case, I think you again want to two-box if you have sufficient metacognition that you can actually uncorrelate your decision from genetics, but I’m not sure what you’d do if you can’t uncorrelate. (The difference again lies in how much Omega’s decision hinges on your actual decision.)
Yeah, FDT has a notion of subjunctive dependence. But the question becomes what does this mean? What precisely is the difference between the smoking lesion and Newcombs? I have some ideas and maybe I’ll write them up at some point.
Parent comment for: Why the focus on wise AI advisors?
With quotes from: Yoshua Bengio, Oppenheimer, Toby Ord, Edward Wilson, Nick Bostrom, Carl Sagan, Pope Francis, Antonio Guterres, Stephen Hawking
Acausal positive interpretation
I’m beginning to warm to the idea that the reason why we have evolved to think in terms of counterfactuals and probabilities is rooted in these are fundamental at the quantum-level. Normally I’m suspicious at rooting macro level claims in quantum level effects because at such a high level of abstraction it would be very easy for these effects to wash out, but the multi-world hypothesis is something that wouldn’t wash out. Otherwise it would seem to be all a bit too much of a coincidence.
(“Oh, so you believe that counterfactuals and probability are at least partly a human construct, but they just so happen to correspond with what seems to us to be the fundamental level of physics, not because there is a relation there, but because of pure happenstance. Seems a bit of a stretch)”
I expect that agents evolved in a purely deterministic but similarly complex world would be no less likely to (eventually) construct counterfactuals and probabilities than those in a quantum sort of universe. Far more likely to develop counterfactuals first, since it seems that agents on the level of dogs can imagine counterfactuals at least in the weak sense of “an expected event that didn’t actually happen”. Human-level counterfactual models are certainly more complex than that, but I don’t think they’re qualitatively different.
I think if there’s any evolution pressure toward ability to predict the environment, and the environment has a range of salient features that vary in complexity, there will be some agents that can model and predict the environment better than others regardless of whether that environment is fundamentally deterministic or not. In cases where evolution leads to sufficiently complex prediction, I think it will inevitably lead to some sort of counterfactuals.
The simplest predictive model can only be applied to sensory data directly. The agent gains a sense of what to expect next, and how much that differed from what actually happened. This can be used to update the model. This isn’t technically a counterfactual, but only through a quirk of language. In everything but name “what to expect next” is at least some weak form of counterfactual. It’s a model of an event that hasn’t happened and might not happen. But still, let’s just rule it out arbitrarily and continue on.
The next step is probably to be able to apply the same predictive model to memory as well, which for a model changing over time means that an agent can remember what they experienced, what they expected, and compare with what they would now expect to have happened in those circumstances. This is definitely a counterfactual. It might not be conscious, but it is a model of something in the past that never happened. It opens up a lot of capability for using a bunch of highly salient stored data to update the model instead of just the comparative trickle of new salient data that comes in over time.
There are still higher strengths and complexities of counterfactuals of course, but it seems to me that these are all based on the basic mechanism of a predictive model applied to different types of data.
None of this needs any reference to quantum mechanics, and nor does probability. All it needs is a universe too complex to be comprehended in its entirety, and agents that are capable of learning to imperfectly model parts of it that are relevant to themselves.
“I expect that agents evolved in a purely deterministic but similarly complex world would be no less likely to (eventually) construct counterfactuals and probabilities than those in a quantum sort of universe”
I’m actually trying to make a slightly unusual argument. My argument isn’t that we wouldn’t construct counterfactuals in a purely deterministic world operating similar to ours. My argument is involves:
a) Claiming that counterfactuals are at least partly constructed by humans (if you don’t understand why this might be reasonable, then it’ll be more of a challenge to understand the overall argument)
b) Claiming that it would be a massive coincidence if something partly constructed by humans happened to correspond with fundamental structures in such a way unrelated to the fundamental structures
c) Concluding that its likely that there is some as yet unspecified relation
Does this make sense?
To me the correspondence seems smaller, and therefore the coincidence less unlikely.
Many-world hypothesis assumes parallel worlds that obey exactly the same laws of physics. Anything can happen with astronomically tiny probability, but the vast majority of parallel worlds is just as boring as our world. The counterfactuals we imagine are not limited by the laws of physics.
Construction of counterfactuals is useful for reasoning with uncertainty. Quantum physics is a source of uncertainty, but there are also enough macroscopic sources of uncertainty (limited brain size, second law of thermodynamics). If an intelligent life evolved in a deterministic universe, I imagine it would also find counterfactual reasoning useful.
Yeah, that’s a reasonable position to take.
Not hugely. Quantum mechanics doesn’t have any counterfactuals in some interpretations. It has deterministic evolution of state (including entanglement), and then we interpret incomplete information about it as being probabilistic in nature. Just as we interpret incomplete information about everything else.
Hopefully one day I get a chance to look further into quantum mechanics
What correspondence? Counterfactuals-as-worlds have all laws of physics broken in them, including quantum mechanics.
I’m not claiming that there’s a perfect correspondence between counterfactuals as different worlds in a multiverse vs. decision counterfactuals. Although maybe that’s enough the undermine any coincidence right there?
I don’t see how there is anything here other than equivocation of different meanings of “world”. Counterfactuals-as-worlds is not even a particularly convincing way of making sense of what counterfactuals are.
If you’re interpreting me as defending something along the lines of David Lewis, then that’s actually not what I’m doing.
Says who?
😱✂️💣💣💣💣💣 𝙳 𝙸 𝚂 𝙰 𝚂 𝚃 𝙴 𝚁 - 𝙱 𝚈 - 𝙳 𝙴 𝙵 𝙰 𝚄 𝙻 𝚃 ? - Public Draft
This is a draft post to hold my thoughts on Disaster-By-Default.
I have an intuition that either the SUV Triad can be turned into an argument for Disaster-By-Default and so I created this post to explore this possibility.
However, I consider this post experimental in that it may not pan out.
☞ The 𝙳 𝙸 𝚂 𝙰 𝚂 𝚃 𝙴 𝚁 - 𝙱 𝚈 - 𝙳 𝙴 𝙵 𝙰 𝚄 𝙻 𝚃 hypothesis:
AGI leads to some kind of societal scale catastrophe by default
Clarification: This isn’t a claim that it wouldn’t be possible to avoid this fate if humanity decided to wake up and decide it’s serious about winning. This is just a claim about what happens by default.
Why might this be true?
Recap — 𝚃 𝙷 𝙴 🅂🅄🅅 𝚃 𝚁 𝙸 𝙰 𝙳 — Old version, to be updated to the latest version just before release
For convenience, I’ve copied the description of the SUV Triad from my post Why the focus on wise AI advisors?
Covered in reverse order:
🅅 𝚄 𝙻 𝙽 𝙴 𝚁 𝙰 𝙱 𝙸 𝙻 𝙸 𝚃 𝚈 – 🌊🚣:
✷ The development of advanced AI technologies will have a massive impact on society given the essentially infinite ways to deploy such a general technology. There are lots of ways this could go well, and lots of ways w
e all diethis could go extremely poorly.In more detail...
i) At this stage I’m not claiming any particular timelines.
I believe it’s likely to be
absurdlyquite fast, but I don’t claim this until we get to 🅂𝙿 𝙴 𝙴 𝙳 😅⏳.I suspect that often when people doubt this claim, they’ve implicitly assumed that I was talking about the short or medium term, rather than the long term 🤔. After all, the claim that there are many ways that AI could plausible lead to dramatic benefits or harms over the next 50 or 100 years feels like an extremely robust claim. There are many things that a true artificial general intelligence could do. It’s mainly just a question of how long it takes to develop the technology.
ii) It’s quite likely that at least some of these threats will turn out to be overhyped. That doesn’t defeat this argument! Even in the unlikely event that most of these threats turned out to be paper tigers, as claimed in The Kicker, a single one of these threats going through could cause absurd amounts of damage.
iii) TODO
🅄 𝙽 𝙲 𝙴 𝚁 𝚃 𝙰 𝙸 𝙽 𝚃 𝚈 – 🌅💥:
✷ We have massive disagreement on what we expect the development of AI, let alone the best strategy[46]. Making the wrong call could prove catastrophic.
In more detail...
i) A lot of this uncertainty just seems inherently really hard to resolve. Predicting the future is hard.
ii) However hard this is to resolve in theory, it’s worse in practise. Instead of an objective search for the truth, these discussions are distorted by all these different factors including money, social status and the need for meaning.
iii) More on the kicker: We’re seeing increasing polarisation, less trust in media and experts[48] and AI stands to make this worse. This is not where we want to be starting from and who knows how long this might take to resolve?
🅂 𝙿 𝙴 𝙴 𝙳 – 😅⏳:
✷ AI Is developing incredibly rapidly… We have limited time to act and to figure out how to act.[49].
In more detail...
i) The speed at which things are happening makes the problem much harder. Humanity does have the ability to deal with civilisational scale challenges. It’s not easy—global co-ordination is incredibly difficult to achieve—but it’s possible. However, one at a time is a lot easier than dozens. When this happens, its hard to give each problem the proper attention it deserves 😅🌫️💣💣💣
ii) Even if timelines aren’t short, we might still be in trouble if the take-off speed is fast. Unfortunately, humanity is not very good at preparing for abstract, speculative-seeming threats ahead of time.
iii) Even if neither timelines nor take-off speeds are fast in an absolute sense, we might still expect disaster if they are fast in a relative sense. Governance—especially global governance—tends to proceed rather slowly. Even though it can happen much faster when there’s a crisis, sometimes problems need to be solved ahead of time and once you’re in them it’s too late. As an example, once an AI induced pandemic is spreading, you may have already lost.
I believe that the following reflections provide a strong, but defeasible reason to believe the Disaster-By-Default hypothesis.
Recap: Reflections on The SUV Triad — Old version, to be updated to the latest version just before release
Reflections—Why the SUV Triad is Fucking Scary
Many of the threats constitute civilisational-level risk by themselves. We could successful navigate all the other threats, but simply drop the ball once and all that could be for naught.
Even if a threat can’t led to catastrophe, it can still distract us from those that can. It’s hard to avoid catastrophe when we don’t know where to focus our efforts ⚪🪙⚫.
The speed of development makes this much harder. Even if alignment were easy and governance didn’t require anything special, we could still fail because certain people have decided that we have to race toward AGI as fast as possible.
Controversial: It may even present a reason to expect Disaster-By-Default (draft post) (‼️).
That said, I want to explore making a more rigorous argument. Here’s the general intuition:
That said, I want to try to see if it’s possible to make a more rigorous argument.
Consider the following model. I wonder whether it would be applicable to the SUV Triad:
Suppose we had a series of ten independent decisions and that we have to get each of them right in order to survive. Assume each is a binary choice. Further assume that these choices are quite confusing so we only have a 60% chance of getting each right. Then we’d only have a 0.01% chance of survival.
The question I want to explore is whether it would be appropriate to model the SUV Triad with something along these lines.
That said, even if it made sense from an inside view perspective, we’d still have to adjust for the outside view. The hardness of each decision is not independent, but most likely tied to some general factors such as: overall problem difficulty, general societal competence, the speed at which we have to face these problems. In other words, treating each probability as independent, likely makes the probability look more extreme.
So the first step would be figuring out whether this model is applicable (not in an absolute sense, but whether or not it’d be appropriate for making a defeasible claim.
How could we evaluate this claim? Here’s one such method:
Define a threshold of catastrophe
Make a list of threats that could meet that bar
Make a list of key choices that we’d have to make in relation to these threats and where making the wrong choice could prove catastrophic
Estimate the difficulty of each choice
Consider the degree of correlation between various threats/choices
Potentially: estimate how many additions such choices there may be such that we’ve missed
This model could fail if the number of key choices wasn’t that large, these choices weren’t hard or due to correlation.
And if the model were applicable, then we’d have to consider possible defeaters:
In more detail/further objections...
I suggest ignoring for now. Copied from a different context, so needs to be adapted:
Most notable counterargument: “We most likely encounter smaller wake up calls first. Society wakes up by default”.
Rich and powerful actors will be incentivised to use their influence to downplay the role of AI in any such incidents, argue that we should focus solely on that threat model or even assert that that further accelerating capabilities is the best defense. Worse, we’ll likely be in the middle of a US-China arms race where there’s national security issues at play that could make slowing things down feel almost inconcievable.
Maybe there is eventually an incident that is too serious to ignore, but by then it will probably be too late. Capabilities increase fast and we should expect a major overhang of elicitable capabilties, so we would need to trigger a stop significantly before threshold of dangerous capabilities.
“But the AI industry doesn’t want to destroy society. They’re in society” — Look at what happened with “gain of function” research. If it had been prominently accepted that gain of function is bad, then that would have caused a massive loss of status for medical researchers, so they didn’t allow that to happend. The same incentives apply to AI developers.
“Open source/weights models are behind the frontier and it’s possible that society will enforce restrictions on them, even if it’ll be impossible to prevent closed source development from continuing” — Not that far behind and attempting to restrict open-source models will result in massive pushback/subversion. There’s a large community dedicated to open source software, for some it’s essentially a substitute for a religion, for others it’s the basis of their company or their national competitiveness. Even if the entire UN security council agreed, they couldn’t just say, “Stop!” and expect it to be instantly obeyed. Our default expectation should be that capabilities broadly proliferate.
Second most notable counterargument: “AI is aligned by default”
This looked much more plausible before inference time compute took off.
Third most notable counterargument: “We’ve overcome challenges in the past, we should expect that we most likely stumble through”
AI is unique (generality, speed of development, proliferation).
Much more plausible if there was a narrower threat model or if development moved more slowly.
What is the SUV Triad? Also, the formatting on this is wild, what’s the context for that?
Sorry, this is some content that I had in my short-form Why the focus on wise AI advisors?. The SUV Triad is described there.
I was persuaded by Professor David Manly that I didn’t need to argue for Disaster-By-Default in order to justify wise AI advisors and that focusing too much on this aspect would simply cause me to lose people, so I needed somewhere to paste this content.
I just clicked “Remove from Frontpage”. I’m unsure if it does anything for short-form posts though.
Just experimenting to see what’s possible. Copied it directly from that post, haven’t had time to rethink the formatting yet now that it is its own post. Nowhere near as wild as it gets in the main post though!