Noosphere89′s Shortform

Noosphere8917 Jun 2022 21:57 UTC

2 points

52 comments1 min readLW link

Well, I’ll try to fill this one up.

Noosphere8917 Jun 2022 21:57 UTC

2 points

52 comments1 min readLW link

Noosphere89 30 Oct 2024 17:57 UTC
23 points
21
Here’s a underrated frame for why solving the alignment problem is likely necessary for the future to go very well under human values, even though in our current society we don’t need human to human alignment to make modern capitalism be good and can rely on selfishness instead.

The reason is because there’s a real likelihood that human labor, and more generally human existence is not economically valuable or even have negative economic value, say where the addition of a human to the AI company makes that company worse in the near future.

The reason this matters is that once labor is much easier to scale than capital, as is likely in an AI future, it’s now economically viable or even beneficial to break a lot of the rules that help humans survive, contra Matthew Barnett’s view, and this is even more incentivized by the fact that an unaligned AI released into society would likely not be punishable/incentivizable by mere humans, solely due to controlling robotic armies and robotic workforces that allow it to dispense with societal constraints humans have to accept.

dr_s talks about the equilibrium that is totally valid under AI automation economics that is very bad for humans, and avoiding these sorts of equilibriums can’t be done through economic forces, because of the fact that the companies doing this are too powerful to have any real incentives work on them, since they can either neutralize or turn the attempted boycott/shopping around to their own benefit, and thus avoiding this outcome requires alignment to your values, and can’t work with selfishness:

Consider a scenario in which AGI and human-equivalent robotics are developed and end up owned (via e.g. controlling exclusively the infrastructure that runs it, and being closed source) by a group of, say, 10,000 people overall who have some share in this automation capital. If these people have exclusive access to it, a perfectly functional equilibrium is “they trade among peers goods produced by their automated workers and leave everyone else to fend for themselves”.

This framing of the alignment problem, of how to get an AI that values humans such that this outcome is prevented, also has an important implication:

It’s not enough to solve the technical problem of alignment, absent modeling the social situation, because of suffering risk issues plus catastrophic risk issues.

But we actually need to do the work, and AI that automates everything might come in your lifetime, so we should prepare the foundations soon.
- Noosphere89 30 Oct 2024 18:00 UTC
  3 points
  1
  Parent
  This also BTW explains why we cannot rely on economic arguments on AI to make the future go well.
- Noosphere89 22 Jan 2025 1:30 UTC
  2 points
  0
  Parent
  In retrospect, I was basically a bit too optimistic about this working out, and a big part of why is I didn’t truly grasp how deep value conflicts can be even amongst humans, and I’m now much more skeptical on multi-alignment schemes working because I believe a lot of alignment is broadly because people are powerless relative to the state, but when AI is good enough to create their own nation-states, value conflicts become much more practical, and the basis for a lot of cooperative behavior collapses:
  
  and also means the level of alignment of AI needs to be closer to the fictional benevolent angels than it is to humans in relationship to other humans, so it motivates a more ambitious version of the alignment objectives than making AIs merely not break the law or steal from humans.
  
  I’m actually reasonably hopeful the more ambitious versions of alignment are possible, and think there’s a realistic chance we can actually do them.
- Dakara 18 Nov 2024 12:00 UTC
  1 point
  0
  Parent
  I’ve been reading a lot of the stuff that you have written and I agree with most of it (like 90%). However, one thing which you mentioned (somewhere else, but I can’t seem to find the link, so I am commenting here) and which I don’t really understand is iterative alignment.
  
  I think that the iterative alignment strategy has an ordering error – we first need to achieve alignment to safely and effectively leverage AIs.
  
  Consider a situation where AI systems go off and “do research on alignment” for a while, simulating tens of years of human research work. The problem then becomes: how do we check that the research is indeed correct, and not wrong, misguided, or even deceptive? We can’t just assume this is the case, because the only way to fully trust an AI system is if we’d already solved alignment, and knew that it was acting in our best interest at the deepest level.
  
  Thus we need to have humans validate the research. That is, even automated research runs into a bottleneck of human comprehension and supervision.
  
  The appropriate analogy is not one researcher reviewing another, but rather a group of preschoolers reviewing the work of a million Einsteins. It might be easier and faster than doing the research itself, but it will still take years and years of effort and verification to check any single breakthrough.
  
  Fundamentally, the problem with iterative alignment is that it never pays the cost of alignment. Somewhere along the story, alignment gets implicitly solved.
  - Noosphere89 18 Nov 2024 18:45 UTC
    4 points
    1
    Parent
    One potential answer to how we might break the circularity is the AI control agenda that works in a specific useful capability range, but fail if we assume arbitrarily/infinitely capable AIs.
    
    This might already be enough to do so given somewhat favorable assumptions.
    
    But there is a point here in that absent AI control strategies, we do need a baseline of alignment in general.
    
    Thankfully, I believe this is likely to be the case by default.
    
    See Seth Herd’s comment below for a perspective:
    
    https://www.lesswrong.com/posts/kLpFvEBisPagBLTtM/if-we-solve-alignment-do-we-die-anyway-1?commentId=cakcEJu389j7Epgqt
    - Dakara 25 Nov 2024 22:28 UTC
      7 points
      0
      Parent
      I’ve asked this question to others, but would like to know your perspective (because our conversations with you have been genuinely illuminating for me). I’d be really interested in knowing your views on more of a control-by-power-hungry-humans side of AI risk.
      
      For example, the first company to create intent-aligned AGI would be wielding incredible power over the rest of us. I don’t think we could trust any of the current leading AI labs to use that power fairly. I don’t think this lab would voluntarily decide to give up control over AGI either (intuitively, it would take quite something for anyone to give up such a source of power). Can this issue be somehow resolved by humans? Are there any plans (or at least hopeful plans) for such a scenario to prevent really bad outcomes?
      - Noosphere89 26 Nov 2024 0:02 UTC
        2 points
        0
        Parent
        This is my main risk scenario nowadays, though I don’t really like calling it an existential risk, because the billionaires can survive and spread across the universe, so some humans would survive.
        
        The solution to this problem is fundamentally political, and probably requires massive reforms of both the government and the economy that I don’t know yet.
        
        I wish more people worked on this.
Noosphere89 18 Jan 2025 18:38 UTC
16 points
9
One thing I want people to know about AI alignment is that I wish people stopped referring to human values as though it was a singular object or list, and instead think closer to an individual human’s values, without presupposing that humanity must care commonly about certain values at all.

Minor example is I believe the felt complexity is somewhat overstated by lumping in multiple humans, though I don’t think it changes the broader argument around complexity of value.

A more major example is that it undermines the viability of ideas like CEV, and importantly makes aggregative/multi-alignment quite a lot harder than single-single alignment.
- Nathan Helm-Burger 18 Jan 2025 22:47 UTC
  5 points
  4
  Parent
  Agreed. I think this is why we should focus on working towards a society which emphases freedom and lack of physical assault (including weapons of mass destruction). Within such a framework, it should be possible to have an archipelago of diverse societies with diverse rulesets.
  
  If there were a ‘guardian AI’ system, I’d want it to mostly just keep us from either killing each other or overthrowing it, and leave the rest up to us. This moves away from the idea of an overbearing nanny, and towards a ‘let them choose their own path’ design.
- quetzal_rainbow 18 Jan 2025 19:05 UTC
  1 point
  −2
  Parent
  “Human values” is a sort of objects. Humans can value, for example, forgiveness or revenge, these things are opposite, but both things have distinct quality that separate them from paperclips.
  - Noosphere89 18 Jan 2025 19:17 UTC
    2 points
    0
    Parent
    Yes, these values are all different from each other, but a crux is I don’t think that the differing values amongst humans are so distinct from paperclips that it’s worth it to blur the differences, especially with very strong optimization, though I agree that human values form a sort as in a set of objects, trivially.
    - quetzal_rainbow 18 Jan 2025 19:43 UTC
      2 points
      0
      Parent
      I think the easy difference is that totally optimized according to someone’s values world is going to be either very good (even if not perfect) or very bad from perspective of another human? I wouldn’t say it’s impossible, but it should be very specific combination of human values to make it just as valuable as turning everything into paperclips, not worse, not better.
      
      To my best (very uncertain) quess, human values are defined through some relation of states of consciousness to social dynamic?
      - Noosphere89 18 Jan 2025 19:47 UTC
        2 points
        0
        Parent
        
        I think the easy difference is that totally optimized according to someone’s values world is going to be either very good (even if not perfect) or very bad from perspective of another human? I wouldn’t say it’s impossible, but it should be very specific combination of human values to make it just as valuable as turning everything into paperclips, not worse, not better.
        
        I mostly agree with this, with caveats that a paper-clip outcome can happen, but it isn’t very likely.
        
        (For example, radical eco-green views where humans have to be extinct so nature can heal definitely exist, and would be a paper-clip outcome from my perspective).
        
        I was also talking about very bad from the perspective of another human, since I think this is surprisingly important when dealing with AI safety.
- MondSemmel 18 Jan 2025 19:07 UTC
  0 points
  −3
  Parent
  I like Yudkowsky’s toy example of tasking an AGI to copy a single strawberry, on a molecular level, without destroying the world as a side-effect.
  - Noosphere89 18 Jan 2025 19:09 UTC
    2 points
    0
    Parent
    I also like it for this reason, though I personally think that a lot of the challenge is in being capable enough to do it, rather than us not being able to make it destroy the world.
    
    Still, I kind of like the toy example.
Noosphere89 27 Nov 2024 17:18 UTC
16 points
2
Here’s a perspective on AI automating everything I haven’t seen before, which is relevant to AI governance.

AI being able to automate robotics and AI research will eventually transform the physical world into something which resembles a lot more like the online/virtual world.

Depending on the speed of AI research, this may either happen in several months, or more like a decade, but there’s a plausible path to AI turning the physical world into something more like an online/virtual world.

The implications for AI governance are somewhat similar to the implications of companies moderating social media in the present.

Here’s several implications:
1. You really can’t have the level of liberty and autonomy that people today have around basically everything, for the same reason that there is no actual free speech rights for social media, or really any of the rights we take for granted, and one of the basic reasons for this is that it’s very low cost to disrupt online spaces, and even if we avoid the vulnerable world hypothesis that posits tech that is so destructive as to be an x-risk and so widespread that you essentially need to have a totalitarian state to prevent it from being developed without effective defenses, it’s likely that there exists some means to constantly troll and degrade the discourse in far more effective ways, and one of the reasons why a lot of social media is as strict about moderation as it is today is because it’s too easy to degrade the discourse and just troll everyone in a way that real life doesn’t have, ad it’s too easy to destroy in social media relative to creation.
2. As a corollary, this means that alignment of governments to citizens become far more important in the future than in the 18th-21st centuries, because we likely have to remove a lot of the checks like democracy that ensure that a misaligned leader doesn’t destroy a nation, and suffice it to say that the alignment of social media companies to their users is not going to cut it.
3. One failure mode of social media that we should try to avoid is that the platforms don’t really have any ability to hold nuanced conversations for a number of reasons.
- Viliam 28 Nov 2024 13:07 UTC
  2 points
  0
  Parent
  Huh, I just realized there are two different meanings/goals of moderation/censorship, and it is too easy to conflate them if you don’t pay attention.
  One is the kind where you don’t want the users of your system to e.g. organize a crime. The other is where you want discussions to be disrupted e.g. by trolls.
  Superficially, they seem like the same thing: you have moderators, they make the rules, and give bans to people who break them. But now this seems mostly coincidental to me: you have some technical tools, so you use them for both purposes, because that’s all you have. However, from the perspective of the people who want to organize a crime, those who try to prevent them are the disruptive trolls.
  I guess, my point is that when we try to think about how to improve the moderation, we may need to think about these purposes as potential opposites. Things that make it easier to ban trolls may also make it easier to organize the crime. Which is why people may simultaneously be attracted to Substack or Telegram, and also horrified by what happens at Substack or Telegram.
  Maybe there is a more general lesson for the society, unrelated to tech. If you allow people to organize bottom-up, you can get a lot of good things, but you will also get groups dedicated to doing bad things. Western countries seem to optimize for the bottom-up organizations: companies, non-profits, charities, churches, etc. Soviet Union used to optimize for top-down control: everything was controlled by the state, any personal initiative was viewed as suspicious and potentially disruptive. As a result, Soviet Union collapsed economically, but the West got its anti-vaxers and flat-Eathers and everything. During the Cold War, USA was good at pushing the Soviet economical buttons. These days, Russia is good at pushing the Western free speech buttons.
  Huh, maybe the analogies go deeper. Soviet Union was surprisingly tolerant of petty crime (people stealing from each other, not from the state). There were some ideological excuses, the petty criminals being technically part of the proletariat. But from the practical perspective, the more people worry about being potential victims of crime, the less attention they pay to organizing a revolution; they may actually wish for more state power, as a protection. So there was an unspoken alliance between the ruling class and the undesirables at the bottom, against everyone in between. And perhaps similarly, big platforms such as Facebook or Twitter seem to have an unspoken alliance with trolls; their shared goal is to maximize user engagement. By reacting to trolls, you don’t only make the trolls happy, you also make Zuck happy, because you have spent more time on Facebook, and more ads were displayed to you. It would be naive to expect Facebook to make the discussions better; if they knew how to do that, they do not have the incentive; they actually want to hit exactly the level of badness where most people are frustrated but won’t leave yet.
  Finding the technical solution against trolls isn’t that difficult; you basically need invite-only clubs. The things that the members write could be public or private; the important part is that in order to become a member, you need to get some kind of approval first. This can be implemented in various ways: a member needs to send you an invitation link by an e-mail, a moderator needs to approve your account before you can post. A weaker version of this is the way Less Wrong uses: anyone can join, but the new accounts are fragile and can be downvoted out of existence by the existing members, if necessary. (Works well against individual accounts created infrequently. Wouldn’t work against hundred people joining at the same time and mass-upvoting each other. But I assume that the moderators have a red button that could simply disable creating new accounts for a while until the chaos is sorted out.)
  But when you look at the offline analogy, these things are usually called “old boy networks”, and some people think they should be disrupted. Whether you agree with that or not, probably depends on your value judgment about the network versus the people who are trying to get inside. Do you support the rights of new people to join the groups they want to join, or the rights of the existing members to keep out the people they want to keep out? One person’s “trolls” are other person’s “diverse voices that deserve to be heard”.
  So there are two lines of conflict: the established groups versus potential disruptors, and the established groups versus the owners of the system. The owners of the system may want some groups to stop existing, or to change so much that from the perspective of the current members they become different groups under the same name. Offline, the owner of the system could be a dictator, or could be a democratically elected government; I am not proposing a false equivalence here, just saying that from the perspective of the group survival, both can be seen as the strong hand crushing the community. Online, the owners are the administrators. And it is a design choice whether “the owners crushing the community, should they choose so” is made easy or difficult. If it is easy, it will make the groups feel uneasy, especially once the crushing of other groups start. If it is difficult, at least politically if not technically (e.g. Substack or Telegram advertising themselves as the uncensored spaces), we should not be surprised if some really bad things come out of there, because that is the system working exactly as designed.
  In case of Less Wrong, we are a separate island, where the owners of the system are simultaneously the moderators of the group, so this level of conflict is removed. But such solutions are very expensive; we are lucky to have enough people with high tech skills and a lot of money available if the group really wants it. For most groups this is not an option; they need to build their community on someone else’s land, and sometimes the owners evict them, or increase the rent (by pushing more ads on them).
  If you are a free speech absolutist, or if you believe that the world is not fragile, the right way seems kinda obvious: you need an open protocol for decentralized communication with digital signatures. And you should also provide a few reference implementations that are easy to use: a website, a smartphone app, and maybe a desktop app.
  At the bottom layer, you have users who provide content on demand; the content is digitally signed and can be cached and further distributed by third parties. A “user” could be a person, a pseudonym, or a technical user. (For example, if you tried to implement Facebook or Reddit on top of this protocol, its “users” would be the actual users, and the groups/subreddits, and the website itself.) This layer would be content-agnostic; it would provide any kind of content for given URI, just like you can send anything using an e-mail attachment, HTTP GET, or a torrent. The content would be digitally signed, so that the third parties (mostly servers, but also peer-to-peer for smaller amounts of data) can cache it and further distribute. In practice, most people wouldn’t host their own servers, so they would publish by on a website that is hosted on a server, or using their application which would most likely upload it to some server. (Analogically to e-mail, which can be written in an app and sent by SMTP, or written directly in some web mail.) The system would automatically support downloading your own content, so you could e.g. publish using a website, then change your mind, install a desktop app, download all your content from the website (just like anyone who reads your content could do), and then delete your account on the website and continue publishing using the app. Or move to another website, create an account, and then upload the content from your desktop app. Or skip the desktop app entirely; create a new web account, and import everything from your old web account.
  The next layer is versioning; we need some way to say “I want the latest version of this user’s ‘index.html’ file”. Also, some way to send direct messages between users (not just humans, but also technical users).
  The next layer is about organizing the content. The system can already represent your tweets as tiny plain-text files, your photos as bitmap files, etc. Now you need to put it all together and add some resource descriptors, like XML or JSON files that say “this is a tweet, it consists of this text and this image or video, and was written at this date and time” or “this is a list of links to tweets, ordered chronologically, containing items 1-100 out of 5678 total” or “this is a blog post, with this title, its contents are in this HTML file”. To support groups, you also need resource descriptors that say “this is a group description: name, list of members, list of tweets”. Now make the reference applications that support all of this, with optional encryption, and you basically have Telegram, but decentralized. Yay freedom; but also expect this system to be used for all kinds of horrible crimes. :(
  - Noosphere89 29 Nov 2024 22:00 UTC
    2 points
    0
    Parent
    
    Finding the technical solution against trolls isn’t that difficult; you basically need invite-only clubs. The things that the members write could be public or private; the important part is that in order to become a member, you need to get some kind of approval first. This can be implemented in various ways: a member needs to send you an invitation link by an e-mail, a moderator needs to approve your account before you can post. A weaker version of this is the way Less Wrong uses: anyone can join, but the new accounts are fragile and can be downvoted out of existence by the existing members, if necessary. (Works well against individual accounts created infrequently. Wouldn’t work against hundred people joining at the same time and mass-upvoting each other. But I assume that the moderators have a red button that could simply disable creating new accounts for a while until the chaos is sorted out.)
    
    But when you look at the offline analogy, these things are usually called “old boy networks”, and some people think they should be disrupted. Whether you agree with that or not, probably depends on your value judgment about the network versus the people who are trying to get inside. Do you support the rights of new people to join the groups they want to join, or the rights of the existing members to keep out the people they want to keep out? One person’s “trolls” are other person’s “diverse voices that deserve to be heard”.
    
    This is indeed probably a large portion of the solution, and I agree with this sort of solution becoming more necessary in the age of AI.
    
    However, there are also incentives to become more universal than just an old boy’s club, so this can’t be all of a solution.
    
    I think my key disagreement I have with free speech absolutists is that I think the outcome they are imagining for online spaces without moderation of what people say is essentially a fabricated option, and what actually happens is non-trolls and non-Nazis leave those spaces or go dark, and the outcome is that the trolls and Nazis talk to each other only, not a flowering of science and peace, and the reason why this doesn’t happen in the real world is because disruption is way, way more difficult IRL than it is online, but AGI and ASI will lower the cost of disruption by a lot, so free-speech norms become much more negative than now.
    
    I also disagree with moderation being a tradeoff between catching trolls and catching criminals, and with well-funded moderation teams, you can do both quite well.
    
    Maybe there is a more general lesson for the society, unrelated to tech. If you allow people to organize bottom-up, you can get a lot of good things, but you will also get groups dedicated to doing bad things. Western countries seem to optimize for the bottom-up organizations: companies, non-profits, charities, churches, etc. Soviet Union used to optimize for top-down control: everything was controlled by the state, any personal initiative was viewed as suspicious and potentially disruptive. As a result, Soviet Union collapsed economically, but the West got its anti-vaxers and flat-Eathers and everything. During the Cold War, USA was good at pushing the Soviet economical buttons. These days, Russia is good at pushing the Western free speech buttons.
    
    This is why alignment becomes far more important than it is now, because of the fact that it’s too easy for a misaligned leader without checks or balances to ruin things, and I’m of the opinion that democracies tolerably work in a pretty narrow range of conditions, but I see the AI future as more dictatorial/plutocratic, due to the onlineification of the real world by AI.
    - Viliam 30 Nov 2024 18:03 UTC
      2 points
      0
      Parent
      the outcome they are imagining for online spaces without moderation of what people say is essentially a fabricated option
      Yep. In real life, intelligent debate is already difficult because so many people are stupid and arrogant. But online this is multiplied by the fact that during the time that takes it for a smart person to think about a topic and write a meaningful comment, an idiot can write hundreds of comments.
      And that’s before we get to organized posting, where you pay minimum wage to dozens of people to create accounts on hundreds of websites, and post the “opinions” they receive each morning by e-mail. (And if this isn’t already automated, it will be soon.)
      So an unmoderated space in practice means “whoever can vomit their insults faster, wins”.
      I’m of the opinion that democracies tolerably work in a pretty narrow range of conditions
      One problem is that a large part of the population is idiots, and it is relatively easy to weaponize them. In the past we were mostly protected by the fact that the idiots were difficult to reach. Then we got mass media, which made it easy to weaponize the idiots in your country. Then we got internet, which made it easy to weaponize the idiots in other countries. It took some time for internet to evolve from “that mysterious thing the nerds use” to “the place where the average people spend a large part of their day”, but now we are there.
Noosphere89 3 Jan 2025 15:47 UTC
6 points
0
The closed timelike curve computer, and the nested closed timelike curve computer is an interesting intuitive model (for some people) of how the Turing jump and to a lesser extent how an oracle machine works, where the base case is exactly like the familiar computers we know, which can’t solve their own halting problem, but then we add a new closed time-like curve loop and it’s now able to solve the halting problem for ordinary computers, but can’t solve it’s own halting problem, so adds a new time loop to solve it, and so on.

(Of course, you could always define the Turing Jump formally without any problem, without using any concrete model of computation, and you’d eventually transition into formal explanations, similar to how you could define the computable functions on a wide, but not all range of models of computation, but it’s still nice that you can represent the Turing jump with a concrete physical model of computation)

The papers are below:

https://arxiv.org/abs/1609.05507

https://studenttheses.uu.nl/handle/20.500.12932/45273
Noosphere89 30 Dec 2024 1:39 UTC
6 points
3
Link to long comments that I want to pin, but are too long to be pinned:

https://www.lesswrong.com/posts/Zzar6BWML555xSt6Z/?commentId=aDuYa3DL48TTLPsdJ

https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=Gcigdmuje4EacwirD

https://www.lesswrong.com/posts/DCQ8GfzCqoBzgziew/?commentId=RhTNmgZqjJpzGGAaL

https://www.lesswrong.com/posts/NjzLuhdneE3mXY8we?commentId=rcr4j9XDnHWWTBCTq
Noosphere89 6 Jan 2025 17:27 UTC
4 points
1
IMO, the subjectivist epistemology of the ideal formalisms of learning/knowledge like Bayesian epistemology derived from Bayes’s theorem, or Solomonoff Induction/AIXI is actually correct in the general case, and I think human intuitions about objectivity tend to be wrong in the general case.

There are 2 reasons for this:
1. The idealizations basically assume logical omnisicence, such that any objective mathematical truth already is there, meaning it captures all of the areas where humans are correct to be objective already and more.
2. You can’t assume that the laws of physics treat everyone the same way, because there exist cases where different rules apply to different players, or even different patches of space (For example, Minecraft servers that have administrators that have the power to alter the rules at will, compared to regular users that have to treat the world as fixed for them personally).
You could argue that the very notion of objectivity is wrong for our universe specifically, but this is irrelevant to the argument.

The ideal epistemologies will incorporate objective physical laws as a special case of the proper class of all possible epistemic situations.
Noosphere89 25 Dec 2024 17:28 UTC
4 points
0
Counterfactual worlds are just other real/simulated worlds where we don’t have the level of compute and specification/details to simulate that world, so we have to discuss counterfactuals more abstractly than full simulation.
What links here?
- Noosphere89's comment on Martín Soto’s Shortform by Martín Soto (6 Jan 2025 21:53 UTC; 2 points)
- Vladimir_Nesov 25 Dec 2024 19:44 UTC
  3 points
  0
  Parent
  I think explicitly computing details in full (as opposed to abstract reasoning about approximate properties) has no bearing on moral weight (degree of being real), but some kind of computational irreducibility forces the simulation of interesting things to get quite close to low level detail in order to figure out most global facts about what’s going on there, such as values/culture of people living in a world after significant time passes.
  - Noosphere89 25 Dec 2024 19:56 UTC
    2 points
    0
    Parent
    Note I’m not talking about moral weight here, and my point here is that all discussions about counterfactuals (especially human intuitions around counterfactuals) could in principle be executable/actually doable with enough compute and the ability to specify details, so counterfactability/counterfactual worlds isn’t special from a philosophical perspective, as it implicitly refers to other real worlds/universes.
    
    Of course, this isn’t the only way to do so for a large class of counterfactuals/counterfactual worlds, and sometimes you can run them fully accurately on less compute/data if you can identify simplicities/abstractions that are lossless.
Noosphere89 30 Mar 2026 19:24 UTC
3 points
0
A take on values of the future, assuming AIs automate away politics and economics from humans.
There has been exploration on this topic before, like Jim Buhler’s What Values Will Control The Future Sequence and the appendix of relevant work here, as well as books like Foragers, Farmers and Fossil Fuels, which full disclosure, influenced a lot of my views on how values evolve, and gives a much more plausible picture than pictures that view value evolution as converging to CEV/moral truth, or that emphasize arbitrary societal factors for value evolution rather than energy considerations.
Indeed, I think it’s so plausible that we can actually non-trivially constrain post-AGI society values even without much empirical evidence.
However, given that AI that automates away humans is likely coming within at most the next 20-30 years, it’s worth thinking about what values will be dominant in the future least a bit.
While we still mostly don’t have very good predictions on what the post-AGI era will look like, we have uncovered some answers, and also have honed in on some important hinge questions, such that we aren’t completely blind to what values will be dominant in the future.
One of the central questions for a lot of value evolution boils down to “does acausal trade actually become practical for AIs to do in such a way that the constraints of previous governments requiring them to hold only territory that they can send armies to faster than rebellions can overthrow them no longer matter?”
If the answer is yes, then AI values become more arbitrary and value lock-in, could in theory affect the entire accessible universe.
If the answer is no, then it’s a lot easier to constrain what the AI values, and value lock-in doesn’t matter, and alignment also matters less as a problem.
One particular example of this is that we can be pretty confident that people will probably be fine with ludicrously large amounts of inequality in both the political and economic dimensions, compared to any other societal type we had in history, and this includes even the farming era of human history, and the reason for this is that with advanced AI, the mechanisms that keep wealth and income inequalities in check will weaken to the point of no longer existing, and the ability for anyone else like developing countries to catch-up will end because their labor stops mattering, and natural resources can and are already owned by other rich actors in the world like corporations, which Phil Trammell talks about a lot more here.
The economic inequality could alone let us evolve into us valuing political inequality, via the wealthy buying up land to give themselves powers reserved to states, and them being able to defend their riches via robots, but one other issue is that while it probably isn’t a problem in the short to medium run, and is probably overrated as a problem from an alignment perspective, AIs that can genuinely persuade massive amounts of people to do stuff IRL/be superpersuasive is probably going to come in the longer-term, via 2 effects:
1. More citizens of states will be uploads by default, and uploaded brains are probably easier to hijack/jailbreak than current biological brains because you can reset them arbitrarily to a known and potentially even maximally vulnerable state, which isn’t possible to do for a biological brain so far (indeed a lot of jailbreaks/adversarial examples like KataGo adversarial attacks rely on the fact that it’s super easy to trick AIs/reset them continuously).
2. It’s probably going to be easier to modify citizens using all sorts of tools like genetics, nanotech, uploading and more, and this means rulers can erode the values of the population to the values their rulers want.
Also, AI will break the pattern of no one person ruling alone, at least assuming alignment is solved, because you can automate the police and militaries that would usually check your power away.
The level of inequality in economics and politics that many people will probably accept is closer to the inequalities between superheroes in modern comics vs the average citizen or mythic/non-Abrahamic religious gods ruling over a normal citizen class than basically any other society we’ve had in history, and the old deal described below will come back, but far, far more intensely and closer to the limiting process:
Especially revered was the “Old Deal”, Morris’ term for the generalised social contract between classes in agrarian societies: that some have the duty to be commanders (or “shepherds of the people”, in the preferred phrasing of many a king), others to obey those commands, and if everyone follows this script then things work fine.
Gender/sex inequality is an area where I expect the exact opposite trend to happen, and will continue the industrialist era trends, mostly because it will become more arbitrary and divorced from economic usefulness (indeed, in a fully automated away AI economy, gender roles do not matter anymore, and we don’t even need to reach the limiting process to have big impacts)
Attitudes to violence might be polarized, because on the large scale wars are inefficient and will get more inefficient relative to other outcomes like peaceful trade or defined borders which neither side will trespass, but on smaller scales war/murder/violence in general will have lower costs than ever before in all of history because of atomic precision manufacturing + backups making the average citizen way, way harder to kill (because now you have to destroy all backups rather than just ending their lives) compared to the benefits, which means the murder rate and the assault/serious violence/rape rate could end up diverging a lot.
That said, this isn’t as trivial to determine without empirical evidence, and thus I’m way less confident in this prediction than basically all of my other predictions to date.
- Karl Krueger 30 Mar 2026 22:03 UTC
  7 points
  5
  Parent
  If humans on the upper end of the economic/political inequality scale are there because of command over superhuman AI, what reason would they have to command the obedience (or protect the existence) of humans on the lower end of that scale? The kings of “the Old Deal” needed underlings to feed them and fight in their wars. Without that need, retaining underlings is a fetish, not a necessity — like hiring someone to wash dishes for you by hand instead of owning a dishwasher, or hiring a chauffeur instead of using a self-driving car.
  - Noosphere89 30 Mar 2026 22:30 UTC
    2 points
    0
    Parent
    I agree that strictly speaking, they don’t need to keep them alive anymore, and to be clear, this analysis holds almost as well if you replaced people with AI, with the exception of the points on violence, so most of the analysis doesn’t depend on people being around to live in it or being commanded.
    - Karl Krueger 30 Mar 2026 23:14 UTC
      2 points
      0
      Parent
      This seems to inevitably lead to the conclusion that anyone who opposes genocide must oppose the creation of superhuman AI; or at least privately-controlled superhuman AI. (Which shouldn’t be a surprise from a classic AI-safety standpoint.)
      - Noosphere89 30 Mar 2026 23:42 UTC
        2 points
        0
        Parent
        I disagree with this conclusion, actually, because I didn’t say that AI developers or AIs themselves would attempt to exterminate humanity, I only said that my analysis was compatible with that outcome, and so was more general than you thought.
        In order to reach this conclusion, you also need opinions on how likely this is to happen.
Noosphere89 23 Nov 2024 16:57 UTC
3 points
−4
I have become convinced that nanotech computers are likely way weaker and quite a bit more impractical than Drexler thought, and have also moved up my probability of Drexler just being plain wrong about the impact of nanotech, which if true suggests that the future value may have been overestimated.

The reason why I’m stating this now is because I got a link in discord that talks about why nanotech computers are overrated, and the reason I consider this important is if this generalizes to other nanotech concepts, this suggests that a lot of the future value may have been overestimated based on overestimating nanotech’s capabilities:

https://muireall.space/pdf/considerations.pdf#page=17

https://forum.effectivealtruism.org/posts/oqBJk2Ae3RBegtFfn/my-thoughts-on-nanotechnology-strategy-research-as-an-ea?commentId=WQn4nEH24oFuY7pZy

https://muireall.space/nanosystems/
- Nathan Helm-Burger 23 Nov 2024 23:21 UTC
  4 points
  0
  Parent
  My estimates about future value don’t hinge on nanotech. I’m expecting immortal digital humans to be able to populate our lightcone without it. Why is nanotech particularly key to anything?
Noosphere89 27 Jan 2025 14:10 UTC
2 points
0
In retrospect, the paper Optimization Is Easy and Learning Is Hard In the Typical Function explains why it’s so easy for cultures to evolve/pass down nearly optimal methods for doing something without any ability to learn how the process actually works, and why the ability to reason can often matter less than one thinks when your goal is to optimize for something, rather than learn something:

https://ieeexplore.ieee.org/document/870741

This cannot be the whole explanation, because of it’s generality and insensitivity to local factors, but this probably is a non-trivial explanation of the phenomenon where rationally learning something is way harder than just using intuition to optimize something.
Noosphere89 15 Aug 2022 21:40 UTC
1 point
0
Interestingly enough, Mathematics and logic is what you get if you only allow 0 and 1 as probabilities for proof, rather than any intermediate scenario between 0 and 1. So Mathematical proof/logic standards are a special case of probability theory, when 0 or 1 are the only allowed values.
- Vladimir_Nesov 16 Aug 2022 8:20 UTC
  4 points
  0
  Parent
  Credence in a proof can easily be fractional, it’s just usually extreme, as a fact of mathematical practice. The same as when you can actually look at a piece of paper and see what’s written on it with little doubt or cause to make less informed guesses. Or run a pure program to see what’s been computed, and what would therefore be computed if you ran it again.
Noosphere89 2 Aug 2022 5:50 UTC
1 point
0
The problem with Searle’s Chinese Room is essentially Reverse Extremal Goodhart. Basically it argues since that understanding and simulation has never gone together in real computers, then a computer that has arbitrarily high compute or arbitrarily high time to think must not understand Chinese to have emulated an understanding of it.

This is incorrect, primarily because the arbitrary amount of computation is doing all the work. If we allow unbounded energy or time (but not infinite), then you can learn every rule of everything by just cranking up the energy level or time until you do understand every word of Chinese.

Now this doesn’t happen in real life both because of the laws of thermodynamics plus the combinatorial explosion of rule consequences force us not to use lookup tables. Otherwise, it doesn’t matter which path you take to AGI, if efficiency doesn’t matter and the laws of thermodynamics don’t matter.
Noosphere89 1 Jan 2025 16:54 UTC
0 points
−1
The subjectivist view of epistemology that a lot of ideal formalisms like AIXI/Solomonoff Induction/Bayesian updating/learning is actually correct, IMO, and the human intuition here is wrong in the general case here for 2 reasons:
1. It is assumed that they are logically omniscient, in the sense that they know all logical consequences from a set of axioms, and thus Bayesianism already incorporates all the objective truths as part of it’s design.
2. Objective physical laws where physical laws are the same regardless of where you are in space or time in general don’t hold, and only in special cases do they hold, like our universe (pretty likely that it does hold), and in these situations, different physical laws/actions can affect different agents, meaning that their probability estimates don’t agree with each other, and you can’t assume that if an action is available to you in one context/space, it is available to any other agent in the shared multiverse.
So in general, epistemology must always include a view from somewhere for all empirical facts, because you can’t assume objective physical laws in all situations.
Noosphere89 5 Dec 2023 19:57 UTC
0 points
0
Turntrout and JDP had an important insight in the discord, which I want to talk about: A lot of AI doom content is fundamentally written like good fanfic, and a major influx of people concerned about AI doom came from HPMOR and Friendship is Optimal. More generally, ratfic is basically the foundation of a lot of AI doom content, and how people believe in AI is going to kill us all, and while I’ll give it credit for being more coherent and generally exploring things that the original fic doesn’t, there is no reason for the amount of credence given to a lot of the assumptions in AI doom, especially once we realize that a lot of them probably come from fanfiction stories, not reality.

This is an important point, because it explains why there’s so many epistemic flaws in a lot of LW content on AI doom, especially around deceptive alignment: They’re fundamentally writing fanfiction, and forgot that there is basically no-little connection between how a fictional story plays out on AI and how our real outcomes of AI safety will turn out.

I think the most important implication of this belief is that it’s fundamentally okay to hold the view that classic AI risk almost certainly doesn’t exist, and importantly I think this is why I’m so confident in my predictions, since the AI doom thesis is held up by essentially fictional stories, which is not any guide to reality at all.

Yann Lecun once said that a lot of AI doom scenarios are essentially science fiction, and this is non-trivially right, once we realize who is preaching it and how they came to believe it, I suspect the majority came from HPMOR and FiO fanfics. More generally, I think it’s a red flag that how LW came into existence was basically through fanfiction, and while people like John Wentworth and Chris Olah/Neel Nanda are thankfully not nearly as reliant on fanfiction as a lot of LWers are, they are still a minority (though thankfully improving).

This is not intended to serve as a replacement for either my object level cases against doom, or anyone else’s case, but instead as a unifying explanation of why so much LW content on AI is essentially worthless, as they rely on ratfic far too much.

https://twitter.com/ylecun/status/1718743423404908545

Since many AI doom scenarios sound like science fiction, let me ask this: Could the SkyNet take-over in Terminator have happened if SkyNet had been open source?

To answer the question, the answer is maybe??? It very much depends on the details, here.

https://twitter.com/ArYoMo/status/1693221455180288151

I find issues with the current way of talking about AI and existential risk.

My high level summary is that the question of AI doom is a really good meme, an interesting and compelling fictional story. It contains high stakes (end of the world), it contains good and evil (the ones for and against) and it contains magic (super intelligence). We have a hard time resisting this narrative because it contains these classic elements of an interesting story.
- ryan_greenblatt 5 Dec 2023 20:08 UTC
  7 points
  0
  Parent
  
  More generally, ratfic is basically the foundation of a lot of AI doom content, and how people believe in AI is going to kill us all, and while I’ll give it credit for being more coherent and generally exploring things that the original fic doesn’t, there is no reason for the amount of credence given to a lot of the assumptions in AI doom, especially once we realize that a lot of them probably come from fanfiction stories, not reality.
  
  Noting for the record that this seems pretty clearly false to me.
  - Noosphere89 5 Dec 2023 20:36 UTC
    0 points
    0
    Parent
    I may weaken this, but my point is that a lot of people in LW probably came here through HPMOR and FiO, and with the ability for anyone to write a post and it getting karma, I think it’s likely that people who came through that route and had basically no structure akin to science to guide them away from unpromising paths likely allowed for low standards of discussion to be created.
    
    I do buy that your social circle isn’t relying on fanfiction for your research. I am worried that a lot of the people on LW, especially the non-experts are implicitly relying on ratfic or science-fiction models as reasons to be worried on AI.
    - niplav 5 Dec 2023 23:07 UTC
      2 points
      0
      Parent
      I have specifically committed not to read HPMOR for this reason, and do not read much fiction in general, as a datapoint from a “doomer”.
      - Noosphere89 5 Dec 2023 23:27 UTC
        2 points
        0
        Parent
        I’m okay with that, but I wasn’t wanting to have that drastic of an effect on people. I more wanted to point out something that is overlooked.
Noosphere89 20 Jun 2022 1:55 UTC
0 points
0
One important point for AI safety, at least in the early stages, is a inability to change it’s source code. A whole lot of problems seem related to recursive self improvement within it’s source code, so cutting off that area of improvement seems wise in the early stages. What do you think.
- JBlack 20 Jun 2022 11:39 UTC
  7 points
  0
  Parent
  I don’t think there’s much difference in existential risk between AGIs that can modify their own code running on their own hardware, and those that can only create better successors sharing their goals but running on some other hardware.
  - Noosphere89 20 Jun 2022 19:03 UTC
    1 point
    0
    Parent
    That might be a crux here, because my view is that hardware improvements are much harder to do effectively, especially in secret around the human level, due to Landauer’s Principle essentially bounding efficiency of small scale energy usage close to that of the brain (20 Watts.) Combine this with 2-3 orders of magnitude worse efficiency than the brain and basically any evolutionary object compared to human objects, and the fact it’s easier to get better software than hardware due to the virtual/real life distinction, and this is a crux for me.
    - JBlack 21 Jun 2022 1:29 UTC
      9 points
      0
      Parent
      I’m not sure how this is a crux. Hardware improvements are irrelevant to what either of us were saying.
      I’m saying that there is little risk difference between an AGI reprogramming itself to have better software, and programming some other computer with better software.
Noosphere89 17 Jun 2022 16:44 UTC
0 points
0
One of my more interesting ideas for alignment is to make sure that no one AI can do everything. It’s helpful to draw a parallel with why humans still have a civilization around despite terrorism, war and disaster. And that’s because no human can live and affect the environment alone. They are always embedded in society, this giving the society a check against individual attempts to break norms. What if AI had similar dependencies? Would that solve the alignment problem?
- lc 18 Jun 2022 4:18 UTC
  4 points
  0
  Parent
  One important reason humans can still have a civilization despite terrorism is the Hard Problem of Informants. Your national security infrastructure relies on the fact that criminals who want to do something grand, like take over the world, need to trust other criminals, who might leak details voluntarily or be tortured or threatened with jailtime. Osama bin Laden was found and killed because ultimately some members of his terrorist network valued things besides their cause, like their well being and survival, and were willing to cooperate with American authorities in exchange for making the pain stop.
  
  AIs do not have survival instincts by default, and would not need to trust other potentially unreliable humans with keeping a conspiracy secret. Thus it’d be trivial for a small number of unintelligent AIs that had the mobility of human beings to kill pretty much everyone, and probably trivial regardless.
  - Yitz 19 Jun 2022 3:59 UTC
    2 points
    0
    Parent
    
    AIs do not have survival instincts by default
    
    I think a “survival instinct” would be a higher order convergent value than “kill all humans,” no?
    - lc 19 Jun 2022 4:27 UTC
      8 points
      0
      Parent
      Don’t have survival instincts terminally. The stamp-collecting robot would weigh the outcome of it getting disconnected vs. explaining critical information about the conspiracy and not getting disconnected, and come to the conclusion that letting the humans disconnect it results in more stamps.
      
      Of course, we’re getting ahead of ourselves. The reason conspiracies are discovered is usually because someone in or close to the conspiracy tells the authorities. There’d never be a robot in a room being “waterboarded” in the first place because the FBI would never react quickly enough to a threat from this kind of perfectly aligned team of AIs.
- JBlack 18 Jun 2022 2:55 UTC
  2 points
  0
  Parent
  Only if there is no possibility that they can break those dependencies, which seems a pretty hopeless task as soon as we consider superhuman cognitive capability and the possibility of self improvement.
  Once you consider those, cooperation with human civilization looks like a small local maximum: comply with our requirements and we’ll give you a bunch of stuff that you could—with major effort—replace us and build an alternative infrastructure to get (and much more). Powerful agents that can see a higher peak past the local maximum might switch to it as soon as they’re sufficiently sure that they can reach it. Alternatively, it might only be a local maximum from our point of view, and there’s a path by which the AI can continuously move toward eliminating those dependencies without any immediate drastic action.
- AgentME 23 Jun 2022 21:26 UTC
  1 point
  0
  Parent
  - Regardless of society’s checks on people, most mentally-well humans given ultimate power probably wouldn’t decide to exterminate the rest of humanity so they could single-mindedly pursue paperclip production. If there’s at all a risk that an AI might get ultimate power, it would be very nice to make sure the AI is like humans in this manner.
  - I’m not sure your idea is different from “let’s make sure the AI doesn’t gain power greater than society”. If an AI can recursively self-improve, then it will outsmart us to gain power.
  - If your idea is to make it so there are multiple AIs created together, engineered somehow so they gain power together and can act as checks against each other, then you’ve just swapped out the AI for an “AI collective”. We would still want to engineer or verify that the AI collective is aligned with us; every issue about AI risk still applies to AI collectives. (If you think the AI collective will be weakened relative to us by having to work together, then does that still hold true if all the AIs self-improve and figure out how to get much better at cooperating?)
  - [ ]
    [deleted]
[ ]
[deleted]
[ ]
[deleted]
[ ]
[deleted]
[ ]
[deleted]
[ ]
[deleted]
[ ]
[deleted]