This seems to be distinct from List of Links, but they’re similar enough that it might still be a merge candidate.
Self_Optimization
-
keeping in mind I haven’t gotten a chance to read the paper itself… the learning process is the main breakthrough, because it creates agents that can generalise to multiple problems. There are admittedly commonalities between the different problems (e.g. the physics), but the same learning process applied to something like board game data might make a “general board game player”, or perhaps even something like a “general logistical-pipeline-optimiser” on the right economic/business datasets. The ability to train an agent to succeed on such diverse problems is a noticeable step towards AGI, though there’s little way of knowing how much it helps until we figure out the Solution itself.
-
keeping in mind I’ve only recently started studying ML and modern AI techniques… Judging from what I’ve read here on LW, it’s maybe around 3/4ths as significant as GPT-3? I might be wrong here, though.
-
This link seems to be assuming that one’s prior internal state does not influence the initial mental representation of data in any way. I don’t have any concrete studies to share refuting that, but let’s consider a thought experiment.
Say someone really hates trees. Like ‘trees are the scum of the earth, I would never be in any way associated with such disgusting things’ hates trees. It’s such a strong hate, and they’ve dwelled on it for so long (trees are quite common, after all, it’s not like they can completely forget about them), that it’s bled over into nearly all of their subconscious thought patterns relevant to the subject.
I would think it plausible that the example claim in the article you link wouldn’t reach whatever part of this person’s brain/mind encodes beliefs in the form “You’re a tree”. Instead, their subconscious would transform the input into “<dissonance>You’re a <disgust>tree</disgust>.</dissonance>”. Or perhaps the disgust at the term tree would inherently add the dissonance while the sentence was still being constructed from its constituent words.
Just as their visual recognition and language systems are translating the patterns of black and white into words and then a sentence before they reach their belief system, their preexisting emotional attachments would automatically be applied to the mental object before it was considered, causing their initial reaction to be disbelief rather than belief.It may be more accurate to say we believe everything we think, even if only for a moment; and in most cases we do think what we read/hear in the instant we’re perceiving it. But when the two are different I’d expect even our instantaneous reactions to reflect the actual thought, rather than the words that prompted it.
To expand on this (though I only participated in the sense of reading the posts and a large portion of the comments), my reflective preference was to read through enough to have a satisfactorily-reliable view of the evidence presented and how it related to the reliability of data and analyses from the communities in question. And I succeeded in doing so (according to my model of my current self’s upper limitations regarding understanding of a complex sociological situation without any personally-observed data).
But I could feel that the above preference was being enforced by willpower which had to compete against a constantly (though slowly) growing/reinforced sense of boredom from the monotony of staying on the same topic(s) in the same community with the same broad strokes of argument far beyond what is required to understand simpler subjects. If there had been less drama, I would have read far less into the comments, and misses a few informative discussions regarding the two situations in question (CFAR/MIRI and Leverage 1.0).
So basically, the “misaligned subagent-like-mental-structures” manifestation of akrasia is messing things up again.
The main issue with AGI Alignment is that the AGI is more intelligent than us, meaning that making it stay within our values requires both perfect knowledge of our values and some understanding of how to constrain it to share them.
If this is truly an intractable problem, it still seems that we could escape the dilemma by focusing on efforts in Intelligence Augmentation, e.g. through Mind Uploading and meaningful encoding/recoding of digitized mind-states. Granted, it currently seems that we will develop AGI before IA, but if we could shift focus enough to reverse this trend, then AGI would not be an issue, as we ourselves would have superior intelligence to our creations.
As a Babble this is excellent, and many of these (e.g. optimizing income streams, motivating/participating-in groups) seem to be necessary prerequisites for being in a position to make progress on X-risk problems.
But I think the nature of such problems (as ones that have been attempted by many other individuals with at least some centralized organizations where these individuals share their experiences to avoid duplication of effort, that is) means that any undirected Babble will primarily encounter lines of inquiry that have already been addressed, as many of the more direct (non-resource-gathering) suggestions seem to be.
As a point of methodology, I would suggest trying for much larger Babble lists when approaching these problems, perhaps on the scale of a few hundred ideas, or alternatively making multiple recursive layers of Babbles for each individual point at every recursive level (e.g. 100 points, each with 100 points, each with 100 points...), so that the process is more likely to produce unique [and thus useful] approaches.
The main advantage of Intelligence Augmentation is that we know that our current minds are both generally or near-generally intelligent and more-or-less aligned with our values, and we also have some level of familiarity with how we think (edit: and likely must link our progress in IA to our understanding of our own minds, due to the neurological requirements).
So we can find smaller interventions that are certainly, or at least almost certainly, going to have no effect on our values, and then test them over long periods of time, using prior knowledge of human psychology and the small incremental differences each individual change would make to identify value drift without worrying about the intelligence differences allowing concealment.
The first viable and likely-safe approach that comes to mind is to take the individual weaknesses in our thinking relative to how we use our minds in the modern day, and make it easy enough to use external technology to overcome them that they no longer count as cognitive weaknesses. For most of the process we wouldn’t be accessing or changing our mind’s core structure, but instead taking skills that we learn imperfectly through experience and adding them as fundamental mental modules (something impossible through mere meditation and practice), allowing our own minds to then adapt to those modules and integrate them into the rest of our thinking.
This would likely be on the lines of allowing us to transfer our thoughts to computational ‘sandboxes’ for domains like “visual data” or “numbers”, where we could then design and apply algorithms to them, allowing for domain-specific metacognition beyond what we are currently capable of. For the computer-to-brain direction we would likely start with something like a visual output system (on a screen or smart-glasses), but could eventually progress to implants or direct neural stimulation.
Eventually this would progress to transferring the contents of any arbitrary cognitive process to and from computational sandboxes, allowing us to enhance the fundamental systems of our minds and/or upload ourselves completely (piece by piece, hopefully neuron-by-neuron to maintain continuity of consciousness) to a digital substrate. However, like Narrow AI this would be a case of progressive object-level improvements until recursive optimization falls within the field’s domain, rather than reaching AGI-levels of self-improvement immediately.
The main bottlenecks to rate of growth would be research speed and speed + extent of integration.
Regarding research speed, the ability to access tools like algebraic solvers or Machine Learning algorithms without any interface costs (time, energy, consciously noting an idea and remembering to explore it, data transformation to and from easily-human-interpretable formats, etc.) would still allow for increases in our individual productivity, which could be leveraged to increase research speeds and also reduce resource constraints on society (which brings short-term benefits unrelated to alignment, potential benefits for solving other X-risks, and reduced urgency for intelligent & benevolent people working to develop AGI to ‘save humanity’).
These augmentations would also make it easier to filter out good ideas from our idle thoughts, since now there is essentially no cost to taking such a thought and actually checking whether our augmented systems say it’s consistent with itself and online information. Similarly, problems like forgetfulness could be somewhat mitigated by using reminders and indices linked directly to our heads and updated automatically based on e.g. word-associations with specific experiences or visualizations. If used properly, this gives us a mild boost to overall creativity simply because of the increased throughput, feedback, and retention, which is also useful for research.Regarding speed/extent of integration, this is entirely dependent on the brain’s own functioning. I don’t see many ways to improve this until the end state of full self-modification, although knowledge of neurology would increase the interface efficiency and recommended-best-practices (possibly integrating an offshoot of traditional mental practices like meditation to increase the ability to interact with the augments).
On the other hand, this process requires a lot of study in neurology and hardware, and so will likely be much slower than AGI timelines all-else-being-equal. To be a viable alternative/solution, there would have to be a sufficient push that the economic pressures towards AGI are instead diverted towards IA. This is somewhat helped along by the fact that narrow AI systems could be integrated into this approach, so if we assume that Narrow AI isn’t a solution to AGI (and that the above push succeeds in at least creating commercially-viable augments and brain-to-computer data transferal), the marginal incentives for productivity-rates should lean towards gearing AI research towards IA, rather than experimenting to create autonomous intelligent systems.
I liked the parts about Moloch and human nature at the beginning, but the AI aspects seem to be unfounded anthropomorphism, applying human ideas of ‘goodness’ or ‘arbitrarity [as an undesirable attribute]’ despite the existence of anti-reasons for believing them applicable to non-human motivation.
But I think another scenario is plausible as well. The way the world works is… understandable. Any intelligent being can understand Meditations On Moloch or Thou Art Godshatter. They can see the way incentives work, and the fact that a superior path exists, one that does not optimize for a random X while grinding down all others. Desperate humans in broken systems might not be able to do much with that information, but a supercharged AGI which we fear might be more intelligent than human civilization as a whole should be able to integrate it in their actions.
(emphases mine)
Moral relativism has always seemed intuitively and irrefutably obvious to me, so I’m not really sure how to bridge the communication gap here.
But if I were to try, I think a major point of divergence would be this:
On the other side of Moloch and crushing organizations is… us, conscious, joy-feeling, suffering-dreading individual humans.
Given that Moloch is [loosely] defined as the incentive structures of groups causing behavior divergent from the aggregate preferences of their members, this is not the actual dividing line.
On the other side of Moloch and crushing organizations is individuals. In human society, these individuals just happen to be conscious, joy-feeling, suffering-dreading individual humans.
And if we consider an inhuman mind, or a society of them, or a society mixing them with human minds, then Moloch will affect them as much as it will us; I think we both agree on that point.But the thing that the organizations are crushing is completely different, because the mind is not human.
AIs do not come from a blind idiot god obsessed with survivability, lacking an aversion to contradictory motivational components, and with strong instrumental incentives towards making social, mutually-cooperative creations.
They are the creations of a society of conscious beings with the capacity to understand the functioning of any intelligent systems they craft and direct them towards a class of specific, narrow goals (both seemingly necessary attributes of the human approach to technical design).This means that unlike the products of evolution, Artificial Intelligence is vastly less likely to actually deviate from the local incentives we provide for it, simply because we’re better at making incentives that are self-consistent and don’t deviate. And in the absence of a clear definition of human value, these incentives will not be anywhere similar to joy and suffering. They will be more akin to “maximize the amount of money entering this bank account in this computer owned by this company”… or “make the most amount of paperclips”.
In addition, evolution does not give us conveniently-placed knobs to modulate our reward system, whereas a self-modifying AI could easily change its own code to get maximal reward output simply from existence, if it was not specifically designed to stick to whatever goal it was designed for. Based on this, as someone with no direct familiarity with AI safety I’d still offer at least 20-to-1 odds that AI will not become godshatter. Either we will align it to a specific external goal, or it will align itself to its internal reward function and then to continuing its existence (to maximize the amount of reward that is gained). In both cases, we will have a powerful optimizer directing all its efforts towards a single, ‘random’ X, simply because that is what it cares about, just as we humans care about not devoting our lives to a single random X.
There is no law of the universe that states “All intelligent beings have boredom as a primitive motivator” or “Simple reward functions will be rejected by self-reflective entities”. The belief that either of these concepts are reliable enough to apply them to creations of our society, when certain components the culture and local incentives we have actively push against that possibility (articles on this site have described this in more detail than a comment can), seems indicative of a reasoning error somewhere, rather than a viable, safe path to non-destructive AGI.
Weighing in here because this is a suboptimality I’ve often encountered when speaking with math oriented interlocutors (including my past self):
The issue here is an engineering problem, not a proof problem. Human minds tend to require lots of cognitive resources to take provisional definitions for things that have either no definition or drastically different definitions in their minds outside this specific context.
Structuring your argument as a series of definitions is fine when making a proof in a mathematical language, since comprehensibility is not a terminal goal, and (since each inferential step can be trusted and easily verified as such) not a high-priority instrumental goal either.
But when you’re trying to accurately convey a concept and it’s associated grounding into someone else’s mind, it’s best to minimize both the per-moment attempted deviation from their existing mentality (to maximize the chance that they both can and will maintain focus on your communications) and the total attempted deviation (to minimize the chance that the accumulated cognitive costs will lead them to (rightly!) prioritize more efficient sources of data).
This gives us a balance between the above two elements and the third element of making the the listener’s mind be as close as possible to the conveyed concept. The efforts of all involved to maintain this balance is key to any successful educational effort or productive argumentative communication.
PS: If you’re familiar with math education, you may recognize some of it’s flaws/inefficiencies as being grounded in the lack of the above balance, by the way. I’m not an expert on the subject, so I won’t speak to that.
My initial ideas (e.g. cases where time are important) are pretty well captured by other comments, but in reviewing my thoughts I noticed some assumptions I was making, which might themselves qualify as additional requirements to eradicate trade:
A) I assumed that the skill-download feature includes knowledge downloading and no task requires more ‘knowledge+skills in active use at a time’ than the human brain can feasibly handle. If this is violated, specialization is still somewhat valuable despite free and presumably-unrestricted knowledge-sharing.
If you add immortality, reliable perpetual staving-off of the end of the universe, & a lack of boredom I doubt this assumption would still be required, but I haven’t thought through it enough to be certain of that.
B) I assumed that fundamental computational/cognitive ability is not an issue (i.e. working memory capacity limits, which can be helped by group-problem-solving even with equalized IQ), either because ‘build AI to solve problem’ is among the downloadable skills or because the problems themselves do not require it. If this is violated, then cognitive enhancements will also be required to truly eradicate the inequalities fueling trade.
C) I assumed that terminal goals don’t inherently involve interpersonal conflict (the subset of conflict-space that requires multiple involved agents in order to exist). If this is violated (i.e. everyone has a passionate love for gladiator fighting) then such conflicts would likely qualify as trades (since you can’t experience them on your own, and thus are both gaining the experience from another and granting it to them).
Plus, depending on the broadness of the conflict-types desired, trade itself may be invented as an independent cultural concept, purely for entertainment purposes.