Still haven’t heard a better suggestion than CEV.
TristanTrim
Using chatbots is delusion-inducing stimuli, and plenty of people have signed up for that.
Oh, I really like what you are thinking with the “explanation why” thing. I think it still fails on the first two “trust solution problems” I mentioned, but it seems like a promising concept for curating trust networks.
Reading a bit about “natural language autoencoder” technique. It seems they are treating the activations as the distribution to be encoded into a latent space, and using natural language as the latent space.
Given my experience with autoencoders, the latent space is kinda arbitrary, determined by the relationship of locations in the distribution to one another, and the initialization. So it seems clear to me that the encoder would learn to output text that the decoder can use to reconstruct the activation, it just isn’t clear to me why that text should have any meaning to humans reading it that necessarily relates to what the activation means.
Does anyone know more about the technique who could answer this question, or verify my intuition that this technique doesn’t guarantee an accurate description of latent semantics?
sensitive to the precise chronological order of “realizing things” and “self-modification”.
I think this is a misunderstanding of CEV. From (the CEV paper):
“Spread” describes cases where your extrapolated volition becomes unpredictable, in- tractable, or random. You might predictably want a banana tomorrow, or predictably not want a banana tomorrow, or predictably have a 30% chance of wanting a banana tomorrow depending on variables that are quantum-random, deterministic but unknown, or computationally intractable. When multiple outcomes are possible and probable, this creates spread in your extrapolated volition.
“Muddle” measures self-contradiction, inconsistency, and cases of “damned if you do and damned if you don’t.” Suppose that if you got a banana tomorrow you would not want a banana, and if you didn’t get a banana you would indignantly complain that you wanted a banana. This is muddle.
It seems to me “coherence” is meant to be applied over sets of people, but it is also meant to be applied over possible extrapolations of each person, and probably possible sets of people extrapolated together. As such, CEV is impractical to compute, and possible incomputable, and may contain moral simulation hazards. (Are we living in a simulation being extrapolated in the calculation of CEV?)
But the problem of the order of “realizing” and “self-modifying” is not an issue because all possible branches are meant to be considered. The sensitivity is there, yes, but is expressed in individuals having wide CEV spread, and then CEV targets the coherence within that spread.
I think that does leave as an open question how to weight incoherent branches in that spread against one another. I don’t think that was well covered in the CEV paper.
Thanks for the response : )
Avoiding networks where people share slop either only works for a minority of people who actually do it, or becomes a race for people to migrate to networks without slop, and sloppers to find the networks people are migrating to. As time goes on, I expect agentic AI spam, so sloppers will migrate to all networks open to them as quickly as (or more quickly than) real people migrate to them. So I don’t consider this a real solution.
The “trust” solution seems much more promising, but starts running into other problems:
It requires manual effort, so casual users will not do it.
It increases filter bubble / echo chamber effects.
If done automagically, it gives people running the social media platforms power to censor and shape discourse.
I think trust networks might perform better, if good rules for how to extend trust linking could be developed, but it doesn’t seem obvious to me how that could be done. Still, it seems like a promising direction.
I’d also like to see solutions that somehow abstract what is being posted so that the frequency a kind of content is posted doesn’t make you more likely to encounter it. Give people a better sense of what is being posted rather than how much something is being posted. Although I suppose slop might still be able to drown out real content by having a much larger diversity of kinds of nonsense posted than there are different kinds of real content getting posted.
Vector search is very much the kind of thing I’m talking about, but seems early in the tech tree, if that makes sense.
Classifier models seem much less what I’m talking about depending on what you mean. If it’s just a basic deep learning model trained to apply a label based on a labelled training set, that’s not at all what I’m talking about. But if it’s like, an autoencoder that learns a latent space, and then cluster analysis and classification can be done in that latent space, that is the sort of thing I’m talking about.
The important thing is that it’s an unsupervised model that learns a latent space that is high dimensional in the sense that it is maybe around 10 to 1000 dimensions, but is much lower dimension than the space it is trained on. For example, 100x100xRGB images live in a 30,000 dimensional space.
For a language model, it’s a bit harder to see what’s going on, since the text gets tokenized and then the dimensionality stays roughly the same until it gets sampled from to give next token prediction. But the function being approximated is “what is the probability distribution of the next token given the sequence of previous tokens” which is a space with dimensionality: (possible sequences)*(possible distributions) = (vocab size)^(length of sequence) * (vocab size −1), which is absolutely massive. For gpt-2, I think that’s about (5e4)^(1e3) * (5e4) ~= 1e4700. So even though the model isn’t actually changing dimensionality much between the input and output, it’s still representing a semantic space with a much higher dimensionality in much lower dimensional latent spaces.
Anyway, I think the broad umbrella of what I’m talking about may not have much that isn’t being focused on at all, but I think if there was more focus in the area, it would be possible to find lots of worthwhile things that haven’t been tried. Basically, shifting focus off of goals like “make bigger models” and “shove chatbots into every app” and onto goals in umbrellas like:
Using the structure of the latent spaces to analyze and learn new things about the spaces they represent.
Developing more foundational theory to better understand the kinds of structures that exist in these spaces and how to work with them.
Developing better general and domain specific tools for experts and non-experts to get higher bandwidth insight from these spaces.
To be clear, I do think there are people pursuing goals within each of those umbrellas, but it seems much less than it deserves. Although, it is possible my view is coloured by marketing hype and if I actually dug into investment statistics I would be happier with the picture I saw. Not sure.
Cool post! I have random thoughts you may or may not find interesting:
In the specific case of the bigfoot example, focusing on mind-world correlation measure seems worthwhile. You have the belief that your mind has become linked to the state of whether bigfoot is in the next room by the process of your mind remembering times you walked into rooms that didn’t have bigfoot, and your understanding of society and the prevailing scientific beliefs on bigfoot. With common assumptions it seems there is a strong link between the state of your mind and the actual absence of bigfoot in the next room. You can apply the same process of examining the mind-world correlation of your friend who thinks bigfoot in the next room is 50⁄50. Maybe you happen to know your friend has only very limited experience with probability, thinking of 50⁄50 as a shorthand for “it either happens or it doesn’t”. Or they have a history of magical thinking, preferring fantasies to facing a cold and uncaring reality. These may imply that your friends language isn’t mapping to reality in the same way as yours, or that your friends model has a weaker connection to reality than yours. Much easier way to dismiss the specific bigfoot credence than consulting the Solomonoff prior.
For the malign Solomonoff simulation capture situation, I have an intuition that you would want to strategies over possible worlds to take actions that work best when applied uniformly, including modelling the relationship between the possible versions of yourself taking actions and how they possibly relate to future simulated versions of yourself. Probably actions taken by base level, earlier timeline, versions are more significant, since they can have butterfly effects on future worlds. The doesn’t really protect against attacks from spaces that are not causally linked, but I think the no free lunch theorem applies there. So from a pragmatic standpoint it makes sense to act as if you are not a simulated version I would think.
In the discussion of betting money, experience, and terminal value, I think what is being reached for is an abstraction of “wantingness”, and terminal value makes the most sense, but runs into the issue that I don’t think that individual humans have consistent, coherent terminal values. But it isn’t a show stopper, because the goal is just that the agent assigning a probability is trying to make a bet such that they maximize their winnings, so we can just suppose that. Say “suppose you are given 100 probabilitrons and want to maximize your probabilitrons by betting them”. We can’t expect actual agents to actually care about the results of the thought experiment, but it does point at what we are trying to point at with assigning probabilities. However, it does seem worthwhile to explicitly note that people are not always incentivized to give accurate probability estimates, and that the purpose of probability estimates is for decision systems to make use of.
In saying “I want to choose between action A and B, and taking into account all considerations, I want to know which action leads to a better world according to my values.” you have provided “A and B” as possible actions, but it is important how A and B were located in the space of possible actions. This seems like a question corresponding to the problem of locating hypotheses worthy of consideration, and the problem of finding strategies to actually make these considerations that are not intractable/incomputable.
I agree with the importance of having both good and bad examples, even if it is somewhat subjective at the margin.
On Brendon Long’s prompting, I’m going to generate a list of speculative examples for what I mean when I say GenAI for latent space seems surprisingly ignored compared to GenAI for actual generation.
This is not meant to be exhaustive. To be clear, I don’t know that people are specifically not working on the latent space applications, I’m confident many are, I just don’t hear about it as much, and it doesn’t seem to be where the hype and funding is, which seems strange to me since from my pov, it’s where most of the value is.
LLMs
People focus on generating text, mostly by tricking LLMs into simulating the “chat assistant” role, and in that role wearing many different hats. It’s honestly amazing this works as well as it does. There’s tons of potential value here and also tons of potential harms. Some ideas for latent space representation applications:
Developing a metric space over natural language for classifying written content at multiple scales (eg: books, entire articles, tweets, paragraphs, sentences, phrases).
This could be used for search engines, clustering and analyzing semantic content, and one of the most important applications in my opinion: de-duplicating discourse and making the cacophony of viewpoints and information on social media and elsewhere into structures that can be more meaningfully engaged with. I have some strong opinions on how this could be done right or wrong.
“Clustering and analyzing semantic content” is still extremely broad, covering other broad umbrellas like sentiment analysis, linguistics, and logical argument structure.
Clustering personality types (since LLMs are hyper aware of variations in writing style). This could potentially lead to something more articulate than the big-five personality classification.
Building on this, personality disorders could probably be articulately examined in a clustered personality space. This could then (potentially) be used for cheap, effective diagnosis of mental disorders from speech to text, or arbitrary written text, depending on how well the relevant personality subspace can be factored out of other context withing the latent space.
I think this would also be an incredibly useful space to correlate with with many other psychological studies, and possibly studies in other fields, like medicine. Being able to do correlation like this fills me with hope for moving the entire field of psychology closer to the harder sciences. Able to make stronger, more beneficial predictions.
This could also be used for more intelligently targeted propaganda. I view this as a downside.
This could also be used for recommendation algorithms, both for content and products. I think this could be positive or negative depending on how much it is used for (finding vulnerabilities in peoples psyches and getting them to buy things that don’t benefit them, or get them addicted to content streams) VS (finding products that people actually most want to buy and are benefited by, and show them content that likewise improves their well being).
Image Generation
People seem most interested in using this for generating generic art to improve the aesthetic of products, advertisements, blog articles, and etc… and for generating porn. (Aside; I think it is basically bad if artists are being put out of work (without compensation) by software artifacts trained on the stolen work of artists). Ideas for latent space representation applications:
As with text, the basic idea of having a metric over image-space that has rich semantics seems valuable for indexing and clustering images, both for search applications, and general classification applications.
Anomaly detection. For manufacturing processes at scale, it becomes a difficult problem to predict every way a manufactured unit could go wrong, so detecting and preventing very rare flaws becomes very difficult. Having a cluster-space with a region of known “good” unit appearances allows the automatic detection of appearances that deviate from “good” in ways that do not need to be specified ahead of time. This also extends to latent spaces generated from including other manufacturing data.
Aesthetics are important for peoples psychological well being. This could be better studied with a quantifiable metric space, and used in the design of products, buildings, interior designs, and art that could be more psychologically beneficial to people. The dark side application of this is that products and content could be designed to be even more addictive and harmful to people. (More than ever, it seems like a huge problem that our socio-technical outcome influencing systems do not robustly target human benefit).
For medical conditions that can be diagnosed visually, being able to cluster and classify cases would be valuable for diagnosis and for giving more low cost, repeatable, and specific data for medical studies. As with other applications, this probably requires methods for effectively factoring the relevant latent subspace out from irrelevant latent subspaces.
For any kind of biological study, having more repeatable quantification of phenotype-space seems generally valuable.
Brain imaging
I think latent models built on brain imaging and other data has so much potential for neuroscience. Both for examining the subspace across different brain structure (both across people and across injury/health status), and the subspace of brain state variation holding brain structure constant. I think there’s lots of promise for improving medical diagnostics quality, and for linking latent spaces between expensive MRI data and cheaper diagnostic equipment to bring down the cost and improve the availability of diagnostics. But I think I’m most excited by the potential for pure research, potentially understanding how thought and perception are encoded in brains. Learning how similar and different peoples inner worlds are. Eventually, this might lead to healing and improving people’s inner worlds to a degree that would previously be incomprehensible.
I think these are inspiring, early days looks at what can become possible:
I hold the following three views, seemingly without contradiction:
AI agents may be moral patients, so understanding their experience is something we should care. We should feel directionally similar feelings about potential implications that we do about human slavery.
AI progress holds potential for incredible humanitarian progress. Inhibiting AI progress without due cause should be viewed directionally similar to inhibiting vaccines or other large scale life saving and quality of life projects.
AI progress holds potential for human extinction. Extreme care must be taken regarding recursively compounding capabilities, capability overhang, and the specification and encoding of preferences in outcome influencing systems.
I don’t know how much any given person sees those views as contradictory. I feel like average members of the public are very unlikely to hold all three, and may view holding all three as a contradiction. Certainly each one complicates the others, but wise action requires considering all three.
It feels like genAI for actually generating things is, reasonably, what most people focus on, but it seems less valuable to me than the potential of genAI for developing latent space representations of spaces that are normally too highly dimensional to reason about or write good algorithms over.
With the ever increasing proliferation of slop, both from humans and machines, the indexing, organizing, and deduplication of content has never been more important. I hope competent people are focused on the problem. Anyone got any thoughts on the situation?
This reminds me of another communication problem I’ve been musing on here and there. If you solved the alignment problem to a sufficient degree that it was wise for humanity to proceed with ASI, could you convince others it was real and to take it seriously? It is a message that I would desperately want to effectively reach me and I harbor concerns that it might not.
I think it’s likely I misinterpreted your original sentiment “just, try not to be totalized about it.” to mean something like “don’t become very dedicated towards important well integrated aims” rather than what you probably meant which may be something more like “don’t become panicked and stressed about a poorly considered issue and adopt overly narrow strategies that neglect important considerations”.
I don’t think either of us is using “totalize” like the merriam-webster definition:
1: to add up : total
2: to express as a whole
I feel like I’m picking on your use of the word more than is valuable at this point. Sorry about that. I’ll try to explain my use of the word a bit by the following responses, but it’s probably not super important we get on the same page, so if it doesn’t make sense feel free to ignore.
you can’t be totalized about all of them!
I agree you can’t be totalized about all of them individually, but you can be totalized about the set of all of them together.
You might be saying “society should have a lot of totalized people, because this is a good way to solve problems that are very ‘all-or-nothing’ shaped.”
Yes. I think this is indeed what I am saying. Also that those totalized people need to be better at organization and coordination with one another.
if part of what needs to happen at the societal scale is to figure out solutions to many all-or-nothing problems at once, it seems surprising that what you want is “totalized people” instead of “people who specialize in the thing, but, are still tracking at least some of the other problems.”
I think you are very correct that specialization is very important, but coordination between specialists is also important. Specializing in helping to integrate and cross reference. IE specializing in totalizing. But also, I don’t think a totalized person must have a totalized career, more that they have totalized motivations.
As an example, I could become an EA and think that some cause, like preventing malaria, is important enough that it should consume all of my focus, and then conclude that I should specialize in law, make lots of money as a lawyer, and donate most of it to the against malaria foundation.
A totalized view of what is important does not need to imply one must become an expert in all subjects.
I also just think most of the other problems don’t make sense to be totalizing. Which ones are you thinking of?
Any claim that has extremely important implications if it is true. This includes ASI risk, climate change, the holocene extinction, societal collapse, global resource management, nuclear war and more generally geopolitical instability, pandemic risk, basically everything Toby Ord talked about in The Precipice, basically everything to do with communication and coordination, and also on the tail end weird intractable and difficult to make progress on things like philosophy of ethics, religion, consciousness, immortality, etc...
it’s more likely you accidentally neglect things that you needed, to either be personally healthy, or to make your local social/professional world healthy
I agree with you about that, and I think it is possible for totalization to cause a person to neglect the things you mention in ways that harm themselves and the cause they are trying to aid, but it seems more like a fact about people’s abilities to strategize and manage their executive functioning, not necessarily like a fact about totalization.
I don’t know how you would measure it but I would expect there are more people accidentally neglect things they need who aren’t totalized than those who are. I would expect totalization might have weakly positive effects, giving people motivation to do the things they must do to support themselves and their cause. This might be a case of general advice failing and needing to be reversed for some people.
In worlds where leaders X of some social movement condemn taboo action Y you would expect to see X condemning Y. But also in worlds where X supports Y, you would expect them to not be able to publicly state that they support Y, and so in these worlds too, you would expect X to condemn Y. But this creates a problem. If condemning Y is what X does in both worlds where they support and condemn Y, how can X actually communicate with supporters who find themselves wondering if they can support the movement with Y. Do sufficiently strong taboos against Y make it impossible to actually communicate true condemnation clearly? That would suck.
I’m going to note that this is a proxy conversation for recent current events and I’m not really going to respond with recent current events in mind but instead focus on the abstract social coordination issues.
I think there’s two very different aspects here both of which are important, (a) attribution of causal relevance and (b) attribution of reward & punishment to shape behaviour. When looking at the fire example from the perspective of (a) it might seem prudent to look at the psychology and upbringing of the “fire” shouter, and how society prepares or fails to prepare people for handling emergency situations such as fires. From the perspective of (b) on the other hand, it is more reasonable to want to have simple and clearly understandable rules and procedures. An appropriate rule in the fire example might be to ban shouting fire in a theater, making the “fire” shouter fully liable for harm caused, and make theaters liable for maintaining certain standard procautions and procedures in case of fire.
more broadly getting at “how do people who disagree about what’s true and what’s good, cooperate?”
Just wanna flag that I think this is a very important question for people to be focusing on.
I think it’s true. I don’t have a super principled answer other than “just, try not to be totalized about it.”
My view is closer to: be totalized about it, but:
Really deeply understand that other people are not totalized about it.
Recognize and learn from examples of other totalized people. Many have caused significant problems. Most have probably been crazy.
Try to argue yourself out of the position that is making you totalized. Make sure you understand your own position well. Think about the nuances and details.
Try to adopt and integrate other totalizing beliefs.
Try not to be a jerk about it.
I think it is correct for the social-judgment-sphere to have an immune system against totalizing beliefs.
I mostly disagree. I think something between “people should be able to hold totalizing beliefs without becoming totalized” and “we are living in a poly-crisis! There are too many totalizing problems now to easily catalogue all of them! Everyone needs to get totalized and organized and coordinated NOW!”
Thanks! I fully agree. What I said was wrong and I edited my comment to reflect that.
It’s somewhat nice that “spreading misinformation” is an umbrella covering doing so intentionally (lying) and unintentionally. It is unpleasant that another way to say “misinformation” is “fake news” and accusations of such seem to be available as a cheap, fully general, attack on political opponents. I guess it would be pretty nice if people always used citations when talking about things, but that seems like an unrealistic ideal.
I like this post. A few thoughts:
While doing PauseAI outreach one of the responses I really appreciated from people was “I agree this is an important issue” regardless of whether those people wanted to help build the social movement or try to help regulate AI or were focused on completely other things. Rationally it might be that their agreement implied that as things moved forward we could expect people to agree with and support the cause rather than fight it. It might also be that it is nice to hear agreement. So a good behaviour for Bobs who aren’t adopting X might be to acknowledge that they agree it is important, but at that time it feels too costly to engage with.
The “you’re either with us or against us” mentality is interesting and problematic. I once heard a version “if you don’t support the oppressed you are supporting the oppressors”, or more poignant, Martin Niemöller’s poem ending with “Then they came for me—and there was no one left to speak for me”. I refer to it in 3 quotes to show different facets and how it isn’t easy to simply accept or oppose the concept. I think the concept does cause a lot of coordination difficulty, but there is validity to it. Perhaps factoring the useful and harmful aspects from the idea would be beneficial, like, polarized dehumanizing thinking is bad, but focusing on coordination and solidarity between social groups is good.
One problem with the “with us or against us” mentality is that people only have so much ability to focus on and understand things. It is bad if people are ignoring ALL issues, but it is also bad if people are spreading themselves too thin and becoming ineffective by trying to focus on ALL issues. Most people should focus on ONE or a small cluster of SOME issues. People who focus on systems/networks/communication/logistics kind of things are a special case. They are in some sense trying to focus on ALL issues, but not by becoming familiar with all the details of all issues, but instead by understanding abstracted views of issues and how they interrelate. This is valuable and is not the same thing as trying to simultaneously become an expert in all issues.
I like the idea of ombudspeople. (Is it a contraction of omni-buddy?) I wonder if it would be possible to have something like a guild of ombudspeople for social change somehow.
I like thinking of language and memes as a kind of technology. It would be nice if there was better linguistic technology for dealing with this kind of thing.
One tool that might help is making softer forms of support easier and more visible, like being able to easily sign a sentiment saying “yeah I heard a paragraph about this issue and based on that paragraph I agree some people should think about it some more but I don’t think I should specifically be one of those people”.
More ambitiously, I want better maps of the different worldviews people are living in and better jargon for talking about worldviews abstractly and communicating productively with people who can mutually understand and acknowledge that they are operating from within different worldviews.
I think I should have used the word “polarizing” instead of “politicizing”.
I mean the first two also with the implication that people treat these things as quasi-conflicts between quasi-tribes, and so become less likely to focus on what is correct and beneficial and more likely to focus on signalling tribal membership and allegiance.
I think your third bullet point is related, but not necessarily what I’m talking about. Arguing about how society should respond to and think about school shootings is important. School shootings are bad and should be prevented just like traffic accidents and heart disease are bad and should be prevented. I believe responses like gun control are politicized in that people are likely to pattern match “gun control” into a quasi-tribe conflict and then respond accordingly, instead of actually thinking about it, or as should often be done, ignoring if they are not well versed on the relevant issues. But just talking about issues and which parties plan what responses to those issues isn’t necessarily a problem, except if it causes people to start contextualizing the issue as a quasi-tribal conflict.
Maybe instead of “politicized” or “polarized” a term like “quasi-tribalized” or “in-group-out-group-conflictized”, or something similar but less rhetorically unwieldily.
I agree with everything you’ve said in this post, especially thinking about political coalition with U.S. Child safety advocates and others who care about Cognitive Security issues.
In my view, it is not the case that humans in general have control over their beliefs and actions. A more accurate view is that people have influence over their beliefs and actions, with other sources of influence being mentors, peers, and existing propaganda, advertising, and other memes found in the environment.
That people should have influence over themselves does seem desirable, and they should be able to manage their exposure to things that would alter their beliefs and actions. I don’t think people should become completely shut off from having their beliefs influenced. Indeed, it is very pro-social and beneficial for everyone when humans challenge the beliefs of one another. But individual humans challenging each other is very different from large organizations or AI systems challenging individual humans. So I think people should be aware and consenting if they are exposed to content designed to alter their beliefs.
AI represents a potentially extreme form of this, and is new, so we don’t know how harmful it is, and people have not had time to develop defences to it. And so AI merits more focus, but I think the general argument also applies to advertising and other propaganda.