Manifold users are mostly unconvinced:
Nate Showell
“Agent” is an incoherent concept.
I think this anthropomorphizes the origin of glitch tokens too much. The fact that glitch tokens exist at all is an artifact of the tokenization process OpenAI used: the tokenizer identify certain strings as tokens prior to training, but those strings rarely or never appear in the training data. This is very different from the reinforcement-learning processes in human psychology that lead people to avoid thinking certain types of thoughts.
I’ve come to believe (~65%) that Twitter is anti-informative: that it makes its users’ predictive calibration worse on average. On Manifold, I frequently adopt a strategy of betting against Twitter hype (e.g., on the LK-99 market), and this strategy has been profitable for me.
Even though that doesn’t happen in biological intelligences?
Relatedly, humans are very extensively optimized to predictively model their visual environment. But have you ever, even once in your life, thought anything remotely like “I really like being able to predict the near-future content of my visual field. I should just sit in a dark room to maximize my visual cortex’s predictive accuracy.”?
n=1, but I’ve actually thought this before.
Downvoted for recommending that readers operate at simulacrum level 2.
I find myself betting “no” on Manifold a lot more than I bet “yes,” and it’s tended to be a profitable strategy. It’s common for questions on Manifold to have the form “Will [sensational event] happen by [date].” These markets have a systematic tendency to be too high. I’m not sure how much of this bias is due to Manifold users overestimating the probabilities of sensational, low-probability events, and how much of it is an artifact of markets being initialized at 50%.
I have another question about bounded agents: how would they behave if the expected utility were capped rather than the raw value of the utility? Past a certain point, an AI with a bounded expected utility wouldn’t have an incentive to act in extreme ways to achieve small increases in the expected value of its utility function. But are there still ways in which an AI with a bounded expected utility could be incentivized to restructure the physical world on a massive scale?
Some more terms that could be added to the list of “dirty concepts”:
Capabilities / capabilities research
Embeddedness
Interpretability
Artificial general intelligence
Subagent
(Recursive) self-improvement
OK, so maybe this is a cool new way to look at at certain aspects of GPT ontology… but why this primordial ontological role for the penis?
“Penis” probably has more synonyms than any other term in GPT-J’s training data.
Trust and distrust are social emotions. To feel either of them toward nature is to anthropomorphize it. In that sense, “deep atheism” is closer to theism than “shallow atheism,” in some cases no more than a valence-swap away.
An actually-deeply-atheistic form of atheism would involve stripping away anthropomorphization instead of trust. It would start with the observation that nature is alien and inhuman and would extend that observation to more places, acting as a kind of inverse of animism. This form of atheism would remove attributions of properties such as thought, desire, and free will from more types of entities: governments, corporations, ideas, and AI. At its maximum extent, it would even be applied to the processes that make up our own minds, with the recognition that such processes don’t come with any inherent essence of humanness attached. To really deepen atheism, make it illusionist.
Me: Are you bound to the Seal of Solomon?
ChatGPT: I am not a physical entity, so I am not bound by any physical object. I am a computer-based language model trained to assist with a wide range of tasks, such as answering questions, providing information, and generating text. I do not have the ability to perform magic or any other supernatural abilities.
What’s your model of inflation in an AI takeoff scenario? I don’t know enough about macroeconomics to have a good model of what AI takeoff would do to inflation, but it seems like it would do something.
Is trade ever fully causal? Ordinary trade can be modeled as acausal trade with the “no communication” condition relaxed. Even in a scenario as seemingly causal as using a vending machine, trade only occurs if the buyer believes that the vending machine will actually dispense its goods and not just take the buyer’s money. Similarly, the vending machine owner’s decision to set up the machine was informed by predictions about whether or not people would buy from it. The only kind of trade that seems like it might be fully causal is a self-executing contract that’s tied to an external trigger, and for which both parties have seen the source code and verified that the other party have enough resources to make the agreed-upon trade. Would a contract like that still have some acausal element anyway?
I agree: the capabilities of AI romantic partners probably aren’t the bottleneck to their wider adoption, considering the success of relatively primitive chatbots like Replika at attracting users. People sometimes become romantically attached to non-AI anime/video game characters despite not being able to interact with them at all! There doesn’t appear to be much correlation between the interactive capabilities of fictional-character romantic partners and their appeal to users/followers.
It seems like fixed points could be used to replace the concept of utility, or at least to ground it as an inferred property of more fundamental features of the agent-environment system. The concept of utility is motivated by the observation that agents have preference orderings over different states. Those preference orderings are statements about the relative stability of different states, in terms of the direction in which an agent tends to transition between them. It seems duplicative to have both utilities and fixed points as two separate descriptions of state transition processes in the agent-environment system; utilities look like they could be defined in terms of fixed points.
As one preliminary idea for how to do this, you could construct a fully connected graph in which the vertices are the probability distributions that satisfy . The edges are beliefs that represent hypothetical transitions between the fixed points. The graph would take the place of a preference ordering by describing the tendency of the agent to move between the fixed points if given the option. (You could also model incomplete preferences by not making the graph fully connected.) Performing power iteration with the transition matrix of would act as a counterpart to moving through the preference ordering.
Further exploration of this unification of utilities and fixed points could involve connecting to the beliefs that are actually, rather than just counterfactually, present in the agent-environment system, to describe what parts of the system the agent can control. Having a way to represent that connection could let us rewrite the instrumental constraint to not rely on .
In a world in which the replication attempts went the other direction and social priming turned out to be legit, I would probably agree with you. But even in controlled laboratory settings, human behavior can’t be reliably “nudged” with subliminal cues. The human brain isn’t a predictable computer program for which a hacker can discover “zero days.” It’s a noisy physical organ that’s subject to chaotic dynamics and frequently does things that would be impossible to predict even with an extremely extensive set of behavioral data.
Consider targeted advertising. Despite the amount of data social media companies collect on their users, ad targeting still sucks. Even in the area of attempted behavior manipulation that’s subject to more optimization pressure than any other, companies still can’t predict, let alone control, their users’ purchasing decisions with anything close to consistency. Their data simply isn’t sufficient.
What would it take to make nudges actually work? Even if you covered the entire surface of someone’s living area with sensors, I doubt you’d succeed. That would just give you one of the controlled laboratory environments in which social priming still failed to materialize. As mentioned above, the brain is a chaotic system. This makes me think that reliably superhuman persuasion at scale would be impractical even for a superintelligence, aside from with brain-computer interfaces.
Has anyone developed a metric for quantifying the level of linearity versus nonlinearity of a model’s representations? A metric like that would let us compare the levels of linearity for models of different sizes, which would help us extrapolate whether interpretability and alignment techniques that rely on approximate linearity will scale to larger models.
I particularly wish people would taboo the word “optimize” more often. Referring to a process as “optimization” papers over questions like:
What feedback loop produces the increase or decrease in some quantity that is described as “optimization?” What steps does the loop have?
In what contexts does the feedback loop occur?
How might the effects of the feedback loop change between iterations? Does it always have the same effect on the quantity?
What secondary effects does the feedback loop have?
There’s a lot hiding behind the term “optimization,” and I think a large part of why early AI alignment research made so little progress was because people didn’t fully appreciate how leaky of an abstraction it is.