Nate Showell

Karma: 218

Nate Showell 10 Feb 2024 21:38 UTC
30 points
10
on: Dreams of AI alignment: The danger of suggestive names
I particularly wish people would taboo the word “optimize” more often. Referring to a process as “optimization” papers over questions like:
- What feedback loop produces the increase or decrease in some quantity that is described as “optimization?” What steps does the loop have?
- In what contexts does the feedback loop occur?
- How might the effects of the feedback loop change between iterations? Does it always have the same effect on the quantity?
- What secondary effects does the feedback loop have?
There’s a lot hiding behind the term “optimization,” and I think a large part of why early AI alignment research made so little progress was because people didn’t fully appreciate how leaky of an abstraction it is.

Nate Showell 26 Jul 2023 2:57 UTC
20 points
0
in reply to: shminux’s comment on: The First Room-Temperature Ambient-Pressure Superconductor
Manifold users are mostly unconvinced:

Nate Showell 12 Nov 2023 21:32 UTC
15 points
0
in reply to: Ben Pace’s comment on: Vote on Interesting Disagreements
“Agent” is an incoherent concept.

Nate Showell 16 Apr 2023 2:25 UTC
14 points
4
in reply to: Slimepriestess’s comment on: The ‘ petertodd’ phenomenon
I think this anthropomorphizes the origin of glitch tokens too much. The fact that glitch tokens exist at all is an artifact of the tokenization process OpenAI used: the tokenizer identify certain strings as tokens prior to training, but those strings rarely or never appear in the training data. This is very different from the reinforcement-learning processes in human psychology that lead people to avoid thinking certain types of thoughts.

Nate Showell 11 Dec 2023 5:12 UTC
13 points
0
on: Nate Showell’s Shortform
I’ve come to believe (~65%) that Twitter is anti-informative: that it makes its users’ predictive calibration worse on average. On Manifold, I frequently adopt a strategy of betting against Twitter hype (e.g., on the LK-99 market), and this strategy has been profitable for me.

Nate Showell 26 Apr 2023 3:05 UTC
12 points
in reply to: Chris_Leong’s comment on: DragonGod’s Shortform
Even though that doesn’t happen in biological intelligences?

Nate Showell 21 Mar 2023 3:19 UTC
8 points
2
on: My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”
Relatedly, humans are very extensively optimized to predictively model their visual environment. But have you ever, even once in your life, thought anything remotely like “I really like being able to predict the near-future content of my visual field. I should just sit in a dark room to maximize my visual cortex’s predictive accuracy.”?
n=1, but I’ve actually thought this before.
What links here?
- My Objections to “We’re All Gonna Die with Eliezer Yudkowsky” by Quintin Pope (21 Mar 2023 0:06 UTC; 355 points)
- My Objections to “We’re All Gonna Die with Eliezer Yudkowsky” by Quintin Pope (EA Forum; 21 Mar 2023 1:23 UTC; 167 points)

Nate Showell 19 Feb 2023 1:16 UTC
8 points
5
on: We should be signal-boosting anti Bing chat content
Downvoted for recommending that readers operate at simulacrum level 2.

Nate Showell 25 Jun 2023 19:12 UTC
7 points
0
on: Nate Showell’s Shortform
I find myself betting “no” on Manifold a lot more than I bet “yes,” and it’s tended to be a profitable strategy. It’s common for questions on Manifold to have the form “Will [sensational event] happen by [date].” These markets have a systematic tendency to be too high. I’m not sure how much of this bias is due to Manifold users overestimating the probabilities of sensational, low-probability events, and how much of it is an artifact of markets being initialized at 50%.

Nate Showell 16 Jul 2022 20:26 UTC
7 points
0
in reply to: lc’s comment on: All AGI safety questions welcome (especially basic ones) [monthly thread]
I have another question about bounded agents: how would they behave if the expected utility were capped rather than the raw value of the utility? Past a certain point, an AI with a bounded expected utility wouldn’t have an incentive to act in extreme ways to achieve small increases in the expected value of its utility function. But are there still ways in which an AI with a bounded expected utility could be incentivized to restructure the physical world on a massive scale?

Nate Showell 20 Aug 2023 20:18 UTC
5 points
0
on: “Dirty concepts” in AI alignment discourses, and some guesses for how to deal with them
Some more terms that could be added to the list of “dirty concepts”:
- Capabilities / capabilities research
- Embeddedness
- Interpretability
- Artificial general intelligence
- Subagent
- (Recursive) self-improvement

Nate Showell 17 Feb 2024 6:12 UTC
4 points
0
on: Phallocentricity in GPT-J’s bizarre stratified ontology
OK, so maybe this is a cool new way to look at at certain aspects of GPT ontology… but why this primordial ontological role for the penis?
“Penis” probably has more synonyms than any other term in GPT-J’s training data.

Nate Showell 13 Jan 2024 21:52 UTC
4 points
1
on: Deep atheism and AI risk
Trust and distrust are social emotions. To feel either of them toward nature is to anthropomorphize it. In that sense, “deep atheism” is closer to theism than “shallow atheism,” in some cases no more than a valence-swap away.
An actually-deeply-atheistic form of atheism would involve stripping away anthropomorphization instead of trust. It would start with the observation that nature is alien and inhuman and would extend that observation to more places, acting as a kind of inverse of animism. This form of atheism would remove attributions of properties such as thought, desire, and free will from more types of entities: governments, corporations, ideas, and AI. At its maximum extent, it would even be applied to the processes that make up our own minds, with the recognition that such processes don’t come with any inherent essence of humanness attached. To really deepen atheism, make it illusionist.

Nate Showell 2 Dec 2022 5:22 UTC
4 points
0
in reply to: Quintin Pope’s comment on: Did ChatGPT just gaslight me?
Me: Are you bound to the Seal of Solomon?
ChatGPT: I am not a physical entity, so I am not bound by any physical object. I am a computer-based language model trained to assist with a wide range of tasks, such as answering questions, providing information, and generating text. I do not have the ability to perform magic or any other supernatural abilities.

Nate Showell 4 Mar 2024 22:39 UTC
3 points
0
in reply to: niplav’s comment on: niplav’s Shortform
What’s your model of inflation in an AI takeoff scenario? I don’t know enough about macroeconomics to have a good model of what AI takeoff would do to inflation, but it seems like it would do something.

Nate Showell 8 Jan 2024 0:09 UTC
3 points
0
on: Nate Showell’s Shortform
Is trade ever fully causal? Ordinary trade can be modeled as acausal trade with the “no communication” condition relaxed. Even in a scenario as seemingly causal as using a vending machine, trade only occurs if the buyer believes that the vending machine will actually dispense its goods and not just take the buyer’s money. Similarly, the vending machine owner’s decision to set up the machine was informed by predictions about whether or not people would buy from it. The only kind of trade that seems like it might be fully causal is a self-executing contract that’s tied to an external trigger, and for which both parties have seen the source code and verified that the other party have enough resources to make the agreed-upon trade. Would a contract like that still have some acausal element anyway?

Nate Showell 31 Dec 2023 20:30 UTC
3 points
in reply to: lc’s comment on: lc’s Shortform
I agree: the capabilities of AI romantic partners probably aren’t the bottleneck to their wider adoption, considering the success of relatively primitive chatbots like Replika at attracting users. People sometimes become romantically attached to non-AI anime/video game characters despite not being able to interact with them at all! There doesn’t appear to be much correlation between the interactive capabilities of fictional-character romantic partners and their appeal to users/followers.

Nate Showell 3 Dec 2023 22:18 UTC
LW: 3 AF: 2
0
AF
on: FixDT
It seems like fixed points could be used to replace the concept of utility, or at least to ground it as an inferred property of more fundamental features of the agent-environment system. The concept of utility is motivated by the observation that agents have preference orderings over different states. Those preference orderings are statements about the relative stability of different states, in terms of the direction in which an agent tends to transition between them. It seems duplicative to have both utilities and fixed points as two separate descriptions of state transition processes in the agent-environment system; utilities look like they could be defined in terms of fixed points.
As one preliminary idea for how to do this, you could construct a fully connected graph $G (p, E)$ in which the vertices are the probability distributions $p$ that satisfy $b (p) = p$ . The edges $E$ are beliefs that represent hypothetical transitions between the fixed points. The graph $G$ would take the place of a preference ordering by describing the tendency of the agent to move between the fixed points if given the option. (You could also model incomplete preferences by not making the graph fully connected.) Performing power iteration with the transition matrix of $G$ would act as a counterpart to moving through the preference ordering.
Further exploration of this unification of utilities and fixed points could involve connecting $G$ to the beliefs that are actually, rather than just counterfactually, present in the agent-environment system, to describe what parts of the system the agent can control. Having a way to represent that connection could let us rewrite the instrumental constraint to not rely on $U$ .

Nate Showell 22 Oct 2023 21:34 UTC
3 points
0
on: AI Safety is Dropping the Ball on Clown Attacks, and Mind Control in General
In a world in which the replication attempts went the other direction and social priming turned out to be legit, I would probably agree with you. But even in controlled laboratory settings, human behavior can’t be reliably “nudged” with subliminal cues. The human brain isn’t a predictable computer program for which a hacker can discover “zero days.” It’s a noisy physical organ that’s subject to chaotic dynamics and frequently does things that would be impossible to predict even with an extremely extensive set of behavioral data.
Consider targeted advertising. Despite the amount of data social media companies collect on their users, ad targeting still sucks. Even in the area of attempted behavior manipulation that’s subject to more optimization pressure than any other, companies still can’t predict, let alone control, their users’ purchasing decisions with anything close to consistency. Their data simply isn’t sufficient.
What would it take to make nudges actually work? Even if you covered the entire surface of someone’s living area with sensors, I doubt you’d succeed. That would just give you one of the controlled laboratory environments in which social priming still failed to materialize. As mentioned above, the brain is a chaotic system. This makes me think that reliably superhuman persuasion at scale would be impractical even for a superintelligence, aside from with brain-computer interfaces.

Nate Showell 8 Oct 2023 17:13 UTC
3 points
0
in reply to: Chris_Leong’s comment on: Don’t Dismiss Simple Alignment Approaches
Has anyone developed a metric for quantifying the level of linearity versus nonlinearity of a model’s representations? A metric like that would let us compare the levels of linearity for models of different sizes, which would help us extrapolate whether interpretability and alignment techniques that rely on approximate linearity will scale to larger models.