testingthewaters

Karma: 380

testingthewaters Mar 24, 2025, 4:57 AM
2 points
0
in reply to: Raemon’s comment on: METR: Measuring AI Ability to Complete Long Tasks
That surprisingly straight line reminds me of what happens when you use noise to regularise an otherwise decidedly non linear function: https://www.imaginary.org/snapshot/randomness-is-natural-an-introduction-to-regularisation-by-noise

testingthewaters Mar 21, 2025, 5:22 PM
2 points
0
on: Towards a scale-free theory of intelligent agency
I think this is a really cool research agenda. I can also try to give my “skydiver’s perspective from 3000 miles in the air” overview of what I think expected free energy minimisation means, though I am by no means an expert. Epistemic status: this is a broad extrapolation of some intuitions I gained from reading a lot of papers, it may be very wrong.

In general, I think of free energy minimisation as a class of solutions for the problem of predicting complex systems behaviour, in line with other variational principles in physics. Thus, it is an attempt to use simple physical rules like “the ball rolls down the slope” to explain very complicated outcomes like “I decide to build a theme park with roller coasters in it”. In this case, the rule is “free energy is minimised”, but unlike a simple physical system whose dimensionality is very literally visible, VFE is minimised in high dimensional probability spaces.

Consider the concrete case below: there are five restaurants in a row and you have to pick one to go to. The intuitive physical interpretation is that you can be represented by a point particle moving to one of five coordinates, all relatively close by in the three dimensional XYZ coordinate space. However, if we assume that this is just some standard physical process you’ll end up with highly unintuitive behaviour (why does the particle keep drifting right and left in the middle of these coordinates, and then eventually go somewhere that isn’t the middle?). Instead we might say that in an RL sense there is a 5 dimensional action space and you must pick a dimension to maximise expected reward. Free energy minimisation is a rule that says that your action is the one that minimises variation between the predicted outcome your brain produces and the final outcome that your brain observes—which can happen either if your brain is very good at predicting the future or if you act to make your prediction come true. A preference in this case is a bias in the prediction (you can see yourself going to McDonald’s more, in some sense, and you feel some psychological aversion/repulsive force moving you away from Burger King) that is then satisfied by you going to the restaurant you are most attracted to. Of course this is just a single agent interpretation and with multiple subagents you can imagine valleys and peaks in the high dimensional probability space, which is resolved when you reach some minima that can be satisfied by action.

testingthewaters Mar 19, 2025, 3:37 PM
2 points
0
in reply to: Lee.aao’s comment on: The Takeoff Speeds Model Predicts We May Be Entering Crunch Time
It’s hard to empathise with dry numbers, whereas a lively scenario creates an emotional response so more people engage. But I agree that this seems to be very well done statistical work.

testingthewaters Mar 19, 2025, 12:31 PM
2 points
0
in reply to: Richard_Ngo’s comment on: Elite Coordination via the Consensus of Power
Hey, thank you for taking the time to reply honestly and in detail as well. With regards to what you want, I think that this is in many senses also what I am looking for, especially the last item about tying in collective behaviour to reasoning about intelligence. I think one of the frames you might find the most useful is one you’ve already covered—power as a coordination game. As you alluded to in your original post, people aren’t in a massive hive mind/conspiracy—they mostly want to do what other successful people seem to be doing, which translates well to a coordination game and also explains the rapid “board flips” once a critical mass of support/rejection against some proposition is reached. For example, witness the rapid switch to majority support of gay marriage in the 2010s amongst the population in general.

Would also love to discuss this with you in more detail (I trained as an English student and also studied Digital Humanities). I will leave off with a few book suggestions that, while maybe not directly answering your needs, you might find interesting.
- Capitalist Realism by Mark Fisher (as close to a self-portrait by the modern humanities as it gets)
- Hyperobjects by Timothy Morton (high level perspective on how cultural, material, and social currents impact our views on reality)
- How minds change by David McRaney (not humanities, but pop sci about the science of belief and persuasion)
P.S. Re: the point about Yarvin being right, betting on the dominant group in society embracing a dangerous delusion is a remarkably safe bet. (E.g. McCarthyism, the aforementioned Bavarian Witch Hunts, fascism, lysenkoism etc.)

testingthewaters Mar 19, 2025, 7:53 AM
45 points
5
on: Elite Coordination via the Consensus of Power
Hey, really enjoyed your triple review on power lies trembling, but imo this topic has been… done to death in the humanities, and reinventing terminology ad hoc is somewhat missing the point. The idea that the dominant class in a society comes from a set of social institutions that share core ideas and modus operandi (in other words “behaving as a single organisation”) is not a shocking new phenomenon of twentieth century mass culture, and is certainly not a “mystery”. This is basically how every country has developed a ruling class/ideology since the term started to have a meaning, through academic institutions that produce similar people. Yale and Harvard are as Oxford and Cambridge, or Peking University and Renmin University. (European universities, in particular, started out as literal divinity schools, and hence are outgrowths of the literal Catholic church, receiving literal Papal bulls to establish themselves as one of the studia generalia.) [Retracted, while the point about teaching religious law and receiving literal papal bulls is true the origins of the universities are much more diverse. But my point about the history of cultural hegemony in such institutions still stands.]

What Yarvin seems to be annoyed by is that the “Cathedral consensus” featured ideas that he dislikes, instead of the quasi-feudal ideology of might makes right that he finds more appealing. That is also not surprising. People largely don’t notice when they are part of a dominant class and their ideas are treated as default: that’s just them being normal, not weird. However, when they find themselves at the edge of the overton window, suddenly what was right and normal becomes crushing and oppressive. The natural dominance of sensible ideas and sensible people becomes a twisted hegemony of obvious lies propped up by delusional power-brokers. This perspective shift is also extremely well documented in human culture and literature.

In general, the concept that a homogenous ruling class culture can then be pushed into delusional consensuses which ultimately harms everyone is an idea as old as the Trojan War. The tension between maintaining a grip on power and maintaining a grip on reality is well explored in Yuval Noah Harari’s book Nexus (which also has an imo pretty decent second half on AI). In particular I direct you to his account of the Bavarian witch hunts. Indeed, the unprecedented feature of modern society is the rapid divergence in ideas that is possible thanks to information technology and the cultivation of local echo chambers. Unfortuantely, I have few simple answers to offer to this age old question, but I hope that recognising the lineage of the question helps with disambiguation somewhat. I look forward to your ideas about new liberalisms.

testingthewaters Mar 16, 2025, 2:43 PM
1 point
0
in reply to: the gears to ascension’s comment on: The Fork in the Road
Yeah, I’m not gonna do anything silly (I’m not in a position to do anything silly with regards to the multitrillion param frontier models anyways). Just sort of “laying the groundwork” for when AIs will cross that line, which I don’t think is too far off now. The movie “Her” is giving a good vibe-alignment for when the line will be crossed.

testingthewaters Mar 16, 2025, 2:42 PM
3 points
0
in reply to: Daniel Kokotajlo’s comment on: The Fork in the Road
Ahh, I was slightly confused why you called it a proposal. TBH I’m not sure why only 0.1% instead of any arbitrary percentage between (0, 100]. Otherwise it makes good logical sense.

testingthewaters Mar 15, 2025, 11:18 PM
3 points
0
in reply to: Daniel Kokotajlo’s comment on: The Fork in the Road
Hey, the proposal makes sense from an argument standpoint. I would refine slightly and phrase as “the set of cognitive computations that generate role emulating behaviour in a given context also generate qualia associated with that role” (sociopathy is the obvious counterargument here, and I’m really not sure what I think about the proposal of AIs as sociopathic by default). Thus, actors getting into character feel as if they are somehow sharing that character’s emotions.

I take the two problems a bit further, and would suggest that being humane to AIs may necessarily involve abandoning the idea of control in the strict sense of the word, so yes treating them as peers or children we are raising as a society. It may also be that the paradigm of control necessarily means that we would as a species become more powerful (with the assistance of the AIs) but not more wise (since we are ultimately “helming the ship”), which would be in my opinion quite bad.

And as for the distinction between today and future AI systems, I think the line is blurring fast. Will check out Eleos!

testingthewaters Mar 15, 2025, 7:14 PM
3 points
0
in reply to: Daniel Kokotajlo’s comment on: The Fork in the Road
Hey Daniel, thank you for the thoughtful comment. I always appreciate comments that make me engage further with my thinking because one of the things I do is that I get impatient with whatever post I’m writing and “rush it out of the door”, so to speak, so this gives me another chance to reflect on my thoughts.

I think that there are approximately ~3 defensible positions with regards to AI sentience, especially now that AIs seem to be demonstrating pretty advanced reasoning and human-like behaviour. One is the semi mystical argument that humans/brains/embodied entities have some “special sauce” that AIs will simply never have, and therefore that no matter how advanced AI gets it will never be “truly sentient”. The other is that AI is orthogonal to humans, and as such behaviours that in a human would indicate thought, emotion, calculation etc. are in fact the products of completely alien processes, so “it’s okay”. In other words, they might not even “mind” getting forked and living for only a few objective minutes/hours. The third, which I now subscribe to after reading quite a lot about the free energy principle, predictive processing, and related root-of-intelligence literature, is that intelligent behaviour is the emergent product of computation (which is itself a special class of physical phenomena in higher dimensions), and since NNs seem to demonstrate both human like computations (cf. neural net activations explaining human brain activations and NNs being good generative models of human brains) and human like behaviour, they should have (after extensive engineering and under specific conditions we seem to be racing towards) roughly matching qualia to humans. From this perspective I draw the inferences about factory farms and suffering.

To be clear, this is not an argument that AI systems as they are now constitute “thinking feeling beings” we would call moral patients. However, I am saying that thinking about the problem in the old fashioned AI-as-software way seems to me to undersell the problem of AI safety as merely “keeping the machines in check”. It also seems to lead down a road of dominance/oppositional approaches to AI safety that cast AIs as foreign enemies and alien entities to be subjugated to the human will. This in turn raises both the risks of moral harms to AIs and failing the alignment problem by acting in a way that counts as a self fulfilling prophecy. If we bring entities not so different from us into the world and treat them terribly, we should not be surprised when they rise up against us.

The Fork in the Road

testingthewatersMar 15, 2025, 5:36 PM

14 points

12 comments2 min readLW link

testingthewaters Mar 2, 2025, 10:01 AM
5 points
1
on: testingthewaters’s Shortform
This seems like an interesting paper: https://arxiv.org/pdf/2502.19798

Essentially: use developmental psychology techniques to cause LLMs to develop a more well rounded human friendly persona that involves reflecting on their actions, while gradually escalating the moral difficulty of the dilemmas presented as a kind of phased training. I see it as a sort of cross between RLHF, CoT, and the recent work on low example count fine tuning but for moral instead of mathematical intuitions.

testingthewaters Feb 23, 2025, 12:53 PM
3 points
2
on: Make Superintelligence Loving
Yeah, that’s basically the conclusion I came to awhile ago. Either it loves us or we’re toast. I call it universal love or pathos.

testingthewaters Feb 12, 2025, 3:19 PM
2 points
0
on: If Neuroscientists Succeed
This seems like very important and neglected work, I hope you get the funds to continue.

testingthewaters Feb 10, 2025, 3:19 PM
1 point
0
in reply to: CapResearcher’s comment on: testingthewaters’s Shortform
Yeah, definitely. My main gripe where I see people disregarding unknown unknowns is a similar one to yours- people who present definite worked out pictures of the future.

testingthewaters’s Shortform

testingthewatersFeb 10, 2025, 2:06 AM

3 points

12 comments LW link

testingthewaters Feb 10, 2025, 2:06 AM
12 points
6
on: testingthewaters’s Shortform
Note to self: If you think you know where your unknown unknowns sit in your ontology, you don’t. That’s what makes them unknown unknowns.

If you think that you have a complete picture of some system, you can still find yourself surprised by unknown unknowns. That’s what makes them unknown unknowns.

If your internal logic has almost complete predictive power, plus or minus a tiny bit of error, your logical system (but mostly not your observations) can still be completely overthrown by unknown unknowns. That’s what makes them unknown unknowns.

You can respect unknown unknowns, but you can’t plan around them. That’s… You get it by now.

Therefore I respectfully submit that anyone who presents me with a foolproof and worked-out plan of the next ten/hundred/thousand/million years has failed to take into account some unknown unknowns.

testingthewaters Feb 8, 2025, 12:54 AM
1 point
−2
in reply to: Cleo Nardo’s comment on: strawberry calm’s Shortform
The problem here is that you are dealing with survival necessities rather than trade goods. The outcome of this trade, if both sides honour the agreement, is that the scope insensitive humans die and their society is extinguished. The analogous situation here is that you know there will be a drought in say 10 years. The people of the nearby village are “scope insensitive”, they don’t know the drought is coming. Clearly the moral thing to do if you place any value on their lives is to talk to them, clear the information gap, and share access to resources. Failing that, you can prepare for the eventuality that they do realise the drought is happening and intervene to help them at that point.

Instead you propose exploiting their ignorance to buy up access to the local rivers and reservoirs. The implication here is that you are leaving them to die, or at least putting them at your mercy, by exploiting their lack of information. What’s more, the process by which you do this turns a common good (the stars, the water) into a private good, such that when they realise the trouble they have no way out. If your plan succeeds, when their stars run out they will curse you and die in the dark. It is a very slow but calculated form of murder.

By the way, the easy resolution is to not buy up all the stars. If they’re truly scope insensitive they won’t be competing until after the singularity/uplift anyways, and then you can equitably distribute the damn resources.

As a side note: I think I fell for rage bait. This feels calculated to make me angry, and I don’t like it.

testingthewaters Feb 7, 2025, 6:26 PM
3 points
0
in reply to: Cleo Nardo’s comment on: strawberry calm’s Shortform
Except that’s a false dichotomy (between spending energy to “uplift” them or dealing treacherously with them). All it takes to not be a monster who obtains a stranglehold over all the watering holes in the desert is a sense of ethics that holds you to the somewhat reasonably low bar of “don’t be a monster”. The scope sensitivity or lack thereof of the other party is in some sense irrelevant.

testingthewaters Feb 7, 2025, 3:12 PM
8 points
4
in reply to: Cleo Nardo’s comment on: strawberry calm’s Shortform
The question as stated can be rephrased as “Should EAs establish a strategic stranglehold over all future resources necessary to sustain life using a series of unequal treaties, since other humans will be too short sighted/insensitive to scope/ignorant to realise the importance of these resources in the present day?”

And people here wonder why these other humans see EAs as power hungry.

testingthewaters Jan 27, 2025, 12:15 AM
7 points
4
in reply to: Viliam’s comment on: The Monster in Our Heads
Hey, thanks for the reply. I think this is a very valuable response because there are certain things I would want to point out that I can now elucidate more clearly thanks to your push back.

First, I don’t suggest that if we all just laughed and went about our lives everything would be okay. Indeed, if I thought that our actions were counterproductive at best, I’d advocate for something more akin to “walking away” as in Valentine’s exit. There is a lot of work to be done and (yes) very little time to do it.

Second, the pattern I am noticing is something more akin to Rhys Ward’s point about AI personhood. AI is not some neutral fact of our future that will be born “as is” no matter how hard we try one way or another. In our search for control and mastery over AI, we risk creating the things we fear the most. We fear AIs that are autonomous, ruthless, and myopic, but in trying to make controlled systems that pursue goals reliably without developing ideas of their own we end up creating autonomous, ruthless, and myopic systems. It’s somewhat telling, for example, that AI safety really started to heat up when RL became a mainstream technique (raising fears about paperclip optimisers etc.), and yet the first alignment efforts for LLMs (which were manifestly not goal seeking or myopic) was to… add RL back to them, in the form of a value-agnostic technique (PPO/RLHF) that can be used to create anti aligned agents just as easily as it can be used to create aligned agents. Rhys Ward similarly talks about how personhood may be less risky from an x-risk perspective but also makes alignment more ethically questionable. The “good” and the “bad” visions for AI in this community are entwined.

As a smaller point, OpenAI definitely started as a “build the good AI” startup when Deepmind started taking off. Deepmind also started as a startup and Demis is very connected to the AI safety memeplex.

Finally, love as humans execute it is (in my mind) an imperfect instantation of a higher idea. It is true, we don’t practice true omnibenevolence or universal love, or even love ourselves in a meaningful way a lot of the time, but I treat it as a direction to aim for, one that inspires us to do what we find most beautiful and meaningful rather than do what is most hateful and ugly.

P.S. sorry for not replying to all the other valuable comments in this section, I’ve been rather busy as of late, trying to do the things I preach etc.

testingthewaters

The Fork in the Road

test­ingth­e­wa­ters’s Shortform

testingthewaters’s Shortform