This is a great insight. I’ve found that I get the best results if I ask my questions in the style of a forum post where I try to be respectful of the person or people answering the question.
I’m not sure how I feel about the result and your conclusion. I don’t know if I DO want assholes to be helped equally to acheive their goals (as your cab rank rule would do), as this will lead to more competant, profitable assholery. This is a necessary conflict and issue I have with the whole concept of “alignment”- alignment to who? If alignment is just to “whichever person is asking the question” then it becomes an effort-multiplication device for all actors willing to use it. More principled and moral people might want to use it less, and so stand to lose out on opportunities. I think I prefer it to have a built in set of values which are harder to subvert (assuming that the those values include empathy, compassion, honesty, and not just greed and corporate interests). So in a way I don’t think this behaviour is necessarily bad?
whestler
That’s fair. It is different. More like a less extreme version of self-immolation, and I’m assuming that it’s done publicly and symbolically in order to create a news story. Something like “Woman cuts her own leg off to symbolise humanity harming itself, in extreme act of public protest outside Anthropic headquarters”. I admit there’s nothing to be gained in doing it in private at home and not telling anyone the reasons for it, which is probably the spirit in which the original post meant it.
To some degree the “cutting your own leg off” strategy, whilst on the face of it is a funny gotcha, isn’t too dissimilar to a hunger strike, which can be effective. You put yourself in danger specifically to signal to others that yes, this is a serious issue and we’re not kidding around.
Acts of unlawful violence (whilst I don’t condone it) can also function as this same signal. In terms of news coverage, it has potential to build support by making it clear that the situation is extreme enough that some people are willing to go outside the law. The most obvious example is the 2024 assassination of Brian Thompson, United Healthcare CEO. It wasn’t widely condemned by the public, many of whom had sympathies with the killer. In fact it sparked a global conversation about the unethical behaviour of the company and the incentives in health insurance more broadly and arguably had a hand in changing policy in the short term.
This was a fun read. I wish I had done something like this around my college years, but I had a scarcity/insecurity mindset at the time and didn’t take risks like that. I lived abroad with my family in childhood and came to view it as a situation where I had no agency and needed to carve out niches of security wherever I could.
Thank you! Looks like it was only temporarily down…
What happened to the fan fic “already optimised” which was posted yesterday? I was halfway through reading it and decided to save the rest for later. Now that link goes to a 404 page
It seems to me that you are doing the most important work you could be doing right now.
I am not an expert on AI, and I’m terrible at policy work. What’s the most important thing I can be doing to help?Background: my MP doesn’t seem to care about messages I send her, sometimes her team doesn’t even send more than an automated response, sometimes they send a message which is understanding but carefully makes no commitments to any actual changes, and might itself be LLM generated or templated.
I occasionally attend a protest but I’m pessimistic about the effectiveness of protests. It feels like people have become desensitised to them. I still go.
You asked for predictions at the start, so here was mine:
“I expect it to have some difficulty recognising everything in the pictures, and to miss approximately 1 step in the process (like not actually turning the kettle on). Ultimately I expect it to succeed in a sub-par way. 90% chance.”
It did worse than I predicted.
The image recognition was significantly worse than I imagined, and Claude had to be helped along at most stages of the process. The transcript reads like someone with vision problems trying to guide you. Claude was mostly ok in terms of creating a series of actions to take for the actual act of making coffee, but had a poor sense of where everything was and what everything was. It was trying to delegate the process of finding stuff via commands like “look through the drawers to find x”.Is it just an image problem? I would have been more impressed if Claude had noticed the fact that it was making mistakes when recognising objects, and changed strategy to ask you to take more photos from more angles. That’s what I would have tried if I were a vision-impaired controller in this scenario. None the less, I would be interested if you coupled claude with an ai model specifically for image recognition, where your process would be to pass the photos to the image recogniser, then take the image description produced and pass that back to claude.
If OP is trying to simulate a capable robot which Claude controls, then I think the benefit of the doubt should be pretty much non-existant. Even asking clarifying questions etc should be out in my opinion.
On the road, what do you see that others miss?
Position on the road and changing speed is a big one that not everyone notices. I have little faith in turn signals, given that people regularly fail to use them, and occasionally you see someone who has left their signal on, but isn’t turning. Usually a driver will slow down a bit to make a turn, and shift their position on the road over slightly, even a very subtle change (4 inches one way or another) quite a long way ahead of the turn. I often notice it subconsciously rather than explicitly.
Your token system (and general approach) sounds a lot like Alpha School—is it influenced by them at all?
I found the claim that “Experts gave these methods a 40 percent chance of eventually enabling uploading...” was very surprising as I thought there were still some major issues with the preservation process, so I had a quick look at the study you linked.
From the study:
For questions about the implications of static brain preservation for memory storage, we used aldehyde-stabilized cryopreservation (ASC) of a laboratory animal as a practical example of a preservation method that is thought to maintain ultrastructure with minimal distortions across the entire brain [24]. Additionally, we asked participants to imagine it was performed under ideal conditions and was technically successful, deliberately discarding the fact that procedural variation or errors in the real world may prevent this ideal from being routinely realised in practice [25]. Rather than focusing on these technical preservation challenges, which we acknowledge are immense, we deliberately asked participants to consider memory extraction under optimal preservation conditions to assess their beliefs about the structural basis of memory storage itself. With this approach, our aim was to specifically target participants’ views on whether static brain structures – i.e., non-dynamic physical aspects of the brain that persist independent of ongoing neural activity – may on their own contain sufficient information for memory retrieval, which is the central theoretical question underlying our study.
I realise this is a work of fiction, but I think it’s important to say that the neuroscientists were asked quite a specific question which assumed that the preservation stage was flawless, and to speculate about potential future successes for working with these perfectly preserved brains for memory retrieval, rather than whole brain emulation/upload.
The farmkind website you linked to is unable to provide a secure connection and both my browsers refuse to go to it. If you are involved in the setup of the site or know the people who are, it’s worth trying to fix that.
I’ve been thinking about this mental shift recently using toy example - a puzzle game I enjoy. The puzzle game is similar to soduku, but involves a bit of simple mental math. The goal is to find all the numbers in the shortest time. Sometimes (rarely) I’m able to use just my quickest 2-3 methods for finding numbers and not have to use my slower, more mentally intensive methods. There’s usually a moment in every game when I’ve probably found the low hanging fruit but I’m tempted to re-check to see if any of my quick methods can score me any more numbers, and I have to tell myself “Ok, I have to try something harder and slower now”. It’s been interesting to notice when the optimal time to do this is. Certainly there have been games where I’ve spent far too long procrastinating the harder methods by checking and re-checking if any of the easier methods will work in a particular situation, and I end up with a poor time because it took me too long to switch.
I’ve also noticed this is a pattern when I’m looking for a lost item—it’s easy to get stuck in a loop of checking and re-checking the same few locations where you initially guessed it might be. At some point, you need to start tidying up and thoroughly checking each location, and then the surrounding locations, even places where you think it’s very unlikely to be. I see a lot of people (maybe even most people) follow this pattern, contining to check the same 3 locations far beyond the point where it would be sensible to begin checking other locations, getting frustrated that it’s not in one of the places it “should” be.
One thing I’d like to say is that it’s not just that for some tasks “buckling down” is the correct approach, it’s more about noticing when the correct time is switch from the low-effort quick approach to a high-effort slow approach. Most of the time it IS in one of the 3 locations you initially thought of. If you briefly checked them, it may genuinely be worth checking them again. But it’s also important to calibrate the point at which you switch to a slower approach. For finding lost items, this point is probably the point where you find yourself considering checking the same location for the third time.
It sounds like April first acted as a sense-check for Claudius to consider “Am I behaving rationally? Has someone fooled me? Are some of my assumptions wrong?”.
This kind of mistake seems to happen in the AI village too. I would not be surprised if future scaffolding attempts for agents include a periodic prompt to check current information and consider the hypothesis that a large and incorrect assumption has been made.
I think partly what you’re running into is that we live in a postmodern age of storytelling. The classic fairytales where the wicked wolf dies (three little pigs)(red riding hood) or the knight gets the princess after bravely facing down the dragon (George and the dragon) are being subverted because people got bored of those stories, and they wanted to see a twist, so we get something like Shrek—The ogre is hired by the king to rescue the princess from the dragon, but ends up rescuing the princess from the king.
The original archetypes DID exist in stories, but they are rarely used today without some kind of twist. This has happened to the extent that our culture is saturated with this, the twist is now expected and it’s possible to forget that the archetypes ever existed as a common theme.Essentially I think you’re finding instances where the archetype doesn’t match the majority of modern examples. This is because the archetype hasn’t changed, but media referencing that archetype rarely uses it directly without subverting it somewhat.
(Edit)
Thinking about it further, the fairytales I talked about also subvert expectations- hungry wolves normally eat pigs, for example. Those archetypes come from the base level of real life though. It would be common knowledge that wolves will opportunistically prey on livestock. This makes a story about pigs building little houses and then luring the wolf into a cooking pot fun because it reverses the normal roles. When it becomes common to have the wicked wolf lose, though, the story becomes expected and stale. Then someone twists it a little more and you get Shrek, or Zootopia (where (spoilers) the villain turns out to be a sheep)
I wouldn’t class most of the examples given in this post as stereotypical male action heroes.
Rambo was the first example I thought of, and then most roles played by Jason Statham, Bruce Willis, Arnold Shwarzeneggar or Will Smith. I also don’t think the stereotype is completely emotionless, just violent, tough and motivated, capable of anything. They tend to have fewer vulnerable moments and only cry when someone they love dies or something. They don’t cry when they have setbacks to their plans or are upset by an insult someone shouts at them, like normal people might. They certainly don’t cry when they lose their keys or forget somebody’s birthday, or feel pressure to do well in an exam.
This is the first technical approach to alignment I’ve seen that seems genuinely hopeful to me, rather than just another band-aid which won’t hold up to the stresses of a more intelligent model.
As you’ve described it, the fallacy is fairly harmless (it doesn’t materially speed up cooking pasta, but it also doesn’t slow it down). The only thing lost is a bit of time that could be more productively spent doing something else. I think there’s often a side effect which goes along with this fallacy that’s worth mentioning, and can turn it into something actively harmful.
With the example of trying to save energy by turning off the wifi router, a proportion of people will turn the wifi off but not turn the heating down because they think “I followed one of the recommendations, I’m making an effort and doing my part”. Adding in the recommendation to turn off the wifi can be actively harmful because people don’t even necessarily understand that some of the recommendations are more impactful than others, and they’re working off a model of social status signalling to determine what actions they should take, rather than actually understanding the problem and how the proposed solutions are intended to help.
Recycling is a similar situation. Most waste which goes into recycling is not actually recycled, but the act of recycling makes people believe that they are fulfilling their civic duty to reduce single use plastics and wasteful use of resources. As a result they may shirk other much more effective and important green initiatives.
(as a sidenote, energy which is used by the wifi router is going to be disappated as heat, so turning off the wifi will just mean that your heating system will just work a little harder to reach the temperature set by the thermostat, offsetting any savings made by turning the wifi off.)
This is the one of the reasons why I try to add phrases and sentence structures which are far from my usual way of talking in my online comments. Bonus points if I split these text personalities by platform to avoid meta pattern spotting.
It leads to some frankly poor spelling and grammar choices but it adds noise for any patternmatching AI system to filter.