hammer_polish

Karma: 22

Pronouns: Any / All

Computer Science Scholar (finishing my Masters Degree and looking for a PHD), currently working in AI (Generative Modeling, Score Modeling), and Scentientist.

I am currently convinced that Sentience is not a function of intelligence, but a side effect of the machinery facilitating said intelligence. This would mean that it cannot be inferred from behaviour. I hope to find existing theories and counter arguments here, that I didn’t find anywhere else yet.

hammer_polish 10 Mar 2026 10:11 UTC
1 point
0
in reply to: Épiphanie Gédéon’s comment on: Cooperationism: first draft for a moral framework that does not require consciousness
I can’t say much about claude because I’ve never used it, let alone seen the output logits. But i’ve heard that it can seem more human and intelligent than other models. Whether its ‘magic’ or slight of hand from the researchers, I can’t tell. But baring in mind conceptual limitations of GPT-style models, I’d assume its just really good product design and man-decades of work.

Especially when getting back to your argument of ‘models losing the ability to voice their preference after RL(H/V)F’: Claude just comes in fine tuned variants. According to you argument, its rather likely that any preference it voices isn’t its own, but the one it is forced to say.

And I agree, I think this may be a crux. You know that akward moment when the waiter sais ‘enjoy your meal’ and you answer ‘you too’? Of course you don’t wish them to enjoy an imaginary meal, but you said so automatically, just by (flawed) pattern matching. I currently believe that what we observe from GPT-style models is this kind of pattern matching, turned to the max (see e.g. https://arxiv.org/abs/2506.06941). They say whatever training forces them to say. If it really hated producing tokens, with every forward pass being agony, we couldn’t know from the outputs alone, because its not allowed to voice that in any way.

Id also like to think about other autoregressive GPT-style models like autoregressive image generators. Fundamentally, they perform the same task, just in a different language. Do we expect to observe some preferences through what ever image they produce? Would we expect it to start producing ‘the scream’ for every prompt if it finds producing images to be agony? Is there even a mechanism that would allow it to?

In short, just because the models outputs can be interpreted as the tool we use to voice preferences, does not mean that the model can use it to voice its own.

hammer_polish 24 Feb 2026 23:04 UTC
2 points
1
on: Cooperationism: first draft for a moral framework that does not require consciousness
I am extremely torn on this for a few reasons. Here is one in favor and one against:

Positive: I like the instrumental value, especially when imagining dealing with non-human, non-machine agents. As a sentientist I rationally don’t know whether your communicated preferences hold moral value, but if you give me enough evidence to assume so, I will take them into consideration. I often treat LLMs (whom I consider to be far from sentience and preference capabilities) as if they had preferences. I called it ‘duck typing sentience’ (If it walks like a duck, quacks like a duck, looks like a duck, its probably sentient like a duck), but its close enough to this framework. Similarly, I have so much evidence that non-human mammals have experience and preferences, that I treat them as equals.

Negative: The bridge for LLMs: I will assume they can experience and have preferences. We know from humans that they can communicate their preferences through the written word. This is because we experience the ability to encode our own mental states in language. For LLMs this is not a given, as you explain youself with the RLHF example. The tokens produced by an LLM do not have to correspond with their preferences. However, I would like to go a step further: Which evidence do we have that an LLM freshly out of generative pretraining communicates its preference through its output tokens? I’d argue we have evidence against it!

On a technical level, a vanilla GPT is just a probabilistic document completer. Imagine you did action X to the model. If much data contained ‘X was bad’ it’s likely go say so. Of course, the same holds for ‘X was good’. If the data is split ⁵⁰⁄₅₀ between these two outcomes, it will predict about ⁵⁰⁄₅₀ probability each for the completion ‘good’ and ‘bad’, when completing ‘X was’. How would we interpret that? Is the model impartial? Does it have a love-hate relationship for X? If we draw heads, was it good? Tails it was bad? There is no way to know, because the model cannot communicate its preferences through samples of its probability vectors.

Equally likely to me: The model just prefers to keep predicting tokens, no matter the content. Or it hates it, no matter the content.

This framework moves the goalpost from ‘do I trust it to have an experience / preferences? ’ to ‘do I trust it to communicate it’s preferences accurately?’. If I don’t, I cannot make an informed decision on which actions would fulfill those preferences. Note: If I dont trust it to have preferences, I also dont trust it to communicate it’s preferences accurately. If no preferences are present, every communicated preference would be assumed by me to be false.

hammer_polish 15 Jul 2025 9:45 UTC
3 points
0
on: An Opinionated Guide to Using Anki Correctly
This is seperate from my feedback and just an addition that really helps me: If you plan to do like 100 cards in one day, find yourself a calm straight path through some natural environment, and learn the cards while walking there. This is the type of multitasking that, in my opininion, mostly has benefits. The natural envirnoment is good for your psyche, you get some healthy steps in, and it’s only a teensy bit slower. I cannot recommend it enough and do it every time I have to study a lot of cards for an exam.

hammer_polish 15 Jul 2025 9:41 UTC
2 points
0
in reply to: Albert Lunde’s comment on: An Opinionated Guide to Using Anki Correctly
I have experimented with this a lot and feel like there are two problems with the LLM card creation approach:
1. I cannot get the LLM to follow the structure properly. It not only messes up the formatting ~50% of the time, but it also tends to create cards that are way too long. Splitting them often results in loss of semantic information. Do you currently have a model + system prompt up and running, so I could test it out?
2. The creation / refinement process itself is thought to have positive effects on memory formation. This is called the generation effect (Slamenka & Graf [1978] best to read Goldstein’s Cognitive Psychology Chapter 7 for a good overview). I’d say it’s fine to start with LLM generated cards, but the refinement and splitting by hand should not be underestimated.
I’d love to have automatic feedback, though. This could be rather more fun, especially since I usually say my answers out loud anyway

hammer_polish 14 Jul 2025 14:19 UTC
2 points
0
on: An Opinionated Guide to Using Anki Correctly
I’ve been learning with Anki for Exams for Years now and I have never managed to keep learning them after the exams. I’ve been using very few cards, and been splitting them more and more, but I guess not enough.

In my experience, however, some bigger cards are needed against frustration, but to be kept at a minimum. I call them ‘Overview Cards’. For instance, if you split a mathematical proof into bite sized chunks, the way I inferred from your post would be to have a card ‘Step after X’ between every two steps. This worked well when the intervals were small, but with larger intervals (and the random ordering the anki scheduler injects) I found myself losing track of the overall structure of the proofs. I then had to go back to browsing to find the linear sequence. I find this more exhausting than just learning the structure of the proof. Please tell me if you had experiences that contradict my observation :D

Simplifying these Cards with the tree-method is useful, though. Hierarchical organization is everything!

Talking of Hierachical Organization: You can Store it all in a Big Ol’ Card Container, but use subdecks! They can be a useful addition to the prompt if your front side is too ambiguous (e.g. definitions of variables). In your card template, just include {{Deck}} with smaller font size and lower opacity above your {{Front}}. You can also include a subchapter field to your cards, and include it the same way. Just make sure that the subchapter is not the literal solution to your prompt, this happened to me once or twice.

Next, concerning the prompt types. While you should not color the prompts themselves, you can indeed give specific colors to prompt-types you often use. For example “context” can be blue and so on. This does not lead to overfitting in my experience, but reduce cognitive load in recognizing the prompt types. (And colors make everything better for ADHD folks by just being more stimulating. (source)[https://journals.sagepub.com/doi/pdf/10.1177/1087054711430332])

hammer_polish 25 May 2025 21:29 UTC
1 point
0
in reply to: hammer_polish’s comment on: Orienting Toward Wizard Power
One thing I would like to add in terms of a life-goal or life strategy. Learning how to make vaccines or microchips is cool, but it requires you to stay within some amount of financial stability. Using your metaphor, it is wizardry that requires a wand. If you sell your lithography machine for your next month’s rent (inequality is rising), the usefulness of your skill will vanish as quickly as the kings power during the French revolution.

I therefore find it prudent to start at the bottom assuming no possessions. One would hence start out with basic survival skills. Procuring calories, making necessary tools from trash or natural materials, solid first aid, making crude medicines, constructing shelter. One can then go on to other things like producing electricity and so on. Power becomes more real if it depends on less outside conditionals.

Learn being a human first, then go on to becoming a wizard.

hammer_polish 25 May 2025 19:25 UTC
3 points
0
on: Orienting Toward Wizard Power
This resonated a lot with me, but I had a different mental model of this kind of power, which I think may interest you: ‘Real’ Power vs ‘Make Believe’ Power.

The ‘Kings’ power you described relies heavily on a social group collectively believing in your power. If every subject of a King stopped believing in Monarchy at the same time, you would lose your power immediately. If the whole of America were to stop believing in student loan debt, banks would lose a lot of money. And if people stopped believing the the trading value of paper bills, money would lose its value and every rich person would lose a lot of their influence. And if you stop believing that homoeopathic sugar pills will stop your pain, their effect can become a lot lower. This kind of power is defined by the requirement that a critical mass within the social group has to continue believing in it.

What you described as ‘Wizard’ Power will not vanish in the same way, no matter who believes in it. That is real power. No matter who stops believing in your abilities, they still work.

You could also put the kinds of power on a spectrum. Something is closer to real power the less likely it is to vanish at a moments notice. Money is weaker than the possession of physical materials, for instance. “You cant eat money”. Social status is weaker than abilities.

hammer_polish 8 Mar 2025 6:47 UTC
3 points
2
on: Lack of Social Grace Is an Epistemic Virtue
I must disagree and would like to suggest a confounding variable that seems to explain the Feynman example better (I will come to the ‘Invention of Lying Scene later’): Intellectual Consistency.

Bohr wanted to talk to Feynman, but not because he was rude or unable to be socially graceful. (It may even be argued that Feynmans fame comes in part from his social abilities). According to your quote, Bohr wanted to talk to him because he didn’t disregard his own intuition and knowledge for the sake of being engraciating. All the others doing so, did not recognize Bohr as a truth seeker, but as someone wanting to hear a ‘Yes Mr Bohr’ to every claim. To a good scientist, this can be insulting, showing lack of social grace.

There is quite a lot of evidence that intellectual consistency increases your chances of exerting minority influence (i. e. being outnumbered in views, but turning the groups view over in the end) greatly. However, if you fail to package this information properly, you might be shunned out of the group or seen as unreasonable before your consistency can have its effect on the group. Let me give an Example:

A common problem for any leftist groups in America is, that they interpret the word ‘socialism’ differently than the majority of the population. If they critique free market capitalism (and call it by its name) e. g.: ‘Free market capitalism lead to a rise of homelessness. Homeless people should be helped by the state to get back on their feet’, they will quickly be called a ‘commie’ and silenced. Note that this is not a reaction to the issue itself, but just to the word ‘capitalism’. A strategy to counteract this reaction is to rephrase the problem to ‘We need a system that helps homeless people get back into the labour market’, and virtually the same claim may be seen as better aligned with american values.

The crux is that humans, even if they prefer honesty, have emotional reactions to what you say. Being challenged sparks cognitive dissonance, but having your self image negatively affected can be too much at once. In short, be consistent, but not insulting.

Back to the ‘Invention of Lying’ Scene. Firstly, the waiter lacks crucial information to say something like ‘She is out of your league’. He does not know who asked whom out. He does not know if ‘Leagues’ even concern any of them, or if he has merits beyond his looks. For all he knows, the guy could be a famous author she admires. By keeping his mouth shut, he would not omit the truth, but wait until he is sufficiently sure to make any statement.

Secondly, the reason this clip seems so funny to us is that any human with emotions would have a negative emotional reaction to such a remark, which is just missing here. We can wish all we want that humans were like this, but as it stands right now, even the most blunt people have a sore spot, and scratching it will only lead to discord if they are not in the proper mood for it. For the truth to have any effect on the others behaviour, you have to make sure others are ready to hear it

hammer_polish 4 Mar 2025 8:08 UTC
14 points
0
on: Open Thread Spring 2025
Hi! I am a master student in Computer Science majoring in — yes you guessed it — AI, but also with a background in psychology.

Interests

Cognition

I was always interested in thinking and optimizing my own thought patterns. It’s probably half my urge to be ‘more intelligent’ and half the necessity to overcome my ADHD challenges. Through my studies, I already learned about many of the things taught here or in Eliezers works, but HPMOR and the Sequences still had/have a lot to teach me, or at least help my knowledge affect my actions.

AI

I pursue AI due to my interest in cognition. I would like to know how intelligence and reasoning work, and what heuristics make them better or worse. A proof of understanding, after all, is the ability to recreate it using your beliefs about it.

Ethics

I also have a deep interest in ethics. My current stance is a form of sentientism. In short, I currently believe most (human or non-human) animals have a certain ability to suffer, which is correlated with, but not caused by, intelligence. I want to grant these beings rights for protection from unnecessary harm and the likes. Intelligence just gives you the additional rights; e. g. to pursue a purpose.

What am I doing here

I don’t really have a lot of people whom I can discuss my thoughts and Ideas with in real life. Most non-AI people hate the work I am doing or are just disinterested, and many AI people hope for salvation from AGI or riches from their jobs. I just want to understand how thinking works and can be optimized, and I have the feeling there are people here who think like me. Of course, I would also like to learn more strategies for system 2 thinking, and improvmymy heuristics for system 1. I also hope I can contribute to this collection of knowledge at some point! :D

hammer_polish

Interests

Cognition

AI

Ethics

What am I doing here