Burnout often doesn’t look like lack of motivation / lack of focus / fatigue as people usually describe it. At least in my experience, it’s often better described as a set of aversive mental triggers that fire whenever a burnt out person goes to do a sort of work they spent too much energy on in the past. (Where ‘too much energy’ has something to do with time and effort, but more to do with a bunch of other things re how people interface with their work).
‘I get surrounded by small ugh fields that grow into larger, overlapping ugh fields until my navigation becomes constained and eventually impossible’ was how I described one such experience
Thing I currently believe about what the core interface failure is, possibly just for me:
[comment link]
Burnout is not a result of working a lot, it’s a result of work not feeling like it pays out in ape-enjoyableness[citation needed]. So they very well could be having a grand ol time working a lot if their attitude towards intended amount of success matches up comfortably with actual success and they find this to pay out in a felt currency which is directly satisfying. I get burned out when effort ⇒ results ⇒ natural rewards gets broken, eg because of being unable to succeed at something hard, or forgetting to use money to buy things my body would like to be paid with.
Moreover, it’s when the work that used to be satisfying has stopped being so, but the habit of trying to do the work has not yet been extinguished. So you don’t quit yet, but the habit is slowly dying so you don’t do it well …
I often hear people dismiss AI control by saying something like, “most AI risk doesn’t come from early misaligned AGIs.” While I mostly agree with this point, I think it fails to engage with a bunch of the more important arguments in favor of control— for instance, the fact that catching misaligned actions might be extremely important for alignment. In general, I think that control has a lot of benefits that are very instrumentally useful for preventing misaligned ASIs down the line, and I wish people would more thoughtfully engage with these.
The typical mind fallacy is really useful for learning things about other people, because the things they assume of others often generalize surprisingly well to themselves
Ah, an avalanche of thoughts that apply to heuristics in general...
yes, it is also my experience that this is a useful heuristics
but there are exceptions
and when this becomes widely known, of course the bad actors will adapt and say the right things
The most useful example for this heuristics is when people say things like “everyone is selfish” etc. For example:
Brent promoted a cynical version of the world, in which humans were inherently bad, selfish, and engaged in constant deception of themselves and others. He taught that people make all their choices for hidden reasons: men, mostly to get sex; women, mostly to get resources for their children. Whenever someone outside Black Lotus made a choice, Brent would dissect it to reveal the hidden selfish motivations. Whenever someone outside Black Lotus was well-respected or popular, Brent would point out (or make up) ways that they were exploiting those weaker than them for their own benefit. This became a self-fulfilling prophecy. One interviewee identified this ideology as the worst harm of Black Lotus, more than sexual boundary violations or being coerced into taking drugs. It took a long time to rebuild their ability to trust and have faith in other people.
The opposite extreme is naive people who assume that everyone is trying their best, and if there is a dysfunction somewhere, it must be because somehow no one has ever approached those people and told them “uhm, could you please do the right thing?”. If we do that, certainly the problems will get fixed! (Similar pattern: naive religious people who assume that others are atheists simply because they have never heard of this book called Bible.)
Problem is, sometimes people happen to be surrounded by a statistically unrepresentative sample of population. Young people may overgeneralize examples from their family, rather than from themselves. I suspect that there are many kids who e.g. happen to have an abusive father, so they generalize “all men are abusive”, because (1) they have no idea how it works in other families, and it makes sense to assume that all other families also keep secret about what happens at home, and (2) it is too painful to admit that it’s your family that happens to suck.
One thing that I hate about this heuristics is that it discourages learning. Like, suppose that you start like the naive kind of person who assumes that everyone always tries their best to help others. Then you get burned, because you happen to meet a predator who specializes on this kind of victims. Afterwards you try to share your lesson with other people in a similar situation… but they incorrectly apply this heuristics to you and conclude that you are the bad guy. (Incorrectly: you say “some people are evil”, not “all people are evil”. But to them it sounds similar.)
Then there is the usual thing about bad actors being anti-inductive. As soon as it becomes known that good people assume the best and bad people assume the worst, smart scammers will adapt their stories, and will weaponize this heuristics against anyone who suspects them or just advises caution in general. (Just like they have already adapted to heuristics such as “sincere people will look in your eyes, the insincere will avoid looking at you because they feel ashamed” by taking care to look straight in your eyes when they are trying to scam you.)
...and despite all objections that I just made, it is a useful heuristics (as long as you remember the exceptions).
I think rule-following may be fairly natural for minds to learn, much more so than other safety properties, and I think it might be worth more research into this direction.
Some pieces of weak evidence: 1. Most humans seem to learn and follow broad rules, even when they are capable of more detailed reasoning 2. Current LLMs seem to follow rules fairly well (and training often incentivizes them to learn broad heuristics rather than detailed, reflective reasoning)
While I do expect rule-following to become harder to instill as AIs become smarter, I think that if we are sufficiently careful, it may well scale to human-level AGIs. I think trying to align reflective, utilitarian-style AIs is probably really hard, as these agents are much more prone to small unavoidable alignment failures (like slight misspecification or misgeneralization) causing large shifts in behavior. Conversely, if we try our best to instill specific simpler rules, and then train these rules to take precedence over consequentialist reasoning whenever possible, this seems a lot safer.
I also think there is a bunch of tractable, empirical research that we can do right now about how to best do this.
Burnout often doesn’t look like lack of motivation / lack of focus / fatigue as people usually describe it. At least in my experience, it’s often better described as a set of aversive mental triggers that fire whenever a burnt out person goes to do a sort of work they spent too much energy on in the past. (Where ‘too much energy’ has something to do with time and effort, but more to do with a bunch of other things re how people interface with their work).
‘I get surrounded by small ugh fields that grow into larger, overlapping ugh fields until my navigation becomes constained and eventually impossible’ was how I described one such experience
Thing I currently believe about what the core interface failure is, possibly just for me:
Moreover, it’s when the work that used to be satisfying has stopped being so, but the habit of trying to do the work has not yet been extinguished. So you don’t quit yet, but the habit is slowly dying so you don’t do it well …
I often hear people dismiss AI control by saying something like, “most AI risk doesn’t come from early misaligned AGIs.” While I mostly agree with this point, I think it fails to engage with a bunch of the more important arguments in favor of control— for instance, the fact that catching misaligned actions might be extremely important for alignment. In general, I think that control has a lot of benefits that are very instrumentally useful for preventing misaligned ASIs down the line, and I wish people would more thoughtfully engage with these.
The typical mind fallacy is really useful for learning things about other people, because the things they assume of others often generalize surprisingly well to themselves
Ah, an avalanche of thoughts that apply to heuristics in general...
yes, it is also my experience that this is a useful heuristics
but there are exceptions
and when this becomes widely known, of course the bad actors will adapt and say the right things
The most useful example for this heuristics is when people say things like “everyone is selfish” etc. For example:
The opposite extreme is naive people who assume that everyone is trying their best, and if there is a dysfunction somewhere, it must be because somehow no one has ever approached those people and told them “uhm, could you please do the right thing?”. If we do that, certainly the problems will get fixed! (Similar pattern: naive religious people who assume that others are atheists simply because they have never heard of this book called Bible.)
Problem is, sometimes people happen to be surrounded by a statistically unrepresentative sample of population. Young people may overgeneralize examples from their family, rather than from themselves. I suspect that there are many kids who e.g. happen to have an abusive father, so they generalize “all men are abusive”, because (1) they have no idea how it works in other families, and it makes sense to assume that all other families also keep secret about what happens at home, and (2) it is too painful to admit that it’s your family that happens to suck.
One thing that I hate about this heuristics is that it discourages learning. Like, suppose that you start like the naive kind of person who assumes that everyone always tries their best to help others. Then you get burned, because you happen to meet a predator who specializes on this kind of victims. Afterwards you try to share your lesson with other people in a similar situation… but they incorrectly apply this heuristics to you and conclude that you are the bad guy. (Incorrectly: you say “some people are evil”, not “all people are evil”. But to them it sounds similar.)
Then there is the usual thing about bad actors being anti-inductive. As soon as it becomes known that good people assume the best and bad people assume the worst, smart scammers will adapt their stories, and will weaponize this heuristics against anyone who suspects them or just advises caution in general. (Just like they have already adapted to heuristics such as “sincere people will look in your eyes, the insincere will avoid looking at you because they feel ashamed” by taking care to look straight in your eyes when they are trying to scam you.)
...and despite all objections that I just made, it is a useful heuristics (as long as you remember the exceptions).
I think rule-following may be fairly natural for minds to learn, much more so than other safety properties, and I think it might be worth more research into this direction.
Some pieces of weak evidence:
1. Most humans seem to learn and follow broad rules, even when they are capable of more detailed reasoning
2. Current LLMs seem to follow rules fairly well (and training often incentivizes them to learn broad heuristics rather than detailed, reflective reasoning)
While I do expect rule-following to become harder to instill as AIs become smarter, I think that if we are sufficiently careful, it may well scale to human-level AGIs. I think trying to align reflective, utilitarian-style AIs is probably really hard, as these agents are much more prone to small unavoidable alignment failures (like slight misspecification or misgeneralization) causing large shifts in behavior. Conversely, if we try our best to instill specific simpler rules, and then train these rules to take precedence over consequentialist reasoning whenever possible, this seems a lot safer.
I also think there is a bunch of tractable, empirical research that we can do right now about how to best do this.