Hi, I am a Physicist, an Effective Altruist and AI Safety student/researcher.
Linda Linsefors
Is there anyone in London who want to colaborate with me on some Comp-in-Sup reserch?
Prerequisits are knowing some Linear Algebra and some Python/Pytorch.
I like it! “SiLT” is also easier to say than “Es El Te”
Thanks for the link. I have read it but it was long ago, so thanks for the reminder. It’s related in a helpfull way.
I just checked in with myself what the post above was for. I tink its part rant, part me clarifying my thoughs by writing them, and hopefully getting some reflections back. And it’s also becasue maybe someone will find it usefull, but that’s also maybe secretly about me, to create more conversation partners that track the things I think is important.
If I was writing a proper LW blogpost then [who is this for] should primarlely be the reader.
But in a shortform like this I feel like I’m allowed to do what I want. And also people can take away what they want. Tracking [who is this for] is much more important when people are in a live conversations, becasue that is a more trapped situation, requiring more concideration.
There are also the type of conversation where the other person pretends that it is about me, but acctually it is about their need to feel like a good person. These situatios are afull and terrible, and I will not play along.
When I’m in a conversation often track who the conversation is for. I.e. who is this conversation primerely seving, in this moment.
If I’m ranting, then this conversation is for me, to let me realsease some tension. I will express my self in a way that feels good to me.
If I’m sharing usefull information, then the conversation is for the other person. I will express my self in a way to make the information clear and accessable for them, and also pay attetion to if they even want this information.
I can explain myself becasue I need to be seen, or becasue another person wants to understand me. But these are diffrent things.
Sometimes more than one person is upset and have needs, and then you have to pick who gets help first. Who ever goes first, now the concersation is for them, untill they feel sufficiently better to swich. And if neithr person can set aside their needs, probably you should not talk right now, or bring in help.
I don’t know how freqently or reliably I do this, because it’s not deliberate, i.e. I never decided to do this, I just do it sometimes, because [who is this for?] is often a required imputs for my speach generator.
Do you usually track this? In what types of conversations? Do you think other people usually track this?
I vaugly remember having played with these rules, with you, more than once.
Another change (starting from the standard rules) that I think might speed games up, is the ability to spend multiple funding tokens to publish a paper out of turn. But I’ve only run this once, needing three tokens, and no one took advantage of it.
AI Safety Camp 10 Outputs
Typo react from me. I think you should call your links something informative. If you think the title of the post is clickbate, you can re-title it something better maybe?
Now I have to click to find out what the link is even about, which is also click-bate-y.
Estimated MSE loss for three diffrent ways of embedding features into neuons, when there are more possible features than neurons.
I’ve typed up some math notes for how much MSE loss we should expect for random embedings, and some other alternative embedings, for when you have more features than neurons. I don’t have a good sense for how ledgeble this is to anyone but me.
Note that neither of these embedings are optimal. I belive that the optimal embeding for minimising MSE loss is to store the features in almost orthogonal directions, which is similar to random embedings but can be optimised more. But I also belive that MSE loss don’t prefeer this solution very much, which means that when there are other tradeofs, MSE loss might not be enough to insentivise superposstion.
This does not mean we should not expect superpossition in real network.Many networks uses other loss functions, e.g. cross-entropy.
Even if the loss is MSE on the final output, this does not mean MSE is the right loss for modeling the dynamics in the middle of the network.
Setup and notation
features
neurons
active featrues
Assuming:
True feature values:
= 1 for active featrus
= 0 for inactive features
Using random embedding directions (superpossition)
Estimated values:
+ where for active features
where for active features
Total Mean Squared Error (MSE)
This is minimised by
Making MSE
One feature per neuron
We emebd a single feature in each neuron, and the rest of the features, are just not represented.
Estimated values:
for represented features
for non represented features
Total Mean Squared Error (MSE)
One neuron per feature
We embed each feature in a single neuron.
where the sum is over all feature that shares the same neuron
We assume that the probability of co-activated features on the same neuron is small enough to ignore. We also assume that every neuron is used at least once. Then for any active neuron, the expected number of inactive neurons that will be wrongfully activated, are , giving us the MSE loss for this case as
We can already see that this is smaller than , but let’s also calculate what the minimum value is. is minimised by
Making MSE
I’m not sure that answered your question, but maybe you can ask a more specific one now.
The thing I was after was, what is the actual concreet causal chain from rationality training to you getting better at debuging.
I currently think the answer is that the rationality training made you motivated, and that was the missing part that stopped you from getting better before. Let me know if you think I’m missing something important.
Interesting. Reading your comment makes me notice that I’m more motivated to learn object level skills than meta level skills.
“meta level” != “rationality.
E.g. I would count most of the CFAR curiculum as object level skills. But the traingin you’re working on seems more meta level skills.I expect motivation to be super central for what leanring methods works. There has been a number of posts on ACX about school (including 2 that are part of the reveiw contest). The common theme is that the main bottleneck is students motivation.
I didn’t improve much at debugging until I got generally serious about rationality training.
Can you expand on this please?
NSFW question
How do you maintain breath control on someone who is paniking.
I’ve tried a bit of hoding someones mouth and nose, from both sides of the experience, and haven’t figured out a way that acctually stops the person from breathing if they try hard enough.
No, I don’t think what you say maches my experience. My anxiety was pointing straight at the thing I needed. Although I acknolage I did not put forward enough details for thus to be clear to you.
But it did not tell me how much I would need exactly. So it’s more like your hungry, and you eat some, and notice that you’re still hungry, and then start to wonder if eating is actually what you need, or this hunger feeling is about something else.
I don’t know what you mean by “generic safery net” or “safety in the literal sense”. I assumed based on context that we’re not talking about physical safety.
I mean things like: I’m not lonely and I expect to continue not to be lonely, because I found people I like who reliably also want me around.
I don’t know what is true for the typical person, and I’m definatly not a typical person.
With those caviats, what you describe is not true for me. To feel ok, I need to have a handfull of close friends that I see regularly. This provides some sort of validation, among other values. If I have this, my social anxiety is low. If I don’t have this my anxiety is high, and causes lot’s of problems.It might look like my anxiety was recistant to be cured by more safety, because it took me a long time to find the people I need. Before I found people of my approximat neurotype, I was so far from being ok, that it was unclear to me that the thing I could clearly feel I was missing, was something that could exist.
And it’s not the case that the further from the safe situation I am, the more anxiety I feel. It’s more like a step function.
Also, sometimes the anxiety need some time to fully update on a new situation. This looks like the anxiety comming back. And then I focus on the evidence that things are acctually ok, or ask for some help to do this, and then it goes away. This does not work if things are not acctually ok.
I can see how this could look like anxiety is conserved, over a lot of diffrent datapoints, and I don’t know how someone can tell the diffrence untill they have experienced sufficient safety.
Maybe the reason people stick to what they are good at, is not lack of motivation to explore, but lack of safety net to explore. This seems to explain all your observations, if you assume most people are much more anxious than you. In this case, what other people need to grow is more acceptance in their life, not more pushing.
I disagree that it’s hard, in the relevant context.
It’s hard to communicate this to someone who don’t have a distinction between the two concepts in their head. It’s also hard to communicate this with someone who are two quick to jump to conclutions regarding what you mean to say, and also have bad priors about you. This is enough of a problem, that I don’t recommend offering decernments to people you don’t know well. But that’s also kind of a mute point, since I think it’s bad to offer unsolicited advice to people you don’t know well, for other reasons.
But with someone like a romantic parner, or a close friend, with whom you’d have lots of long form conversation, I don’t think it’s hard.
You can infact just say: “I love you as you are, and among the things I love about you is the desire to grow stronger. I’ve noticed a way you could be stronger, do you want to hear it now or later?”
Or if you have extablished the words “desernment vs judgment” you can just pre-prease any suggestion for imporvment with “desernment”. Or what ever communication style works for you.
Later into the relationship, you might not even have to clarify, but the person will just have the correct prior that you’re expressing a desernment, and not a judgment.
If I had to guess: I think the way prepaid meters work is you go to a shop and buy a physical object representing a certain amount of electricity and present that physical object to your electric meter.
I’ve used a pre payed meter. The way it worked was I went to their webpage and bough the amount I wanted. This was mean to update the meeter automatically within an hour, but this never worked, so I had to use the backup method wich was to enter a very long numerical code into the meter.
I’m supprised that intrumental convergence wasn’t covered in the book. I didn’t even notice it was left out untill reading this review.
Here’s some alternative sources in anyone prefeers text over video:
https://www.lesswrong.com/w/instrumental-convergence
https://en.wikipedia.org/wiki/Instrumental_convergence