Hi, I am a Physicist, an Effective Altruist and AI Safety student/researcher.
Linda Linsefors
AI Safety Camp 10 Outputs
Estimated MSE loss for three diffrent ways of embedding features into neuons, when there are more possible features than neurons.
I’ve typed up some math notes for how much MSE loss we should expect for random embedings, and some other alternative embedings, for when you have more features than neurons. I don’t have a good sense for how ledgeble this is to anyone but me.
Note that neither of these embedings are optimal. I belive that the optimal embeding for minimising MSE loss is to store the features in almost orthogonal directions, which is similar to random embedings but can be optimised more. But I also belive that MSE loss don’t prefeer this solution very much, which means that when there are other tradeofs, MSE loss might not be enough to insentivise superposstion.
This does not mean we should not expect superpossition in real network.Many networks uses other loss functions, e.g. cross-entropy.
Even if the loss is MSE on the final output, this does not mean MSE is the right loss for modeling the dynamics in the middle of the network.
Setup and notation
features
neurons
active featrues
Assuming:
True feature values:
= 1 for active featrus
= 0 for inactive features
Using random embedding directions (superpossition)
Estimated values:
+ where for active features
where for active features
Total Mean Squared Error (MSE)
This is minimised by
Making MSE
One feature per neuron
We emebd a single feature in each neuron, and the rest of the features, are just not represented.
Estimated values:
for represented features
for non represented features
Total Mean Squared Error (MSE)
One neuron per feature
We embed each feature in a single neuron.
where the sum is over all feature that shares the same neuron
We assume that the probability of co-activated features on the same neuron is small enough to ignore. We also assume that every neuron is used at least once. Then for any active neuron, the expected number of inactive neurons that will be wrongfully activated, are , giving us the MSE loss for this case as
We can already see that this is smaller than , but let’s also calculate what the minimum value is. is minimised by
Making MSE
I’m not sure that answered your question, but maybe you can ask a more specific one now.
The thing I was after was, what is the actual concreet causal chain from rationality training to you getting better at debuging.
I currently think the answer is that the rationality training made you motivated, and that was the missing part that stopped you from getting better before. Let me know if you think I’m missing something important.
Interesting. Reading your comment makes me notice that I’m more motivated to learn object level skills than meta level skills.
“meta level” != “rationality.
E.g. I would count most of the CFAR curiculum as object level skills. But the traingin you’re working on seems more meta level skills.I expect motivation to be super central for what leanring methods works. There has been a number of posts on ACX about school (including 2 that are part of the reveiw contest). The common theme is that the main bottleneck is students motivation.
I didn’t improve much at debugging until I got generally serious about rationality training.
Can you expand on this please?
NSFW question
How do you maintain breath control on someone who is paniking.
I’ve tried a bit of hoding someones mouth and nose, from both sides of the experience, and haven’t figured out a way that acctually stops the person from breathing if they try hard enough.
No, I don’t think what you say maches my experience. My anxiety was pointing straight at the thing I needed. Although I acknolage I did not put forward enough details for thus to be clear to you.
But it did not tell me how much I would need exactly. So it’s more like your hungry, and you eat some, and notice that you’re still hungry, and then start to wonder if eating is actually what you need, or this hunger feeling is about something else.
I don’t know what you mean by “generic safery net” or “safety in the literal sense”. I assumed based on context that we’re not talking about physical safety.
I mean things like: I’m not lonely and I expect to continue not to be lonely, because I found people I like who reliably also want me around.
I don’t know what is true for the typical person, and I’m definatly not a typical person.
With those caviats, what you describe is not true for me. To feel ok, I need to have a handfull of close friends that I see regularly. This provides some sort of validation, among other values. If I have this, my social anxiety is low. If I don’t have this my anxiety is high, and causes lot’s of problems.It might look like my anxiety was recistant to be cured by more safety, because it took me a long time to find the people I need. Before I found people of my approximat neurotype, I was so far from being ok, that it was unclear to me that the thing I could clearly feel I was missing, was something that could exist.
And it’s not the case that the further from the safe situation I am, the more anxiety I feel. It’s more like a step function.
Also, sometimes the anxiety need some time to fully update on a new situation. This looks like the anxiety comming back. And then I focus on the evidence that things are acctually ok, or ask for some help to do this, and then it goes away. This does not work if things are not acctually ok.
I can see how this could look like anxiety is conserved, over a lot of diffrent datapoints, and I don’t know how someone can tell the diffrence untill they have experienced sufficient safety.
Maybe the reason people stick to what they are good at, is not lack of motivation to explore, but lack of safety net to explore. This seems to explain all your observations, if you assume most people are much more anxious than you. In this case, what other people need to grow is more acceptance in their life, not more pushing.
I disagree that it’s hard, in the relevant context.
It’s hard to communicate this to someone who don’t have a distinction between the two concepts in their head. It’s also hard to communicate this with someone who are two quick to jump to conclutions regarding what you mean to say, and also have bad priors about you. This is enough of a problem, that I don’t recommend offering decernments to people you don’t know well. But that’s also kind of a mute point, since I think it’s bad to offer unsolicited advice to people you don’t know well, for other reasons.
But with someone like a romantic parner, or a close friend, with whom you’d have lots of long form conversation, I don’t think it’s hard.
You can infact just say: “I love you as you are, and among the things I love about you is the desire to grow stronger. I’ve noticed a way you could be stronger, do you want to hear it now or later?”
Or if you have extablished the words “desernment vs judgment” you can just pre-prease any suggestion for imporvment with “desernment”. Or what ever communication style works for you.
Later into the relationship, you might not even have to clarify, but the person will just have the correct prior that you’re expressing a desernment, and not a judgment.
If I had to guess: I think the way prepaid meters work is you go to a shop and buy a physical object representing a certain amount of electricity and present that physical object to your electric meter.
I’ve used a pre payed meter. The way it worked was I went to their webpage and bough the amount I wanted. This was mean to update the meeter automatically within an hour, but this never worked, so I had to use the backup method wich was to enter a very long numerical code into the meter.
Although non of them seem to have “distillation”, or “reserach debt” so there seems to be room for imporvement.
“distilation” do have an explanation in this tag though
https://www.lesswrong.com/w/distillation-and-pedagogy
I think the Wiki Tags are ment to be used as both tags and dictionary, however these two purpuses don’t cleanly line up.
I was pretty sure this exist, maybe even built into LW. It seems like an obvious thing, and there are lots of parts of LW that for some reason is hard to find from the fron page. Googleing “lesswrong dictionary” yealded
https://www.lesswrong.com/w/lesswrong-jargon
https://www.lesswrong.com/w/r-a-z-glossary
https://www.lesswrong.com/posts/fbv9FWss6ScDMJiAx/appendix-jargon-dictionary
Now days you can descripe the concept you want and have a LLM tell you the common term, but this tech is super new. Most of our jargon in from a time when you could only Google things you already know the name for.
Yes, thanks. I’ve fixdd it now.
I think it’s great that you point this out when you see it!
I think the acadeimc meaning of the word “review” (as in “review paper”) is a great fit. Except this word also have several other near by meanings which are not the meaning we want.
“Explainer” is pretty good. E.g. calling a post “Explainer of …”, or somene can have the role of Explainer.
I don’t think “pedagogy” is the right word? It’s too broad. It encompases things like how long lectures should be before taking a break, spaced repetition, etc. The jargon Distillation [2] is not a shorthand for the entire science of teahing and learning. It’s a shorthand for the art of writing good explainers, right?
and then being told that it’s a skill issue and I should just learn the rules.
This part is not aimed at leogao’s post!
What I was (not very skillfully) trying to point at is people who think that autistic people are just worse at social skills. I’m so fead up with this claim, and is a contributing reason to me avoiding the neurotypicals. But it’s not a claim that I read leogao’s as having made.
leogau’s language comparison is actually pretty great for this. You would not call someone who have a difrent native langue “bad at languages”, but nerutypicals are often mistakenly beleveing that autists are “bad at social skills”.
I also want to add that lots of atuists learn how to interact with the neurotypicals. It’s called masking, and involves learning more than just their wierd customs. It also involves hiding ones natural reactions. I hear it’s common for autistic women to get so good at this that they don’t get diagnosed untill later in life, when the burden of constant masking causes depression or something. This did not happen to me, because I am terrible at masking.
I’m pretty smart, clearly above average in general inteligence. But I’m also clearly below average in ability to learn langugaes. I can learn, I did learn English after all. But for a long time I was much worse than the typical Sweed my age.
Typo react from me. I think you should call your links something informative. If you think the title of the post is clickbate, you can re-title it something better maybe?
Now I have to click to find out what the link is even about, which is also click-bate-y.