Hi, I am a Physicist, an Effective Altruist and AI Safety student/researcher.
Linda Linsefors(Linda Linsefors)
I’ve now updated the event information to include summaries/abstracts for the projects/talks. Some of these are still under construction.
Ok, you’re right that this is a very morally clear story. My bad for not knowing what’s typical tabloid storry.
Missing kid = bad,
seems like a good lesson for AI to learn.
I don’t read much sensationalist tabloid, but my impression is that the things that get a lot of attention in the press, is things people can reasonable take either side of.
Scott Alexander writes about how everyone agrees that factory framing is terrible, but exactly because this overwhelming agreement, it get’s no attention. Which is why PETA does outrageous things to get attention.
The Toxoplasma Of Rage | Slate Star Codex
There need to be two sides to an issue, or else no-one gets ingroup loyalty points for taking one side or the other.
AI Safety Camp final presentations
Their more human-in-the-loop stuff seems neat though.
I found this on their website
Soon, interacting with AI agents will be a part of daily life, presenting enormous regulatory and compliance challenges alongside incredible opportunities.
Norm Ai agents also work alongside other AI agents who have been entrusted to automate business processes. Here, the role of the Norm Ai agent is to automatically ensure that actions other AI agents take are in compliance with laws.
I’m not sure if this is worrying, because I don’t think AI overseeing AI is a good solution. Or it’s actually good, because, again, not a good solution, which might lead to some early warnings?
Sensationalist tabloid news stories and other outrage porn are not the opposite. These are actually more of the same. More edge cases. Anything that is divisive have the problem I’m talking about.
Fiction is a better choice.
Or even just completely ordinary every-day human behaviour. Most humans are mostly nice most of the time.
We might have to start with the very basic, the stuff we don’t even notice, because it’s too obvious. Things no-one would think of writing down.
The math in the post is super hand-wavey, so I don’t expect the result to be exactly correct. However in your example, l up to 100 should be ok, since there is no super position. 2.7 is almost 2 orders of magnitude off, which is not great.
Looking into what is going on: I’m basing my results on the Johnson–Lindenstrauss lemma, which gives an upper bound on the interference. In the post I’m assuming that the actual interference is order of magnitude the same as the this upper bound. This assumption is clearly fails in your example since the interference between features is zero, and nothing is the same order of magnitude as zero.
I might try to do the math more carefully, unless someone else gets there first. No promises though.
I expect that my qualitative claims will still hold. This is based on more than the math, but math seemed easier to write down. I think it would be worth doing the math properly, both to confirm my claims, and it may be useful to have more more accurate quantitative formulas. I might do this if I got some spare time, but no promises.
my qualitative claims = my claims about what types of things the network is trading away when using super position
quantitative formulas = how much of these things are traded away for what amount of superposition.
Recently someone either suggested to me (or maybe told me they or someone where going to do this?) that we should train AI on legal texts, to teach it human values. Ignoring the technical problem of how to do this, I’m pretty sure legal text are not the right training data. But at the time, I could not clearly put into words why. Todays SMBC explains this for me:
Saturday Morning Breakfast Cereal—Law (smbc-comics.com)
Law is not a good representation or explanation of most of what we care about, because it’s not trying to be. Law is mainly focused on the contentious edge cases.
Training an AI on trolly problems and other ethical dilemmas is even worse, for the same reason.
(Note: Said friend will be introducing himself on here and writing a sequence about his work later. When he does I will add the links here.)
Did you forget to add the links?
Virtual AI Safety Unconference 2024
I think point 5 is the main crux.
Please click agree or disagree on this comment if you agree or disagree (cross or check mark), since this is useful guidance for what part of this people should prioritise when clarifying further.
Did you forget to provide links to research project outputs in the appendix? Or is there some other reason for this?
I think it’s reasonable to think about what can be stored in a way that can be read of in a linear way (by the next layer), since that are the features that can be directly used in the next layer.
storing them nonlinearly (in one of the host of ways it takes multiple nn layers to decode)
If it takes multiple nn layers to decode, then the nn need to unpack it before using it, and represent it as a linear readable feature later.
Good point. I need to think about this a bit more. Thanks
Just quickly writing up my though for now...What I think is going on here is that Johnson–Lindenstrauss lemma gives a bound on how well you can do, so it’s more like a worst case scenario. I.e. Johnson–Lindenstrauss lemma gives you the worst case error for the best possible feature embedding.
I’ve assumed that the typical noise would be same order of magnitude as the worst case, but now I think I was wrong about this for large .
I’ll have to think about what is more important of worst case and typical case. When adding up noise one should probably use worst typical case. But when calculating how many features to fit in, one should probably use worst case.
Yes. Thanks for pointing this out. I changed notation and must have forgotten this one.
Some costs of superposition
And… my guess in hindsight is that the “internal double crux” technique often led, in practice, to people confusing/overpowering less verbal parts of their mind with more-verbal reasoning, even in cases where the more-verbal reasoning was mistaken.
I’m confused about this. The way I remember it tough was very much explicitly against this, I.e:Be open to either outcome being right.
Don’t let the verbal part give the non-verbal part a dumb name.
Make space for the non verbal part to express it self in it’s natural modality which is often inner sim.
For me IDC was very helpful to teach me how to listen to my non verbal parts. Reflecting on it, I never spent much time on the actual cruxing. When IDC-ing I mostly spend time on actually hearing both sides. And when all the evidence is out, the outcome is most often obvious.
But it was the IDC lesson and the Focusing lesson that thought me these skills. Actually even more important than the skill was to teach me this possibility.
For me probably the most important CFAR lesson was the noticing and “double-clicking” on intrusion. The one where Anna puts a glass of water on the edge of a table and/or writes expressions with the wrong number of parenthesises.
Do most people come away from a CFAR workshop listening less to their non verbal parts?
I’m not surprised if people listning less to their non verbal parts happens at all. But I would be surprised if that’s the general trend.On the surface Anna provides one datapoint, which is not much. But the fact that she brings up this datapoint, makes me suspect it’s representative? Is it?
I timed how long it took me to fill in the survey. It took 30 min. I could probably have done it in 15 min if I skipped the optional text questions. This is to be expected however. Every time I’ve seen someone someone guesses how long it will take to respond to their survey, it’s off by a factor of 2-5.
I disagree. In verbal space MARS and MATS are very distinct, and they look different enough to me.
However, if you want to complain, you should talk to the organisers, not one of the participants.
Here is their website: MARS — Cambridge AI Safety Hub
(I’m not involved in MARS in any way.)