Independent AI safety researcher
Alex Flint
Very very good question IMO. Thank you for this.
Consider a person who comes to a very clear understanding of the world, such that they are extremely capable of designing things, building things, fundraising, collaborating, and so on. Consider a moment where this person is just about to embark on a project but has not yet acquired any resources, perhaps has not even made any connections with anyone at all, yet is highly likely to succeed in their project when they do embark. Would you say this person has “resources”? If not, there is a kind of continuous trading-in that will take place as this person exchanges their understanding of things for resources, then later for final outcomes. Is there really a line between understanding-of-things, resources, and outcomes? The interesting part is the thing that gives this person power, and that seems to be their understanding-of-things.
I’m strongly disinclined to delve into the matter of consent in the sexual encounter, as it primarily pertains to (alleged) misconduct by Alex/Koshin (who I don’t really know), whereas the accusations of organizational malfeasance (e.g. a cover-up) pertain to all of MAPLE/OAK/CEDAR (where I do know several people, and which I’m just going to call MAPLE going forward).
Yeah thank you for this.
In particular, I’m noticing that Koshin described having been asked to write a letter with Shekinah, describing their relationship status and intentions, while Shekinah described having been pressured into signing a letter which Soryu had instructed Koshin to write.
Basically what happened is that Soryu asked us to write a letter together (on a phone call with me), then I told Shekinah that we had been asked to write a letter together, that she wasn’t obliged to, that we could write it as we saw fit, and she said okay. Then I wrote a draft and asked Shekinah what she thought, whether we should change anything etc etc, and she said no it’s fine, and then we signed it.
Well no I definitely did not rape Shekinah. I don’t think even she accuses me of that in her post.
It’s been quite a difficult few weeks at this end, which is why I haven’t replied more to your comment. I see the following points in your comment:
-
The paragraph that goes “So firstly I want to flag that this observation is consistent with the world you assert… But it’s also consistent with a different world, where those things are straightforwardly revealing of failures on the part of yourself and/or Monastic Academy” where you critique my non-linking to Shekinah’s medium post
-
The part where you critique my talking about “this darkness that lies at the heart of various rationalist orgs” in response to shminux’s post
-
The part that goes “I want to flag that shminux seemed to make several criticisms …” where you mention that I didn’t respond to all of shminux’s points
I believe I have responded to (1). Given that you’ve apparently decided that I’m definitely a rapist (“I now believe we’re in substantially this world. Alex raped Shekinah.”), are you interested in further dialog on (2) or (3), and are there any further points that I’ve missed?
-
No
I understand that there are ways this can work really well for people but jesus christ the failure modes on that are numerous and devastating.
I really agree with this. The reason spiritual communities can go more quickly and more disastrously off the rails is because they are aiming to tinker with the rules by which we live at a really fundamental level, whereas most organizations generally opt to work on top of a kind of cultural base operating system.
I would generally find it unwise to tinker at all with one’s operating system except that our cultural operating system seems so unable to address some really really huge and pressing problems including, seemingly to me, all of x-risk.
I think part of what the rationalist community has done well (that incidentally I think EA has done less well) is be willing to discard some of the cultural operating system we inherited, in a deliberate and goal-oriented way.
Yeah right. I actually spent quite a while considering this exact point (whether to link it) when writing the post. I was basically convinced that if I did link it, many people would jump straight to that link after reading the first ~paragraph of my post, then would return to read my post holding the huge number of triggering issues raised in Shekinah’s post, and ultimately I’d fail to convey the basic thing I wanted to convey. Then I considered “yes but maybe it’s still necessary to link it if my post won’t make any sense without reading that other post” but I decided that it wasn’t really a necessary prerequisite, so ultimately I didn’t link it.
In the dynamical systems example, it’s not just that it’s not a necessary prerequisite, but that if you go to the wikipedia page for dynamical systems and just start learning from scratch about dynamical systems with the intention to do it quickly and then return to the previous post, then you’ll end up kind of frustrated at the hugeness of the topic because it’s not really something you can learn in a short time, and then you’ll return to the post about optimization in a state of mind that is already bubbling with oodles of concepts that will make the simple point of the optimization essay not easy to digest. That’s my sense of it, and this is the way that this example is similar to the not-linking to Shekinah’s post.
Thanks for taking the time to write this comment philh.
So firstly I want to flag that this observation is consistent with the world you assert, where Shekinah’s writing and the associated commentary suggest things in a way that makes it hard to read them and maintain a grip on what is and isn’t asserted, what is and isn’t true, and similar things that it’s important to keep a grip on.
Yup this is a good paraphrase of what I meant.
In that world, declining to link those things is… well, I don’t love it; I prefer not to be protected from myself.
Yup. Well I try to write in a way that conveys a point as straightforwardly as possible, and I judged that linking to the medium post would hinder that goal. I may have been wrong about this but I wouldn’t say that I was trying to protect the reader from themselves (and I agree that trying to protect readers from themselves when writing on the internet is rarely helpful).
[Meta: I’m now going to try to compare this to some imperfectly analogous situations and I want to flag that using imperfect analogies in the context of accusations of sexual assault is kind of dangerous because the non-rhyming aspects of the analogies can appear kind of flippant or rude if taken to be rhyming aspects.]
Analogy: I wrote a while ago about optimization. The post had a lot of connections with dynamical systems. I didn’t link much or discuss much the connections with dynamical systems, beyond a general nod in the direction, because I judged that it didn’t help to illuminate the topic very much. By doing so I wouldn’t say that I was protecting the reader from themselves, but I was making a judgement about how to present the thing in a straightforward way. Now one might say that dynamical systems was the most important thing to link because the whole content of my own post was building on top of that foundation. But just as with my non-linking to Shekinah’s article, I mentioned the existence of the dynamical systems literature in my post, and anybody who wanted to look it up could easily find the relevant content via a google search. I had the sense that linking it explicitly would suggest that the reader ought to either understand the main concepts at the other end of the link or else not expect to understand my own post, neither of which was true w.r.t. dynamical systems in that post or w.r.t. Shekinah’s article in this post.
[Am intending to reply more to your further points. Thank you again for taking the time to go into this.]
Yeah, as Ruby said, this is a community that I care about and publish in, and is where Shekinah linked and discussed her own post. I also want to stand for the truth! I’ve been in this org (Maple) for a while and I think it has a lot to offer the world, and I think it’s been really badly mischaracterized, and I care about sharing what I know, and this community cares a lot about the truth in a way that is different to most of the rest of the world. IMO the comments on this post so far are impressively balanced, considerate, epistemically humble, and just generally intelligent. I can’t think where else to have such a reasonable discussion about something like this!
(Good question btw!)
Should you have, say, stepped down and distanced yourself from the organization the moment the “monastic agreement” was broken...
Well just so you know, I actually did step down right after the incident. It was a bit of a mess because I stepped down informally the day after I told the community what had happened, then we decided that this action was hasty and hadn’t given the board of directors time to make their own assessment, so we reversed it, then about a week later the board of directors agreed that I should step down and I did so. You can imagine how on edge everyone in the org was at a time like this. I certainly had no real power from the moment I first stepped down.
But why would it be good for me to distance myself from the community though? In general I have the sense that when you make a mistake like this you should stay and help and do your best to face the consequences, plus it was all so psychologically terrible that I really benefited at that time from structured spiritual practice.
the relevant reddit thread and the medium post (which you didn’t link, why?)
I didn’t link them because both they are both so emotionally charged and at the same time invoke all the maximally triggering stuff (gurus, sexual assault, cover-ups, and so on) that it’s very hard to read them and stay sane. I see in your comment that you start with quite an understanding tone and then by the end you’re talking about this darkness that lies at the heart of various rationalist orgs. The meme complexes in Shekinah’s post do give this very strong suggestion of a kind of sexual darkness in the hearts of various men, but it’s more like a very powerful subtext than something she really argues for.
Yeah thank you for the note.
Just so you know though it was actually a 7-day meditation retreat within a one-month stay at the monastic community (and for the non-retreat weeks of the program we spend time meeting with each other, using computers, going shopping for groceries and such, in addition to 1-2 hours sitting each morning and evening). It’s true that the residents did a long yaza on one night of the retreat but it wasn’t required, though yes still quite a lot for someone who hasn’t sat a retreat before.
It was an intense retreat, and it’s true that Shekinah didn’t have prior retreat experience. One of the things that we’ve changed since then is making it more difficult and explicit for folks to enter the more intense parts of the training. But it’s complicated… the most intense retreats in my experience are the ones that are sublime and simple and don’t involve any big experiences at all but just open you to something so simple that you can’t ever quite forget it. You never really know what that’s going to happen for someone, and in any “consent” process most everyone will say yes they want to do this, but the aftermath of such happiness can leave one’s carefully arranged life in disarray.
This is one reason I think it’s so valuable to live full-time in spiritual community while doing periodic retreat, but as we are learning with Shekinah and others, it’s difficult to know what to do when folks come and have quite powerful experiences and then decide later that it was all a big trick of some sort. The grief and sadness that arises in this case is just enormous. You feel as if you’ve been tricked into believing, just for a moment, that everything you ever dreamed of is completely feasible, only to have it ripped out from underneath you upon returning to the heaviness of day-to-day drudgery. Then people get mad. What to do? Sign a disclaimer? Require people to renounce their whole lives before starting the training? It’s so hard. We do have a way to do this now but it’s such a band-aid. Would love to write/discuss more on this.
Gracefully correcting uncalibrated shame
That the predictor is perfectly accurate. [...]
Consider, for instance, if the predictor makes an incorrect decision once in 2^10 times. (I am well aware that the predictor is deterministic; I mean the predictor deterministically making an incorrect decision here.)
Yeah it is not very reasonable that the predictor/reporter pair would be absolutely accurate. We likely need to weaken the conservativeness requirement as per this comment
That the iteration completes before the heat death of the universe.
Consider the last example, but with 500 actuators. How many predictor updates do we need to do? …2^499. How many operations can the entire universe have completed from the big bang to now? ~10^120. Or about 2^399. Oops.
Well yes the iteration scheme might take unreasonably long to complete. Still, the existence of the scheme suggests that it’s possible in principle to extrapolate far beyond what seems reasonable to us, which still implies the infeasibility of automated ontology identification.
That storing the decision boundary is possible within the universe.
Consider the last example again. The total statespace is 2^500 states. How many bits do we need to store an arbitrary subset of said states? 2^500. How many bits can the universe contain? ~10^90. Or about 2^299. Oops.
Indeed
That there exists any such predictor.
I didn’t follow your argument here, particular the part under “If P returns K=S:”
Yeah we certainly can’t do better than the optimal Bayes update, and you’re right that any scheme violating that law can’t work. Also, I share your intuition that “iteration can’t work”—that intuition is the main driver of this write-up.
As far as I’m concerned, the central issue is: what actually is the extent of the optimal Bayes update in concept extrapolation? Is it possible that a training set drawn from some limited regime might contain enough information to extrapolate the relevant concept to situations that humans don’t yet understand? The conservation of expected evidence isn’t really sufficient to settle that question, because the iteration might just be a series of computational steps towards a single Bayes update (we do not require that each individual step optimally utilize all available information).
Well just so you know, the point of the write-up is that iteration makes no sense. We are saying “hey suppose you have an automated ontology identifier with a safety guarantee and a generalization guarantee, then uh oh it looks like this really counter-intuitive iteration thing becomes possible”.
However it’s not quite as simple as to rule out iteration as appealing to conservation of expected evidence, because it’s not clear exactly how much evidence is in the training data. Perhaps there is enough information in the training data to extrapolate all the way to . In this case the iteration scheme would just be a series of computational steps that implement a single Bayes update. Yet for the reasons discussed under “implications” I don’t think this is reasonable.
Ah so I think what you’re saying is that for a given outcome, we can ask whether there is a goal we can give to the system such that it steers towards that outcome. Then, as a system becomes more powerful, the range of outcomes that it can steer towards expands. That seems very reasonable to me, though the question that strikes me as most interesting is: what can be said about the internal structure of physical objects that have power in this sense?
The space of cases to consider can be large in many dimensions. The countable limit of a sequence of extensions needs not be a fixed point of the magical improvement oracle.
Indeed. We may need to put a measure on the set of cases and make a generalization guarantee that refers to solving X% of remaining cases. That would be a much stronger generalization guarantee.
The style of counter-example is to construct two settings (“models” in the lingo of logic) A and B with same labeled easy set (and context made available to the classifier), where the correct answer for some datapoint x differs in both settings
I appreciate the suggestion but I think that line of argument would also conclude that statistical learning is impossible, no? When I give a classifier a set of labelled cat and dog images and ask it to classify which are cats and which are dogs, it’s always possible that I was really asking some question that was not exactly about cats versus dogs, but in practice it’s not like that.
Also, humans do communicate about concepts with one another, and they eventually “get it” with respect to each other’s concept boundaries, and it’s possible to see that someone “got it” and trust that they now have the same concept that I do. So it seems possible to learn concepts in a trustworthy way from very small datasets, though it’s not a very “black box” kind of phenomenon.
Presumably, the finite narrow dataset did teach me something about your values? [...] “out-of-distribution detection.”
Yeah right, I do actually think that “out of distribution detection” is what we want here. But it gets really subtle. Consider a model that learns that when answering “is the diamond in the vault?” it’s okay for the physical diamond to be in different physical positions and orientations in the vault. So even though it has not seen the diamond in every possible position and orientation within the training set, it’s still not “out of distribution” to see the diamond in a new position and answer the question confidently. And what if the diamond is somehow left/right mirror-imaged while it is in the vault? Well that also probably is fine for diamonds. But now what if instead of a diamond in the vault, we are learning to do some kind of robotic surgery, and the question we are asking is “is the patient healthy?”. Well in this case also we would hope that the machine learning system would learn that it’s okay for the patient to undergo (small) changes of physical position and orientation, so that much is not “out of distribution”, but in this situation we really would not want to move ahead with a plan that mirror-images our patient, because then the patient wouldn’t be able to eat any food that currently exists on Earth and would starve. So it seems like the “out of distribution” property we want is really “out of distribution with respect to our values”
Now you might say that mirror-imaging ought to be “out of distribution” in both cases, even though it would be harmless in the case of the diamond. That’s reasonable, but it’s not so easy to see how our reporter would learn that on its own. We could just outlaw any very sophisticated plan but then we’re losing competitiveness with systems that are more lax on safety.
it sure seems like they’re meeting both a Safety requirement (not generating non-faces) and a Generalization requirement (generating new faces that weren’t in the training dataset). What am I missing?
Well we might have a predictor that is a perfect statistical model of the thing it was trained to on, but the ontology identification issue is about what kind of safety critical questions can be answered based on the internal computations of such a model. So in the case of a GAN, we might try to answer “is this person lying?” based on a photo of their face, and we might hope that, having trained the GAN on the general-purpose face-generation problem, the latent variables within the GAN might contain features we need to do visual lie detection. Now even if the GAN does perfectly safe face generation, we need some additional work to get a safety guarantee on our reporter, and this is difficult because we want to do it based on a finite narrow dataset.
One further thought: suppose we trained a predictive model to answer the same question as the reporter itself, and suppose we stipulated only that the reporter ought to be as safe and general as the predictor is. Then we could just take the output of the predictor as the reporter’s output and we’d be done. Now what if we trained a predictive model to answer a question that was a kind of “immediate logical neighbor” of the reporter’s question, such as “is the diamond in the left half of the vault?” where the reporter’s question is “is the diamond in the vault?” Then we also should be able to specify and meet a safety guarantee phrased in terms of the relationship between the correctness of the reporter and the predictor. Interested in your thoughts on this.
👍
Well keep in mind that we are not proposing “iterated ontology identification” as a solution to the ELK problem, but rather as a reductio ad absurdum of the existence of any algorithm fulfilling the safety and generalization guarantees that we have given. Now here is why I don’t think it’s quite so easy to show a contradiction:
In the 99% safety guarantee, you can just train a bunch of separate predictor/reporter pairs on the same initial training data and take the intersection of their decision boundaries to get a 99.9% guarantee. Then you can sample more data from that region and do the iteration like that.
Now this assumes that each of the predictor/reporter pairs has an independent 99% safety guarantee, and you might say that they are trained on the same training data, so this independence won’t hold. But then we can use completely different sensor data—camera data, lidar data, microphone data—for each of the pairs, and proceed that way. We can still iterate the overall scheme.
The basic issue here is that it is just extremely challenging to get generalization with a safety guarantee. It is hard to see how it could be accomplished! We suspect that this is actually formally impossible, and that’s what we set out to show, though we came up short of a formal impossibility result.
- 22 Feb 2022 15:43 UTC; 2 points) 's comment on Implications of automated ontology identification by (
Yeah I am also very pessimistic about having the core argument about sexual assault on the public internet so I agree with not trying to resolve that part right here.
Got it! Sorry! I really thought you were directly critiquing my non-linking to Shekinah’s post. I think I read your comment in the midst of feeling wrongfully accused about stuff and didn’t read as carefully as I should have.
Ok so yeah I really agree about keeping in mind that there are other possible explanations, and the value of that for not over-weighting the first plausible explanation found.
It’s hard though. In this particular case you might point out an alternative explanation for my actions, and I might respond “yeah but I remember reasoning in such and such a way”. That could be introduction of new evidence, too.
Yet memories about intentions and mental states quickly become extremely fuzzy. Sometimes it’s better to go based on concrete actions taken.
I won’t expand on (2) or (3) for now then. Just noting this for readers who are evaluating my helpfulness/unhelpfulness on this thread (which I support readers doing btw!). Sorry it was such a long time between comments. I may not have come back at all if you hadn’t pointed out my long absence, so thank you for doing that.