Gordon Seidoh Worley(G Gordon Worley III)

Karma: 8,766

I’m writing a book about epistemology. It’s about The Problem of the Criterion, why it’s important, and what it has to tell us about how we approach knowing the truth.

I’ve also written a lot about AI safety. Some of the more interesting stuff can be found at the site of my currently-dormant AI safety org, PAISRI.

Gordon Seidoh Worley 14 Jul 2024 22:26 UTC
3 points
0
on: A (paraconsistent) logic to deal with inconsistent preferences
Maybe I’m missing something, but this theory seems to leave out considerations of what’s usually the most important aspect of preference models, which is what things are preferred to what. Considering only X > ~X leaves out the many obvious cases of X > Y that we’d like to model.
The usual problem is that we are not time and context insensitive the way simple models are, such that we might feel X > Y under conditions Z, but Y > X under conditions W, and that this is sufficient to explain our seemingly inconsistent preferences because they only look inconsistent on the assumption that we should have the same preferences at all times and under all circumstances. The inclusion of context, such as by adding a time variable to all preference relations, is probably sufficient to rescue the standard preference model: our preferences are consistent at each moment in time, but are not necessarily consistent across different moments because the conditions of each moment are different and thus change what we prefer.

Gordon Seidoh Worley 14 Jul 2024 17:10 UTC
3 points
0
on: Trust as a bottleneck to growing teams quickly
Some thoughts on trust and org growth from a different context: the Zen center. (Note that if you’ve regularly attended any kind of church this will feel very familiar, just with different flavors.)
In Zen, we all come to learn a lot about trust. It starts with learning to trust our experiences. We often don’t trust them at first because we’re identified with the idea of what we should be experiencing rather than what we’re actually experiencing, but over time we settle down and stop trying to fight reality so that we can learn to dance with it.
But the reality of a Zen sangha is that it’s not just about individual practice, but about creating something together with the varied people that show up.
First, you’ve got the drop-ins & newcomers who aren’t members or even regular attendees. They don’t know the forms and customs, so they require instruction, but also leeway, because we want them to like the Zen center enough to come back and learn the forms! These are folks who can’t really be trusted, other than you can trust that they’ll do several norm-violating things while in the center. Most places deal with this by having one or more of the senior students assigned to help newcomers get oriented.
Next, you’ve got the less-committed regulars. People who show up all the time, definitely know what they are supposed to do, but also don’t take things too seriously. Most of them aren’t trained to preform rituals or hold complex positions, but they are reliably able to do simpler things, like follow the basic forms, know where the brooms are the sweep the porch, and even be able to instruct newcomers in some simple tasks like finding the bathroom or how to dust the moulding.
Among the regulars, some are committed and serious. This includes all the senior students, but also most of the junior students and occasionally people who aren’t very dedicated to practice but are dedicated to showing up. These are the people who get trusted to take on formal positions, like ringing bells, caring for the alter, leading chants, and instructing newcomers. They can all be trusted to follow the forms, know what they are supposed to do, and may be asked to correct others. This is not to say they never make mistakes, but they are all known quantities. If someone has no rhythm or can’t sing, you don’t ask them to play the mokugyo or to lead chants, but maybe you do train them in alter care or some other tasks they are better suited for.
And then there’s the teacher (and sometimes separately an abbot), who sits as the source of trust in the running of the center. Trust is extended from the teacher to the senior students and then on down in a hierarchy (and the hierarchy in the running of a center is explicit and expected to be upheld). So the teacher, say, trusts a student to run work period, and that student then extends trust to each person within the task they’ve been asked to do. If they can’t trust a person to do what they want, they have to train them, and the teacher trusts the student will oversee their training or successfully delegate the training.
In reverse, the students extend trust to the teacher. This is different sort of trust because it’s not about following norms so much as it is about students trusting that the teacher will use their position of authority ethically and for the benefit of their students. The teacher is there to be an authority on the dharma and to help students learn it and ultimately to help them wake up. If the students lose trust in their teacher, they’ll wander off to another teacher or maybe leave Zen all together.
To me the analogies with a business are obvious. Bosses, managers, and supervisors extend trust to their subordinates, and doing so requires an understanding of what they can trust (expect with high confidence) each person to do. In reverse, subordinates must trust their leaders to have their best interests and the interests of the company in mind, and people often quit when they lose trust in their boss.
As a final note, I think of trust as one of the key building blocks of civilized life, and the more trust we can extend to each other, the more civilized life becomes. High trust requires that everyone actually do what they are trusted to do, and this applies not just on the scale of Zen centers and businesses, but also to countries and even the whole planet.

Gordon Seidoh Worley 14 Jul 2024 16:30 UTC
3 points
0
on: Trust as a bottleneck to growing teams quickly
The way I’ve heard this advice phrase is that growth is bottlenecked by the number of people you can fully trust. Trust zero people? You’re gonna have to manage every detail yourself. Trust one person? Now you and they can manage every detail within your scope. And so on.
Partial trust is not as good, because you’ll still have to check up on some thing, and you often need a lot of context to check up on even a few details of another person’s work. The difference between 95% trust and 100% trust is huge.

Gordon Seidoh Worley 5 Jul 2024 15:36 UTC
LW: 2 AF: 1
0
AF
in reply to: Chris_Leong’s comment on: Finding the Wisdom to Build Safe AI
Seems reasonable. I do still worry quite a bit about Goodharting, but perhaps this could be reasonably addressed with careful oversight by some wise humans to do the wisdom equivalent of red teaming.

Gordon Seidoh Worley 5 Jul 2024 15:34 UTC
2 points
0
in reply to: alex.herwix’s comment on: Finding the Wisdom to Build Safe AI
This is a place where my Zen bias is showing through. When I wrote this I was implicitly thinking about the way we have a system of dharma transmission that, at least as we practice Zen in the west, also grants teaching authorization, so my assumption was that if we feel confident certifying an AI as wise, this would imply also believing it to be wise and skilled enough to teach what it knows. But you’re right, these two aspects, wisdom and teaching skill, can be separated, and in fact in Japan this is the case: dharma transmission generally comes years before teaching certification is granted, and many more people receive transmission than are granted the right to teach.

Gordon Seidoh Worley 5 Jul 2024 15:28 UTC
4 points
0
in reply to: Anders Lindström’s comment on: Finding the Wisdom to Build Safe AI
I’m not sure what this comment is replying to. I don’t think it’s likely that AI will be very human-like, nor do I have special reason to advocate for human-like AI designs. I do note that some aspects of training wise AI may be easier if AI were more like humans, but that’s contingent on what I consider to be the unlikely possibility of human-like AI.

Finding the Wisdom to Build Safe AI

Gordon Seidoh Worley4 Jul 2024 19:04 UTC

34 points

10 comments9 min readLW link

Gordon Seidoh Worley 4 Jul 2024 18:25 UTC
5 points
0
on: How predictive processing solved my wrist pain
Doing some quick searching into PRT, I’m mostly turning up stuff that’s trying to sell me PRT. Is there somewhere I can access some details about, say, specific treatments/exercises it recommends?

Gordon Seidoh Worley 2 Jul 2024 19:11 UTC
7 points
8
on: Why haven’t there been assassination attempts against high profile AI accelerationists like sam altman yet?
The same reason there are not constant assassination attempts in general: it’s outside the Overton Window of acceptable actions. One of the many benefits of civilization is that we’ve all agreed not to kill each other, even when it seems strategically beneficial in the short-term, because that’s what’s necessary to create the world we’d like to live in long-term. Defection against this norm is harshly punished to maintain the norm.

Gordon Seidoh Worley 19 Jun 2024 17:35 UTC
2 points
0
in reply to: milanrosko’s comment on: Catching a Cat by the Tail
What is an approach but a policy about what ideas are worth exploring? However you frame it, we could work on this or not in favor of something else. Having ideas is nice, but they only matter if put into action, and since we have limited resources for putting ideas into action, we must implicitly also consider whether or not an idea is worth investing effort into.
My original comment is to say that you didn’t convince me this is a good idea to explore, which seems potentially useful for you to know, since I expect many readers will feel the same way and bounce off your idea because they don’t see why they should care about it.
I think you can easily address this by spending time making a case for why such an approach might be useful at all and then also relatively useful compared to alternatives, and I think this is especially important given the tradeoffs your proposal suggests we make (sacrificing people’s lives in the name of learning what we need to know to build safe AI).

Gordon Seidoh Worley 18 Jun 2024 17:00 UTC
1 point
0
in reply to: milanrosko’s comment on: Catching a Cat by the Tail
For one, advocating policies to pause or slow down capabilities development until we have sufficient theoretical understanding to not need to risk misaligned AI causing harm.

Gordon Seidoh Worley 18 Jun 2024 2:43 UTC
1 point
0
on: Catching a Cat by the Tail
We should intentionally enhance the AGI’s instrumental possibilites while it has weak capabilities in order to provoke a malignant convergence.
I’m not convinced. What’s your reasoning for why we enact this policy over other alternatives that are less likely to result in people dying?
That is, I’m holding aside for the moment whether or not I think you’re right that this could be a workable part of a path to AI safety, and instead asking what tradeoffs would lead us to choose your proposed policy over other others.

Gordon Seidoh Worley 17 Jun 2024 17:55 UTC
4 points
0
on: “Victory is Instrumental”
Victory is instrumental.
Something seems off to me about this view, and by off I mean it sounds like nonsense to me.
What’s pinging for me is that I take “victory” to mean successfully getting what you want, and I would take this to not just be shallowly getting what you want, but deeply getting it by getting it in the way you want to get it and without any undesirably side effects. Thus there’s no real sense to me in which it makes sense to say that victory is instrumental because it’s the ideal state of getting your desires met.
My read is that you’re trying to say something like victory only matters if it’s achieved by endorsed means, but that doesn’t mean victory is instrumental, only that it’s incomplete if achieved the wrong way.

Gordon Seidoh Worley 14 Jun 2024 20:32 UTC
4 points
0
on: Shard Theory—is it true for humans?
Looks like this was double posted: https://www.lesswrong.com/posts/MtnASqccEZ6zYTqi6/shard-theory-is-it-true-for-humans

Gordon Seidoh Worley 9 Jun 2024 0:59 UTC
8 points
0
on: Sev, Sevteen, Sevty, Sevth
For whatever it’s worth, this problem seems to have its roots in Proto-Indo European, with only a few languages managing to shorten it from the original two-syllable word to one, although even in those cases it looks to me like the words still end up with two moras even if they do manage to fit in a single syllable.

Gordon Seidoh Worley 9 Jun 2024 0:48 UTC
2 points
0
in reply to: cousin_it’s comment on: Book review: The Quincunx
On the other hand it was fun getting to hear an older version of the post and see what changed!

Gordon Seidoh Worley 7 Jun 2024 19:17 UTC
2 points
0
on: Book review: The Quincunx
Tangent to this post, but I read it by listening to the narration, and there are substantial differences between what the narration says is the text of this post and what text actually appears on the screen. I’ve noticed a smaller version of this with other posts in the past, but this time it seemed especially notable.

Gordon Seidoh Worley 5 Jun 2024 16:18 UTC
4 points
0
in reply to: Nisan’s comment on: How was Less Online for you?
Clutter is fine in the sense that everything is fine, but clutter also creates noise that I have to process (unless I close my eyes), and that eats up some brain capacity.

Gordon Seidoh Worley 4 Jun 2024 17:15 UTC
4 points
2
on: Clarifying METR’s Auditing Role [linkpost]
FYI I don’t think you need to make link posts to Alignment Forum posts on Less Wrong because they show up in the feed anyway.

Gordon Seidoh Worley 3 Jun 2024 17:25 UTC
8 points
0
on: How was Less Online for you?
A few thoughts I had about the weekend. May think of more later:
- I really missed hanging out with rationalists in person. Since I moved from Berkeley to San Francisco I don’t spend as much time in person with rationalists.
  - One of the big things I missed is related to communication style. I feel extremely free to say whatever I’m thinking with rationalists, thanks to high decoupling norms that are much more robust than they are anywhere else. In almost all other conversations I have to at least think about censoring myself, even if I don’t in practice always have to. With rationalists I can turn off the censoring check.
  - I also missed having deep, in-person intellectual conversations with the subset of the community who are into the same stuff as I am. It’s a really great experience to have high-bandwidth conversations with people whom I have short inferential gaps with.
- I learned about some lines of AI safety work that I’m really excited about and give me some hope.
- Food was tasty and plentiful.
- I enjoyed having a number of conversations with folks who resonated with my experience being a religious rationalist. It gives me the sense that I have developed something of a “unique perspective” that lets me bring missing value to conversations.
  - I know there were even more people who wanted to talk to me about spirituality and Zen than I got to. If you were one of those people whom I failed to meet up with, please DM me and we’ll find some time to chat!
- I was kind of put off by how disorganized and messy things were around the venue. Partly this is just because thing get messy when there’s lots of people, but one thing I like about EA Global by comparison is that the aesthetics are often better in little ways, like chairs are consistently arranged neatly and common spaces stay cleaner. I find it’s harder to think clearly when I’m surrounded by mess.
- I wish I had arranged my schedule to take today off to allow for some decompression from the weekend rather than jumping straight back into work!

Gordon Seidoh Worley(G Gordon Worley III)

Find­ing the Wis­dom to Build Safe AI

Finding the Wisdom to Build Safe AI