I’d like to wait and see what various models say.
DanielFilan
AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
AXRP Episode 34 - AI Evaluations with Beth Barnes
[this comment is irrelevant to the point you actually care about and is just nit-picking about the analogy]
There is a pretty big divide between “liberal” and “conservative” Christianity that is in some ways bigger than the divide between different denominations. In the US, people who think of themselves as “Episcopalians” tend to be more liberal than people who call themselves “Baptists”. In the rest of this comment, I’m going to assume we’re talking about conservative Anglicans rather than Episcopalians (those terms referring to the same denominational family), and also about conservative Baptists, since they’re more likely to be up to stuff / doing meaningful advocacy, and more likely to care about denominational distinctions. That said, liberal Episcopalians and liberal Baptists are much more likely to get along, and also openly talk about how they’re in cooperation.
My guess is that conservative Anglicans and Baptists don’t spend much time at each other’s churches, at least during worship, given that they tend to have very different types of services and very different views about the point of worship (specifically about the role of the eucharist). Also there’s a decent chance they don’t allow each other to commune at their church (more likely on the Baptist end). Similarly, I don’t think they are going to have that much social overlap, altho I could be wrong here. There’s a good chance they read many of the same blogs tho.
In terms of policy advocacy, on the current margin they are going to mostly agree—common goals are going to be stuff like banning abortion, banning gay marriage, and ending the practice of gender transition. Anglican groups are going to be more comfortable with forms of state Christianity than Baptists are, altho this is lower-priority for both, I think. They are going to advocate for their preferred policies in part by denominational policy bodies, but also by joining common-cause advocacy organizations.
Both Anglican and Baptist churches are largely going to be funded by members, and their members are going to be disjoint. That said it’s possible that their policy bodies will share large donor bases.
They are also organized pretty differently internally: Anglicans have a very hierarchical structure, and Baptists having a very decentralized structure (each congregation is its own democratic policy, and is able to e.g. vote to remove the pastor and hire a new one)
Anyway: I’m pretty sympathetic to the claim of conservative Anglicans and Baptists being meaningfully distinct power bases, altho it would be misleading to not acknowledge that they’re both part of a broader conservative Christian ecosystem with shared media sources, fashions, etc.
Part of the reason this analogy didn’t vibe for me is that Anglicans and Baptists are about as dissimilar as Protestants can get. If it were Anglicans and Presbyterians or Baptists and Pentecostals that would make more sense, as those denominations are much more similar to each other.
Further updates:
On the one hand, Nate Silver’s model now gives Trump a ~30% chance of winning in Virginia, making my side of the bet look good again.
On the other hand, the Economist model gives Trump a 10% chance of winning Delaware and a 20% chance of winning Illinois, which suggests that there’s something going wrong with the model and that it was untrustworthy a month ago.
That said, betting markets currently think there’s only a one in four chance that Biden is the nominee, so this bet probably won’t resolve.
I will spend time reading posts and papers, improving coding skills as needed to run and interpret experiments, learning math as needed for writing up proofs, talking with concept-based interpretability researchers as well as other conceptual alignment researchers
I feel like this is missing the bit where you write proofs, run and interpret experiments, etc.
As a maximal goal, I might seek to test my theories about the detection of generalizable human values (like reciprocity and benevolence) by programming an alife simulation meant to test a toy-model version of agentic interaction and world-model agreement/interoperability through the fine-structure of the simulated agents.
Do you think you will be able to do this in the next 6 weeks? Might be worth scaling this down to “start a framework to test my theories” or something like that
the fine-structure of the simulated agents.
what does this mean?
alife
I think most people won’t know what this word means
I plan to stress-test and further flesh out the theory, with a minimal goal of producing a writeup presenting results I’ve found and examining whether the assumptions of the toy models of the original post hold up as a way of examining Natural Abstractions as an alignment plan.
I feel like this doesn’t give me quite enough of an idea of what you’d be doing—like, what does “stress-testing” involve? What parts need fleshing out?
Time-bounded: Are research activities and outputs time-bounded?
Does the proposal include a tentative timeline of planned activities (e.g., LTFF grant, scholar symposium talk, paper submission)?
How might the timeline change if planned research activities are unsuccessful?
This part is kind of missing—I’m seeing a big list of stuff you could do, but not an indication of how much of it you might reasonably expect to do in the next 5 weeks. A better approach here would be to give a list of things you could do in those 5 weeks together estimates for how much time each thing could take, possibly with a side section of “here are other things I could do depending on how stuff goes”
Theory of Change
I feel like this section is missing a sentence like “OK here’s the thing that would be the output of my project, and here’s how it would cause these good effects”
[probably I also put a timeline here of stuff I have done so far?]
This is valuable for you to do so that you can get a feel for what you can do in a week, but I’m not sure it’s actually that valuable to plop into the RP
Less prosaically, it’s not impossible that a stronger or more solidly grounded theory of semantics or of interoperable world-models might prove to be the “last missing piece” between us and AGI; that said, given that my research path primarily involves things like finding and constructing conceptual tools, writing mathematical proofs, and reasoning about bounds on accumulating errors—and not things like training new frontier models—I think the risk/dual-use-hazard of my proposed work is minimal.
I don’t really understand this argument. Why wouldn’t having a better theory of semantics and concepts help people build better AIs, but still do a good job of describing what’s going on in smart AIs? Like, you might think the more things you know about smart AIs, the easier it would be to build them—where does this argument break?
The thing you imply here is that it’s pretty different from stuff people currently do to train frontier models, but you already told me that scaling frontier models was really unlikely to lead to AGI, so why should that give me any comfort?
Not only would a better theory of semantics help researchers detect objects and features which are natural to the AI, it would also help them check whether a given AI treats some feature of its environment or class of object as a natural cluster, and help researchers agree within provable bounds on what concept precisely they are targeting.
This part isn’t so clear to me. Why can’t I just look at what features of the world an AI represents without a theory of semantics?
the next paragraph is kind of like that but making a sort of novel point so maybe they’re necessary? I’d try to focus them on saying things you haven’t yet said
(admittedly unlikely)
why do you think it’s unlikely?
On one hand, arbitrary agents—or at least a large class of agents, or at least (proto-)AGIs that humans make—might turn out to simply already naturally agree with us on the features we abstract from our surroundings; a better-grounded and better-developed theory of semantics would allow us to confirm this and become more optimistic about the feasibility of alignment.
On the other, such agents might prove in general to have inner ontologies totally unrelated to our own, or perhaps only somewhat different, but in enduring and hazardous ways; a better theory of semantics would warn us of this in advance and suggest other routes to AGI or perhaps drive a total halt to development.
I feel like these two paragraphs are just fleshing out the thing you said earlier and aren’t really needed
inner
is this word needed?
I’d now make this bet if you were down. Offer expires in 48 hours.