bpomo

Karma: 68

Interested in math, evals, macrostrategy and fun. Rookie.

bpomo 27 Jul 2026 10:43 UTC
1 point
0
in reply to: Thomas Kwa’s comment on: bpomo’s Shortform
1. Main thing is either a snappy HTML site or Lesswrong post or all of the above with the final (verifiable! specific!) attendee supplied desired outcomes ranked by something like cost effectiveness or estimated value. If the event is good enough that notable attendees/cG or something/word of mouth signal boost it a lot, then this could be quite valuable.
2. A deep dive into the list from 1., which looks like a longer version of 1. and includes some subset of:
  1. other options that didn’t make the cut for the snappy list
  2. evidence and arguments that convinced attendees
  3. summaries of attendee reasoning/BOTECs which led to their specific valuations
  4. vibesy outcomes which were highly valued but hard to specify
3. Improved relative valuations of various AI safety agendas in the minds of some attendees
4. Maybe prize money attached to some of 1. by some funders, though this is tricky
I have low confidence here. I’d give ~30% that conditional on running something like the above, the most impactful output that the people running it are aware of is substantially distinct from any of the above. And I haven’t thought much about how to optimize the above.

bpomo 27 Jul 2026 10:31 UTC
1 point
0
in reply to: Kaarel’s comment on: bpomo’s Shortform
Thanks! I think this gets at: specification/verifiability is hard, but even thinking about it a little is useful.

bpomo 27 Jul 2026 10:26 UTC
1 point
0
in reply to: sanyer’s comment on: bpomo’s Shortform
I know very little about it, but it seems like extremely little. From skimming the consensus “research priorities by area” list, it seems public facing and everything-bagel-y. They identify seven areas of research priority, and they are: cyber misuse, bio/chem, child safety, mental health/consumer protection, AI agents in the economy, open weight model safety and security, and (finally) loss of control and oversight.

Most of these things are not about reducing X risk at all, and I think it is clear that this group is thinking about different things than the people I’m envisioning, who are most worried about existential risk. Also, the leading proposals in the output are quite broad. I would want leading proposals from the output of the cause prio conference to be unusually specific.

The short answer is: sort of in vibes, not at all in practice, and the main issue is a waterline for participation that is too low in terms of seriousness about X-risk and openness/rationality.

bpomo 26 Jul 2026 18:30 UTC
32 points
12
on: bpomo’s Shortform
Should we have a very exclusive AIS cause-prio sort of conference which outputs something like Hilbert Problems?
I still think something like “Hilbert problems for AIS” or just a convention of GOAT level researchers where they do super intense cause prio is quite good.

The idea:
- Get a number of very excellent AIS researchers/thinkers/funders (people with excellent end to end threat models/theories of change with various strategies) in the same place for 1+ days (IDK how long)
- Have people working on a very diverse set of things (e.g. China policy, specific control protocols, grassroots organizing, singular learning theory, etc. etc. etc. )
- Have them present their best arguments at the start of the thing for why more people should work on the set of things they think people should work on
- Have lots of 1 on 1s with goal of finding cruxes or changing minds
- Instruct people to make outcomes they want as specific/verifiably worded as possible. E.g. “scalable oversight” is not ok, but “models which we have confidence in as defined by X monitoring transcripts from all of the following operations inside of labs: Y% of internal sandboxed evals, Z% of blah blah” is better. This example is bad, but hopefully points in the direction of what I mean.
- If an outcome isn’t actually verifiable other researchers poke at it and try to help make it more verifiable. Maybe have some incentive structure IDK
- Maybe have some auction format where attendees put how much they’d pay, in percent of 2026-2028 cG dollars or some other valuation, to get various outcomes.
- Maybe have forecasters estimate the expected price of those outcomes in total labor/resources later and then you have a list of most cost effective things to try from this aggregate view
Pros:
- In general you get a list of very specific things people should work towards, and now instead of researchers saying “I work on scalable oversight!” they can say “I do this thing which I think will help achieve Guntherson’s criteria in the following way” and are more likely to be doing helpful stuff
- Important researchers at this conference might update their worldview to work on better things
- In general the conference runners record discussions and gather evidence the researchers reference as much as they can and can then present great steelman cases for various things to work on that people can look at
- Maybe this let’s you post cash prizes for some of the verifiable outcomes.
Cons:
- Hard to structure well?
- Cost + opportunity cost
- Maybe you make it less likely that researchers go exploring and find the good ideas that they haven’t thought of yet or otherwise limit future creativity. This seems not likely to be a huge cost to me: people who currently think outside of the box in terms of what we are prioritizing probably wouldn’t stop because the list of things people currently care about is much better specified, ordered and presented.

bpomo’s Shortform

bpomo24 Jul 2026 9:58 UTC

2 points

7 comments1 min readLW link

Enough is Enough: Measuring Diminishing returns to benchmark size with Item Response Theory

bpomo14 Jul 2026 14:00 UTC

16 points

0 comments6 min readLW link

bpomo 11 May 2026 14:10 UTC
3 points
0
in reply to: ariana_azarbal’s comment on: Pulling on AI Safety (with money)
I am not very confident about the first question, but I’m heartened by the fact that other people have had ideas when I asked them. The most obvious thing is probably formal verification, either for a pre existing model or more likely for a new model which is built to verifiably have some nice properties. More practically, one could use trusted judges along with a less formal rubric (for problems where perfect specification is too hard), although this requires building trust over time.
The second point I’m less worried about at the moment; AI researchers are good at working on problems that have robust continuous metrics–e.g. the classic RL thing where they get clear signal from improvement, even if they don’t solve the whole thing. I think that they are worst at exactly the kind of challenges inducement prizes should attack: where the final outcome can be made quite explicit, but we don’t know what methods would get us there.

Pulling on AI Safety (with money)

bpomo11 May 2026 3:58 UTC

16 points

2 comments4 min readLW link

A Fast and Loose Clustering of LLM Benchmarks

bpomo10 Apr 2026 0:35 UTC

7 points

0 comments7 min readLW link

bpomo

bpomo’s Shortform

Enough is Enough: Mea­sur­ing Diminish­ing re­turns to bench­mark size with Item Re­sponse Theory

Pul­ling on AI Safety (with money)

A Fast and Loose Clus­ter­ing of LLM Benchmarks

Enough is Enough: Measuring Diminishing returns to benchmark size with Item Response Theory

Pulling on AI Safety (with money)

A Fast and Loose Clustering of LLM Benchmarks