ai safety; physics
skunnavakkam
Compute Verification on Short Timelines
Takes on Automating Alignment
Simulated Qualia Mugging
I think my issue with the LW wiki is that it relies too much on Lesswrong? It seems like the expectation is you click on a tag, which then contains / is assigned to a number of LW posts, and then you read through the posts. This is not like how other wikis / encyclopedias work!
My gold standard for a technical wiki (other than wikipedia) is the chessprogramming wiki https://www.chessprogramming.org/Main_Page
I agree with this
i’ve found that the lw wiki doesn’t work as a wikipedia-like resource, at least for me
skunnavakkam’s Shortform
How useful is a wiki for alignment? There doesn’t seem to be one now.
norm \in \mathbf{R}, doesn’t matter
I’ve found the part about applying random search to be the among the best takeaways I had from PAIR! Novelty for the sake of Novelty is not a terrible idea. Specifically, I’ve found that even if you don’t like the things you do, it makes it much easier to then make progress towards the larger goal
Empirically, AI progress seems to have been very continuous though rapid, and the gaps between model releases are smaller due to model releases happening more frequently. I don’t discount the possibility of model capabilities becoming very discontinuous in a few years, but it seems unlikely to me. This makes me pretty happy.
Also, it seems like doing alignment without a measurable goal is a difficult task for humans too. We might be a little better at estimating intermediate rewards but I think measuring alignment is necessary on both fronts