Towards_Keeperhood comments on Anti-Slop Interventions?

Towards_Keeperhood 4 Feb 2025 21:22 UTC
LW: 1 AF: 1
0
AF
Thanks.
True, I think your characterization of tiling agents is better. But my impression was sorta that this self-trust is an important precursor for the dynamic self-modification case where alignment properties need to be preserved through the self-modification. Yeah I guess calling this AI solving alignment is sorta confused, though maybe there’s sth into this direction because the AI still does the search to try to preserve the alignment properties?
Hm I mean yeah if the current bottleneck is math instead of conceptualizing what math has to be done then it’s a bit more plausible. Like I think it ought to be feasible to get AIs that are extremely good at proving theorems and maybe also formalizing conjectures. Though I’d be a lot more pessimistic about finding good formal representations for describing/modelling ideas.
Do you think we are basically only bottlenecked on math so sufficient math skill could carry us to aligned AI, or only have some alignment philosophy overhang you want to formalize but then more philosophy will be needed?
- abramdemski 4 Feb 2025 21:53 UTC
  LW: 3 AF: 3
  1
  AF Parent
  I think there is both important math work and important conceptual work. Proving new theorems involves coming up with new concepts, but also, formalizing the concepts and finding the right proofs. The analogy to robots handling the literal heavy lifting part of a job seems apt.