I thought about this sort of thing (adversarial robust augmentation) and decided it would be very hard to do it safely with something smarter than you.
However, there may in fact be a window where LLMs are good at math but not agency and they can be used to massively accelerate agent foundations research.
agent foundations research is what I’m talking about, yup. what do you ask the AI to make significant progress on agent foundations and be sure you did so correctly? are there questions we can ask where, even if we don’t know the entire theorem we want to ask for a proof of, we can show there aren’t many ways to fill in the whole theorem that could be of interest, so that we could, eg, ask an AI to enumerate what theorems could have a combination of agency-relevant properties? something like that. I’ve been procrastinating on making a whole post pitching this because I myself am not sure the idea has merit, but maybe there’s something to be done here, and if there is it seems like it could be a huge deal. it might be possible to ask for significantly more complicated math to be solved, so maybe if you can frame it as something where you’re looking for plausible compressions, or simplifications or generalizations of an expression, or something.
I thought about this sort of thing (adversarial robust augmentation) and decided it would be very hard to do it safely with something smarter than you.
However, there may in fact be a window where LLMs are good at math but not agency and they can be used to massively accelerate agent foundations research.
agent foundations research is what I’m talking about, yup. what do you ask the AI to make significant progress on agent foundations and be sure you did so correctly? are there questions we can ask where, even if we don’t know the entire theorem we want to ask for a proof of, we can show there aren’t many ways to fill in the whole theorem that could be of interest, so that we could, eg, ask an AI to enumerate what theorems could have a combination of agency-relevant properties? something like that. I’ve been procrastinating on making a whole post pitching this because I myself am not sure the idea has merit, but maybe there’s something to be done here, and if there is it seems like it could be a huge deal. it might be possible to ask for significantly more complicated math to be solved, so maybe if you can frame it as something where you’re looking for plausible compressions, or simplifications or generalizations of an expression, or something.