Fwiw, I’m linking to it because I think it’s the first/clearest demo of how the entire ML research workflow (e.g. see figure 1 in the arxiv) can plausibly be automated using LM agents, and they show a proof of concept which arguably already does something (in any case, it works better than I’d have expected it to). If you know of a better reference, I’d be happy to point to that instead/alternately. Similarly if you can ‘debunk it’ (I don’t think it’s been anywhere near debunked).
I thought you meant the AI scientist paper has some obvious (e.g. methodological or code) flaws or errors. I find that thread unconvincing, but we’ve been over this.
It doesn’t demonstrate automation of the entire workflow—you have to, for instance, tell it which topic to think of ideas about and seed it with examples—and also, the automated reviewer rejected the autogenerated papers. (Which, considering how sycophantic they tend to be, really reflects very negatively on paper quality, IMO.)
I genuinely don’t know what you’re referring to.
Fwiw, I’m linking to it because I think it’s the first/clearest demo of how the entire ML research workflow (e.g. see figure 1 in the arxiv) can plausibly be automated using LM agents, and they show a proof of concept which arguably already does something (in any case, it works better than I’d have expected it to). If you know of a better reference, I’d be happy to point to that instead/alternately. Similarly if you can ‘debunk it’ (I don’t think it’s been anywhere near debunked).
We had this conversation two weeks ago?
https://www.lesswrong.com/posts/rQDCQxuCRrrN4ujAe/jeremy-gillen-s-shortform?commentId=TXePXoEosJmAbMZSk
I thought you meant the AI scientist paper has some obvious (e.g. methodological or code) flaws or errors. I find that thread unconvincing, but we’ve been over this.
It doesn’t demonstrate automation of the entire workflow—you have to, for instance, tell it which topic to think of ideas about and seed it with examples—and also, the automated reviewer rejected the autogenerated papers. (Which, considering how sycophantic they tend to be, really reflects very negatively on paper quality, IMO.)