For what it’s worth I do something pretty similar to this using Claude Code. I can just point it at the tests and say “hey, make the code make this pass” and it will come up with something, although the something isn’t always what I want if the tests are underspecified!
That is very important, in fact Unvibe passes the source code of tests to the LLM. But in the latent space of the LLM there are many possibile implementations that the LLM doesn’t know how to distinguish. Having the unit tests score as ground truth helps a lot
For what it’s worth I do something pretty similar to this using Claude Code. I can just point it at the tests and say “hey, make the code make this pass” and it will come up with something, although the something isn’t always what I want if the tests are underspecified!
That is very important, in fact Unvibe passes the source code of tests to the LLM. But in the latent space of the LLM there are many possibile implementations that the LLM doesn’t know how to distinguish. Having the unit tests score as ground truth helps a lot