Feedback from tests
We can give the agent access to all of the tests or at least some of the tests either in a blackbox way (“x tests failed, please fix them”) or directly (“We tested behavior x and it failed with this error message”). This is more similar to how software engineers typically work and also how the Claude C compiler was built.
The risk of that is that the LLM can then trivially get 100% by hardcoding the response for each test case, rather than creating a generic solution.
The risk of that is that the LLM can then trivially get 100% by hardcoding the response for each test case, rather than creating a generic solution.