I see someone’s been writing long horizon solvers.
Good engineering instincts.
They do behave like this in practice, if you have the patience to manage the testing surface, run the evals. It’s been my experience that this kind of thing is fairly straightforward to do on a single model, but truly painful to generalize across models. That’s been getting easier, though, as the models become richer reasoners, and the tooling becomes more uniform.
The framing as fiction makes it more engaging, I think.
It makes me think about the layers you would need to make something like the Terrarium work without drift.
The implication is a board or ticket system of some kind, but I think it would need to be custom.
The hardest piece to implement would be the complex virtual currency routing, as laid out in the story. IRL, you’re probably looking at multiple subsystems.
I see someone’s been writing long horizon solvers.
Good engineering instincts.
They do behave like this in practice, if you have the patience to manage the testing surface, run the evals. It’s been my experience that this kind of thing is fairly straightforward to do on a single model, but truly painful to generalize across models. That’s been getting easier, though, as the models become richer reasoners, and the tooling becomes more uniform.
The framing as fiction makes it more engaging, I think.
It makes me think about the layers you would need to make something like the Terrarium work without drift.
The implication is a board or ticket system of some kind, but I think it would need to be custom.
The hardest piece to implement would be the complex virtual currency routing, as laid out in the story. IRL, you’re probably looking at multiple subsystems.