Adam Karvonen comments on The Thinking Machines Tinker API is good news for AI control and security

Adam Karvonen 9 Oct 2025 19:01 UTC
1 point
2
AF
I’m guessing most modern interp work should be fine. Interp has moved away from “let’s do this complicated patching of attention head patterns between prompts” to basically only interacting with residual stream activations. You can easily do this with e.g. pytorch hooks, even in modern inference engines like vLLM. The amount of computation performed in a hook is usually trivial—I never have noticed a slowdown in my vLLM generations when using hooks.
Because of this, I don’t think batched execution would be a problem—you’d probably want some validation in the hook so it can only interact with activations from the user’s prompt.
There’s also nnsight, which already supports remote execution of pytorch hooks on models hosted on Bau Lab machines through an API. I think they do some validation to ensure users can’t do anything malicious.
You would need some process to handle the activation data, because it’s large. If I’m training a probe on 1M activations, with d_model = 10k and bfloat16, then this is 20GB of data. SAEs are commonly trained on 500M + activations. We probably don’t want the user to have access to this locally, but they probably want to do some analysis on it.
- Buck 9 Oct 2025 19:18 UTC
  LW: 2 AF: 2
  0
  AF Parent
  Yeah, what I’m saying is that even if the computation performed in a hook is trivial, it sucks if that computation has to happen on a different computer than the one doing inference.
  - Adam Karvonen 9 Oct 2025 19:31 UTC
    1 point
    0
    AF Parent
    In nnsight hooks are submitted via an API to run on a remote machine, and the computation is performed on the same computer as the one doing the inference. They do some validation to ensure that it’s only legit Pytorch stuff, so it isn’t just arbitrary code execution.
    - Buck 9 Oct 2025 20:36 UTC
      LW: 2 AF: 2
      0
      AF Parent
      Yeah for sure. A really nice thing about the Tinker API is that it doesn’t allow users to specify arbitrary code to be executed on the machine with weights, which makes security much easier.
      - Adam Karvonen 9 Oct 2025 23:26 UTC
        1 point
        0
        Parent
        Yeah, makes sense.
        
        Letting users submit hooks could potentially be workable from a security angle. For the most part, there’s only a small number of very simple operations that are necessary for interacting with activations. nnsight transforms the submitted hooks into an intervention graph before running it on the remote server, and the nnsight engineers that I’ve talked to thought that there wasn’t much risk of malicious code execution due to the simplicity of the operations that they allow.
        
        However, this is still a far larger attack surface than no remote code execution at all, so it’s plausible this would not be worth it for security reasons.