David Matolcsi comments on David Matolcsi’s Shortform

David Matolcsi 17 May 2025 7:44 UTC
17 points
1
That’s not how I see it. I think the argument tree doesn’t go very deep until I lose the the thread. Here are a few, slightly stylized but real, conversations I had with friends who had no context on what ARC was doing, when I tried to explain our research to them:
Me: We want to to do Low Probability Estimation.
Them: Does this mean you want to estimate the probability that ChatGPT says a specific word after a 100 words on chain of thought? Isn’t this clearly impossible?
Me: No, you see, we only want to estimate the probabilities only as well as the model knows.
Them: What does this mean?
Me: [I can’t answer this question.]
Me: We want to do Mechanistic Anomaly Detection.
Them: Isn’t this clearly impossible? Won’t this result in a lot of false positives when anything out of distribution happens?
Me: Yes, why we have this new clever idea of relying on the fragility of sensor tampering, that if you delete a subset of the actions, you will get an inconsistent image.
Them: What if the AI builds another robot to tamper with the cameras?
Me: We actually don’t want to delete actions but heuristic arguments for why the cameras will show something, and we want to construct heuristic explanations in a way that they carry over through delegated actions.
Them: What does this mean?
Me; [I can’t answer this question.]
Me: We want to create Heuristic Arguments to explain everything the model does.
Them: What does it mean that an argument explained a behavior? What is even the type signature of heuristic arguments? And you want to explain everything a model does? Isn’t this clearly impossible?
Me: [I can’t answer this question.]
When I was explaining our research to outsiders (which I usually tried to avoid out of cowardice), we usually got to some of these points within minutes. So I wouldn’t say these are fine details of our agenda.
During my time at ARC, the majority of my time was spent on asking variations of these three questions from Mark and Paul. They always kindly answered, and the answer was convincing-sounding enough for the moment that I usually couldn’t really reply on the spot, and then I went back to my room to think through their answers. But I never actually understood their answers, and I can’t reproduce them now. Really, I think that was the majority of work I did at ARC. When I left, you guys should have bought a rock with “Isn’t this clearly impossible?” written on it, and that would profitably replace my presence.
That’s why I’m saying that either ARC’s agenda is fundamentally unsound or I’m still missing some of the basics. What is standing between ARC’s agenda collapsing from five minutes of questioning from an outsider is that Paul and Mark (and maybe others in the team) have some convincing-sounding answers to the three questions above. So I would say that these answers are really part of the basics, and I never understood them.
Maybe Mark will show up in the comments now to give answers to the three questions, and I expect the answers to sound kind of convincing, and I won’t have a very convincing counter-argument other than some rambling reply saying essentially that “I think this argument is missing the point and doesn’t actually answer the question, but I can’t really point out why, because I don’t actually understand the argument because I don’t understand how you imagine heuristic arguments”. (This is what happened in the comments on my other post, and thanks to Mark for the reply and I’m sorry for still not understanding it.) I can’t distinguish whether I’m just bad at understanding some sound arguments here, or the arguments are elaborate self-delusions of people who are smarter and better at arguments than me. In any case, I feel epistemic learned helplessness on some of these most basic questions in ARC’s agenda.
- fencebuilder 18 May 2025 18:35 UTC
  2 points
  −2
  Parent
  What is your opinion on the Low Probability Estimation paper published this year at ICLR?
  I don’t have a background in the field, but it seems like they were able to get some results, that indicate the approach is able to extract some results. https://arxiv.org/pdf/2410.13211
  - David Matolcsi 18 May 2025 20:26 UTC
    6 points
    2
    Parent
    It’s a nice paper, and I’m glad they did the research, but importantly, the paper reports a negative result about our agenda. The main result is that the method inspired by our ideas under-performs the baseline. Of course, these are just the first experiments, work is ongoing, this is not conclusive negative evidence for anything. But the paper certainly shouldn’t be counted as positive evidence for ARC’s ideas.
    - fencebuilder 20 May 2025 19:12 UTC
      1 point
      0
      Parent
      Thanks for the clarification! Not in the field and wasn’t sure I understood the meaning of the results correctsly.