Hannes Thurnherr

Karma: 37

Hannes Thurnherr 19 May 2026 20:40 UTC
1 point
0
in reply to: Joshua Batson’s comment on: Trying to use NLAs to find out how Qwen 2.5 7B does multiplication
Hey! Thanks for your input. It inspired some follow-up experiments:
As you say, the intermediate layers may operate with slightly different bases, which, of course, makes logitlens-based stuff less reliable. I trained a linear probe at each layer to see whether it could predict the final output digit. As you say, at layer 20, the accuracy of the probe is low (50.8 %). This does imply that the AV is doing some of the arithmetic that’s required to get to 90.5% by itself. But it doesn’t have to do much! The probe has the output digits among its top 3 guesses in 90% of cases at layer 20...

Hannes Thurnherr 12 Sep 2022 13:49 UTC
0 points
0
in reply to: ChristianKl’s comment on: Is training data going to be diluted by AI-generated content?
I haven’t thought of training the models by evaluating the selection of the image by the user. And thanks for correcting me on my Dalle 2 training data—claim.
What do you mean by “training a model to detect its own errors”? Maybe this is a naive question (i am an ML newcomer) but isn’t that impossible by definition? Why would a model make a mistake if it’s capable of identifying it as such? Do you mean that through continuous improvement the model could correct the mistakes it made in the past, after some time has passed?

The problem of dilution remains for GPTs in my view. Widespread use seems likely over the coming years and the resulting text is unlikely to be properly labeled as AI-generated. Thus it seems likely that the text produced by today’s models will get absorbed into the training data of future GPTs, which will cause them to at least partially attempt to emulate their predecessors. Am I making a mistake somewhere in this thought process?

Hannes Thurnherr 8 Sep 2022 12:49 UTC
0 points
0
in reply to: gwern’s comment on: Is training data going to be diluted by AI-generated content?
I tend to agree with you. But I am not sure that our way of distinguishing AI-generated from human-generated content will reach the perfection required for this to “work”. Assuming that the mechanism of distinguishing the two will remain imperfect at least a bit of a feedback loop will remain, which will slow down development.