Curt Tigges

Karma: 139

Curt Tigges 15 Apr 2025 21:47 UTC
7 points
0
on: To be legible, evidence of misalignment probably has to be behavioral
I’m not sure I entirely agree with the overall recommendation for researchers working on internals-based techniques. I do agree that findings will need to be behavioral initially in order to be legible and something that decision-makers find worth acting on.
My expectation is that internals-based techniques (including mech interp) and techniques that detect specific highly legible behaviors will ultimately converge. That is:
1. Internals/mech interp researchers will, as they have been so far at least in model organisms, find examples of concerning cognition that will be largely ignored or not acted on fully
2. Eventually, legible examples of misbehavior will be found, resulting in action or increased scrutiny
3. This scrutiny will then propagate backwards to finding causes or indicators of that misbehavior, and provided interp tools are indeed predictive, this path that has been developed in parallel will suddenly be much more worth paying attention to
Thus, I think it’s worth progressing on these internals-based techniques even if their use isn’t immediately apparent. When legible misbehaviors arrive, I expect internals-based detection or analysis to be more directly applicable.

Curt Tigges 28 Jan 2025 21:52 UTC
3 points
0
on: Those of you with lots of meditation experience: How did it influence your understanding of philosophy of mind and topics such as qualia?
I have perhaps 1000-1500 hours of meditation experience and have done a decent amount of psychedelics as well. I don’t think meditation has given me any understanding of the hard problem of consciousness. Meditation has helped me to see different possibilities in terms of content, shape, and phenomena within the conscious space, and perhaps helped me to understand the shape of it better, but I don’t really see it helping much to bridge the scientific/philosophical gap. Best I can say is that “yeah, it sort of feels like what Epistemic Depth Theory and probably Global Neuronal Workspace Theory would suggest.”
To actually get what you’re looking for, I think you’d need to do more studies on people who are experiencing different mental states, including those found in meditation, while using scientific instruments to probe the mind (fMRI, or BCIs—ideally much better ones than those that now exist). I think you’d need to do causal experiments specifically, not just correlational ones.
For that, you need those improved scientific instruments as well as people who are trained to interospect and report very fine-grained details of their experiences.
FWIW, I’m confused by the difference between Camp 1 and Camp 2. The crux seems to be the definition of “special.” My own views on consciousness are close to physicalism (which might be Camp 2?), but I do think solving the Meta-Problem of Consciousness to sufficient depth has a good chance of leading us to those physical correlates or generators of consciousness.

Curt Tigges 14 Jan 2025 22:51 UTC
4 points
0
on: How do you deal w/ Super Stimuli?
I use Freedom and Limit on my computer and Stay Focused on my Android phone. The former two allow for a combination of complete blocking during certain time windows and time limits (for any website, even across browsers and even if you open an incognito window). The latter does both for my phone.

I block all social media and content during prime working hours and implement a 30-minute limit outside of that. It works pretty well. I may make it more strict because I sometimes find myself looking at Twitter, etc. occasionally when watching a TV show in the evenings.

I also use BlockTube to get rid of YouTube Shorts entirely from my web browser. They no longer show up in search results or in the menu.

Finally, I recommend the tools here, though I haven’t tried all of them: https://liamrosen.com/2023/04/18/modding-social-media-to-win-the-attention-war/

Curt Tigges 12 Dec 2024 19:28 UTC
6 points
1
on: Something Is Lost When AI Makes Art
I find this argument quite compelling, and this is also why I find the idea of “AI girl/boyfriends” largely uninteresting. Without actual connection to another mind (that has experiences and phenomenal consciousness), any of these things—art, deep conversations about thoughts/feelings, what have you—eventually falls flat. (That includes one-way connection through art).

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders

Can, Adam Karvonen, Johnny Lin, Curt Tigges, Joseph Bloom, chanind, Yeu-Tong Lau, Eoin Farrell, Arthur Conmy, CallumMcDougall, Kola Ayonrinde, Matthew Wearden, Sam Marks and Neel Nanda

11 Dec 2024 6:30 UTC

82 points

6 comments2 min readLW link

(www.neuronpedia.org)

Curt Tigges 10 Dec 2024 18:28 UTC
4 points
4
on: Zen and The Art of Semiconductor Manufacturing
I quite enjoyed reading this. Very evocative.

Welcome to San Francisco.

Stitching SAEs of different sizes

Bart Bussmann, Patrick Leask, Joseph Bloom, Curt Tigges and Neel Nanda

13 Jul 2024 17:19 UTC

39 points

12 comments12 min readLW link

Curt Tigges 18 Jun 2024 18:49 UTC
1 point
0
on: The Best Tacit Knowledge Videos on Every Subject
Domain: Software engineering, mech interp

Bryce Meyer (primary maintainer of TransformerLens, and software engineer with many years of experience) has a weekly coding stream event where he does live coding on TransformerLens—resolving bugs, adding features and tests, etc. I’ve found it to be useful!

You can find it in the Open Source Mechanistic Interpretability Slack, under the “code-sessions” channel (feel free to DM for an invite).

Curt Tigges 30 May 2024 22:02 UTC
9 points
9
on: Talent Needs of Technical AI Safety Teams
Great post, but there is one part I’d like to push back on:
Iterators are also easier to identify, both by their resumes and demonstrated skills. If you compare two CVs of postdocs that have spent the same amount of time in academia, and one of them has substantially more papers (or GitHub commits) to their name than the other (controlling for quality), you’ve found the better Iterator. Similarly, if you compare two CodeSignal tests with the same score but different completion times, the one completed more quickly belongs to the stronger Iterator.
This seems like a bit of an over-claim. I would endorse a weaker claim, like “in the presence of a high volume of applicants, CodeSignal tests, GitHub commits, and paper count statistically provide some signal,” but the reality of work in the fields of research and software development is often such that there isn’t a clean correspondence between these measures and someone’s performance. In addition, all three of these measures are quite easy to game (or Goodhart).
For example, in research alone, not every paper entails the same-sized project; two high-quality papers could have an order of magnitude difference in the amount of work required to produce them. Not every research bet pays off, too—some projects don’t result in papers, and research management often plays a role in what directions get pursued (and dropped or not if they are unproductive). There are also many researchers who have made a career out of getting their names on as many papers as possible; there is an entire science to doing this that is completely independent of your actual research abilities.
In the case of CodeSignal evaluations, signal is likewise relatively low-dimensional and primarily conveys one thing: enough experience with a relatively small set of patterns that one can do the assessment very quickly. I’ve taken enough of these and seen enough reviews from senior engineers on CodeSignal tests to know that they capture only a small, specific part of what it takes to be a good engineer, and overemphasize speed (which is not the main thing you want from an actual senior engineer; you want quality as well as maintainability and readability, which often are at odds with speed. Senior engineers’ first instinct is not generally to jump in and start spitting out lines of code like their lives depend on it). Then there’s the issue of how hackable/gameable the assessments are; senior engineer Yanir Seroussi has a good blog post on CodeSignal specifically: https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/
I’m definitely not arguing that these metrics are useless, however. They do provide some signal (especially if the volume of applicants is high), but I’d suggest that we see them as imperfect proxies that we’re forced to use due to insufficient manpower for comprehensive candidate evaluations, rather than actually capturing some kind of ground truth.

Curt Tigges 15 May 2024 18:51 UTC
1 point
0
on: my note system
Perhaps more important than these details: How do you curate input to take notes on, and what is the purpose you take the notes for? How do you use the notes once written? (This latter point seems to be one of the biggest reason many people have dropped PKM systems.)

Curt Tigges 11 May 2024 5:35 UTC
1 point
0
in reply to: Vanessa Kosoy’s comment on: Dating Roundup #3: Third Time’s the Charm
Very kind of you to say. :) I think for me, though, the source of the emotion I felt when reading this series was something like: “Ah, so in addition to ensuring we are dateable ourselves, we must fix society, capitalism (at least the dating part of it), culture, etc. in order to have a Good Dating Universe.” Which in retrospect was a bit overblown of me, so I think I no longer endorse the strong version of what I said in that comment.

Curt Tigges 10 May 2024 5:50 UTC
6 points
2
on: Dating Roundup #3: Third Time’s the Charm
I think this list may successfully convince some to stay off the dating market indefinitely. Who in the world has time to work on all of this? At best, this is just a massive set of to-dos; at worst, it’s an enormous list of all the ways the dating world sucks and reasons why you’ll fail.

Upon reflection: This is a good collection of information, even if it is rather discouraging to read. May we all find exceptions to the unfortunate trends that seem to characterize the modern dating landscape.

Curt Tigges 5 Feb 2024 19:47 UTC
4 points
2
on: Why I no longer identify as transhumanist
I actually went through the same process as what you describe here, but it didn’t remove my “transhumanist” label. I was a big fan of Humanity+, excited about human upgrading, etc. etc. I then became disillusioned about progress in the relevant fields, started to understand nonduality and the lack of a persistent or independent self, and realized AI was the only critical thing that actually was in the process of happening.

In that sense, my process was similar but I still consider myself a transhumanist. Why? Because for me, solving death or trying to make progress in the scientific fields that lead to various types of augmentations aren’t the biggest or most critical pieces of transhumanism. One could probably have been a transhumanist in the 1800s, because for me it’s about the process of imagining and defining and philosophizing about what humanity—on an individual organism level as well as on a sociocultural level—will become (or what it might be worthwhile to become) after particular types of technological transitions.
Admittedly, there is a normative component that’s something like “those of us who want to should be able to become something more than base human” and isn’t really active until those capabilities actually exist, but the process of thinking about what it might be worthwhile to become, or what the transition will be like, or what matters and what is valuable in this kind of future, are all important.
It’s not about maximizing the self, either—I’m not an extropian. Whether or not something called “me” exists in this future (which might be soon but might not), the conscious experience of beings in it matter to me (and in this sense I’m a longtermist).
Will an aligned AI solve death? Maybe, but my hopes don’t rely on this. Humanity will almost certainly change in diverse ways, and is already changing a bit (though often not in great ways). It’s worthwhile to think about what kind of changes we would want to create, given greater powers to do so.

Curt Tigges 7 Apr 2023 16:35 UTC
1 point
0
in reply to: Thomas Kwa’s comment on: Exploratory Analysis of RLHF Transformers with TransformerLens
Yes, tuned lens is an excellent tool and generally superior to the original logit lens. In this particular case, I don’t think it would show very different results, however (and in any case the logit lens is only a small part of the analysis), but I think it would be interesting to have some kind of integration with TransformerLens that enabled the training and usage of tuned lens as well.

Exploratory Analysis of RLHF Transformers with TransformerLens

Curt Tigges3 Apr 2023 16:09 UTC

21 points

2 comments11 min readLW link

(blog.eleuther.ai)

Curt Tigges 8 Jan 2023 23:31 UTC
11 points
7
on: I tried to learn as much Deep Learning math as I could in 24 hours
This is a cool idea, and I have no doubt it helped somewhat, but IMO it falls prey to the same mistake I see made by the makers of almost every video series/online course/list of resources for ML math: assuming that math is mostly about concepts and facts.
It’s only about 5% that. Maybe less. I and many others in ML have seen the same videos and remembered the concepts for a while too. And forgotten them, in time. More than once! On the other hand, I’ve seen how persistently and operationally fluent (especially in ML and interpretability) people become when they actually learned math the way it must be learned: via hundreds of hours of laborious exercises, proofs, derivations, etc. Videos and lectures are a small fraction of what’s ultimately needed.
For most of ML, it’s probably fine—you’ll never need to do a proof or do more than simple linear algebra operations by hand. But if you want to do the really hard stuff, especially in interpretability, I don’t think there’s any substitute for cranking through those hours.
To be clear, I think this weekend was a great start on that—if you continue immediately to taking full courses and doing the exercises. I’m a top-down learner, so it would certainly help me. But unless it’s practiced in very short order, it will be forgotten, and just become a collection of terms you recognize when others talk about them.

Curt Tigges

SAEBench: A Com­pre­hen­sive Bench­mark for Sparse Autoencoders

Stitch­ing SAEs of differ­ent sizes

Ex­plo­ra­tory Anal­y­sis of RLHF Trans­form­ers with TransformerLens

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders

Stitching SAEs of different sizes

Exploratory Analysis of RLHF Transformers with TransformerLens