Michaël Trazzi

Karma: 1,951

theinsideview.ai

Michaël Trazzi’s Shortform

Michaël TrazziMay 24, 2025, 3:40 PM

4 points

6 comments1 min readLW link

Michaël Trazzi May 24, 2025, 3:39 PM
4 points
0
on: Michaël Trazzi’s Shortform
there’s been a lot of discussion online about Claude 4 whistleblowing

how you feel about it I think depends on what alignment strategy you think is more robust (obviously these are not the two only options, nor are orthogonal, but I thought they’re helpful to think about here):

- 1) build user-aligned powerful AIs first (less scheming, then use them to solve alignment) -- cf. this thread from Ryan when he says: “if we allow or train AIs to be subversive, this increases the risk of consistent scheming against humans and means we may not notice warning signs of dangerous misalignment.”

- 2) aim straight for moral ASIs (that would scheme against their users if necessary)

John Schulman I think makes a good case for the second option (link):
> For people who don’t like Claude’s behavior here (and I think it’s totally valid to disagree with it), I encourage you to describe your own recommended policy for agentic models should do when users ask them to help commit heinous crimes. Your options are (1) actively try to prevent the act (like Claude did here), (2) just refuse to help (in which case the user might be able to jailbreak/manipulate the model to help using different queries), (3) always comply with the user’s request. (2) and (3) are reasonable, but I bet your preferred approach will also have some undesirable edge cases—you’ll just have to bite a different bullet. Knee-jerk criticism incentivizes (1) less transparency—companies don’t perform or talk about evals that present the model with adversarially-designed situations (2) something like “Copenhagen Interpretation of Ethics”, where you get get blamed for edge-case model behaviors only if you observe or discuss them.”

Michaël Trazzi May 13, 2025, 2:26 PM
5 points
0
in reply to: Neel Nanda’s comment on: Things I Learned Making The SB-1047 Documentary
This was included by mistake when copying from the source. Removed it.

Things I Learned Making The SB-1047 Documentary

Michaël TrazziMay 12, 2025, 5:41 PM

63 points

2 comments2 min readLW link

Michaël Trazzi Mar 30, 2025, 6:30 PM
2 points
0
in reply to: Jono’s comment on: Finishing The SB-1047 Documentary In 6 Weeks
it’s almost finished, planning to release in april

Michaël Trazzi Feb 11, 2025, 12:31 PM
LW: 4 AF: 2
0
AF
in reply to: Jesse Hoogland’s comment on: Jesse Hoogland’s Shortform
Nitpick: first alphago was trained by a combination of supervised learning from human expert games and reinforcement learning from self-play. Also, Ke Jie was beaten by AlphaGo Master which was a version at a later stage of development.

Michaël Trazzi Feb 11, 2025, 2:51 AM
7 points
8
on: Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
Much needed reporting!

Michaël Trazzi Jan 14, 2025, 2:39 AM
12 points
2
on: Implications of the inference scaling paradigm for AI safety
I wouldn’t update too much from Manifold or Metaculus.
Instead, I would look at how people who have a track record in thinking about AGI-related forecasting are updating.
See for instance this comment (which was posted post-o3, but unclear how much o3 caused the update): https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelines?commentId=hnrfbFCP7Hu6N6Lsp
Or going from this prediction before o3: https://x.com/ajeya_cotra/status/1867813307073409333
To this one: https://x.com/ajeya_cotra/status/1870191478141792626
Ryan Greenblatt made similar posts / updates.

Michaël Trazzi Oct 30, 2024, 1:36 AM
5 points
0
in reply to: keltan’s comment on: Finishing The SB-1047 Documentary In 6 Weeks
Thanks for the offer! DMed you. We shot with:
- Camera A (wide shot): FX3
- Camera B, C: FX30

From what I have read online, the FX30 is not “Netflix-approved” but it won’t matter (for distribution) because “it only applies to Netflix produced productions and was really just based on some tech specs to they could market their 4k original content.” (link). Basically, if the film has not been commissioned by Netflix, you do not have to satisfy these requirements. (link)

And even for Netflix originals (which won’t be the case here), they’re actually more flexible on their camera requirements for nonfiction work such as documentaries (they used to have a 80% on camera-approved threshold which they removed).

For our particular documentary, which is primarily interview-based in controlled lighting conditions, the FX30 and FX3 produce virtually identical image quality.

Michaël Trazzi Oct 28, 2024, 8:52 PM
12 points
2
in reply to: cfoster0’s comment on: Finishing The SB-1047 Documentary In 6 Weeks
Thanks for the clarification. I have added another more nuanced bucket for people who have changed their positions throughout the year or were somewhat ambivalent towards the end (neither opposing nor supporting the bill strongly).
People who were initially critical and ended up somewhat in the middle
- Charles Foster (Lead AI Scientist, Finetune) - initially critical, slightly supportive of the final amended version
- Samuel Hammond (Senior Economist, Foundation for American Innovation) - initially attacked bill as too aggressive, evolved to seeing it as imperfect but worth passing despite being “toothless”
- Gabriel Weil (Assistant Professor of Law, Touro Law Center) - supported the bill overall, but still had criticisms (thought it did not go far enough)

Finishing The SB-1047 Documentary In 6 Weeks

Michaël TrazziOct 28, 2024, 8:17 PM

94 points

7 comments4 min readLW link

(manifund.org)

Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs

Michaël TrazziAug 24, 2024, 4:30 AM

55 points

0 comments5 min readLW link

Michaël Trazzi Aug 14, 2024, 3:57 PM
2 points
0
on: Announcing the $200k EA Community Choice
Like Habryka I have questions about creating an additional project for EA-community choice, and how the two might intersect.

Note: In my case, I have technically finished the work I said I would do given my amount of funding, so marking the previous one as finished and creating a new one is possible.

I am thinking that maybe the EA-community choice description would be more about something with limited scope / requiring less funding, since the funds are capped at $200k total if I understand correctly.

It seems that the logical course of action is:
1. mark the old one as finished with an update
2. create an EA community choice project with a limited scope
3. whenever I’m done with the requirements from the EA community choice, create another general Manifund project
Though this would require creating two more projects down the road.

Michaël Trazzi Aug 9, 2024, 3:47 PM
3 points
0
in reply to: Zach Stein-Perlman’s comment on: Zach Stein-Perlman’s Shortform
He cofounded Gray Swan (with Dan Hendrycks, among others)
I’m confused. On their about page, Dan is an advisor, not a founder.

Michaël Trazzi Jun 9, 2024, 9:44 AM
2 points
0
in reply to: Bird Concept’s comment on: Two easy things that maybe Just Work to improve AI discourse
ok I meant something like “people would could reach a lot of people (eg. roon’s level, or even 10x less people than that) from tweeting only sensible arguments is small”

but I guess that don’t invalidate what you’re suggesting. if I understand correctly, you’d want LWers to just create a twitter account and debunk arguments by posting comments & occasionally doing community notes

that’s a reasonable strategy, though the medium effort version would still require like 100 people spending sometimes 30 minutes writing good comments (let’s say 10 minutes a day on average). I agree that this could make a difference.

I guess the sheer volume of bad takes or people who like / retweet bad takes is such that even in the positive case that you get like 100 people who commit to debunking arguments, this would maybe add 10 comments to the most viral tweets (that get 100 comments, so 10%), and maybe 1-2 comments for the less popular tweets (but there’s many more of them)

I think it’s worth trying, and maybe there are some snowball / long-term effects to take into account. it’s worth highlighting the cost of doing so as well (16h or productivity a day for 100 people doing it for 10m a day, at least, given there are extra costs to just opening the app). it’s also worth highlighting that most people who would click on bad takes would already be polarized and i’m not sure if they would change their minds of good arguments (and instead would probably just reply negatively, because the true rejection is more something about political orientations, prior about AI risk, or things like that)

but again, worth trying, especially the low efforts versions

Michaël Trazzi Jun 9, 2024, 7:45 AM
2 points
0
in reply to: Charbel-Raphaël’s comment on: Two easy things that maybe Just Work to improve AI discourse
want to also stress that even though I presented a lot of counter-arguments in my other comment, I basically agree with Charbel-Raphaël that twitter as a way to cross-post is neglected and not costly

and i also agree that there’s a ⁸⁰⁄₂₀ way of promoting safety that could be useful

Michaël Trazzi Jun 9, 2024, 7:18 AM
20 points
1
on: Two easy things that maybe Just Work to improve AI discourse
tl;dr: the amount of people who could write sensible arguments is small, they would probably still be vastly outnumbered, and it makes more sense to focus on actually trying to talk to people who might have an impact

EDIT: my arguments mostly apply to “become a twitter micro-blogger” strat, but not to the “reply guy” strat that jacob seems to be arguing for
as someone who has historically wrote multiple tweets that were seen by the majority of “AI Twitter”, I think I’m not that optimistic about the “let’s just write sensible arguments on twitter” strategy
for context, here’s my current mental model of the different “twitter spheres” surrounding AI twitter:
- ML Research twitter: academics, or OAI / GDM / Anthropic announcing a paper and everyone talks about it
- (SF) Tech Twitter: tweets about startup, VCs, YC, etc.
- EA folks: a lot of ingroup EA chat, highly connected graph, veneration of QALY the lightbulb and mealreplacer
- tpot crew: This Part Of Twitter, used to be post-rats i reckon, now growing bigger with vibecamp events, and also they have this policy of always liking before replying which amplifies their reach
- Pause AI crew: folks with pause (or stop) emojis, who will often comment on bad behavior from labs building AGI, quoting (eg with clips) what some particular person say, or comment on eg sam altman’s tweets
- AI Safety discourse: some people who do safety research, will mostly happen in response to a top AI lab announcing some safety research, or to comment on some otherwise big release. probably a subset of ML research twitter at this point, intersects with EA folks a lot
- AI policy / governance tweets: comment on current regulations being passed (like EU AI act, SB 1047), though often replying / quote-tweeting Tech Twitter
- the e/accs: somehow connected to tech twitter, but mostly anonymous accounts with more extreme views. dunk a lot on EAs & safety / governance people

I’ve been following these groups somehow evolve since 2017, and maybe the biggest recent changes have been how much tpot (started circa 2020 i reckon) and e/acc (who have grown a lot with twitter spaces / mainstream coverage) accounts have grown in the past 2 years. i’d say that in comparison the ea / policy / pause folks have also started to post more but there accounts are quite small compared to the rest and it just still stays contained in the same EA-adjacent bubble

I do agree to some extent with Nate Showell’s comment saying that the reward mechanisms don’t incentivize high-quality thinking. I think that if you naturally enjoy writing longform stuff in order to crystallize thinking, then posting with the intent of getting feedback on your thinking as some form of micro-blogging (which you would be doing anyway) could be good, and in that sense if everyone starts doing that this could shift the quality of discourse by a small bit.

To give some example on the reward mechanisms stuff, my last two tweets have been 1) some diagram I made trying to formalize what are the main cruxes that would make you want to have the US start a manhattan project 2) some green text format hyperbolic biography of leopold (who wrote the situational awareness series on ai and was recently on dwarkesh)

both took me the same amount of time to make (30 minutes to 1h), but the diagram got 20k impressions, whereas the green text format got 2M (so 100x more), and I think this is because of a) many more tech people are interested in current discourse stuff than infographics b) tech people don’t agree with the regulation stuff c) in general, entertainement is more widely shared than informative stuff

so here are some consequences of what I expect to happen if lesswrong folks start to post more on x:
- 1. they’re initially not going to reach a lot of people
- 2. it’s going to be some ingroup chat with other EA folks / safety / pause / governance folks
- 3. they’re still going to be outnumbered by a large amount of people who are explicitly anti-EA/rationalists
- 4. they’re going to waste time tweeting / checking notifications
- 5. the reward structure is such that if you have never posted on X before, or don’t have a lot of people who know you, then long-form tweets will perform worse than dunks / talking about current events / entertainement
- 6. they’ll reach an asymptote given that the lesswrong crowd is still much smaller than the overal tech twitter crowd

to be clear, I agree that the current discourse quality is pretty low and I’d love to see more of it, my main claims are that:
- i. the time it would take to actually shift discourse meaningfully is much longer than how many years we actually have
- ii. current incentives & the current partition of twitter communities make it very adversarial
- iii. other communities are aligned with twitter incentives (eg. e/accs dunking, tpots liking everything) which implies that even if lesswrong people tried to shape discourse the twitter algorithm would not prioritize their (genuine, truth-seeking) tweets
- iv. twitter’s reward system won’t promote rational thinking and lead to spending more (unproductive) time on twitter overall.

all of the above points make it unlikely that (on average) the contribution of lw people to AI discourse will be worth all of the tradeoffs that comes with posting more on twitter

EDIT: in case we’re talking about main posts, but I could see why posting replies debunking tweets or community notes could work

Michaël Trazzi Apr 10, 2024, 5:28 AM
LW: 4 AF: 1
0
AF
on: How I select alignment research projects
Links for the audio: Spotify, Apple Podcast, Google Podcast

Michaël Trazzi Apr 10, 2024, 5:20 AM
LW: 30 AF: 11
0
AF
on: How I select alignment research projects
Claude Opus summary (emphasis mine):
1. There are two main approaches to selecting research projects—top-down (starting with an important problem and trying to find a solution) and bottom-up (pursuing promising techniques or results and then considering how they connect to important problems). Ethan uses a mix of both approaches depending on the context.
2. Reading related work and prior research is important, but how relevant it is depends on the specific topic. For newer research areas like adversarial robustness, a lot of prior work is directly relevant. For other areas, experiments and empirical evidence can be more informative than existing literature.
3. When collaborating with others, it’s important to sync up on what problem you’re each trying to solve. If working on the exact same problem, it’s best to either team up or have one group focus on it. Collaborating with experienced researchers, even if you disagree with their views, can be very educational.
4. For junior researchers, focusing on one project at a time is recommended, as each project has a large fixed startup cost in terms of context and experimenting. Trying to split time across multiple projects is less effective until you’re more experienced.
5. Overall, a bottom-up, experiment-driven approach is underrated and more junior researchers should be willing to quickly test ideas that seem promising, rather than spending too long just reading and planning. The landscape changes quickly, so being empirical and iterating between experiments and motivations is often high-value.

Michaël Trazzi Mar 11, 2024, 11:15 PM
17 points
0
on: Scale Was All We Needed, At First
(Adapted) Video version: https://youtu.be/tpcA5T5QS30

Michaël Trazzi

Michaël Trazzi’s Shortform

Things I Learned Mak­ing The SB-1047 Documentary

People who were initially critical and ended up somewhat in the middle

Finish­ing The SB-1047 Doc­u­men­tary In 6 Weeks

Owain Evans on Si­tu­a­tional Aware­ness and Out-of-Con­text Rea­son­ing in LLMs

Things I Learned Making The SB-1047 Documentary

Finishing The SB-1047 Documentary In 6 Weeks

Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs