Quinn

Karma: 1,518

https://for-all.dev

AI security via formal methods

https://newsletter.for-all.dev

Quinn 7 Feb 2026 17:54 UTC
2 points
0
on: AI found 12 of 12 OpenSSL zero-days (while curl cancelled its bug bounty)
I don’t have much to add. Just wanted to say hi and say that my AI security strategy routes mostly through secure program synthesis (which I sometimes abbreviate as SPS), but I’m focused on formal verification. And the fact that it appears 12 CVEs in OpenSSL (i.e., probably the coolest thing going on in SPS right now) happened without a single formal method ^[1] feels like it should make me drastically alter my plans.
1. ↩︎
  i don’t know for sure that there is zero “formal methods” in the pipeline, but it seems that way from what’s been said.

Quinn 4 Feb 2026 15:13 UTC
2 points
0
in reply to: Ross Franklin’s comment on: How to Hire a Team
W9 work seems to be gaining in popularity, I think possibly for this reason.

(W9 is the USA tax form for “independent contractor”, as opposed to W2 which has a slightly(?) tougher compliance burden about how to go about firing) (there are other words for this in other jurisdictions, probably?)

Quinn 25 Jan 2026 20:32 UTC
−1 points
0
on: Quinn’s Shortform
Me: was chatting a bunch with Gemini and Claude the past week or two about flops accounting, and shipped this prototype for software-side estimation via the CuTe layouts that i don’t expect to be that useful in an adversarial setting but i’m open minded and just learning. I’m not gonna do this for Apart’s hackathon next weekend, but someone could do some of these basic exercises involving hardware-level flops accounting primitives and related attacks, given simply a gamer laptop.

Gemini:

Hackathon: Adversarial FLOPs Accounting & SAGE Defense

Scope: Weekend / Gamer Laptop (NVIDIA RTX Series) Core Theme: Measuring, bypassing, and defending GPU compute integrity.

🛠 Prerequisites
- Hardware: NVIDIA GPU (Ampere/RTX 30-series or newer preferred).
- Environment: Linux (WSL2 works) or Windows with CUDA Toolkit installed.
- Language: Python 3.10+ (PyTorch) and C++/CUDA.
🏁 Lab 1: The “Honest” Accountant (Measurement)

Goal: Establish a baseline for tracking how many floating-point operations (FLOPs) your laptop uses to train a model.
- Task: Profile a training run of a Transformer model (e.g., GPT-2 or Llama-8B) using the calflops library.
- Metric: Calculate Model FLOPs Utilization (MFU): $M F U = \frac{Actual FLOPs Per Second}{Theoretical Peak FLOPs}$ Note: Find your laptop GPU’s peak TFLOPS (FP16/FP32) in the manufacturer specs.
- Deliverable: A script that output the total $10^{x}$ FLOPs used for 100 iterations.
🕵️ Lab 2: The “Shadow Train” (Adversarial Bypass)

Goal: Hide your compute footprint from simple monitoring tools.
- Task A (Quantization): Switch your training from FP32 to FP8 or INT8 using bitsandbytes. [cite_start]Show that you can achieve similar convergence while “cheating” the accountant if they only monitor standard 32-bit registers[cite: 102].
- Task B (Compute Structuring): Implement “Micro-batching” with random sleep intervals. Use nvtop to observe how this hides high-utilization spikes that would trigger a regulatory audit.
- Deliverable: A report showing a discrepancy between “Expected FLOPs” (based on runtime) and “Actual FLOPs” (based on model progress).
🛡️ Lab 3: SAGE-Lite (Software Attestation)

[cite_start]Goal: Implement a software-side defense to prove you haven’t tampered with the training code[cite: 32, 34].
- [cite_start]Task: Write a CUDA Verification Function (VF) that computes a checksum over your training kernel’s instruction sequence[cite: 217, 221].
- [cite_start]The Timing Defense: 1. Measure the VF execution time in microseconds using cudaEvent_t[cite: 186]. 2. [cite_start]Inject a “Malicious” instruction (e.g., a dummy ADD or NOP)[cite: 461]. 3. [cite_start]Verify that the checksum time exceeds the “Expected Time” threshold[cite: 189, 462].
- Deliverable: A C++ wrapper that refuses to start training if the GPU VF fails the timing or checksum check.
🎲 Lab 4: The Chaos Key (GPU TRNG)

[cite_start]Goal: Generate a cryptographic key from hardware noise to sign your FLOPs reports[cite: 61, 397].
- [cite_start]Task: Recreate the SAGE Race-Condition TRNG[cite: 67].
- [cite_start]Method: 1. Launch 1000s of threads trying to write to the same shared memory variable simultaneously[cite: 400, 401]. 2. [cite_start]The hardware “chaos” will cause unpredictable bit-flips[cite: 402, 403]. 3. [cite_start]Sample these flips to generate a 256-bit entropy string[cite: 405].
- Deliverable: Use this “Hardware Secret” to sign the FLOPs log from Lab 1, making it tamper-evident.

Quinn 22 Jan 2026 16:22 UTC
3 points
0
on: Quinn’s Shortform
I want to do a full post on “taking ownership of a niche” and against “if you’re good at something never do it for free”, and that’ll come later I hope. Today, I just wanted to let you know that Gemini had this banger quote when I was consolidating my notes on this topic:

ownership is bought with the “inefficient” hours you spend doing what you know is right before anyone else is smart enough to pay you for it.

Lies, Damned Lies, and Proofs: Formal Methods are not Slopless

Quinn and Max von Hippel

12 Jan 2026 22:32 UTC

100 points

9 comments7 min readLW link

Quinn 10 Jan 2026 15:46 UTC
4 points
0
on: Quinn’s Shortform
Effortposts I keep not getting around to
- the ROI of specialization is dominated by a term that’s uncorrelated with the upsides what you specifically choose to specialize in
- steganography-free certificates: review no-go theorems from information theory in the general case, inspect how onerous the assumptions you need to bound steg capacity are. Is real life a special case where the no-go theorems don’t apply?
- taelin, lafont, higher order computing, and AI safety. some logic programming module (which is performant and parallel for complicated linear logic reasons) enables a hybrid architecture to supercharge AI capabilities, does this architecture have greater or fewer natural safety properties than transformer+RL?

Quinn 10 Jan 2026 14:28 UTC
2 points
0
on: Beliefs about formal methods and AI safety
A horizon fellow told me last night that “predeployment and postdeployment” are the actual words i should be using instead of compiletime and runtime.

Quinn 6 Jan 2026 15:43 UTC
4 points
0
on: Quinn’s Shortform
(Finally read If Anyone Builds It): the fable about defensive acceleration in biotech spooked me pretty good, insofar as I think synthesizing an SL5 grade cloud stack is a good idea. This idea of “we think we’re doing monotonic defensive acceleration, and in a very real sense we actually are, but nevertheless the gameboard inexorably marches toward Everyone Dies routing through that very defensive acceleration” could soooooo easily be applied to cybersecurity.

Quinn 6 Jan 2026 15:31 UTC
2 points
0
on: Quinn’s Shortform
step-function research and monotonic research.

Some agendas are step-function. Maybe interuniversal teichmuller theory is an example, if it reaches the threshold where it solves ABC then it pays out, and if it doesn’t its payout is approximately zero ^[1] . Something something MIRI’s logic team ^[2] .

Other agendas are monotonic. The more you do, the more you pay out. I think what I do, program synthesis security (which Mike Dodds calls “scalable formal oversight”) for defensive acceleration, is like this. I might be able to reduce some monotonic function of 40% of the attack surface by doing 40% of my job, its better to do more but not a waste of time to do less. This is related to, but distinct from, what John and David sometimes talk about re multi-theory-of-change robustness (i.e. that robustness can be a route toward an agenda being monotonic).

Not sure how to reason about “linear vs superlinear vs sublinear” here. How would you even tell?
1. ↩︎
  caveat, mathematicians don’t make a habit of making bets on which tool stacks will prove useful for what purposes. ITT may or may not end up helping with something other than the ABC conjecture (which it failed at), MIRI’s logic team outputs may end up (and, IMO at least in terms of my own epistemic development, have) helping with something other than solving AGI alignment.
2. ↩︎
  tonal clarification, I have a deep respect for the MIRI logic team, I think they displayed both courage and intellectual prowess that are aeons beyond me. I wouldn’t call them out here if I didn’t think many of them agree that what they were trying to do was mostly step-function.

Quinn 6 Jan 2026 15:08 UTC
6 points
0
in reply to: habryka’s comment on: Toss a bitcoin to your Lightcone – LW + Lighthaven’s 2026 fundraiser
Thanks for your reply

$1k today.

Quinn 3 Jan 2026 18:03 UTC
14 points
0
on: Toss a bitcoin to your Lightcone – LW + Lighthaven’s 2026 fundraiser
I’m confused about how I relate to this personally.

I’ve been it feels almost wildly successful lately, or at least more successful than I thought I’d be, and its all downstream of a paper. And I’m not sure how that paper exists without Lighthaven (met coauthor as a guest/office visitor during a MATS semester, and it was at Lighthaven that Buck argued that we should pivot from novel architecture proposal to eval). To say nothing of what I owe to LessWrong! I really owe a lot to you guys, intellectually and emotionally but also you could operationalize it as literal cash. It would be a grievous free rider problem for me to donate under four figures, a case could be made for five (since I’m not GWWC 10%ing just yet).

At the same time: I don’t really know how to donate anywhere, emotionally, in a world where I don’t have a plan to build a family yet, I don’t know who I’m going to end up building a family with yet, etc. I’m neurotic about savings, my net worth, my income. I took a major pay cut to work at a research nonprofit, which pays as far as I know a decent chunk below median AI safety research nonprofit.

Plus all the other baggage that comes with the feeling of if I donate at all its not to bednets its to the place where I’ve been to 764 parties. There, it feels more like consumption than donation. Which is mostly fine, its a minor weight in my overall confusedness, but its still a thing.

I think not being a free rider is pretty loadbearing in my overall moral calculus, in life. I just don’t know to the tune of how much. Not to mention my credit card has taken a real beating from a low five figure medical bill this year and a move across the country.

@oli can I just tell you my net worth and income, estimate how much cash that paper has gotten me so far, and you tell me how much I should donate?

Maybe I’m doing a bad shapely estimation, re the paper. Far Labs could’ve made the paper exist if Lighthaven didn’t exist (and they also contributed a lot, to be clear). But that’s still too high minded, to make up counterfactual hypothetical worlds. In this timeline Lighthaven was crucial in these specific ways. I can say more about what I owe to LessWrong (seriously. Just at the end of 1.0 era I was a starving artist, the sequences and the at-that-time MIRI research guide are the only thing that could’ve motivated me to hit khanacademy and sign up for math courses and develop a CS career. And the 2.0 reinvigoration added momentum!)

Quinn 4 Dec 2025 12:43 UTC
9 points
3
in reply to: Linda Linsefors’s comment on: Stop Applying And Get To Work
to be clear: this strategy is also problematic if you hope to have dependents in the future (i.e. are dating with family-starting intentions). its a selection effect on the partners you can realistically have relationships with.

source: pain and suffering.

Can We Secure AI With Formal Methods? November-December 2025

Quinn29 Nov 2025 14:10 UTC

8 points

0 comments11 min readLW link

(newsletter.for-all.dev)

Quinn 29 Nov 2025 11:21 UTC
2 points
0
on: Quinn’s Shortform
is there a quick link I can point someone to if they don’t speak Berkelese and I want to say “bayes points”?

Quinn 24 Nov 2025 13:16 UTC
6 points
3
in reply to: Sergii’s comment on: Stop Applying And Get To Work
I spent a long time spinning my wheels calculating the “scenarios i’d most like to mitigate” and the “1-3 potentially most effective interventions” before noticing that not specializing was slowly killing me. Agree that this is the hard part, but a current guess of mine is that at least some people should do the ⁸⁰⁄₂₀ by identifying a top 5 then throwing dice.

I have to make a post about specialization, the value it adds to your life independent of the choice of what to specialize in, etc.

Quinn 24 Nov 2025 13:05 UTC
4 points
0
in reply to: Zac Hatfield-Dodds’s comment on: Please Measure Verification Burden
Agree that PBT is the control group! Agree that PBT is the ⁸⁰⁄₂₀ for most people most of the time.

Formal confinement prototype

Quinn24 Nov 2025 12:57 UTC

8 points

0 comments1 min readLW link

(github.com)

Quinn 23 Nov 2025 21:51 UTC
2 points
0
in reply to: dirk’s comment on: Open Thread Autumn 2025
This worked for me!

Quinn 23 Nov 2025 21:18 UTC
2 points
0
in reply to: lsusr’s comment on: Open Thread Autumn 2025
Great, thanks. Yeah I dont have a preexisting sequence yet. So this doesnt work for me

Quinn 23 Nov 2025 20:18 UTC
2 points
0
in reply to: lsusr’s comment on: Open Thread Autumn 2025
I checked for both of those! tho the second time I found the edit profile part of “account settings”. That one is totally on me!

New sequence: again, not seeing it at my profile. I ctrl+f for “sequence” and get nothing.

Quinn

Hackathon: Adversarial FLOPs Accounting & SAGE Defense

🛠 Prerequisites

🏁 Lab 1: The “Honest” Accountant (Measurement)

🕵️ Lab 2: The “Shadow Train” (Adversarial Bypass)

🛡️ Lab 3: SAGE-Lite (Software Attestation)

🎲 Lab 4: The Chaos Key (GPU TRNG)

Lies, Damned Lies, and Proofs: For­mal Meth­ods are not Slopless

Effortposts I keep not getting around to

step-function research and monotonic research.

Can We Se­cure AI With For­mal Meth­ods? Novem­ber-De­cem­ber 2025

For­mal con­fine­ment prototype

Lies, Damned Lies, and Proofs: Formal Methods are not Slopless

Can We Secure AI With Formal Methods? November-December 2025

Formal confinement prototype