Stanislav Fort

Karma: 480

Stanislav Fort 16 May 2026 17:25 UTC
5 points
0
on: An Introduction to Exemplar Partitioning for Mechanistic Interpretability
Interesting! I used a similar technique for a very particular application, namely detecting harmful inputs into LLMs in Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback Section 5.4 (Applying Out-of-Distribution Detection to Reject Strange or Harmful Requests).

Stanislav Fort 2 May 2026 5:27 UTC
2 points
0
on: Sanity-checking “Incompressible Knowledge Probes”
Quick thought: I wonder if you could still be overestimating parameter counts for models that would be made by distilling a larger teacher down to a smaller student. Any OSS ones you could test this hypothesis on?

Stanislav Fort 9 Feb 2026 12:40 UTC
7 points
0
in reply to: Rasool’s comment on: AI found 12 of 12 OpenSSL zero-days (while curl cancelled its bug bounty)
Thanks for flagging this, Rasool. I’ve been following the Anthropic announcement closely. It’s genuinely exciting to see a frontier lab validate that AI can find real vulnerabilities in real software at scale. The more serious players in this space, the better, since it is a genuinely large problem ⇒ far larger than any single team.

We’ve been doing this work operationally since mid/late-2025, and our experience has been that the hardest part isn’t finding some bugs but rather the hardest bugs in the most audited codebases. Earning the trust of maintainers and closing the full loop from discovery through patch acceptance is also very challenging. That’s where most of the difficulty (and most of the value) lives.

I wrote up a practitioner’s perspective on what we’ve learned, how our results compare, and what the real challenges ahead look like from our vantage point: What AI Security Research Looks Like When It Works

Stanislav Fort 8 Feb 2026 14:52 UTC
4 points
0
in reply to: Quinn’s comment on: AI found 12 of 12 OpenSSL zero-days (while curl cancelled its bug bounty)

i don’t know for sure that there is zero “formal methods” in the pipeline

in discovering the 12 OpenSSL zero-day vulnerabilities, we haven’t used any formal methods. since then, we incorporated some. the (discovery --> CVE assigned --> CVE made public) pipeline is a very lagging indicator and the OpenSSL results are reflective of the state of the AISLE system approximately mid-fall 2025, prior to our use of formal methods

AI found 12 of 12 OpenSSL zero-days (while curl cancelled its bug bounty)

Stanislav Fort27 Jan 2026 20:21 UTC

357 points

24 comments8 min readLW link

Stanislav Fort 3 Nov 2025 17:26 UTC
4 points
0
in reply to: peterbarnett’s comment on: AISLE discovered three new OpenSSL vulnerabilities
Short answer: these aren’t Heartbleed-class, but they’re absolutely worth patching.

Two signals: (i) OpenSSL itself minted CVEs for them. This is non-trivial given its conservative posture, and (ii) fixes were backported across supported branches (3.5.4 / 3.4.3 / 3.3.5 / 3.2.6, with distro backports).

For context, per OpenSSL’s own vulnerability index as of today (3 Nov 2025), there were 4 CVEs in 2025 YTD (CVE-2025-), 9 in 2024 (CVE-2024-), 18 in 2023 (CVE-2023-), 15 in 2022 (CVE-2022-). Getting any CVE there is hard. “Low/Medium” here mostly reflects narrow preconditions and prevalence within typical OpenSSL usage, not that the primitives themselves are trivial. The score (called CVSS) compresses likelihood and impact into one scalar.

Stanislav Fort 3 Nov 2025 17:12 UTC
11 points
0
in reply to: lc’s comment on: AISLE discovered three new OpenSSL vulnerabilities
Appreciate the pushback and your perspective. Two anchoring facts:
1. OpenSSL minted and published these CVEs (not us). They’re very conservative. Getting any CVE through their process is non-trivial. In 2025 we reported several issues. Some received CVEs, others were fixed without CVEs, which is normal under OpenSSL’s security posture.
2. On your “AI vs human experts” point: the findings came from a fully autonomous analysis pipeline. We then manually verified and coordinated disclosure with maintainers. The takeaway: our stack surfaced previously unknown, CVE-worthy bugs in OpenSSL’s hardened codebase. That’s hard to do by hand at scale.

Solving adversarial attacks in computer vision as a baby version of general AI alignment

Stanislav Fort29 Aug 2024 17:17 UTC

101 points

10 comments7 min readLW link