sanxiyn comments on Models finding software vulnerabilities is not the primary source of cybersecurity risk

sanxiyn 8 Jun 2026 5:04 UTC
12 points
2
Unhackable operating system for all practical purposes already exists: seL4. DARPA funded HACMS program to use seL4. It worked.
seL4 is formally verified for functional correctness. It means the implementation corresponds to the specification. This verification eliminates all implementation bugs. It does not eliminate specification bugs, but seL4 also proved integrity, availability, and confidentiality of OS. Definition of integrity/availability/confidentiality is small enough to be manually reviewed, and was in fact widely reviewed (including myself).
So there IS a single mechanism that can rule out nearly all of 30% of bugs left over after 70% of memory safety bugs are fixed. Formal verification for functional correctness is not particularly simple, but it is demonstrably possible as seL4 proved, and AI assistance will make formal verification proof engineering cheaper.
- Random Developer 8 Jun 2026 14:06 UTC
  11 points
  3
  Parent
  seL4 is a fantastic piece of technology. The problem is that it is a small, highly specialized kernel: SeL4 is about 10,500 lines of code, built at a cost of about $350-400 per line of code. Apparently they found that verification cost scales supralinearly with size, too.
  
  In practice, seL4 deals poorly with the fact that most hardware is itself bug-infested garbage, and that much of it has enough DMA or other access to cause mischief behind the kernel’s back.
  
  The Linux kernel itself is now about 40 million lines of code, much of it in drivers. Using an extremely approximate methodology, the core of a modern Linux environment is likely over 250 million lines of code. Various wild-ass guesses place the total amount of code in the world at anywhere from hundreds of billions of lines to trillions of lines.
  
  Redesigning and rewriting the entire Linux kernel to seL4 standards would cost around 16 billion dollars using an extremely naive linear extrapolation. Rewriting a distro core brings us to around $100 billion. Rewriting all the software in the world would bring us to an exceptionally wild-assed guess of, say, 500 billion lines of code times $400/line, or $200 trillion dollars. Give or take some zeros, to be clear.
  
  Now, let’s look at how these numbers are changing. GitHub is desperately trying to increase capacity for the agentic era:
  
  We started executing our plan to increase GitHub’s capacity by 10X in October 2025 with a goal of substantially improving reliability and failover. By February 2026, it was clear that we needed to design for a future that requires 30X today’s scale.
  
  So the amount of code that would need to be secured is increasingly drastically. But is this new agentic code more or less secure than the old human-written code? Let’s look at a recent comment thread on Hacker News talking about how many companies’ engineering standards are changing in response to agentic coding, e.g.:
  
  I’m seeing so many of these come in with “this is 95% done, just need a couple of minor tweaks for production release”
  
  “Minor tweaks” being fix the layout so it’s not messed up if the browser isn’t exactly 1920px wide, sometimes these filters and sorting don’t seem to work right and the app doesn’t seem to refresh new values properly after an action.
  
  No matter the issue it’s pre-estimated by the business as “should be a quick fix, for an experienced dev” because they (allegedly) did 95% of the work already.
  
  So my fear is that the widespread use of agentic coding will actually push us towards a world of vibecoded slop being pushed straight to production. This code will sometimes make extremely basic security errors, like hard-coding the database password in the publicly-readable UI code.
  
  So, yes, I agree that we do know how to build fairly secure software. I just find it profoundly unlikely that we’ll actually do it: The cost would be unimaginably vast, and the rapid deprofessionalization of software engineering means that all the incentives currently point to far less security and 10x the new code per year.
  - Ben Livengood 8 Jun 2026 15:57 UTC
    1 point
    0
    Parent
    Would rewriting Linux to seL4 standards cost $16B in the world before or after frontier models are solving Erdos problems? If that’s the cost in human SWE hours then it seems tractable to use agent harnesses and formal methods to achieve quite a cost reduction.
    But also I don’t think most people want Linux to seL4 standards (the Unix security model isn’t great); there’s probably more to be gained by finishing the network stack(s) for seL4 and implementing a bunch of network card drivers and a TLS library. That would enable IoT at least to have a pretty secure base to work from, and hopefully the harnesses and tooling for that work would also be available to application developers to verify at least their parsing and security checks for example.
    - Random Developer 8 Jun 2026 19:59 UTC
      2 points
      0
      Parent
      
      Would rewriting Linux to seL4 standards cost $16B in the world before or after frontier models are solving Erdos problems?
      
      After. (EDIT: For the incredibly basic reason frontier models are already solving Erdos problems, but the largest LLM-based piece of system software is only borderline functional dumpster fire of a C compiler, but keep reading.)
      
      It’s fantastically impressive that LLMs can solve Erdos problems! But:
      
      Erdos problems are much smaller than the Linux kernel, and much easier to verify. Both these factors play to LLM strengths.
      Not all Erdos problems are equally difficult. Some seem to be just difficult enough that a human would need to seriously dig in and grind it out, or bring a nice insight from a distant corner of math. Others are likely more difficult. I believe about 551/1217 are currently solved.
      Reimplementing a highly-secure kernel that could replace Linux for most uses cases is an enormous project with many poorly defined subtasks, which is basically hell for LLMs. The biggest LLM project I’m aware of is a C compiler. Which basically works, but which is apparently a remarkably terrible C compiler. And C is far, far better specified than the internal guts of Linux and all the ghastly hardware that Linux supports.
      
      Also, please keep in mind that I’m willing to handwave a couple of zeros either way on the larger estimates.
      
      I expect overall security of newly written code to get clearly worse on average (thanks to the deprofessionalization of software development and companies pushing vibeslop to production) right up until offensive Mythos-class abilities become widespread. Then I expect things to get unpleasantly exciting.