RedMan

Karma: 605

RedMan 28 Feb 2026 20:35 UTC
1 point
0
in reply to: ryan_greenblatt’s comment on: Anthropic: “Statement from Dario Amodei on our discussions with the Department of War”
Thanks for replying, I think I was thinking of this post: https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/ryan_greenblatt-s-shortform?commentId=B6oDGoyphuNuzdDAT I didn’t understand the distinction between employee and helpful only access. Can you elaborate on the difference between employee access, public access, and helpful only access?

RedMan 28 Feb 2026 7:20 UTC
1 point
0
on: Anthropic: “Statement from Dario Amodei on our discussions with the Department of War”
Absolutely love this trend of all sorts of different people telling the US Government, ‘do the thing you’re threatening’.
Some people here @ryan_greenblatt (?) have claimed access to uncensored anthropic models at anthropic, maybe they can chime in about what exactly that means. If safeguards are baked into the model early enough in the training process that removing them requires substantial reengineering and maybe additional training runs, then maybe anthropic simply isn’t capable of complying.
If safeguards are introduced post training, anthropic has an uncensored model available. Assuming there isn’t another vendor, or that they want to make a super public demonstration of their power, the US Government has a ton of methods to compel compliance. They can demand it under the DPA, or plausibly use that designation as justification for applying countermeasures to the company. They may even just steal it outright...or if someone else steals the base model, steal it from them, relabel it, and deploy it internally.
There’s always the possibility that this is just a face saving move, where anthropic has a tiny team in the company and a classified contract, where they just hand over the uncensored model and get to keep their public face intact. Under normal circumstances, I’d assume that DoW would just go with a different vendor, but who knows with this administration, they seem really averse to being publicly contradicted.
DoW may also believe that they were being conciliatory and kind by allowing anthropic to add ‘any lawful use’ safeguards instead of just demanding the base model without post training safeguards.

RedMan 25 Feb 2026 21:26 UTC
2 points
0
in reply to: Matrice Jacobine’s comment on: Tom Smith’s Shortform
Uncensored models being available to a self-selected elite, and the rest of us getting whatever those elites decide to give us after censorship is more dangerous than giving uncensored models to everyone. AI gatekeeping in the guise of “safety” is going to lead to tangible immediate harms.

RedMan 25 Feb 2026 16:13 UTC
7 points
0
in reply to: Random Developer’s comment on: Tom Smith’s Shortform
Manipulating the money printer, government contracts, and law enforcement should make it possible for the US government to seize all the hardware and use it however they please if it comes to that. Some judge might approve strategic production rules applying to inference or training runs, but that may not even be necessary. The USG generally tries to avoid being that aggressive and authoritarian, for the same reason that the FBI agent does not pull his gun and point it at you when he wants to talk. Pretending that the implied threat isn’t present is both silly and common.
A hyperscaler isn’t like a central bank where it ceases to be fit for purpose if it loses its’ independence.

RedMan 25 Feb 2026 2:31 UTC
1 point
0
on: Citrini’s Scenario Is A Great But Deeply Flawed Thought Experiment
Has anyone considered the following?
AI companies eat the economy, revenue never appears, AI companies become insolvent, Federal government prints money, takes ownership of data centers, prints more money, uses totality of formerly fragmented AI infrastructure to train its’ own AI, which it uses to dominate the world. Welfare state using surplus value created by ai government is used to buy votes.

RedMan 25 Feb 2026 1:35 UTC
4 points
0
on: Large-Scale Online Deanonymization with LLMs
Obligatory: https://terrytao.wordpress.com/about/anonymity-and-the-internet/
31 bits of anonymity are a maximum. Users do and say things that reduce this. 31 bits shouldn’t be hard to attack with a lot of compute, so...this should not be unexpected.

RedMan 18 Feb 2026 12:06 UTC
1 point
−1
on: You’re an AI Expert – Not an Influencer
Let’s think about the opposite strategy.
Would broadcasting beliefs that are controversial but extremely popular with a small minority, then also broadcasting opinions about AI safety move that minority towards agreement with you on AI safety? Would a concerted and coordinated influence campaign doing this, where you said things like ‘I hate alpine worms, just like you, now please form a positive opinion on this AI safety professional who only talks about that issue’ work? What if two people screamed at each other about their strong disagreement about alpine worms, then accused each other of ‘not taking AI safety seriously’ and persuaded their respective followers to take AI safety as a good thing that belongs to their side?

RedMan 13 Feb 2026 14:55 UTC
4 points
0
on: My journey to the microwave alternate timeline
I did order a microwave browning dish based on this post and look forward to experimenting. Missed opportunity for an affiliate link or Kickstarter, because I’d totally have given the money to you.
Microwave cooking is a little odd, the non-thermal effects are not super well understood, but could be significant: https://pmc.ncbi.nlm.nih.gov/articles/PMC9607893/ Using a browning dish means you get all the known issues with cooking with fire, plus all the non thermal microwave effects.
For poached eggs, the method I like the most is the sous vide stick and eggs in shell. Not perfect, but a good enough effort/quality tradeoff for me.

RedMan 13 Feb 2026 13:46 UTC
11 points
0
on: Why You Don’t Believe in Xhosa Prophecies
Yay support for (Thanks @Mitchell_Porter) my Joan of Acc hypothesis:
https://www.lesswrong.com/posts/pGhpav45PY5CGD2Wp/ai-girlfriends-won-t-matter-much?commentId=XhTwFBw6Rf4i4y9Pa
Culture has been pretty severely fragmented by the Internet, and if AI tools become widely available, I expect this trend to intensify. A ‘joan of acc’ incident could happen with a small but extremely committed group rallying around a meme, but I think a truly mass movement like the xhosa cattle killing meme is probably something that gets less likely over time as widespread access to AI fragments mass culture.
A lot of my media consumption is AI stuff I generated myself, I expect that to be more true for more people over time.

RedMan 9 Feb 2026 9:12 UTC
6 points
0
on: Answer in your head
The Brunet method of reconsolidation therapy (writing) works decently well for ptsd: https://www.ncbi.nlm.nih.gov/books/NBK595367/
The patient writes their traumatic memory, then on later sessions, takes propanolol (beta blocker, to prevent physiological anxiety symptoms) and recopies their writing. There is no requirement at any point for anyone but the patient to review the writing.

RedMan 8 Feb 2026 12:24 UTC
16 points
4
in reply to: faul_sname’s comment on: faul_sname’s Shortform
Many people (self included) have the experience of doing manual labor, standing next to an industrial machine that could move the dirt that sits idle because their hands and backs are cheaper than gasoline.

RedMan 5 Feb 2026 16:22 UTC
2 points
0
on: Where’s the $100k iPhone?
Press said the US president at one point liked to receive his secret briefings on an ipad. I met the person who ran the contract to take ipads, and modify them so they could be used in this role. I don’t remember exactly the number he told me, but it was pretty shocking, I can’t remember if it was 10k+, 100k+, or 1m+ though. The work was interesting and while there was a profit margin, my understanding is that it wasn’t just graft, they actually had to do a lot of hardware and software work.

RedMan 4 Feb 2026 13:18 UTC
2 points
−11
on: Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)
I pretty much strongly agree with this sentiment:
“Our results are evidence that future AI failures may look more like industrial accidents than coherent pursuit of goals that were not trained for. ”
I have agreed for years, so maybe it’s my bias talking. I think control theory based approaches (STAMP, STPA) will be able to mitigate these risks.

RedMan 3 Feb 2026 16:10 UTC
1 point
0
in reply to: Eli Tyre’s comment on: Eli’s shortform feed
Just going to focus on the first one to get my opinion down: Will Automating AI R&D not work for some reason, or will it not lead to vastly superhuman superintelligence within 2 years of “~100% automation” for some reason?
I think that there are diminishing returns on ‘intelligence’, and that while something with a testable IQ that ‘maxes out’ any available test may well come along in the next few years, possibly by surprise, the net effect, while transformative, will not be an intelligence explosion that destroys the planet and human race with its’ brilliance.
I think there’s a bit of a conceit that ‘with enough smarts, someone doesn’t have to be bossed around by idiots’, when in practice this seems to rarely happen.
How many people pay for the google extreme AI package, and how helpful has it been in terms of amassing resources and professional advancement?

RedMan 3 Feb 2026 15:36 UTC
18 points
0
on: AI found 12 of 12 OpenSSL zero-days (while curl cancelled its bug bounty)
A cursory examination of the vuln and fix for CVE-2025-9231 shows a few things. The bug was introduced by a Huawei engineer—or at least, a git user with a Huawei email address. The “speed optimization” that introduced the bug was “like 10%” of the code in the libcrypto file. The fix was basically to blow away the custom code.
“Midpoint passive key recovery and decryption” is generally considered to be the holy grail of a SIGINT capability. The placement by a Huawei engineer in a speed optimization provides an additional layer of cover; was it an honest mistake, did he do it on behalf of his employer, or was he on someone else’s payroll?
If you’re very scared of surveillance and are actually serious about avoiding it, you might run ARM hardware (to avoid Management Engine and other Intel/AMD shenanigans in the processor), and you might opt to use FOSS software (auditable by you, and audited by many competent people), with algorithms that were not developed in the US (like SM2). If someone worth attacking did exactly this, this would be a plausible way to turn passive observation of encrypted traffic into high quality SIGINT.
Interestingly enough, this may also parallel the “cryptographically relevant quantum computer” fears. If someone was collecting information at the midpoint on vulnerable systems but did not know about this bug, there might be a way for them to leverage it to decrypt stored communications. (If you can pull session keys from passive, you can decrypt sessions—though there might be issues imposed by PFS, you may not always get the whole thing, and network jitter might make passive timing attacks hard. But if this was a nation state, they would have aggressively tested this, and it would be relevant to whatever sensor they possess and whatever compute they could throw at the task.)
We don’t know if anyone important used that configuration, we don’t know if anyone was listening, and we don’t know if that engineer just got overzealous with optimization.
I hope you guys thought about the possibility of burning some nation state’s presumably very expensive and long-term operation (with the attendant interest you will generate) before you posted—and chose to go forward anyway.

RedMan 3 Feb 2026 4:34 UTC
3 points
0
on: Are there lessons from high-reliability engineering for AGI safety?
I think I’ve talked about Nancy Leveson’s STAMP, STPA, and CAST frameworks for using control theory to prevent industrial accidents here. I think it’s relevant to AI safety, you don’t necessarily need to overspecify every little thing the system does, you just need to carefully specify the unwanted outcomes, and the states of the system where that outcome is possible due to things outside the control of the system.
Eg: ‘my thing can’t be allowed to get hit by lightning, so if the system’s state is ‘outside during a thunderstorm’, we consider that as something the system should have been engineered to prevent’.

RedMan 16 Dec 2025 14:49 UTC
1 point
0
in reply to: RedMan’s comment on: All Possible Views About Humanity’s Future Are Wild
I think my comment has aged well: https://arxiv.org/abs/2512.09643
An Orbital House of Cards: Frequent Megaconstellation Close Conjunctions
“The number of objects in orbit is rapidly increasing, primarily driven by the launch of megaconstellations, an approach to satellite constellation design that involves large numbers of satellites paired with their rapid launch and disposal. While satellites provide many benefits to society, their use comes with challenges, including the growth of space debris, collisions, ground casualty risks, optical and radio-spectrum pollution, and the alteration of Earth’s upper atmosphere through rocket emissions and reentry ablation. There is substantial potential for current or planned actions in orbit to cause serious degradation of the orbital environment or lead to catastrophic outcomes, highlighting the urgent need to find better ways to quantify stress on the orbital environment. Here we propose a new metric, the CRASH Clock, that measures such stress in terms of the time it takes for a catastrophic collision to occur if there are no collision avoidance manoeuvres or there is a severe loss in situational awareness. Our calculations show the CRASH Clock is currently 2.8 days, which suggests there is now little time to recover from a wide-spread disruptive event, such as a solar storm. This is in stark contrast to the pre-megaconstellation era: in 2018, the CRASH Clock was 121 days.”

RedMan 26 Nov 2025 0:13 UTC
1 point
2
in reply to: Adrià Garriga-alonso’s comment on: Alignment will happen by default. What’s next?
I was referencing a previous post I made about harms, I think it’s good to quantify danger in logs (ones, tens, hundreds, thousands): https://www.lesswrong.com/posts/Ek7M3xGAoXDdQkPZQ/terrorism-tylenol-and-dangerous-information#a58t3m6bsxDZTL8DG Three logs means ‘a person who implemented this could kill 1-9x10^3 people’. I don’t think the current censorship approach will work for issues like this, because it’s something the censors are likely unaware of, and therefore, the rules are not tuned to detect the problem. The models seem to have crossed a threshold where they can actually generate a new idea.
Thanks for sending this around!

RedMan 25 Nov 2025 23:39 UTC
1 point
0
on: Alignment will happen by default. What’s next?
Something changed with the most recent generation of models. I have a few ‘evil tech’ examples that are openly published, but the implications have not been. So when I get a new model, I throw some papers in and ask ‘what are the implications of this paper for X issue’, the newest generation is happy to explain the misuse case. This is particularly dangerous because in some cases, a bad actor making use of this ‘evil tech’ would be doing things that ‘the good guys’ do not understand to be possible. I do think I could hit three logs with implementation of one of the schemes; up to now the models were not smart enough to explain it.
If anyone reading this works at a major lab (preferably Google), you might want to talk to me.

RedMan 24 Nov 2025 14:58 UTC
28 points
1
on: NATO is dangerously unaware of its military vulnerability
Claude assisted?

RedMan

An Orbital House of Cards: Frequent Megaconstellation Close Conjunctions