PipFoweraker

Karma: 124

PipFoweraker 30 Oct 2025 23:45 UTC
4 points
0
on: Your AI Safety org could get EU funding up to €9.08M. Here’s how (+ free personalized support)
This kind of post is awesome and too uncommon.
Helping people through operational bottlenecks or invisible stage-gates—like tailoring your application to suit your org’s reputation / scale—is good metis and worth spreading.

PipFoweraker 1 Sep 2025 6:43 UTC
1 point
0
in reply to: PipFoweraker’s comment on: PipFoweraker’s Shortform
Meta: I have been burrowed away in other research but came across these notes and thought I would publish them rather than let them languish. If there are other efforts in this direction, I would be glad to be pointed that way so I can abandon this idea and support someone else’s instead.

PipFoweraker 1 Sep 2025 2:59 UTC
31 points
13
in reply to: asher’s comment on: asher’s Shortform
‘I get surrounded by small ugh fields that grow into larger, overlapping ugh fields until my navigation becomes constained and eventually impossible’ was how I described one such experience

PipFoweraker 1 Sep 2025 2:55 UTC
1 point
0
on: PipFoweraker’s Shortform
A Sketched Proposal for Interrogatory, Low-Friction Model Cards

My auditor brain was getting annoyed by what I see the current state of model cards as being. If we adopt better norms about these proactively, this seems like low effort to moderately good payoff? I am unsure on this, hence, rough draft below.

Problem
Model cards are uneven: selective disclosure, vague provenance, flattering metrics, generic limitations. Regulation (EU AI Act) and risk frameworks (NIST AI RMF) are pushing toward evidence-backed docs, but most “cards” are still self-reported. If we want to close evals gaps and make safety claims more falsifiable, model card norms are an obvious lever
Design Goal
A modern, interrogatory model card that:
- Minimizes authoring friction
- Maximizes falsifiability/comparability
- Fits EU/NIST governance
- Works for open/closed models
Lever: machine-readable schema + sharp prompts + links to evidence (metrics/datasets/scripts), not essays. Fine tune requirements in further versions.
CAN • SHOULD • MUST

MUST
- Identity & lineage (machine-readable)
- Intended & out-of-scope uses (≥3 concrete unsafe uses + rationale)
- Performance claims → dataset version + eval script commit + run hash (subgroup metrics if relevant)
- Limitations & failure modes (worst-case behaviors tried; at least one “worse than baseline” context)
- Data provenance summary (classes/volumes/filters; link Data Cards if possible)
- Safety testing overview (red-team/evals scope: jailbreaks, autonomy, persuasion, cyber, bio)
- Operational constraints + changelog (rate/context limits, moderation, update policy)
SHOULD
- One “executable” eval (small, reproducible subset)
- Map to NIST AI RMF or EU AI Act obligations
- Short System Card if shipped in a product (HITL, retention, deployment affordances)
- Bias/fairness subgroup rationale (why these; what’s missing)
CAN
- Third-party attestations (verify a slice of claims)
- Public card score (completeness/candor)
- Living card (auto-update + diff feed)
Low-Friction Rationale
Schema-first, link artifacts over prose, reuse compliance docs you already produce.
Rough time: 4–10 hrs initial (10–20 if backfilling subgroup metrics); ~1 hr per update.
Doing this establishes norms that are not burdensome and leave compliance gaps more evident
Adoption Pathways
Venues (MUST + one SHOULD), model hubs (completeness badge), procurement (map to EU/NIST), community norms (reward executable claims), standards (you must have this to upload)
Next Steps (repo plan)
- v0.2: add JSON Schema and a minimal example card; CI to validate examples
- v0.3: sharpen interrogatory prompts; add HF-friendly README template
- v0.4: collect feedback, add third-party attestation hooks

Questions
What loopholes or blind spots do you see in this CAN/SHOULD/MUST split?
Am this being interrogatory enough?
How would you game this framework if it was a norm and you were adversarially capabilities-motivated?

PipFoweraker 19 Mar 2025 7:01 UTC
1 point
1
in reply to: samuelshadrach’s comment on: How I’ve run major projects
You’d run into cognitive overhead limits. Manually reviewing other people’s conversations can only really happen at 1:1 to 2:1 speeds. Summaries are much more efficient.
Plus, people behave very differently in radically observed environments. Panopticons were designed as part of a punishment system for a reason.

PipFoweraker 19 Mar 2025 7:00 UTC
6 points
0
on: How I’ve run major projects
my project DRI starter kit
DRI = Directly Responsible Individual. Often a / the Project Manager, but not always!

PipFoweraker 11 Mar 2025 4:19 UTC
1 point
0
in reply to: Canaletto’s comment on: Catastrophic sabotage as a major threat model for human-level AI systems
By ‘graceful’, do you mean morally graceful, technically graceful, or both / other?

PipFoweraker 19 Feb 2025 21:40 UTC
10 points
2
on: How to Make Superbabies
Thanks for the write-up, I recall a conversation introducing me to all these ideas in Berkeley last year and it’s going to be very handy having a resource to point people at (and so I don’t misremember details about things like the Yamanaka factors!).

Am I reading the current plan correctly such that the path is something like:
Get funding → Continue R+D through primate trials → Create an entity in a science-friendly, non-US state for human trials → first rounds of Superbabies? That scenario seems like it would require a bunch of medical tourism, which I imagine is probably not off the table for people with the resources and mindset willing to participate in this.

PipFoweraker 15 Jan 2023 0:42 UTC
2 points
1
in reply to: Cody Rushing’s comment on: How it feels to have your mind hacked by an AI
I’m not sure that this mental line of defence would necessarily hold, us humans are easily manipulated by things that we know to be extremely simple agents that are definitely trying to manipulate us all the time: babies, puppies, kittens, etc.
This still holds true a significant amount of the time even if we pre-warn ourselves against the pending manipulation—there is a recurrent meme of, eg, dads in families not ostensibly not wanting a pet, only to relent when presented with one.

PipFoweraker 14 Jul 2022 4:59 UTC
2 points
0
in reply to: Flaglandbase’s comment on: How do AI timelines affect how you live your life?
This implies your timelines for any large impact from AI would span multiple future generations, is that correct?

PipFoweraker 14 Jul 2022 4:54 UTC
2 points
2
on: How do AI timelines affect how you live your life?
Dropping my plans of earning to give, which only really made sense before the recent flood of funding and the compression of timelines.
Increasing the amount of study I’m doing in Alignment and adjacent safety spaces. I have low confident I’ll be able to help in any meaningful fashion given my native abilities and timelines, but not trying seems both foolish and psychologically damaging.
Reconsidering my plans to have children—it’s more likely I’ll spend time and resources on children already existing (or planned) inside my circle of caring.

PipFoweraker 29 Oct 2016 23:35 UTC
2 points
0
in reply to: MrMind’s comment on: What’s the most annoying part of your life/job?
Relatively small behavioural changes on your end may address some of the causes of these frustrations. It sounds like you might be overstocked with things of relatively low long-term utility, which is why it’s hard to immediately pass them on. Have you scanned your spending patterns for hyperbolic discounting, for example?

Comics are a great example—if you have the willpower to hold off until you can but a TPB, they’re cheaper, more economic, more durable, take up less storage space, and are much easier to pass on or pass around than individual comics. If you have friends who also enjoy comics, it’s easier to pass around books of them than individual issues, and you can probably read a broader range. Alternately, if you find a way to read comics online or through an app, you can enjoy getting stories as they come out but through digital distribution instead of dead trees. If you’re not collecting dead tree stories for long-term value, and don’t re-read, then that may be a positive trade-off for you.

You’re correct in that throwing things away is one of the least useful things you can do with them. Each low-utility spare object is probably not worth a huge effort in disposing of appropriately individually, so why trap yourself into that situation by virtue of your own lifestyle choices?

Do you have municipal recycling facilities or charities that you could donate things to?

For the entrepreneur—I’d pay some marginally low cost to have comics delivered from MrMind’s house to mine once they’re done with them :-)

PipFoweraker 1 Jun 2016 19:25 UTC
1 point
0
in reply to: ArisKatsaris’s comment on: June 2016 Media Thread
Today’s SMBC will drag a smile out of many people here if thy haven’t read it already.

PipFoweraker 24 May 2016 4:19 UTC
0 points
0
in reply to: SquirrelInHell’s comment on: Open Thread May 23 - May 29, 2016
Whoah. That gets many points. What an excellent layout! We need to know what boots are for it to translate, but that’s a lot closer to an ideal solution than I’ve worked through.

Edit—I thought the diagram looked familiar!

PipFoweraker 23 May 2016 1:53 UTC
0 points
0
in reply to: Elo’s comment on: Open Thread May 23 - May 29, 2016
In your situation, in Australia, it’s mostly about forward planning. Do you have any foreknowledge of likely changes in your health or family situation?

The insurance market in Australia has historically been pretty poor in terms of transparency and easy comparisons. I’m sure you’ve found the various compare-policy tools online. I’m assuming you don’t want to piggyback on a family policy.

Are you looking for more data, or a list of considerations for insurance planning? If it’s the latter, try browsing around insurance industry planner websites for their policy documents. I can probably get some friends in the industry to email me more comprehensive things if you want to work of their approaches.

PipFoweraker 23 May 2016 1:48 UTC
2 points
0
in reply to: Vaniver’s comment on: Open Thread May 23 - May 29, 2016
Thanks! I’ve been playing around with it for a week or so but can’t elegantly find a way to do it that meets my arbitrary standards of elegance and cool design :-)

Becomes easier when using non-circular shapes for Venn-ing, but my efforts look a little hacky.

PipFoweraker 23 May 2016 0:58 UTC
5 points
0
on: Open Thread May 23 - May 29, 2016
Reminiscing over one of my favourite passages from Anathem, I’ve been enjoying looking through visual, wordless proofs of late. The low-hanging fruit is mostly classical geomety, but a few examples of logical proofs have popped up as well.

This got me wondering if it’s possible to communicate the fundamental idea of Bayes’ Theorem in an entirely visual format, without written language or symbols needing translation. I’d welcome thoughts from anyone else on this.

PipFoweraker 23 May 2016 0:50 UTC
2 points
0
in reply to: Elo’s comment on: Open Thread May 23 - May 29, 2016
In any particular geographical or topical area?

PipFoweraker 16 May 2016 1:29 UTC
0 points
0
in reply to: Morgrim’s comment on: 2016 LessWrong Diaspora Survey Analysis: Part One (Meta and Demographics)
I ran into this issue as well, being relatively well credentialed professionally and through the TAFE / AQF framework. It’s hard to know where to put the scale, so I normally do an equivalence of hours-studied-full-time-loading in my head and use that.

PipFoweraker 24 Apr 2016 18:53 UTC
2 points
0
in reply to: CronoDAS’s comment on: My Custom Spelling Dictionary
Unfortunately, this. I did coverage work for WoTC for a few years and my custom dictionary is ridiculous.

For bonus points, I’ve also reviewed 200+ Spec Fic novels, so the amount of weird pronouns in the list is spectacular.