‘I get surrounded by small ugh fields that grow into larger, overlapping ugh fields until my navigation becomes constained and eventually impossible’ was how I described one such experience
PipFoweraker
A Sketched Proposal for Interrogatory, Low-Friction Model Cards
My auditor brain was getting annoyed by what I see the current state of model cards as being. If we adopt better norms about these proactively, this seems like low effort to moderately good payoff? I am unsure on this, hence, rough draft below.
Problem
Model cards are uneven: selective disclosure, vague provenance, flattering metrics, generic limitations. Regulation (EU AI Act) and risk frameworks (NIST AI RMF) are pushing toward evidence-backed docs, but most “cards” are still self-reported. If we want to close evals gaps and make safety claims more falsifiable, model card norms are an obvious leverDesign Goal
A modern, interrogatory model card that:
- Minimizes authoring friction
- Maximizes falsifiability/comparability
- Fits EU/NIST governance
- Works for open/closed modelsLever: machine-readable schema + sharp prompts + links to evidence (metrics/datasets/scripts), not essays. Fine tune requirements in further versions.
CAN • SHOULD • MUST
MUST
- Identity & lineage (machine-readable)
- Intended & out-of-scope uses (≥3 concrete unsafe uses + rationale)
- Performance claims → dataset version + eval script commit + run hash (subgroup metrics if relevant)
- Limitations & failure modes (worst-case behaviors tried; at least one “worse than baseline” context)
- Data provenance summary (classes/volumes/filters; link Data Cards if possible)
- Safety testing overview (red-team/evals scope: jailbreaks, autonomy, persuasion, cyber, bio)
- Operational constraints + changelog (rate/context limits, moderation, update policy)SHOULD
- One “executable” eval (small, reproducible subset)
- Map to NIST AI RMF or EU AI Act obligations
- Short System Card if shipped in a product (HITL, retention, deployment affordances)
- Bias/fairness subgroup rationale (why these; what’s missing)CAN
- Third-party attestations (verify a slice of claims)
- Public card score (completeness/candor)
- Living card (auto-update + diff feed)Low-Friction Rationale
Schema-first, link artifacts over prose, reuse compliance docs you already produce.
Rough time: 4–10 hrs initial (10–20 if backfilling subgroup metrics); ~1 hr per update.
Doing this establishes norms that are not burdensome and leave compliance gaps more evidentAdoption Pathways
Venues (MUST + one SHOULD), model hubs (completeness badge), procurement (map to EU/NIST), community norms (reward executable claims), standards (you must have this to upload)Next Steps (repo plan)
- v0.2: add JSON Schema and a minimal example card; CI to validate examples
- v0.3: sharpen interrogatory prompts; add HF-friendly README template
- v0.4: collect feedback, add third-party attestation hooks
Questions
What loopholes or blind spots do you see in this CAN/SHOULD/MUST split?
Am this being interrogatory enough?
How would you game this framework if it was a norm and you were adversarially capabilities-motivated?
You’d run into cognitive overhead limits. Manually reviewing other people’s conversations can only really happen at 1:1 to 2:1 speeds. Summaries are much more efficient.
Plus, people behave very differently in radically observed environments. Panopticons were designed as part of a punishment system for a reason.
my project DRI starter kit
DRI = Directly Responsible Individual. Often a / the Project Manager, but not always!
By ‘graceful’, do you mean morally graceful, technically graceful, or both / other?
Thanks for the write-up, I recall a conversation introducing me to all these ideas in Berkeley last year and it’s going to be very handy having a resource to point people at (and so I don’t misremember details about things like the Yamanaka factors!).
Am I reading the current plan correctly such that the path is something like:
Get funding → Continue R+D through primate trials → Create an entity in a science-friendly, non-US state for human trials → first rounds of Superbabies? That scenario seems like it would require a bunch of medical tourism, which I imagine is probably not off the table for people with the resources and mindset willing to participate in this.
I’m not sure that this mental line of defence would necessarily hold, us humans are easily manipulated by things that we know to be extremely simple agents that are definitely trying to manipulate us all the time: babies, puppies, kittens, etc.
This still holds true a significant amount of the time even if we pre-warn ourselves against the pending manipulation—there is a recurrent meme of, eg, dads in families not ostensibly not wanting a pet, only to relent when presented with one.
This implies your timelines for any large impact from AI would span multiple future generations, is that correct?
Dropping my plans of earning to give, which only really made sense before the recent flood of funding and the compression of timelines.
Increasing the amount of study I’m doing in Alignment and adjacent safety spaces. I have low confident I’ll be able to help in any meaningful fashion given my native abilities and timelines, but not trying seems both foolish and psychologically damaging.
Reconsidering my plans to have children—it’s more likely I’ll spend time and resources on children already existing (or planned) inside my circle of caring.
Relatively small behavioural changes on your end may address some of the causes of these frustrations. It sounds like you might be overstocked with things of relatively low long-term utility, which is why it’s hard to immediately pass them on. Have you scanned your spending patterns for hyperbolic discounting, for example?
Comics are a great example—if you have the willpower to hold off until you can but a TPB, they’re cheaper, more economic, more durable, take up less storage space, and are much easier to pass on or pass around than individual comics. If you have friends who also enjoy comics, it’s easier to pass around books of them than individual issues, and you can probably read a broader range. Alternately, if you find a way to read comics online or through an app, you can enjoy getting stories as they come out but through digital distribution instead of dead trees. If you’re not collecting dead tree stories for long-term value, and don’t re-read, then that may be a positive trade-off for you.
You’re correct in that throwing things away is one of the least useful things you can do with them. Each low-utility spare object is probably not worth a huge effort in disposing of appropriately individually, so why trap yourself into that situation by virtue of your own lifestyle choices?
Do you have municipal recycling facilities or charities that you could donate things to?
For the entrepreneur—I’d pay some marginally low cost to have comics delivered from MrMind’s house to mine once they’re done with them :-)
Today’s SMBC will drag a smile out of many people here if thy haven’t read it already.
Whoah. That gets many points. What an excellent layout! We need to know what boots are for it to translate, but that’s a lot closer to an ideal solution than I’ve worked through.
Edit—I thought the diagram looked familiar!
In your situation, in Australia, it’s mostly about forward planning. Do you have any foreknowledge of likely changes in your health or family situation?
The insurance market in Australia has historically been pretty poor in terms of transparency and easy comparisons. I’m sure you’ve found the various compare-policy tools online. I’m assuming you don’t want to piggyback on a family policy.
Are you looking for more data, or a list of considerations for insurance planning? If it’s the latter, try browsing around insurance industry planner websites for their policy documents. I can probably get some friends in the industry to email me more comprehensive things if you want to work of their approaches.
Thanks! I’ve been playing around with it for a week or so but can’t elegantly find a way to do it that meets my arbitrary standards of elegance and cool design :-)
Becomes easier when using non-circular shapes for Venn-ing, but my efforts look a little hacky.
Reminiscing over one of my favourite passages from Anathem, I’ve been enjoying looking through visual, wordless proofs of late. The low-hanging fruit is mostly classical geomety, but a few examples of logical proofs have popped up as well.
This got me wondering if it’s possible to communicate the fundamental idea of Bayes’ Theorem in an entirely visual format, without written language or symbols needing translation. I’d welcome thoughts from anyone else on this.
In any particular geographical or topical area?
I ran into this issue as well, being relatively well credentialed professionally and through the TAFE / AQF framework. It’s hard to know where to put the scale, so I normally do an equivalence of hours-studied-full-time-loading in my head and use that.
Unfortunately, this. I did coverage work for WoTC for a few years and my custom dictionary is ridiculous.
For bonus points, I’ve also reviewed 200+ Spec Fic novels, so the amount of weird pronouns in the list is spectacular.
I have a non-specific recollection that, generally speaking, phrasing directions in the positive imperative (“Treat dogs well”) rather than a negative imperative (“Do not treat dogs badly”) leads to better rates of recall / compliance.
If it interests you I’ll ask around and find a proper reference.
Meta: I have been burrowed away in other research but came across these notes and thought I would publish them rather than let them languish. If there are other efforts in this direction, I would be glad to be pointed that way so I can abandon this idea and support someone else’s instead.