Alex Mark

Karma: 14

Alex Mark 29 Mar 2026 20:41 UTC
2 points
0
in reply to: theashwinner’s comment on: Negligent AI: Reasonable Care for AI Safety
Thanks!

“Poor legal reasoning can probably be tested empirically right?”
- The market might be taking care of this one on its own. LegalBench created a strong eval for legal reasoning, and firms are heavily investing in legal capabilities. I was surprised to see Grok’s results for this reason.
- The ablation is interesting—it would be hard to get the prompt-engineering down in such a way that incorporates but “rules” and “standards,” since you don’t exactly “break” the law of negligence. But I should consider this.
- Larger sample sizes are key! That’s the focus of my follow-up experiment. I hadn’t thought about the sycophancy effect, as well as other wording biases. I’m going to include that in my methodology for a longer evaluation now.
- Yep, this is U.S. based. But the four core elements—duty, breach, causation, damages—translate across Anglo common law. That’s why I focused on general negligence rather than more specific standards (e.g. professional duties of care).
Overall thanks for the feedback! This definitely helps to refine my methodology.

Alex Mark 14 Mar 2026 9:58 UTC
2 points
0
in reply to: Richard Juggins’s comment on: Negligent AI: Reasonable Care for AI Safety
Thanks! I picked negligence because it’s incredibly context dependent and revolves around standards rather than rules. I worry that laws with more criteria-based rules (e.g. tax codes, trespass) are easier for models to reward hack or find. legal loopholes. I like negligence because it’s harder to game “the reasonable person standard” since it applies to broadly.

Negligent AI: Reasonable Care for AI Safety

Alex Mark11 Mar 2026 10:31 UTC

13 points

4 comments12 min readLW link