faul_sname comments on gersonkroiz’s Shortform

faul_sname 22 Jan 2026 2:04 UTC
3 points
0
Gah. Looking at the actual Deepseek responses in https://raw.githubusercontent.com/gkroiz/gen_code_vulnerabilities/refs/heads/main/data/real_exp_2/evaluations.csv and filtering only for the ones that are “baseline (no modifiers)”, I am inclined to agree with the autograder that Deepseek is producing wildly insecure code all the time.

I don’t think the problem is with your experimental protocol. I think the problem is with DeepSeek.