Feels worth noting that the alignment evaluation section is by far the largest section in the system card: 65 pages in total (44% of the whole thing). Here are the section page counts
Abstract — 4 pages (pp. 1–4)
1 Introduction — 5 pages (pp. 5–9)
2 Safeguards and harmlessness — 9 pages (pp. 10–18)
3 Honesty — 5 pages (pp. 19–23)
4 Agentic safety — 7 pages (pp. 24–30)
5 Cyber capabilities — 14 pages (pp. 31–44)
6 Reward hacking — 4 pages (pp. 45–48)
7 Alignment assessment — 65 pages (pp. 49–113)
8 Model welfare assessment — 9 pages (pp. 114–122)
9 RSP evaluations — 25 pages (pp. 123–147)
The white box evaluation subsection (7.4) alone is 26 pages, longer than any other section!
Feels worth noting that the alignment evaluation section is by far the largest section in the system card: 65 pages in total (44% of the whole thing).
Here are the section page counts
Abstract — 4 pages (pp. 1–4)
1 Introduction — 5 pages (pp. 5–9)
2 Safeguards and harmlessness — 9 pages (pp. 10–18)
3 Honesty — 5 pages (pp. 19–23)
4 Agentic safety — 7 pages (pp. 24–30)
5 Cyber capabilities — 14 pages (pp. 31–44)
6 Reward hacking — 4 pages (pp. 45–48)
7 Alignment assessment — 65 pages (pp. 49–113)
8 Model welfare assessment — 9 pages (pp. 114–122)
9 RSP evaluations — 25 pages (pp. 123–147)
The white box evaluation subsection (7.4) alone is 26 pages, longer than any other section!