RSS

Trans­parency /​ In­ter­pretabil­ity (ML & AI)

TagLast edit: 1 Aug 2020 15:58 UTC by Multicore

Transparency and interpretability is the ability for the decision processes and inner workings of AI and machine learning systems to be understood by humans or other outside observers.

Present day machine learning systems are typically not very transparent or interpretable. You can use a model’s output, but the model can’t tell you why it made that output. This makes it hard to determine the cause of biases in ML models.

In­ter­pretabil­ity in ML: A Broad Overview

lifelonglearner4 Aug 2020 19:03 UTC
41 points
5 comments15 min readLW link

Trans­parency and AGI safety

jylin0411 Jan 2021 18:51 UTC
50 points
12 comments30 min readLW link

What is In­ter­pretabil­ity?

17 Mar 2020 20:23 UTC
33 points
0 comments11 min readLW link

Chris Olah’s views on AGI safety

evhub1 Nov 2019 20:13 UTC
140 points
38 comments12 min readLW link2 nominations2 reviews

Us­ing GPT-N to Solve In­ter­pretabil­ity of Neu­ral Net­works: A Re­search Agenda

3 Sep 2020 18:27 UTC
60 points
11 comments2 min readLW link

An An­a­lytic Per­spec­tive on AI Alignment

DanielFilan1 Mar 2020 4:10 UTC
53 points
45 comments8 min readLW link
(danielfilan.com)

Ver­ifi­ca­tion and Transparency

DanielFilan8 Aug 2019 1:50 UTC
34 points
6 comments2 min readLW link
(danielfilan.com)

Mechanis­tic Trans­parency for Ma­chine Learning

DanielFilan11 Jul 2018 0:34 UTC
55 points
9 comments4 min readLW link

How can In­ter­pretabil­ity help Align­ment?

23 May 2020 16:16 UTC
33 points
3 comments9 min readLW link

One Way to Think About ML Transparency

Matthew Barnett2 Sep 2019 23:27 UTC
26 points
28 comments5 min readLW link

Re­laxed ad­ver­sar­ial train­ing for in­ner alignment

evhub10 Sep 2019 23:03 UTC
54 points
10 comments27 min readLW link

Spar­sity and in­ter­pretabil­ity?

1 Jun 2020 13:25 UTC
40 points
3 comments7 min readLW link

Search ver­sus design

alexflint16 Aug 2020 16:53 UTC
83 points
39 comments36 min readLW link

In­ner Align­ment in Salt-Starved Rats

Steven Byrnes19 Nov 2020 2:40 UTC
111 points
31 comments11 min readLW link

Multi-di­men­sional re­wards for AGI in­ter­pretabil­ity and control

Steven Byrnes4 Jan 2021 3:08 UTC
10 points
5 comments10 min readLW link

MIRI com­ments on Co­tra’s “Case for Align­ing Nar­rowly Su­per­hu­man Models”

Rob Bensinger5 Mar 2021 23:43 UTC
124 points
13 comments26 min readLW link

Trans­parency Trichotomy

Mark Xu28 Mar 2021 20:26 UTC
20 points
2 comments7 min readLW link

Solv­ing the whole AGI con­trol prob­lem, ver­sion 0.0001

Steven Byrnes8 Apr 2021 15:14 UTC
41 points
4 comments26 min readLW link

Opinions on In­ter­pretable Ma­chine Learn­ing and 70 Sum­maries of Re­cent Papers

9 Apr 2021 19:19 UTC
111 points
13 comments102 min readLW link

Gra­di­ent hacking

evhub16 Oct 2019 0:53 UTC
74 points
34 comments3 min readLW link2 nominations2 reviews

Will trans­parency help catch de­cep­tion? Per­haps not

Matthew Barnett4 Nov 2019 20:52 UTC
43 points
5 comments7 min readLW link

Ro­hin Shah on rea­sons for AI optimism

abergal31 Oct 2019 12:10 UTC
40 points
58 comments1 min readLW link
(aiimpacts.org)

Un­der­stand­ing understanding

mthq23 Aug 2019 18:10 UTC
24 points
1 comment2 min readLW link

in­ter­pret­ing GPT: the logit lens

nostalgebraist31 Aug 2020 2:47 UTC
107 points
28 comments10 min readLW link
No comments.