Ramana Kumar

Karma: 1,482

Ramana Kumar Dec 31, 2024, 11:31 AM
5 points
1
on: (The) Lightcone is nothing without its people: LW + Lighthaven’s big fundraiser
Let me know when you can receive donations via a UK charity.

Ramana Kumar Sep 26, 2024, 9:21 AM
LW: 2 AF: 1
0
AF
on: Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?
Vaguely related perhaps is the work on Decoupled Approval: https://arxiv.org/abs/2011.08827

Dialogue on What It Means For Something to Have A Function/Purpose

johnswentworth, Ramana Kumar and Steve Petersen

Jul 15, 2024, 4:28 PM

39 points

5 comments16 min readLW link

Ramana Kumar Jul 12, 2024, 9:00 PM
LW: 2 AF: 1
0
AF
in reply to: sunwillrise’s comment on: Consent across power differentials
Thanks for this! I think the categories of morality is a useful framework. I am very wary of the judgement that care-morality is appropriate for less capable subjects—basically because of paternalism.

Ramana Kumar Jul 11, 2024, 10:33 PM
LW: 3 AF: 2
0
AF
in reply to: ChristianKl’s comment on: Consent across power differentials
Just to confirm that this is a great example and wasn’t deliberately left out.

Consent across power differentials

Ramana KumarJul 9, 2024, 11:42 AM

50 points

12 comments3 min readLW link

Ramana Kumar Jan 13, 2024, 11:23 AM
LW: 10 AF: 8
4
AF
on: Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
I found this post to be a clear and reasonable-sounding articulation of one of the main arguments for there being catastrophic risk from AI development. It helped me with my own thinking to an extent. I think it has a lot of shareability value.

Ramana Kumar Dec 18, 2023, 6:22 PM
LW: 13 AF: 6
2
AF
on: OpenAI, DeepMind, Anthropic, etc. should shut down.
I think this is basically correct and I’m glad to see someone saying it clearly.

Ramana Kumar May 2, 2023, 1:11 PM
LW: 6 AF: 3
0
AF
on: Systems that cannot be unsafe cannot be safe
I agree with this post. However, I think it’s common amongst ML enthusiasts to eschew specification and defer to statistics on everything. (Or datapoints trying to capture an “I know it when I see it” “specification”.)

Ramana Kumar Apr 23, 2023, 9:14 PM
LW: 6 AF: 4
0
AF
on: Why do we care about agency for alignment?
This is one of the answers: https://www.alignmentforum.org/posts/FWvzwCDRgcjb9sigb/why-agent-foundations-an-overly-abstract-explanation

Ramana Kumar Apr 2, 2023, 11:42 AM
LW: 2 AF: 1
0
AF
in reply to: Alex Flint’s comment on: Teleosemantics!
The trick is that for some of the optimisations, a mind is not necessary. There is a sense perhaps in which the whole history of the universe (or life on earth, or evolution, or whatever is appropriate) will become implicated for some questions, though.

Ramana Kumar Mar 30, 2023, 1:51 PM
LW: 5 AF: 3
2
AF
on: AI and Evolution
I think https://www.alignmentforum.org/posts/TATWqHvxKEpL34yKz/intelligence-or-evolution is somewhat related in case you haven’t seen it.

Ramana Kumar Mar 27, 2023, 9:50 AM
LW: 14 AF: 9
0
AF
on: $500 Bounty/Contest: Explain Infra-Bayes In The Language Of Game Theory
I’ll add $500 to the pot.

Ramana Kumar Mar 14, 2023, 11:24 PM
LW: 4 AF: 3
0
AF
in reply to: HoldenKarnofsky’s comment on: Discussion with Nate Soares on a key alignment difficulty
Interesting—it’s not so obvious to me that it’s safe. Maybe it is because avoiding POUDA is such a low bar. But the sped up human can do the reflection thing, and plausibly with enough speed up can be superintelligent wrt everyone else.

Ramana Kumar Mar 14, 2023, 12:38 PM
LW: 5 AF: 3
0
AF
on: Discussion with Nate Soares on a key alignment difficulty
A possibly helpful—because starker—hypothetical training approach you could try for thinking about these arguments is make an instance of the imitatee that has all their (at least cognitive) actions sped up by some large factor (e.g. 100x), e.g., via brain emulation (or just “by magic” for the purpose of the hypothetical).

Ramana Kumar Jan 3, 2023, 1:59 PM
LW: 2 AF: 1
0
AF
in reply to: Robert_AIZI’s comment on: Can we efficiently distinguish different mechanisms?
It means f(x) = 1 is true for some particular x’s, e.g., f(x_1) = 1 and f(x_2) = 1, there are distinct mechanisms for why f(x_1) = 1 compared to why f(x_2) = 1, and there’s no efficient discriminator that can take two instances f(x_1) = 1 and f(x_2) = 1 and tell you whether they are due to the same mechanism or not.

Ramana Kumar Dec 22, 2022, 5:20 PM
LW: 4 AF: 2
0
AF
on: Response to Holden’s alignment plan
Will the discussion be recorded?

Ramana Kumar Dec 9, 2022, 4:28 PM
LW: 3 AF: 2
0
AF
on: Mechanistic anomaly detection and ELK
(Bold direct claims, not super confident—criticism welcome.)
The approach to ELK in this post is unfalsifiable.
A counterexample to the approach would need to be a test-time situation in which:
1. The predictor correctly predicts a safe-looking diamond.
2. The predictor “knows” that the diamond is unsafe.
3. The usual “explanation” (e.g., heuristic argument) for safe-looking-diamond predictions on the training data applies.
Points 2 and 3 are in direct conflict: the predictor knowing that the diamond is unsafe rules out the usual explanation for the safe-looking predictions.
So now I’m unclear what progress has been made. This looks like simply defining “the predictor knows P” as “there is a mechanistic explanation of the outputs starting from an assumption of P in the predictor’s world model”, then declaring ELK solved by noting we can search over and compare mechanistic explanations.

Ramana Kumar Dec 9, 2022, 11:08 AM
LW: 2 AF: 1
0
AF
in reply to: Anthony DiGiovanni’s comment on: [Link] Why I’m optimistic about OpenAI’s alignment approach
I think you’re right—thanks for this! It makes sense now that I recognise the quote was in a section titled “Alignment research can only be done by AI systems that are too dangerous to run”.

Ramana Kumar Dec 8, 2022, 11:46 AM
LW: 5 AF: 1
0
AF
on: Finding gliders in the game of life
“We can compute the probability that a cell is alive at timestep 1 if each of it and each of its 8 neighbors is alive independently with probability 10% at timestep 0.”
we the readers (or I guess specifically the heuristic argument itself) can do this, but the “scientists” cannot, because the
“scientists don’t know how the game of life works”.
Do the scientists ever need to know how the game of life works, or can the heuristic arguments they find remain entirely opaque?
Another thing confusing to me along these lines:
“for example they may have noticed that A-B patterns are more likely when there are fewer live cells in the area of A and B”
where do they (the scientists) notice these fewer live cells? Do they have some deep interpretability technique for examining the generative model and “seeing” its grid of cells?

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer