hazel

Karma: 134

hazel 15 Mar 2023 2:26 UTC
30 points
2
on: ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so
So.… they held the door open to see if it’d escape or not? I predict this testing method may go poorly with more capable models, to put it lightly.
And then OpenAI deployed a more capable version than was tested!
They also did not have access to the final version of the model that we deployed. The final version has capability improvements relevant to some of the factors that limited the earlier models power-seeking abilities, such as longer context length, and improved problem-solving abilities as in some cases we’ve observed.
This defeats the entire point of testing.
I am slightly worried that posts like veedrac’s Optimality is the Tiger may have given them ideas. “Hey, if you run it in this specific way, a LLM might become an agent! If it gives you code for recursively calling itself, don’t run it”… so they write that code themselves and run it.
I really don’t know how to feel about this. On one hand, this is taking ideas around alignment seriously and testing for them, right? On the other hand, I wonder what the testers would have done if the answer was “yep, it’s dangerously spreading and increasing it’s capabilities oh wait no nevermind it’s stopped that and looks fine now”.

hazel 3 Dec 2020 2:22 UTC
21 points
on: The LessWrong 2018 Book is Available for Pre-order
Where does the money go? Is it being sold at cost, or is there surplus?
If money is being made, will it support: 1. The authors? 2. LW hosting costs? 3. LW-adjacent charities like MIRI? 4. The editors/compilers/LW moderators?
EDIT: Was answered over on /r/slatestarcodex. tldr: one print run has been paid for at a loss, any (unlikely) profits go to supporting the Lesswrong nonprofit organization.

hazel 20 Feb 2019 12:52 UTC
19 points
in reply to: Unnamed’s comment on: Blackmail
This only looks at the effects on Alice and on Bob, as a simplification. But with blackmail “carrying out the threat” means telling other people information about Bob, and that is often useful for those other people.
When the public interest motivates the release of private info, it’s called ‘whistleblowing’ and is* legally protected and considered far more moral than blackmail. I think that contrast is helpful to understanding why that’s not enough to make blackmail moral.
*in some jurisdictions, restrictions may apply, see your local legal code for a full list of terms & conditions.
I think you’re right that it’s not trivially negative sum because it can have positive outcomes for third parties. Still expect a world of legal blackmail to be worse.

hazel 13 Apr 2023 10:13 UTC
11 points
3
in reply to: Guillaume Charrier’s comment on: Killing Socrates
Part of the value of reddit-style votes as a community moderation feature is that using them is easy. Beware Trivial Inconveniences and all that. I think that having to explain every downvote would lead to me contributing to community moderation efforts less, would lead to dogpiling on people who already have far more refutation than they deserve, would lead to zero-effort ‘just so I can downvote this’ drive-by comments, and generally would make it far easier for absolute nonsense to go unchallenged.
If I came across obvious bot-spam in the middle of the comments, neither downvoted nor deleted and I couldn’t downvote without writing a comment… I expect that 80% of the time I’d just close the tab (and that remaining 20% is only because I have a social media addiction problem).

hazel 20 Feb 2019 12:58 UTC
9 points
in reply to: Paperclip Minimizer’s comment on: Blackmail
“It’s obviously bad. Think about it and you’ll notice that. I could write a YA dystopian novel about how the consequences are bad.” <-- isn’t an argument, at all. It assumes bad consequences rather than demonstrating or explaining how the consequences would be bad. That section is there for other reasons, partially (I think?) to explain Zvi’s emotional state and why he wrote the article, and why it has a certain tone.

hazel 26 Oct 2022 11:36 UTC
7 points
in reply to: RomanS’s comment on: Consume fiction wisely
For me, the benefit of studying tropes is that it makes it easy to talk about the ways in which stories are story-like. In fact, to discuss what stories are like, this post used several links to tropes (specifically ones known to be wrong/misleading/inapplicable to reality).
I think a few deep binges on TVtropes for media I liked really helped me get a lot better at media analysis very, very quickly. (Along with a certain anime analysis blog that mixed in obvious and insightful cinematography commentary focusing on framing, color, and lighting, with more abstract analysis of mood, theme, character, and purpose—both illustrated with links to screenshots, using media that was familiar and interesting to me.)
And by putting word-handles on common story features, it makes it easy to spot them turning up in places they shouldn’t. Like in your thinking about real-life situations.

hazel 20 Feb 2019 12:57 UTC
7 points
in reply to: shminux’s comment on: Blackmail
I am not sure why you pick on blackmail specifically
This is in response to other writers, esp. Robin Hanson. That’s why.

hazel 3 Mar 2024 11:38 UTC
6 points
5
in reply to: Shiroe’s comment on: If you weren’t such an idiot...
I’ve been well served by Bitwarden: https://bitwarden.com/

It has a dark theme, apps for everything (including Linux commandline), the Firefox extension autofills with a keyboard shortcut, plus I don’t remember any large data breaches.

hazel 20 Mar 2021 13:48 UTC
5 points
in reply to: johnswentworth’s comment on: Strong Evidence is Common
Tying back to an example in the post: if we’re using ascii encoding, then the string “Mark Xu” takes up 49 bits. It’s quite compressible, but that still leaves more than enough room for 24 bits of evidence to be completely reasonable.
This paper suggests that spoken language is consistently ~39bits/second.
https://advances.sciencemag.org/content/5/9/eaaw2594

hazel 21 Nov 2022 23:19 UTC
4 points
1
in reply to: interstice’s comment on: Scott Aaronson on “Reform AI Alignment”
At the time I took AlphaGo as a sign that Elizer was more correct than Hanson w/r/t the whole AI-go-FOOM debate. I realize that’s an old example which predates the last-four-years AI successes, but I updated pretty heavily on it at the time.

hazel 3 Oct 2022 2:55 UTC
4 points
3
in reply to: agi-hater’s comment on: Why I think strong general AI is coming soon
However, you decided to define “intelligence” as “stuff like complex problem solving that’s useful for achieving goals” which means that intentionality, consciousness, etc. is unconnected to it
This is the relevant definition for AI notkilleveryoneism.

hazel 16 Mar 2022 3:55 UTC
4 points
in reply to: Viliam’s comment on: The Rationalists of the 1950s (and before) also called themselves “Rationalists”
Early LessWrong was atheist, but everything on the internet around the time LW was taking off had a position in that debate. ”...the defining feature of this period wasn’t just that there were a lot of atheism-focused things. It was how the religious-vs-atheist conflict subtly bled into everything.” Or less subtly, in this case.

I see it just as a product of the times. I certainly found the anti-theist content in Rationality: A to Z to be slightly jarring on a re-read—on other topics, Elizer is careful to not bring into it the political issues of the day that could emotionally overshadown the more subtle points he’s making about thought in general—but he’ll drop in extremely anti religion jabs despite that. To me, that’s just part of reading history.

hazel 23 Nov 2020 9:19 UTC
4 points
on: The central limit theorem in terms of convolutions
If $F {f}$ and $F {g}$ are the fourier transforms of $f$ and $g$ , then $F {f * g} = F {f} F {g}$ . This is yet another case where you don’t actually have to compute the convolution to get the thing. I don’t actually use fourier transforms or have any intuition about them, but for those who do, maybe this is useful?
It’s amazingly useful in signal processing, where you often care about the frequency-domain because it’s perceptually significant (eg: percieved pitch & timbre of a sound = fundamental frequency of the air-vibrations & other frequencies. Sounds too fizzy or harsh? Lowpass filter it. Too dull or muffled? Boost the higher frequencies, etc etc etc). Although it’s used the other way around—by doing convolution, you don’t have to compute the thing.
If you have a signal and want to change it’s frequency distribution, what you do is construct a ‘short’ (finite support) function—the convolution kernel—whose frequency-domain transform would multiply to give the kind of frequency responce you’re after. Then you can convolve them in the time domain, and don’t need to compute the fourier/reverse-fourier at all.
For example, in audio processing. Many systems (IIRC linear time-invariant ones) can be ‘sampled’ by taking an impulse response—the output of the system when the input is an impulse (like the Dirac delta function, which is ∞ at 0 but 0 elsewhere—or as close as you can physically construct). This impulse response can then impart the ‘character’ of the system via convolution—this is how convolution reverbs add, as an audio effect, the sound of specific, real-life resonant spaces to whatever audio signal you feed them (“This is your voice in the Notre Dame cathedral” style). There’s also guitar amp/cab sims that work this way. This works because the Dirac delta is the identity under (continuous) convolution (also because these real physical things like sounds interacting with space, and speakers, are linear&time-invariant).
It also comes up in image processing. You can do a lot of basic image processing with a 2d discrete convolution kernel. You can implement blurs/anti-aliasing/lowpass, image sharpening/highpass, and edge ‘detection’ this way.

hazel 15 Mar 2023 10:38 UTC
3 points
0
in reply to: Throwaway2367’s comment on: GPT-4
Codeforces is not marked as having a GPT-4 measurement on this chart. Yes, it’s a somewhat confusing chart.

hazel 26 Oct 2022 11:40 UTC
3 points
in reply to: acylhalide’s comment on: Consume fiction wisely
I’m going to suggest reading widely as another solution. I think it’s dangerous to focus too much on one specific subgenre, or certain authors, or books only from from one source (your library and Amazon do, in fact, filter your content for you, if not very tightly).

hazel 5 Aug 2018 3:50 UTC
3 points
in reply to: shminux’s comment on: Open Thread August 2018
If you’re throwing your AI into a perfect inescapable hole to die and never again interacting with it, then what exact code you’re running will never matter. If you observe it though, then it can affect you. That’s an output.
What are you planning to do with the filtered-in ‘friendly’ AIs? Run them in a different context? Trust them with access to resources? Then an unfriendly AI can propose you as a plausible hypothesis, predict your actions, and fake being friendly. It’s just got to consider that escape might be reachable, or that there might be things it doesn’t know, or that sleeping for a few centuries and seeing if anything happens is a option-maximizing alternative to halting, etc. I don’t know what you’re selecting for—suicidality, willingness to give up, halts within n operations—but it’s not friendliness.

hazel 14 Apr 2018 4:59 UTC
3 points
on: 5 general voting pathologies: lesser names of Moloch
https://www.lesserwrong.com/posts/D6trAzh6DApKPhbv4/a-voting-theory-primer-for-rationalists
The first link in this post should go ^ here to your voting theory primer. Instead, for me, it links here:
https://www.lesserwrong.com/posts/JewWDfLoxgFtJhNct/utility-versus-reward-function-partial-equivalence

hazel 9 Apr 2023 7:14 UTC
2 points
0
in reply to: Razied’s comment on: GPTs are Predictors, not Imitators
To solve this problem you would need a very large dataset of mistakes made by LLMs, and their true continuations. [...] This dataset is unlikely to ever exist, given that its size would need to be many times bigger than the entire internet.
I had assumed that creating on that dataset was a major reason for doing a public release of ChatGPT. “Was this a good response?” [thumb-up] / [thumb-down] → dataset → more RLHF. Right?

hazel 9 Jun 2022 4:12 UTC
2 points
1
in reply to: Lone Pine’s comment on: AGI Safety FAQ / all-dumb-questions-allowed thread
There has to be some limits
Those limits don’t have to be nearby, or look ‘reasonable’, or be inside what you can imagine.
Part of the implicit background for the general AI safety argument is a sense for how minds could be, and that the space of possible minds is large and unaccountably alien. Eliezer spent some time trying to communicate this in the sequences: https://www.lesswrong.com/posts/tnWRXkcDi5Tw9rzXw/the-design-space-of-minds-in-general, https://www.lesswrong.com/posts/5wMcKNAwB6X4mp9og/that-alien-message.

hazel 14 Jan 2020 12:00 UTC
2 points
in reply to: Richard_Kennaway’s comment on: Moral uncertainty: What kind of ‘should’ is involved?
In my experience, stating things outright and giving examples helps with communication. You might not need a definition, but the relevant question is would it improve the text for other readers?