starship006(Cody)

Karma: 281

starship006 30 Nov 2023 7:40 UTC
6 points
2
in reply to: MadHatter’s comment on: Stupid Question: Why am I getting consistently downvoted?
I’m glad to hear you got exposure to the Alignment field in SERI MATS! I still think that your writing reads off as though your ideas misunderstands core alignment problems, so my best feedback then is to share drafts/discuss your ideas with other familiar with the field. My guess is that it would be preferable for you to find people who are critical of your ideas and try to understand why, since it seems like they are representative of the kinds of people who are downvoting your posts.

starship006 30 Nov 2023 2:42 UTC
5 points
2
on: Stupid Question: Why am I getting consistently downvoted?
(preface: writing and communicating is hard and that i’m glad you are trying to improve)
i sampled two:
this post was hard to follow, and didn’t seem to be very serious. it also reads off as unfamiliar with the basics of the AI Alignment problem (the proposed changes to gpt-4 don’t concretely address many/any of the core Alignment concerns for reasons addressed by other commentors)
this post makes multiple (self-proclaimed controversial) claims that seem wrong or are not obvious, but doesn’t try to justify them in-depth.
overall, i’m getting the impression that your ideas are 1) wrong and you haven’t thought about them enough and/or 2) you arent communicating them well enough. i think the former is more likely, but it could also be some combination of the both. i think this means that:
1. you should try to become more familiar with the alignment field, and common themes surrounding proposed alignment solutions and their pitfalls
2. you should consider spending more time fleshing out your writing and/or getting more feedback (whether it be by talking to someone about your ideas, or sending out a draft idea for feedback)

starship006 28 Nov 2023 4:03 UTC
3 points
0
on: Shallow review of live agendas in alignment & safety
Reverse engineering. Unclear if this is being pushed much anymore. 2022: Anthropic circuits, Interpretability In The Wild, Grokking mod arithmetic
FWIW, I was one of Neel’s MATS 4.1 scholars and I would classify ³⁄₄ of Neel’s scholar’s outputs as reverse engineering some component of LLMs (for completeness, this is the other one, which doesn’t nicely fit as ‘reverse engineering’ imo). I would also say that this is still an active direction of research (lots of ground to cover with MLP neurons, polysemantic heads, and more)

[Paper] All’s Fair In Love And Love: Copy Suppression in GPT-2 Small

CallumMcDougall, Arthur Conmy, starship006, Tom McGrath and Neel Nanda

13 Oct 2023 18:32 UTC

82 points

4 comments8 min readLW link

starship006 4 Jul 2023 20:31 UTC
1 point
0
on: Shall We Throw A Huge Party Before AGI Bids Us Adieu?
Quick feedback since nobody else has commented—I’m all for the AI Safety appearing “not just a bunch of crazy lunatics, but an actually sensible, open and welcoming community.”
But the spirit behind this post feels like it is just throwing in the towel, and I very much disapprove of that. I think this is why I and others downvoted too

starship006 16 Jun 2023 14:21 UTC
7 points
7
in reply to: Stephen McAleese’s comment on: Lightcone Infrastructure/LessWrong is looking for funding
Ehh… feels like your base rate of 10% for LW users who are willing to pay for a subscription is too high, especially seeing how the ‘free’ version would still offer everything I (and presumably others) care about. Generalizing to other platforms, this feels closest to Twitter’s situation with Twitter Blue, whose rates appear is far, far lower: if we be generous and say they have one million subscribers, then out of the 41.5 million monetizable daily active users they currently have, this would suggest a base rate of less than 3%.

starship006 11 May 2023 16:49 UTC
2 points
0
on: AI #11: In Search of a Moat
Thanks for the writeup!

Small nitpik: typo in “this indeed does not seem like an attitude that leads to go outcomes”

starship006 21 Apr 2023 4:09 UTC
2 points
0
in reply to: James Payor’s comment on: AI #8: People Can Do Reasonable Things
I’m not sure if you’ve seen it or not, but here’s a relevant clip where he mentions that they aren’t training GPT-5. I don’t quite know how to update from it. It doesn’t seem likely that they paused from a desire to conduct more safety work, but I would also be surprised if somehow they are reaching some sort of performance limit from model size.
However, as Zvi mentions, Sam did say:
“I think we’re at the end of the era where it’s going to be these, like, giant, giant models...We’ll make them better in other ways”

starship006 31 Mar 2023 15:18 UTC
4 points
0
on: Widening Overton Window—Open Thread
The increased public attention towards AI Safety risk is probably a good thing. But, when stuff like this is getting lumped in with the rest of AI Safety, it feels like the public-facing slow-down-AI movement is going to be a grab-bag of AI Safety, AI Ethics, and AI… privacy(?). As such, I’m afraid that the public discourse will devolve into “Woah-there-Slow-AI” and “GOGOGOGO” tribal warfare; from the track record of American politics, this seems likely—maybe even inevitable?
More importantly, though, what I’m afraid of is that this will translate into adversarial relations between AI Capabilities organizations and AI Safety orgs (more generally, that capabilities teams will become less inclined to incorporate safety concerns in their products).
I’m not actually in an AI organization, so if someone is in one and has thoughts on this dynamic happening/not happening, I would love to hear.

starship006 30 Mar 2023 18:54 UTC
11 points
13
on: “Dangers of AI and the End of Human Civilization” Yudkowsky on Lex Fridman
Sheesh. Wild conversation. While I felt Lex was often missing the points Eliezer was saying, I’m glad he gave him the space and time to speak. Unfortunately, it felt like the conversation would keep moving towards reaching a super critical important insight that Eliezer wanted Lex to understand, and then Lex would just change the topic onto something else, and then Eliezer just had to begin building towards a new insight. Regardless, I appreciate that Lex and Eliezer thoroughly engaged with each other; this will probably spark good dialogue and get more people interested in the field. I’m glad it happened.
For those who are time constrained and wondering what is in it: Lex and Eliezer basically cover a whole bunch of high-level points related to AI not-kill-everyone-ism, delving into various thought experiments and concepts which formulate Eliezer’s worldview. Nothing super novel that you probably haven’t heard of if you’ve been following the field for some time.

starship006 26 Mar 2023 19:32 UTC
4 points
0
on: GPT-4 Specs: 1 Trillion Parameters?
Relevant Manifold Market:

starship006 19 Mar 2023 4:59 UTC
1 point
0
in reply to: Anirandis’s comment on: An Appeal to AI Superintelligence: Reasons to Preserve Humanity
Because you’re imagining AGI keeping us in a box?
Yeah, something along the lines of this. Preserving humanity =/= humans living lives worth living.

starship006 18 Mar 2023 20:34 UTC
31 points
26
on: An Appeal to AI Superintelligence: Reasons to Preserve Humanity
I didn’t upvote or downvote this post. Although I do find the spirit of this message interesting, I have a disturbing feeling that arguing to future AI to “preserve humanity for pascals-mugging-type-reasons” trades off X-risk for S-risk. I’m not sure that any of these aforementioned cases encourage AI to maintain lives worth living. I’m not confident that this meaningfully changes S-risk or X-risk positively or negatively, but I’m also not confident that it doesn’t.

starship006 28 Feb 2023 4:55 UTC
13 points
16
on: $20 Million in NSF Grants for Safety Research
With the advent of Sydney and now this, I’m becoming more inclined to believe that AI Safety and policies related to it are very close to being in the overton window of most intellectuals (I wouldn’t say the general public, yet). Like, maybe within a year, more than 60% of academic researchers will have heard of AI Safety. I don’t feel confident whatsoever about the claim, but it now seems more than ~20% likely. Does this seem to be a reach?

starship006 18 Feb 2023 21:26 UTC
20 points
13
on: We should be signal-boosting anti Bing chat content
There is a fuzzy line between “let’s slow down AI capabilities” and “lets explicitly, adversarially, sabotage AI research”. While I am all for the former, I don’t support the latter; it creates worlds in which AI safety and capabilities groups are pitted head to head, and capabilities orgs explicitly become more incentivized to ignore safety proposals. These aren’t worlds I personally wish to be in.
While I understand the motivation behind this message, I think the actions described in this post cross that fuzzy boundary, and pushes way too far towards that style of adversarial messaging

starship006 16 Feb 2023 5:40 UTC
1 point
0
on: The Filan Cabinet Podcast with Oliver Habryka—Transcript
We know, from like a bunch of internal documents, that the New York Times has been operating for the last two or three years on a, like, grand [narrative structure], where there’s a number of head editors who are like, “Over this quarter, over this current period, we want to write lots of articles, that, like, make this point...”
Can someone point me to an article discussing this, or the documents itself? While this wouldn’t be entirely surprising to me, I’m trying to find more data to back this claim, and I can’t seem to find anything significant.

starship006 20 Jan 2023 21:52 UTC
12 points
6
on: Transcript of Sam Altman’s interview touching on AI safety
It feels strange hearing Sam say that their products are released whenever the feel as though ‘society is ready.’ Perhaps they can afford to do that now, but I cannot help but think that market dynamics will inevitably create strong incentives for race conditions very quickly (perhaps it is already happening) which will make following this approach pretty hard. I know he later says that he hopes for competition in the AI-space until the point of AGI, but I don’t see how he balances the knowledge of extreme competition with the hope that society is prepared for the technologies they release; it seems that even current models, which appear to be far from the capabilities of AGI, are already transformative.

starship006 12 Jan 2023 16:12 UTC
5 points
0
on: How it feels to have your mind hacked by an AI
Let’s say Charlotte was a much more advanced LLM (almost AGI-like, even). Do you believe that if you had known that Charlotte was extraordinarily capable, you might have been more guarded about recognizing it for its ability to understand and manipulate human psychology, and thus been less susceptible to it potentially doing so?
I find that small part of me still think that “oh this sort of thing could never happen to me, since I can learn from others that AGI and LLMs can make you emotionally vulnerable, and thus not fall into a trap!” But perhaps this is just wishful thinking that would crumble once I interact with more and more advanced LLMs.

starship006 21 Dec 2022 7:26 UTC
6 points
1
on: Podcast: What’s Wrong With LessWrong
I’m trying to engage with your criticism faithfully, but I can’t help but get the feeling that a lot of your critiques here seem to be a form of “you guys are weird”: your guys’s privacy norms are weird, your vocabulary is weird, you present yourself off as weird, etc. And while I may agree that sometimes it feels as if LessWrongers are out-of-touch with reality at points, this criticism, coupled with some of the other object-level disagreements you were making, seems to overlook the many benefits that LessWrong provides; I can personally attest to the fact that I’ve improved in my thinking as a whole due to this site. If that makes me a little weird, then I’ll accept that as a way to help me shape the world as I see fit. And hopefully I can become a little less weird through the same rationality skills this site helps develop

starship006 2 Oct 2022 4:25 UTC
3 points
0
in reply to: Quintin Pope’s comment on: Paper: Large Language Models Can Self-improve [Linkpost]
Humans can often teach themselves to be better at a skill through practice, even without a teacher or ground truth
Definitely, but I currently feel that the vast majority of human learning comes with a ground truth to reinforce good habits. I think this is why I’m surprised this works as much as it does: it kinda feels like letting an elementary school kid teach themself math by practicing certain skills they feel confident in without any regard to if that skill even is “mathematically correct”.
Sure, these skills are probably on the right track toward solving math problems—otherwise, the kid wouldn’t have felt as confident about them. But would this approach not ignore skills the student needs to work on, or even amplify “bad” skills? (Or maybe this is just a faulty analogy and I need to re-read the paper)

starship006(Cody)

[Paper] All’s Fair In Love And Love: Copy Sup­pres­sion in GPT-2 Small

[Paper] All’s Fair In Love And Love: Copy Suppression in GPT-2 Small