Past Account

Karma: 421

Past Account 6 Apr 2024 14:32 UTC
8 points
0
on: What’s with all the bans recently?
Just wanted to give some validation. I left a comment on this post a while ago pointing out how one (or apparently a few) users can essentially down vote you however they like to silence opinions they don’t agree with. Moderation is tricky and it is important to remember why. Most users on a website forum are lurkers meaning that trying to gather feedback on moderation policies has a biased sampling problem. The irony on likely not being able to leave another comment or engage in discussion is not lost on me.

At first, I thought getting soft-banned meant my “contributions” weren’t valuable. For context, I study AI and integrate it into my thinking which hadn’t been received well on this site. Ironically, not being able to interact with other people pushed me to explore deeper discussions with AI. For example, I have this entire thread to Claude3 and it agreed there were some changes to be made on the rate-limiting system.

It does seem concerning that as a PhD student studying AI alignment, I was effectively pushed out of participating in discussions on LessWrong and the AI Alignment Forum due to the automatic rate limiting system and disagreements with senior user whose downvotes carry much more weight. On the other hand, compared to a few years ago during COVID, now I have colleagues and AI that I have a lot more shared context with than users on this forum so this just matters less to me. I return only because I am taking a class on social computing and am revisiting what makes for good/bad experiences.

Anyway, hopefully this gives you some solace. I would encourage you to seek other sources of validation. There are so many more options than you think! :)

Past Account 28 Jul 2023 22:00 UTC
15 points
13
in reply to: habryka’s comment on: Automatic Rate Limiting on LessWrong
Hi, I think this is incorrect. I had to wait 7 days to write this comment and then almost forgot to. I wrote a comment critiquing a very long post (which was later removed) and was down-voted (by a single user I think) after justifying why I wrote the comment with AI-assistance. My understanding is that a single user with enough karma power can effectively “silence” any opinion they don’t like by down-voting a few comments in an exchange.

I think the site has changed enough over the last several months that I am considering leaving. For me personally, choosing between having a conversation with a random commenter on this site vs. an AI model is just about at a wash. I even hesitate to write this comment given how over-confident your comment seemed i.e. I won’t be able to interact with this site again for another week.

Past Account 19 Jul 2023 19:20 UTC
−5 points
0
in reply to: cfoster0’s comment on: The Full Alignment Plan You’ve Never Heard Of
This is my endorsed review of the article.

Past Account 19 Jul 2023 19:17 UTC
−11 points
0
in reply to: Ariel Kwiatkowski’s comment on: The Full Alignment Plan You’ve Never Heard Of
This seems like a rhetorical question. Both of our top-level comments seem to reach similar conclusions, but it seems you regret the time spent engaging with the OP to write your comment. This took 10 min, most spent writing this comment. What is your point?

Past Account 19 Jul 2023 16:49 UTC
−19 points
1
on: The Full Alignment Plan You’ve Never Heard Of
At over 15k tokens, reading the full article requires significant time and effort. While it aims to provide comprehensive detail on QACI, much of this likely exceeds what is needed to convey the core ideas. The article could be streamlined to more concisely explain the motivation, give mathematical intuition, summarize the approach, and offer brief examples. Unnecessary elaborations could be removed or included as appendices. This would improve clarity and highlight the essence of QACI for interested readers.

My acquired understanding is that the article summarizes a new AI alignment approach called QACI (Question-Answer Counterfactual Interval). QACI involves generating a factual “blob” tied to human values, along with a counterfactual “blob” that could replace it. Mathematical concepts like realityfluid and Loc() are used to identify the factual blob among counterfactuals. The goal is to simulate long reflection by iteratively asking the AI questions and improving its answers. QACI claims to avoid issues like boxing and embedded agency through formal goal specification.

While the article provides useful high-level intuition, closer review reveals limitations in QACI’s theoretical grounding. Key concepts like realityfluid need more rigor, and details are lacking on how embedded agency is avoided. There are also potential issues around approximation and vulnerabilities to adversarial attacks that require further analysis. Overall, QACI seems promising but requires more comparison with existing alignment proposals and formalization to adequately evaluate. The article itself is reasonably well-written, but the length and inconsistent math notation create unnecessary barriers.

Past Account 6 Jun 2023 4:48 UTC
1 point
0
on: Nature < Nurture for AIs
[Deleted]

Past Account 29 May 2023 18:32 UTC
3 points
0
on: Conditional Prediction with Zero-Sum Training Solves Self-Fulfilling Prophecies
[Deleted]

Past Account 11 Mar 2023 0:52 UTC
3 points
−2
on: Challenge: construct a Gradient Hacker
[Deleted]

Past Account 21 Feb 2023 17:31 UTC
22 points
6
on: AI alignment researchers don’t (seem to) stack
[Deleted]

Past Account 11 Dec 2022 19:56 UTC
1 point
0
in reply to: Rachel Freedman’s comment on: The Opportunity and Risks of Learning Human Values In-Context

So are you suggesting that ChatGPT gets aligned to the values of the human contractor(s) that provide data during finetuning, and then carries these values forward when interacting with users?

You are correct that this appears to stand in contrast one of the key benefits of CIRL games. Namely, that they allow the AI to continuously update towards the user’s values. The argument I present is that ChatGPT can still learn something about the preferences of the user it is interacting with through the use of in-context value learning. During deployment, ChatGPT will then be able to learn preferences in-context allowing for continuous updating towards the user’s values like in the CIRL game.

Past Account 11 Dec 2022 4:36 UTC
1 point
0
in reply to: Rachel Freedman’s comment on: The Opportunity and Risks of Learning Human Values In-Context
The reward is from the user $H$ which ranks candidate responses from ChatGPT. This is discussed more in OpenAI’s announcement. I edited the post to clarify this.

Past Account 18 Oct 2022 4:15 UTC
18 points
10
in reply to: Lost Futures’s comment on: Why Weren’t Hot Air Balloons Invented Sooner?
[Deleted]

Past Account 1 Oct 2022 5:37 UTC
8 points
3
on: Clarifying the Agent-Like Structure Problem
[Deleted]

Past Account 11 Sep 2022 18:42 UTC
LW: 5 AF: -3
−5
AF
in reply to: janus’s comment on: Simulators
[Deleted]

Past Account 9 Sep 2022 3:33 UTC
LW: 2 AF: -3
−10
AF
on: Simulators
[Deleted]

Past Account 2 Nov 2021 21:58 UTC
1 point
on: What’s the difference between newer Atari-playing AI and the older Deepmind one (from 2014)?
[Deleted]

Past Account 5 Aug 2021 18:16 UTC
2 points
in reply to: Rohin Shah’s comment on: rohinmshah’s Shortform
[Deleted]

Past Account 30 Jul 2021 16:53 UTC
1 point
in reply to: Quintin Pope’s comment on: DeepMind: Generally capable agents emerge from open-ended play
[Deleted]

Past Account 29 Jul 2021 17:26 UTC
3 points
in reply to: Sébastien Larivée’s comment on: DeepMind: Generally capable agents emerge from open-ended play
[Deleted]

Past Account 29 Jul 2021 17:20 UTC
4 points
in reply to: Quintin Pope’s comment on: DeepMind: Generally capable agents emerge from open-ended play
[Deleted]