I am Andrew Hyer, currently living in New Jersey and working in New York (in the finance industry).
aphyer
The last of these seems very unlike the others? We have a list of potential alignment failures, bypassing safeguards, etc...and then one time when the model carelessly deleted the wrong thing. Is the researcher still really mad at Claude about that or something?
I have a complaint. My perfectly-reasonable request to the LLM was denied for reasons of censorship. Communist censorship! This is literally 1984!
D&D.Sci Release Day: Topple the Tower Analysis & Ruleset
#1 seems to be confusingly worded. Is the banking industry in the US ‘quasi-nationalized’? The pharma industry? Construction? Restaurants? All of these do ‘operate under the protection and audit of government agencies.’ What level of government involvement are you envisioning?
You can have
until I figure out how to play the Necrobinder without dyinguntil Sunday, after that I’ll post the solution anyway to not keep other players waiting too long.
No worries, granted. I’ll aim to post the wrapup doc on the 23rd.
I personally strong-downvoted the parent comment. When I remove my strong-downvote from it, I discover that the combined rest of LessWrong voted that comment to a total of +18 karma and +2 agreement as of me writing this.
I do in fact consider it somewhat alarming that the combined rest of LessWrong on net agrees with this comment and wants to reward the author for posting it.
There’s a saying usually applied to diet and exercise that also seems relevant here—that the best diet/exercise program by far is the one you will actually do.
D&D.Sci Release Day: Topple the Tower!
...it seems very hard to keep whoever is involved in organizing/planning the operation from knowing about it 24 hours in advance, and very unlikely that every such person is wealthy enough to ignore a quarter million dollars.
Two of the four dish combinations with Quality=20 had >16 Quality in expectation.
This is true, but is actually something I looked into making the scenario. The average score of ‘pick a random 20-quality feast from the dataset’ was 15.38, which players did successfully beat.
There was a related writing-constraint that came from me pushing to simplify the ruleset a bit. The original envisioned ruleset was going to give players an additional rule limiting which dishes they were allowed to include.[1]
This would have let me give a substantially larger dataset without worrying that grabbing the best-scoring thing out of the dataset would trivially solve the scenario—if 6-7 dishes were banned, it would be easy for none of the top-scoring feast to be allowed for you, and/or for the best-scoring feast you were allowed to be a single bit of random luck that would betray you if you repeated it.
When I removed that rule, I needed to cut down on the dataset size to avoid that being a trivial solution, which is what led to the data-starvation. Overall I think something like that wouldn’t have been worth the complexity—just telling players they can include whatever dishes they want is simpler and also feels more realistic in context. Open to other views on that, though.
- ^
I had a whole bunch of excuses lined up for this too! One of your companions gets seasick and doesn’t want to go hunt Kraken...another is Good-aligned and will be angry if you kill a Pegasus...
- ^
Huh. I believe I tagged them both the same way, but I don’t get tag notifications on my own posts. Can someone who isn’t me comment on whether they got the notification?
D&D Sci Thanksgiving: the Festival Feast Evaluation & Ruleset
I’m very proud of this scenario. (Even if you’re confident you aren’t going to play it, I think you could read the wrapup doc and in particular the section on ‘Bonus Objective’ so you can see what it involved).
It accomplished a few things I think are generally good in these scenarios:
There was underlying structure that players could uncover, which created emergent complexity in the output but made sense with the theme once the underlying ruleset was revealed/discovered.
Human thought about e.g. the theme and what patterns would be reasonable to observe was valuable, the puzzle was not optimally-solved just by feeding the data into a model and calling it a day.
Multiple levels of solution were possible, from a decent solution with little effort up to a more-involved solution that went further and dug into the underlying structure.
And also it managed to trick many players with a surprising-yet-thematic twist :P
...oops. It turns out that efficiently solving the Knapsack Problem is hard.
On looking in, it looks like my confusion with python variables means that, while your soldiers will always find a valid solution if it’s reachable by a few straightforward rules, they will sometimes fail to find the solution if it wouldn’t be reachable like that.[1]
This doesn’t affect the performance of any submitted team (since the teams were evaluated using the same code that the dataset was generated with), but it does mean that the underlying ruleset was messier and less derivable than I’d hoped...sorry :(
- ^
In detail: when your soldiers can’t find a solution using simple rules, they were supposed to list each possible target for their biggest shot, and make separate branches for each of those. However, a bug means that the code execution for the first branch removes shots from their available list for the later branches, and so all the later branches are doomed. Therefore they will win only if the first branch works.
- ^
I have mixed feelings about this scenario.
I was proud of the underlying mechanics, which I think managed to get interesting and at-least-a-little-realistic effects to emerge from some simple underlying rules.
The theme...at least managed to make me giggle to myself a little as I was writing it.
When players submitted answers to this, though, several people got tricked into getting themselves killed. Out of five answers, two players took extremely safe approaches. Of the three players who were more daring, one submitted an excellent answer while two managed to trick themselves into submitting answers that were worse than random.
From a certain point of view, this is a valuable learning experience, which could teach people not to take drastic risks on limited data.
But I feel like other scenarios in this genre may have taught that lesson better without shooting quite so many players in the foot.
Another downside of this strategy is that a political faction not currently perceived by voters as ‘in power’ has an incentive to use any power they do have to actively worsen the lives of voters, who will blame their opposition.
Your boolean disagreement is relevant because it’s actionable. Suppose that:
Alice thinks calling is +$100 EV
Bob thinks calling is +$10 EV
Claire thinks folding is +$10 EV
David thinks folding is +$100 EV
In this case, Bob is much closer to Claire than to Alice in terms of their beliefs. But Bob agrees with Alice about the correct action, which is often the thing where disagreement actually matters.
(Politics-related examples are left as exercises for the reader).
Slowing down AI progress is not itself valuable.
Slowing down AI progress is valuable if and only if you can slow down AI progress without slowing down AI safety progress.
I have faith in the ability of the US government to put lots of red tape and roadblocks in the way of AI development, require that huge armies of lawyers and compliance officers be deployed by any entity trying to develop AI, etc.
I have no faith whatsoever that this process will not also slow down AI safety progress, and some strong suspicion that it might disproportionately slow down AI safety progress.
Overall I think that, when I look at the current conduct of Anthropic, and the current conduct of the US government, I do not find myself saying ‘oh, if only we had given the US government more control over AI companies!’