David Quarel Jun 3, 2025, 5:57 PM
4 points
3 votes
Overall karma indicates overall quality.
1
1 vote
Agreement karma indicates agreement, separate from overall quality.
on: How to work through the ARENA program on your own
Some things are important to learn (like how to log with wandb), but there is nothing conceptually interesting about doing it. For such exercises, I think it’s justified to look at the solution and simply remember how to do it.
Stuff with wandb, or plotting data, is prime work for a LLM to just do it for you, and save valuable human time on the actual import code itself. I think it’s totally valid to highlight the training loop in Cursor, ask “please rewrite to log loss/accuracy/etc. to wandb” and then eyeballing the result.

ARENA 5.0 - Call for Applicants

JamesH, James Fox, CallumMcDougall, Chloe Li and David Quarel

Jan 30, 2025, 1:18 PM

35 points

14 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

David Quarel May 12, 2024, 5:14 PM
5 points
5 votes
Overall karma indicates overall quality.
0
0 votes
Agreement karma indicates agreement, separate from overall quality.
in reply to: Alexander Gietelink Oldenziel’s comment on: New intro textbook on AIXI
AIXI isn’t isn’t a practically realisable model due to its incomputability, but there’s nice optimality results, and it gives you an ideal model of intelligence that you can approximate (https://arxiv.org/abs/0909.0801). It uses a universal Bayesian mixture over environments, using the Solomonoff prior (in some sense the best choice of prior) to learn, (in a way you can make formal) as fast as possible, as fast as any agent possibly could. There’s some recent work done on trying to build practical approximations using deep learning instead of the CTW mixture (https://arxiv.org/html/2401.14953v1).

(Sorry for the lazy formatting, I’m on a phone right now. Maybe now is the time to get around to making a website for people to link)

David Quarel Sep 9, 2021, 1:42 PM
4 points
3 votes
Overall karma indicates overall quality.
0
0 votes
Agreement karma indicates agreement, separate from overall quality.
in reply to: Elizabeth’s comment on: Handicapping competitive games
I can only speak about those that are policy-preserving for a particular player (me).
(In retrospect, I should have maybe been optimising for fun-preserving instead, though policy-preserving is important if you spend a lot of time playing against people of drastically different skill levels, you don’t want to unlearn how to play the game correctly when you’re back to playing a normal game).
That being said,
- unit or building costs are higher/resources are slower to gather.
  I think if you scaled the cost of everything up by x1.5 or so, it would ruin my build order (though I suppose I could just learn one for the new costs), and I also might have to play more aggressively (as in the late game you’d be at a severe disadvantage). For the same reason, the enemy would be incentivised to play defensively and focus on economy. So I think this fails policy-preservation, but I still think this would be a reasonable handicap, and it’d still be fun for both sides.
- units move slower
  This would really mess things up. An enemy gets to now choose which engagements they want, archers can no longer hit-and-run infantry, enemy siege would be proportionally too fast, ect. I think this would break core mechanics of the game, or at least drastically shift show I would play (probably overly reliant on units that can move quickly like cavalry, or play heavy defense).
- units are proportionally weaker but maintain the same ordinal ranking of what they are vulnerable to and strong against
  What you mean by “weaker”, do my units deal less damage, or do they have less health? In the first case, I would probably focus more on ranged units to either push the exponent of Lanchester’s Law[1] closer to 2 in the case of less damage (as for any melee battle I’ll not be able to win without drastically overproducting units) or for the latter, focus mainly on units with the ability to hit and run (cavalry archers, conquistadors, ect.) or for which their lack of health is already offset by something else (monks, siege).
  For large armies this would mostly average out, but since units deal a discrete amount of attack, and units have a discrete quantity of health points, the thing that matters for very small numbers of units is “number of attacks to kill”, so unfortunately the effect on the game would vary discontinuously with a health penalty adjustment. I don’t see why the enemy couldn’t just waltz over very early and cripple me near the start of the game.
  It would probably still preserve fun though.
- units or buildings take longer to build
  It would make for a frustrating penalty to play with, and I would play defensively until I could get enough production buildings running. But I think a modified version of this where I have a limit on the number of production buildings I’m allowed to make, or on the number of villagers I’m allowed to have, might better preserve policy.
Of these, I think the third option is probably the best for game balancing (as it’d be easy to dial in the penalty as needed), but the fourth would (at least for me) would probably be the most policy-preserving.

[1] https://en.wikipedia.org/wiki/Lanchester%27s_laws

David Quarel Sep 9, 2021, 5:04 AM
6 points
4 votes
Overall karma indicates overall quality.
0
0 votes
Agreement karma indicates agreement, separate from overall quality.
on: Handicapping competitive games
I think you left out a very crucial point; perverse incentives. You want to handicap the stronger player, but in such a way that both players would not drastically change their policy from one they would normally use against a player of similar strength.
I’ve had this problem before in real time strategy games, (like age of empires 2, which is what I play), the gap between the skill floor and skill ceiling is massive, and so when playing with friends (especially 1v1s) there’s often a large difference in skill between two players (making it not fun for either player) and the game doesn’t have any nice built-in features to easily handicap.
One approach is to have the stronger player sit idle for the first X minutes of the game, but that leads to the weaker player forced to play much more aggressively and “rush”, or build a bunch of defensive towers in the enemy base, or other such things, otherwise they lose the advantage of the stronger player going idle.
Similarly, barring the stronger player from using a particular class of units (say, cavalry) will again mean the weaker player will shift from the meta and not bother with making units that counter cavalry, and make more of that units that are countered by cavalry (the game has a rough rock-paper-scissors model of unit types).
The only policy-preserving handicap I can think of for would be to force the stronger player to not use hotkeys, and manually click every action with the mouse. I think I would not drastically change my behaviour with no hotkeys, I would just execute actions slower. The downside is that for the stronger player it’s less fun, they’re now fighting the brain-to-computer interface rather than a skilled opponent.

David Quarel

ARENA 7.0 - Call for Applicants

Emer­gent Misal­ign­ment & Realignment

ARENA 6.0 - Call for Applicants

ARENA 5.0 - Call for Applicants

Emergent Misalignment & Realignment