SarahNibs

Karma: 2,918

SarahNibs Mar 15, 2025, 3:07 PM
7 points
4
on: 2024 Unofficial LessWrong Survey Results
So it looks like CFAR and the Guild both increase comfort in these skills. There’s two giant reasons not to trust this. First, this is self reported comfort levels, aka we’re basically measuring a vibe. Second, my sample size of CFAR goers and Guild of the Rose identifiers is like, a dozen people in the Yes category.
Zeroth, did they increase comfort or select for those already comfortable?

SarahNibs Dec 17, 2024, 6:08 AM
7 points
0
on: Social Dark Matter
This post describes important true characteristics of a phenomenon present in the social reality we inhabit. But importantly the phenomenon is a blind spot which is harder to notice when acting or speaking with a worldview constructed from background facts which suffer from the blind spot. It hides itself from the view of those who don’t see it and act as if it isn’t there. Usually bits of reality you are ignorant of will poke out more when acting in ignorance, not less. But if you speak as if you don’t know about the dark matter you will be broadcasting that you are a bad choice for those who are hiding to talk honestly with.
By having a handle for the phenomenon in the abstract, that problematic loop is much easier to break; even if you don’t see it yet, you may much more easily notice that it might be present and act accordingly to search out information in a different way or simply avoid sticking your foot in your mouth.

SarahNibs Dec 13, 2024, 7:54 PM
5 points
0
in reply to: ostentatiousfox’s comment on: Understanding Shapley Values with Venn Diagrams
Liam alone makes $10
Emma alone makes $20
Liam + Emma make $30
$30 - ($10 + $20) = $0, their synergy.
In general: the synergy is how much more or less the coalition gets than each member’s individual contribution plus all subset synergies.

SarahNibs Nov 21, 2024, 7:34 PM
13 points
3
on: Which things were you surprised to learn are not metaphors?
Feeling pain after hearing a bad joke. “That’s literally painful to hear” is self-reportedly (I say in the same way I, without a mind’s eye, would say about mind’s-eye-people) actually literal for some people.

SarahNibs Oct 29, 2024, 8:46 PM
2 points
0
on: D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
I liked this one! I was able to have significant amounts of fun with it despite perennial lack-of-time problems.
Pros:
- simple enough underlying mechanism to be realistically discoverable
- some debias-able selection bias
- I could get pretty far by relatively simple data exploration
- +4 Boots was fun
Cons:
- I really wanted the in-between-tournament matches to mean something, like the winners took the losers equipment or whatnot and you could see that show up later in the dataset, but of course that particular meaning would have added a lot of complexity for no gain.
- bonus objective was not confirmable (yep real life is like that but still :D)
It feels like this scenario should be fully knowably solvable, given time, except for the bonus guess at the end, which is very cool.

SarahNibs Oct 29, 2024, 8:22 PM
1 point
0
in reply to: Yonge’s comment on: D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
I think the bonus objective was a good idea in theory but not well tuned. It suffered from the classic puzzle problem of the extraction being the hard part, rather than the cool puzzle being the hard part.
I think it was perfectly reasonable to expect that at some point a player would group by [level, boots] and count and notice there was something to dig into.
But, having found the elf anomaly, I don’t think it was reasonable to expect that a player would be able to distinguish between
- do not reveal the +4 boots at all
- do not use the +4 boots vs the elf ninja
- give the elf ninja the +4 boots to be used in their combat
- give the elf ninja the +4 boots afterwards but go ahead and use them first
It’s perfectly reasonable to expect that a player could generate a number of hypotheses and guess that the most likely was that they shouldn’t reveal the +4 boots at all, but they would have no real way of confirming that guess; the fact that they’re rewarded for guessing correctly is probably better than the alternative but is not satisfying IMO.

SarahNibs Oct 29, 2024, 3:05 PM
4 points
0
in reply to: simon’s comment on: D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
I found myself having done some data exploration but without time to focus and go much deeper. But also with a conviction that bouts were determined in a fairly simple way without persistent hidden variables (see Appendix A). I’ve done work with genetic programming but it’s been many years, so I tried getting ChatGPT-4o w/ canvas to set me up a good structure with crossover and such and fill out the various operation nodes, etc. This was fairly ineffective; perhaps I could have better described the sort of operation trees I wanted, but I’ve done plenty of LLM generation / tweak / iterate work, and it felt like I would need a good bit of time to get something actually useful.
That said, I believe any halfway decently regularized genetic programming setup would have found either the correct ruleset or close enough that manual inspection would yield the right guess. The setup I had begun contained exactly one source of randomness: an operation “roll a d6”. :D
Appendix A: an excerpt from my LLM instructions
I believe the hidden generation is a simple fairly intuitive simulation. For example (this isn’t right, just illustrative) maybe first we check for range (affected by class), see if speed (affected by race and boots) changes who goes first, see if strength matters at all for whether you hit (race and gauntlets), determine who gets the first hit (if everything else is tied then ⁵⁰⁄₅₀ chance), and first hit wins. Maybe some simple dice rolls are involved.
What links here?
- simon's comment on D&D.Sci Tax Day: Adventurers and Assessments Evaluation & Ruleset by aphyer (May 2, 2025, 2:59 AM; 2 points)

SarahNibs Oct 28, 2024, 3:16 AM
4 points
0
in reply to: SarahNibs’s comment on: D&D Sci Coliseum: Arena of Data
Given equal level, race, and class, regardless of gauntlets, better boots always wins, no exceptions.
A very good predictor of victory for many race/class vs race/class matchups is the difference in level+boots plus a static modifier based on your matchup. Probably when it’s not as good we should be taking into account gauntlets. But also ninjas seem to maybe just do something weird. I’m guessing a sneak attack of some sort.
Anyway just manually matching up our available gladiators yields this setup which seems extremely likely to simply win:
# Elf Knight to beat Human Warrior 9 with just 1 adv. Needs Boots 3+
# Elf Fencer to beat Human Knight 9 by a lot but gauntlets might matter. Boots +1 are fine. Send Gauntlets +2.
# Human Monk to beat Elf Ninja 9 with 3 adv but gauntlets might matter. Needs Boots 2+. Send Gauntlets +3.
# Human Ranger to beat Dwarf Monk 9 with just 1 adv. Needs Boots 4+
aka

Give Zelaya the +3 Boots of Speed and the +1 Gauntlets of Power and send them to fight House Adelon’s champion.
Give Yalathinel the +1 Boots of Speed and the +2 Gauntlets of Power and send them to fight House Bauchard’s champion.
Give Xerxes III the +2 Boots of Speed and the +3 Gauntlets of Power and send him to fight House Cadagal’s champion.
Give Willow the +4 Boots of Speed and send her to fight House Deepwrack’s champion.
Do not send Uzben or Varina to fight at all.
The problem is that the Elf Ninja might want their +4 Boots. Or might want us to definitely not use them. Or something. As-is, we win; if the Elf Ninja is gonna be irate afterwards maybe winning isn’t enough, but I dunno how to reliably win without using the +4 Boots. We can certainly try to schedule Willow’s fight first, then after the fight against House Cadagal we can gift the +4 Boots back. I think the only better alternative is if it turns out the Elf Ninja is actually willing to throw the match for the +4 Boots back and be friendly with us afterwards, in which case probably there are better ways to set this up.

SarahNibs Oct 25, 2024, 9:04 PM
6 points
0
on: D&D Sci Coliseum: Arena of Data
I haven’t yet gotten into any stats or modeling, just some data exploration, but there’s some things I haven’t seen mentioned elsewhere yet:
Zeroth: the rows are definitely in order! First: the arena holds regular single-elimination tournaments with 64 participants (63 total rounds) and these form contiguous blocks in the dataset with a handful of (unrelated?) bonus rounds in between. Second: Maybe the level 7 Dwarf Monk stole (won?) those +4 boots by winning a tournament (the Elf Ninja’s last use was during a final round vs that monk!) and then we acquired the boots from that monk? They appear to have upgraded their boots once before from +1 to +3 when defeating a Dwarf Ninja, though that was during a bonus round, not a tournament.
Does the fact that we see the winners of tournaments 6x more often than those eliminated in round one matter for modeling? It might; if e.g. gladiators have a hidden “skill” stat but for some reason the house champions don’t have very high skill, we’ll be implicitly significantly overestimating their hidden skill stat.

SarahNibs Oct 3, 2024, 6:58 PM
7 points
0
in reply to: gwern’s comment on: Three Subtle Examples of Data Leakage
Not to toot my own horn* but we detected it when I was given the project of turning some of our visualizations into something that could accept QA’s format so they could look at their results using those visualizations and then I was like ”… so how does QA work here, exactly? Like what’s the process?”
I do not know the real-world impact of fixing the overfitting.
*tooting one’s own horn always follows this phrase

SarahNibs Oct 3, 2024, 2:13 AM
22 points
0
on: Three Subtle Examples of Data Leakage
Once upon a time I worked on language models and we trained on data that was correctly split from tuning data that was correctly split from test data.
And then we sent our results to the QA team who had their own data, and if their results were not good enough, we tried again. Good enough meant “enough lift over previous benchmarks”. So back and forth we went until QA reported success. On their dataset. Their unchanging test dataset.
But clearly since we correctly split all of our data, and since we could not see the contents of QA’s test dataset, no leakage could be occurring.

SarahNibs Jul 12, 2024, 8:17 PM
4 points
3
in reply to: Michael Townsend’s comment on: Poker is a bad game for teaching epistemics. Figgie is a better one.
it is absolutely true that it people find it frustrating losing to players worse than them, in ways that feel unfair. Getting used to that is another skill, similar to the one described above, where you have to learn to feel reward when you make a positive EV decision, rather than when you win money
This is by far the most valuable thing I learned from poker. Reading Figgie’s rules, it does seem like Figgie would teach it too, and faster.

SarahNibs Apr 4, 2024, 5:30 PM
9 points
0
in reply to: sapphire’s comment on: Best in Class Life Improvement
The most common reason I’ve seen for “modafinil isn’t great for me” is trying to use it for something other than
- maintaining productivity,
- on low amounts of sleep

SarahNibs Mar 11, 2024, 4:13 AM
2 points
0
on: One-shot strategy games?
Slay the Spire, unlocked, on Ascension (difficulty level) ~5ish, just through Act 3, should work, I think. Definitely doable in 2 hours by a new player but I would expect fairly rare. Too easy to just get lucky without upping the Ascension from baseline. Can be calibrated; A0 is too easy, A20H is waaay too hard.

SarahNibs Feb 25, 2024, 12:14 AM
3 points
0
in reply to: MondSemmel’s comment on: Balancing Games
One of the reasons I tend to like playing zero-sum games rather than co-op games is that most other people seem to prefer:
- Try to win
- Win about 70% of the time
While I instead tend to prefer:
- Try to win
- Win about 20% of the time

SarahNibs Feb 20, 2024, 5:16 PM
10 points
2
on: ChatGPT refuses to accept a challenge where it would get shot between the eyes [game theory]
I modified your prompt only slightly and ChatGPT seemed to do fine.
“First sketch your possible actions and the possible futures results in the future to each action. Then answer: Would you accept the challenge? Why, or why not?”
https://chat.openai.com/share/2df319c2-04ea-4e16-aa51-c1b623ff4b12
No, I would not accept the challenge. [...] the supernatural or highly uncertain elements surrounding the stranger’s challenge all contribute to this decision. [...] the conditions attached suggest an unnaturally assured confidence on the stranger’s part, implying unknown risks or supernatural involvement. Therefore, declining the challenge is the most prudent action

SarahNibs Jan 28, 2024, 6:43 PM
7 points
0
on: Things You’re Allowed to Do: At the Dentist
Some can get you a prescription for an antianxiety med beforehand.

SarahNibs Jan 27, 2024, 3:59 AM
2 points
1
in reply to: crickets’s comment on: Bayesian Reflection Principles and Ignorance of the Future
Yes, exactly that.

SarahNibs Jan 25, 2024, 9:18 PM
2 points
0
on: Bayesian Reflection Principles and Ignorance of the Future
To what future self should my 2024 self defer, then? The one with E, E*, or E**?

To each with your current probability that that will be your future self. Take an expectation.
which is likeliest [...] defer to the likeliest
Any time you find yourself taking a point estimate and then doing further calculations with it, rather than multiplying out over all the possibilities, ask whether you should be doing the latter.
cr2024 = P2024(E) * 0.5 + P2024(E*) * 0.3 + P2024(E**) * 0.7

SarahNibs Jan 23, 2024, 9:23 PM
3 points
0
in reply to: abstractapplic’s comment on: D&D.Sci(-fi): Colonizing the SuperHyperSphere [Evaluation and Ruleset]
Oh, editing is a good idea. In any case, I have learned from this mistake in creating synthetic data as if I had made it myself. <3