Edmund Nelson

Karma: 41

Edmund Nelson 17 Jun 2025 0:04 UTC
5 points
0
in reply to: Expertium’s comment on: Intelligence Is Not Magic, But Your Threshold For “Magic” Is Pretty Low
I’ll say this much

Rainbolt tier LLMs already exist https://geobench.org/
AI’s trained on Geoguessr are dramatically better than rainbolt and have been for years

Edmund Nelson 21 Apr 2025 18:23 UTC
7 points
0
on: Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red
It’s interesting how much of this was computer vision problems. Inability to look at the screen and determine which set of pixels is the stairs or the complete inability to differentiate cutable tree’s from ones that cannot be cut. That part at least seems like the kind of problem that would go away in a year if significant effort was devoted toward the problem.
I find it fascinating how this set of children’s video games from the 90s does a better job of showing off my frustrations with large language models than anything else. When you give them small concrete narrow tasks and can reliably test their output they are incredibly useful, (ex They are superhuman at helping you write small functions in code)but do not try to get them to do a long context task that you can’t test intermediate steps on. (after all if you could test intermediate steps you can break the task down until you get to the smallest intermediate step and prompt the model with that step). The Hallucination problem is a lot more clear when playing pokemon than anything else, and it’s much more clear about random issues in agenic ability.

The inability of models to have memory is really the major frustration currently preventing models from being used in longer contexts. Pokemon as a benchmark is in theory a 2-3 hour (sub 2 is possible but takes a lot of resets) long task from start to finish if you don’t waste any time.

Edmund Nelson 20 Apr 2025 16:48 UTC
1 point
0
in reply to: Julian Bradshaw’s comment on: Is Gemini now better than Claude at Pokémon?
>I’m not sure that TAS counts as “AI” since they’re usually compiled by humans
Agreed, it’s more “this is what the limit looks like”

>Still, I’d say this is more a programmed “bot” than an AI in the sense we care about.
Is stockfish 8 not an AI? I feel like the goalposts of what counts as “Ai” keep getting shifted. Pokebotbad is an “AI” that searches to solve the pokemon state space

Edmund Nelson 20 Apr 2025 14:32 UTC
1 point
0
on: Is Gemini now better than Claude at Pokémon?
I’ll mention beating pokemon isn’t that big of a challenge in and of itself, what’s important here is that this thing that wasn’t trained to do pokemon can. *
Depending on how strict you want to be with what you call AI beating pokemon we have Ai’s that beat pokemon in less than 2 hours or if you want to go with the interpretation that “AI beating pokemon is a program that beats pokemon” we have “Ai’s” that beat pokemon in less than 2 minutes or less than 1:30 if you want a more strict definition of “beat the game”.

Edmund Nelson 2 Nov 2024 4:12 UTC
3 points
0
in reply to: jbash’s comment on: Prediction markets and Taxes
Yeah that’s fair, I’m just so used to American odds for gambling that I mentally use them all the time for these sorts of things.
Probably should have used good old fashioned odds instead.
The reason casino’s show something like “Yankee’s +110 Red sox −120” is so you can easily see the casino’s rake or something.

Edmund Nelson 1 Nov 2024 21:39 UTC
3 points
0
in reply to: jbash’s comment on: Prediction markets and Taxes
American odds can be basically interpreted as “The Net amount you win after making a $100 bet” So if the odds are +150 you win $150 PLUS your initial $100 bet if you win the bet.
Negative american odds means “the amount you have to be to win $100” so odds o −230 would mean you have to bet $230 to win $100

Edmund Nelson 1 Nov 2024 21:37 UTC
1 point
0
in reply to: Dagon’s comment on: Prediction markets and Taxes
As I said epistemic status “Trivial”
This is something trivial but it’s worth noticing.

Prediction markets and Taxes

Edmund Nelson1 Nov 2024 17:39 UTC

11 points

8 comments1 min readLW link

Edmund Nelson 5 Jul 2023 18:26 UTC
2 points
0
in reply to: Ben Pace’s comment on: Twitter Twitches
The number is way too high for that. I use twitter almost an hour a day (way way too much time) and I don’t hit the rate limit

Edmund Nelson 25 Feb 2023 10:53 UTC
2 points
1
in reply to: gwern’s comment on: [Link] A community alert about Ziz
Daniel was definitely strange in many socially awkward ways

Edmund Nelson 25 Feb 2023 10:39 UTC
3 points
−6
in reply to: Gordon Seidoh Worley’s comment on: [Link] A community alert about Ziz
Knowing what I know about Daniel, I could easily psychologically manipulate him into doing terrible acts if need be. I can easily see a person who is a master of being a cult leader (which Ziz is a top tier cult leader) mentally breaking Daniel. He (at least was) a socially awkward person who took things extremely literally and was easy to push around. He’s definitely up there on the “easy” category, and cult leaders like Ziz need somebody like that. Going over Ziz’s tactics, while Daniel is unlikely to have developed multiple personalities, he’d easily enter a few of the states Ziz mentions.
However I have no clue how Daniel would physically be able to commit certain acts his hand eye coordination is bad, and he’s physically not very strong. If he tried to knife me to death he’d get knocked out by my overhand right or heel hooked into submission well before he got enough stabs in (I’m not a great boxer),

Edmund Nelson 25 Feb 2023 10:30 UTC
6 points
7
in reply to: Duncan Sabien (Inactive)’s comment on: [Link] A community alert about Ziz
That’s fair from a mental POV, but Daniel Blank physically has poor hand-eye coordination and bad reflexes, meaning that if he tried to shoot somebody he’d be extremely likely to miss at all but the shortest of ranges. (the murder in question was definitely gunshot based)

While he could be a part of the conspiracy his ability to have physically committed the actual act is questionable.

Edmund Nelson 19 Jul 2022 7:48 UTC
1 point
0
on: Launching a new progress institute, seeking a CEO
Sadly I lack the skills necessary to do such a monumental task otherwise I would apply. What are the “lesser” roles you are looking for?

Edmund Nelson 27 May 2022 3:17 UTC
5 points
0
on: Visible Homelessness in SF: A Quick Breakdown of Causes
“I also found that, controlling for rents, the partisanship of a state did not predict homelessness”″
Did Partisanship correlate with rents if so what direction?
Good post, it makes me think a lot of the homeless crisis is more literally “I can’t buy a home”.

Edmund Nelson 5 Mar 2022 2:04 UTC
3 points
0
on: Edmund’s Short form
At what point is Ai judged to have “superhuman performance”

From what I can tell there are roughly 4 stages of “Ai performance”

Stage 1 : subhuman (this covers a lot of ground) The Ai is unable to perform as well as a top human in the field (such Ai’s can still be useful)
Stage 2: The “superhuman” stage: Ai’s outperform humans on normal variations of the task
Stage 3: The “adversarial stage” : Humans find adversarial examples which let them outperform the AI (This is most relevant in games) ex [starcraft 2](
Example of exploitative play allowing human to beat AI (this AI beat then top human Serral in a best of 5)
)
Stage 4 the god phase: Even in adversarial examples humans are outperformed by the computer (ex Chess)

Obviously Ai’s can skip stage 3 entirely, and that does happen but I hear conflicting results on stage 3, many people argue we have superhuman results in starcraft 2, but unless there is an Ai more advanced than blizzcon alphastar, it appears we are on stage 3 (humans can reliably beat the AI with anti-AI tactics, but normal play loses). Is stage 3 generally considered “superhuman”?
Alphastar vs serral link (is there a way to collapse these?)

Edmund’s Short form

Edmund Nelson5 Mar 2022 2:03 UTC

1 point

2 comments1 min readLW link

Edmund Nelson 9 Feb 2022 10:15 UTC
2 points
0
on: How to Legally Conduct “Rationalist Bets” in California?
The main way I have gotten around these in practice has been to wager non-monetary items, things like “Whoever loses has to file the other’s taxes” The method tends to work much better when you live in the same house (since housework is a very tradeable thing and is always in demand). This obviously has the weakness that it tends to work best for ⁵⁰⁄₅₀ wagers and is worse at the full continuum. However there are ways around this (\instead of betting on IF something will happen you can bet on WHEN, or you can bet on HOW MUCH ex “I bet 30 guests will arrive at the party tommrow, ” person 2 : “I’ll take the under on that”)

Edmund Nelson 12 Jul 2021 8:10 UTC
3 points
0
on: How much chess engine progress is about adapting to bigger computers?
The oldest Chess engine you can find for free online is cray blitz https://craftychess.com/downloads/crayblitz/ which was the world computer chess champion in 1983. Unfortunately A: it is not UCI compatible and B: the oldest version available is the 1989 version. I asked Harry Lewis Nelson himself yesterday and he said that he doesn’t have the source code for the 1983 version anymore. Unfortunately any farther back and the original author doesn’t appear to be alive anymore. (harry himself is 89)

The oldest chess engine that can be called a champion is kaissa https://www.chessprogramming.org/Kaissa however the best I can do is give you a rewrite in turbo C http://greko.su/index_en.html I figure 1977 is as old as you’re going to get. Hippke can probably do most of the heavy lifting from here.
I agree with the use of stockfish 11 as it appears to be the best engine with poor hardware, NNUE requires more RAM than actually exists on a 1997 machine.

Edmund Nelson

Pre­dic­tion mar­kets and Taxes

Ed­mund’s Short form

Prediction markets and Taxes

Edmund’s Short form