hippke

Karma: 428

Exploring GPT4′s world model

hippke20 Mar 2023 21:31 UTC

−5 points

5 comments2 min readLW link

hippke 15 Dec 2022 20:10 UTC
7 points
0
on: Predicting GPU performance
I think the biggest improvement in this report can be made regarding Appendix D. The authors describe that they use “process size rather than transistor size” which is, as they correctly note, a made-up number. What should be used instead is transistor density (transistors per area), which is readily available in much detail for many past nodes, and the most recent “5nm” nodes (see e.g., wikichip).

hippke 15 Dec 2022 16:50 UTC
3 points
0
on: Predicting GPU performance
What about the Landauer limit? We are 3 orders of magnitude from the Landauer limit ( $10^{- 21}$ J/op), see my article here on Lesswrong. The authors list several physical limitations, but this one seems to be missing. It may pose the most relevant limit.

hippke 15 Sep 2022 11:58 UTC
12 points
2
on: What’s the longest a sentient observer could survive in the Dark Era?
That’s an excellent question, pondered by the brightest minds. The great Freeman Dyson proposed a solution dubbed eternal intelligence (Dyson 1979, Reviews of Modern Physics, Volume 51, Issue 3, July 1979, pp.447-460). Basically, some finite amount of matter=energy is stored. As the universe cools over time, energy costs per computation decrease (logarithmically, but forever). After each cooling time period, one can use some fraction of the remaining energy, which will thus never go to zero, leading to eternal consciousness.
It was later understood that the expansion of the universe is accelerating. If that holds, the concept breaks down, as Dyson admitted. In the far future, any two observers will be separated, making the remaining subjects very lonely.

hippke 7 Sep 2021 18:15 UTC
3 points
in reply to: JBlack’s comment on: Why the technological singularity by AGI may never happen
I think this calculation is invalid. A human is created from a seed worth 700 MB of information, encoded in the form of DNA. This was created in millions of years of evolution, compressing/worth a large (but finite) amount of information (energy). A relevant fraction of hardware and software is encoded in this information. Additional learning is done during 20 years worth 3 MWh. The fractional value of this learning part is unknown.

hippke 5 Sep 2021 15:48 UTC
3 points
in reply to: JBlack’s comment on: Why the technological singularity by AGI may never happen
How can we know that “it is possible to train a 200 IQ equivalent intelligence for at most 3 MW-hr”?

hippke 3 Sep 2021 18:00 UTC
2 points
in reply to: Dave Lindbergh’s comment on: Why the technological singularity by AGI may never happen
How did von Neumann come close to taking over the world? Perhaps Hitler, but von Neumann?

hippke 3 Sep 2021 17:17 UTC
0 points
in reply to: jbash’s comment on: Why the technological singularity by AGI may never happen
Sure! I argue that we just don’t know whether such a thing as “much more intelligent than humans” can exist. Millions of years of monkey evolution have increased human IQ to the 50-200 range. Perhaps that can go 1000x, perhaps it would level of at 210. The AGI concept makes the assumption that it can go to a big number, which might be wrong.

Why the technological singularity by AGI may never happen

hippke3 Sep 2021 14:19 UTC

5 points

14 comments1 min readLW link

hippke 19 Jul 2021 6:10 UTC
1 point
in reply to: Bartlomiej Lewandowski’s comment on: A closer look at chess scalings (into the past)
From what I understand about “ELO inflation”, it refers to the effect that the Top 100 FIDE players had 2600 ELO in 1970, but 2700 ELO today. It has been argued that simply the level increased, as more very good players entered the field. The ELO number as such should be fair in both eras (after playing infinitely many games...). I don’t think that it is an issue for computer chess comparisons. Let me know if you have other data/information!

hippke 19 Jul 2021 6:06 UTC
17 points
0
in reply to: hippke’s comment on: Benchmarking an old chess engine on new hardware
I ran the experiment “Rebel 6 vs. Stockfish 13” on Amazon’s AWS EC2. I rented a Xeon Platinum 8124M which benched at 18x 1.5 MNodes/s. I launched 18 concurrent single-threaded game sets with 128 MB of RAM for each engine. Again, ponder was of, no books, no tables. Time settings were 40 moves in 60s + 0.6 per move, corresponding to 17.5 MNodes/move. For reference, SF13 benches at ELO 3630 at this setting (entry “64 bit”); Rebel 6.0 got 2415 on a Pentium 90 (SSDF Computer Rating List (01-DEC-1996).txt, 90 kN/move).
The result:
- 1911 games played
- 18 draws
- No wins for Rebel
- All draws when Rebel played white
- ELO difference: 941 +- 63
Interpretation:
- Starting from 3630 for SF13, that corresponds to Rebel on a modern machine: 2689.
- Up from 2415, that’s +274 ELO.
- The ELO gap between Rebel on a 1994 Pentium 90 (2415) and SF13 on a 2020 PC (3630) is 1215 points. Of these, 274 points are closed with matching hardware.
- That gives 23% for the compute, 77% for the algorithm.
Final questions:
- Isn’t +274 ELO too little for 200x compute?
- We found 50% algo/50% compute for SF3-SF13. Why is that?
Answer: ELO gain with compute is not a linear function, but one with diminishing returns. Thus, the percentage “due to algo” increases, the longer the time frame. Thus, a fixed percentage is not a good answer.
But we can give the percentage as a function of time gap:
- Over 10 years, it’s ~50%
- Over 25 years, it’s ~22%
With data from other sources (SF8, Houdini 3) I made this figure to show the effect more clearly. The dashed black line is a double-log fit function: A base-10 log for the exponential increase of compute with time, and a natural log for the exponential search tree of chess. The parameter values are engine-dependent, but should be similar for engines of the same era (here: Houdini 3 and SF8). With more and more compute, the ELO gain approaches zero. In the future, we can expect engines whose curve is shifted to the right side of this plot.

hippke 16 Jul 2021 20:13 UTC
6 points
in reply to: paulfchristiano’s comment on: Benchmarking an old chess engine on new hardware
- With a baseline of 10 MNodes/move for SF3, I need to set SF13 to 0.375 MNodes/move for equality. That’s a factor of 30. Caveat: I only ran 10 games which turned out equal, and only at 10 MNodes/move for SF3.
- Yes: Rebel6 at normal 2021 settings (40 moves in 15 min) can be approximately matched with SF13 at 20 kNodes/move. More precisely: I get parity between Rebel6 (128 MB) and SF13 (128 MB) for 16 MNodes/move vs. 20 kNodes/move (=factor of 800x). On my Intel Core-M 5Y31 (750 kNodes/s), that’s 21s vs. 0.026s per move. Note that the figure shows SF8, not SF13.
- I was contacted by one person via PM, we are discussing the execution setup. Otherwise, I could do it by the end of July after my vacation.

Benchmarking an old chess engine on new hardware

hippke16 Jul 2021 7:58 UTC

72 points

4 comments5 min readLW link 1 review

hippke 15 Jul 2021 20:36 UTC

3 points

in reply to: hippke’s comment on: A closer look at chess scalings (into the past)

OK, I have added the Houdini data from this experiment to the plot:

The baseline ELO is not stated, but likely close to 3200:

Experiment	kNodes/move	ELO drop	ELO calculated
4k nodes vs 2k nodes	2	303	1280
8k nodes vs 4k nodes	4	280	1583
16k nodes vs 8k nodes	8	237	1863
32k nodes vs 16k nodes	16	208	2100
64k nodes vs 32k nodes	32	179	2308
128k nodes vs 64k nodes	64	156	2487
256k nodes vs 128k nodes	128	136	2643
512k nodes vs 256k nodes	256	134	2779
1024k nodes vs 512k nodes	512	115	2913
2048k nodes vs 1024k nodes	1024	93	3028
4096k nodes vs 2048k nodes	2048	79	3121
Baseline	4096		3200

hippke 15 Jul 2021 20:25 UTC
1 point
in reply to: paulfchristiano’s comment on: A closer look at chess scalings (into the past)
Mhm, good point. I must admit that the “70 ELO per doubling” etc. is forum wisdom that is perhaps not the last word. A similar scaling experiment was done with Houdini 3 (2013) which dropped below 70 ELO per doubling when exceeding 4 MNodes/move. In my experiment, the drop is already around 1 MNode/move. So there is certainly an engine dependence.

hippke 15 Jul 2021 20:13 UTC
1 point
in reply to: paulfchristiano’s comment on: A closer look at chess scalings (into the past)
From what I understand about the computer chess community:
- Engines are optimized to win in the competitions, for reputation. There are competitions for many time controls, but most well respected are the CCC with games of 3 to 15 minutes, and TCEC which goes up to 90 minutes. So there is an incentive to tune the engines well into the many-MNodes/move regime.
- On the other hand, most testing during engine development is done at blitz or even bullet level (30s for the whole game for Stockfish). You can’t just play thousands of long games after each code commit to test its effect. Instead, many faster games are played. That’s in the few MNodes/move regime. So there’s some incentive to perform well in that regime.
- Below that, I think that performance is “just what is it”, and nobody optimizes for it. However, I think it would be valuable to ask a Stockfish developer about their view.

hippke 15 Jul 2021 19:58 UTC
1 point
in reply to: Charlie Steiner’s comment on: A closer look at chess scalings (into the past)
Yes, that’s correct. It is slightly off because I manually set the year 2022 to match 100,000 kNodes/s. That could be adjusted by one year. To get an engine which begins its journey right in the year 2021, we could perform a similar experiment with SF14. The curve would be virtually identical, just shifted to the right and up.

hippke 15 Jul 2021 19:55 UTC
3 points
in reply to: Bucky’s comment on: A closer look at chess scalings (into the past)
Oh, thank you for the correction about Magnus Carlsen! Indeed, my script to convert the timestamps had an error. I fixed it in the figure.

Regarding the jump in 2008 with Rybka: I think that’s an artifact of that particular list. Similar lists don’t have it.

A closer look at chess scalings (into the past)

hippke15 Jul 2021 8:13 UTC

50 points

14 comments4 min readLW link

hippke 12 Jul 2021 18:27 UTC
1 point
in reply to: Lech Mazur’s comment on: How much chess engine progress is about adapting to bigger computers?
Good point: SF12+ profit from NNs indirectly.

Regarding the ELO gain with compute: That’s a function of diminishing returns. At very small compute, you gain +300 ELO; after ~10 doublings that reduces to +30 ELO. In between is the region with ~70 ELO; that’s where engines usually operate on present hardware with minutes of think time. I currently run a set of benchmarks to plot a nice graph of this.
What links here?
- paulfchristiano's comment on A closer look at chess scalings (into the past) by hippke (15 Jul 2021 20:13 UTC; 2 points)

hippke

Ex­plor­ing GPT4′s world model

Why the tech­nolog­i­cal sin­gu­lar­ity by AGI may never happen

Bench­mark­ing an old chess en­g­ine on new hardware

A closer look at chess scal­ings (into the past)

Exploring GPT4′s world model

Why the technological singularity by AGI may never happen

Benchmarking an old chess engine on new hardware

A closer look at chess scalings (into the past)