TsviBT comments on A regime-change power-vacuum conjecture about group belief

TsviBT 25 Jun 2025 3:45 UTC
4 points
2

I am seeing plenty of “strokes of genius” in LLMs, so the potential is there. They are not “dumb”, they have good creativity (in my experience).

Examples? What makes you think they are strokes of genius (as opposed to the thing already being in the training data, or being actually easy)?
- mishka 25 Jun 2025 4:21 UTC
  10 points
  0
  Parent
  I don’t know. I started (my experience talking with GPT-4 and such) with asking it to analyze a 200 lines of non-standard code with comments stripped out. It correctly figured out that I was using nested dictionaries to represent vector-like objects, and that that was an implementation of a non-standard unusually flexible (but slow) neural machine.
  
  This was obviously the case of “true understanding” (and it was quite difficult to reproduce, as the models evolved the ability to analyze this code well was lost, then eventually regained in better models; those better models eventually figured even more non-trivial things about that non-standard implementation, e.g. at some point newer models started to notice on their own that that particular neural machine was inherently self-modifying; anyway, very obvious evolution from inept pattern matching to good understanding, with some setbacks during the evolution of models, but eventually with good progress towards better and better performance).
  
  Then I asked it to creatively modify and creatively remix some Shadertoy shaders, and it did a very good job (even more so if one considers that that model was visually blind and was unable to see the animations produced by its shaders). Nothing too difficult, but things like taking a function from one of the shaders and adding a call to this function from another shader with impressive visual effects… Again, with all the simplicity, it was more than would have occurred to me, if I were trying to do this manually...
  
  But when I tried to manually iterate these steps to obtain “evolution of interesting shaders”, I got a rapid saturation, not an unlimited interesting evolution...
  
  So not bad at all (I occasionally do rather creative things, but it is always an effort, so on the occasions when I am unable to successfully do this kind of effort, I start to feel that the model might be more creative than “me in my usual mode” (although, I don’t know if these models are already competitive with “me in my peak mode”)).
  
  When they first introduced Code Interpreter, I asked it to solve a math competition problem, and it did a nice job, and then I asked it what would I do, if I want to do it only using a pen and a paper with limited precision (it was a problem with a huge answer), and it told me to take logarithms, and demonstrated how to do that with logarithms.
  
  That immutable tree processing I mentioned was good in this sense, very elegant, taught me some Python tricks I have not known.
  
  Then when reasoning models were first introduced I asked for a linear algebra problem (which I could not solve myself, but people doing math competitions could), and weaker models could not do it, but o1-preview could one-shot it.
  
  (All these are conversations I am publishing on github, so if one wants to take a closer look, one can.)
  
  Anyway, my impression is that it’s not just training data, it’s more than that. This is not doable without reasonable understanding, without good intuition.
  
  At the same time, they can be lazy, can be sloppy and make embarrassing mistakes (which sometimes don’t prevent them from proceeding to the right result). But it’s the reliability which is a problem, not creative capability which seems to be quite robust (at least, on the “medium creativity” setting).
  - TsviBT 25 Jun 2025 6:26 UTC
    4 points
    3
    Parent
    Ok, thanks for the info. (For the record, these do not sound like what I would remotely call “strokes of genius”.)
    - mishka 25 Jun 2025 9:27 UTC
      2 points
      0
      Parent
      A talented kid from a math school, I’d say (about a typical virtual character I was interacting with during those conversations). Not bad for the time being…