mrkelley

Karma: 1

mrkelley 15 May 2026 22:01 UTC
1 point
0
in reply to: Alvin Ånestrand’s comment on: Over Eight Months of Progress in Two: Analyzing the Mythos Preview Capability Jump
Fair point about the abbreviations. I went through end edited my original response in that vein. Thank you for that—like all deep technologist nerds (and I am their king, I do not use the term pejoratively), I tend to forget my audience might not be technically inclined. So your suggestion is warmly received.
But let’s discuss the idea of research taste. Does that mean coming up with the original research idea? If so, I know several corporate R&D types who would disagree. They often take the original idea from outside their teams. In my world, at least, it’s often a customer who says, “I need X”, and the research team takes it from there, creating one or more hypotheses based on X as a final target product. Alternatively, it might be a field engineer who says “It would be incredibly useful to have Y” and the team takes it from there. So does that mean it’s okay for a Superhuman AI Researcher to take X or Y from a human, and still be considered a true Superhuman AI Researcher? I’m asking because that’s very much what happened in my case. I said, “I need to create a chip that does X”, and I did so knowing I didn’t even know what hypotheses to generate based on that goal. I didn’t honestly know enough to say “H(0) means architecture A is our best performer, and H(1) means architecture B is our best performer”, because I didn’t know anything at that time about chip architecture. Granted, I’ve learned a lot since then, but I quite literally said to my LLM: “Can we create a chip that performs X?” and the LLM took it from there. It came up with a couple of architectural ideas, generated an H(0) and an H(1), tested them and reported back to me. Of course, I asked it to tutor me along the way, because I genuinely wanted to learn, but I did not design the hypotheses, the test processes or the success / failure criteria. I’d have to share the entire record with an expert to determine whether it was exercising genuine “good research taste”, and I may do that at some point. It would be a very interesting paper, and peer review would be valuable on it. I won’t do it now, because the patent process around it is not fully complete. We’ve only filed the provisional, so I don’t want to share the underlying data yet.
In my case, the automated code generation question is a bit less clear, because I explicitly mandated stopping points so that I could learn as we went. But I never told it what kind of code to write, or who the code should be structured. There’s automation and there’s automation. So in my book at least, “automated code generation” is vague, probably too vague to be really useful. My two cents.
But my main question is this: Can we rule out Superhuman AI Researcher (SAR) status in this case? Can we definitively say, given what I can share now, that the LLM was not acting as a true SAR?

mrkelley 8 May 2026 21:55 UTC
2 points
0
on: Over Eight Months of Progress in Two: Analyzing the Mythos Preview Capability Jump
I’d like to talk a moment about the idea of Superhuman AI Researcher (SAR). While we might disagree on details or precise definitions, the general idea is an AI capable of performing autonomous research on some subject. The SAR might be capable of its own ideation and choosing its own research topic, or it might not. I want to talk briefly about the kind that takes a human-originated idea and runs with it, researching and developing it in full. I’m choosing that flavor because I believe it already exists.

I am a solutions architect in the software world by profession. But about a year ago, I had an idea for a new chip architecture. I have no training in chip design or testing. Until this project, I had only a loose idea of what an FPGA is. But I began by talking the idea over with a large language model. We selected an appropriate Field Programmable Gate Array (FPGA—used widely in designing and testing new computer chip architectures) together, and the LLM taught me how to manage the physical aspects (reset buttons, what blink codes to look for, how to flash an operating system to the general computing side of the FPGA, etc). Some of it was all new to me, some adjacent to previous work, but we got through it. Then, over a period of weeks, under my supervision, the LLM designed and tested a chip architecture that satisfied my original vision. I did none of the real work. I intentionally took the role of assistant: I reset the board as needed, I adjusted power levels and relayed readings, I reflashed the OS as needed and I maintained the LAN and the project’s physical components. The LLM did the chip design work, implemented that design on the FPGA, tested it, reported test results to me, originated refinements to the architecture to improve performance, and tested its refinements until we reached the absolute best performance possible on the FPGA. Finally, the LLM created a description of the architecture and helped me find an attorney to work through the patent process. The provisional patent was submitted to the USPTO on 4 May, 2026.

During this process, I monitored the LLM’s work. I read specs, I followed output and test results, and I approved decisions. But my approval was intentionally along the lines of rubber-stamping the LLM’s choices. I took the time to understand them as best I could (remember, this project was outside my expertise). But I wanted to see how far we’d get if I let the LLM “do its thing”.

The model was Claude Opus 4.x (currently, in that project, 4.7, but I never recorded the starting model version. That’s an oversight on my part, but I wasn’t thinking about SAR metrics at the time).

To sum up, I, a human being, had an idea outside my own skill set. A Large Language Model researched it, designed it, tested it, improved it, and helped me patent it. Is this true SAR? That depends on how we define SAR. But I would say this is at least very close to true SAR. After my experience in this, I am more and more convinced the acceleration curve is even steeper than we’ve ever imagined.