Bachelor in general and applied physics. AI safety/Agent foundations researcher wannabe.
I love talking to people, and if you are an alignment researcher we will have at least one common topic (but I am very interested in talking about unknown to me topics too!), so I encourage you to book a call with me: https://calendly.com/roman-malov27/new-meeting
Email: roman.malov27@gmail.com
GitHub: https://github.com/RomanMalov
TG channels (in Russian): https://t.me/healwithcomedy, https://t.me/ai_safety_digest
Roman Malov
Maybe this post of mine might be relevant?
Have you seen Harder Drive?
I browsed a bit on your website but did not found link to any call. Can you please help?
Aspects of math which are shaped by humans and not math’s structure is what my post is about.
Oh, I plan to post on the topic of alien math. But in short—aliens are going to be guided by beauty/interestingness/utility for the same reason evolution pushed humans to value them, so a lot of our math could intersect (but you still need aliens or humans to pluck out those valuable math bits, you can’t force math look at itself hard enough and present those parts to you).
And they would have group theory because our universe is just full of symmetries.
I mean, math is invented in the same sense fridges are invented. Is there a space of designs, search over which, some search process could stumble upon and therefore “discover” a fridge design? Sure. But at that point, we call this invention.
I’m not saying that such formalised objective couldn’t exist. My claim is that we (probably) haven’t yet found one. And if there will be one, it wouldn’t be “metaphysically objective”, it will just spit out very insightful theorems very fast.
Unlike chess, there is no “optimal play” for math. And if there is, I think it would be considered slow/boring (if we do have computational constraints).
I will adjust the post. Thanks.
Unmathematical features of math
I had a similar experience, though this was before I became aware of AI x-risks, and it was related to the risk of nuclear war. I was constantly checking the news, constantly vigilant for any sounds resembling an airstrike alarm, was constantly checking the window during partly cloudy days, etc. The way I dealt with it was basically by accepting death, and then, when I learned of AI x-risks, this wasn’t that much of a hit.
Then they aren’t perfect, aren’t they?
But I guess I can see your point, the algorithm requires a lot of time and compute and maybe anything that has that much resources can answer questions like that with exhaustive enough search. I guess the problem as you define it is underconstrained.
Justification =
P(“Opponent will agree that S is a good justification for prediction” | “I say S as my justification for a prediction”). If there are no division-by-zero errors, that should work.
Why not just ask them to superforecast what would be the ideal response to “what’s the justification for prediction X”? Or are they so perfect that they don’t even consider counterfactuals?
Jailbreaks, at least in frontier models, stop working.
Why do you expect that? The hardness of jailbreak creation did not grow as fast as capabilities, to the best of my knowledge.
Why no one here is talking about Claude Mythos? I don’t have any takes, but I want to hear yours.
encoded how?
Via Unix time (milliseconds from Jan 1st 1970), and you can just trim whatever precision you don’t need. People are born with rate of 2/second, so you don’t need last 2 digits. Current timestamp is 10 digits long for seconds, which is roughly 33 bits.
The hypothesis you describe wouldn’t result the behaviour I quoted. If the hypothesis “put 1 on square-free and 0 otherwise” is not in the hypothesis class, then “If you observed the square-free sequence so far, then guess 0 on square-free indices and 1 otherwise” is not in the hypothesis class either. Or are you claiming that the behaviour emerges from constantly changing “leading hypothesis”?
In one of your previous talks, you mentioned that there would be ~2 years during which you can intervene, and after that the future is in the control of AI. Do you think those 2 years have already started? If not, how long until they start? And has your estimate of this “window of opportunity” changed in any way?
This is a paragraph from the description of a future where AI companies try to solve alignment by automating it with LLM agents, did I guess correctly?