Bachelor in general and applied physics. AI safety/Agent foundations researcher wannabe.
I love talking to people, and if you are an alignment researcher we will have at least one common topic (but I am very interested in talking about unknown to me topics too!), so I encourage you to book a call with me: https://calendly.com/roman-malov27/new-meeting
Email: roman.malov27@gmail.com
GitHub: https://github.com/RomanMalov
TG channels (in Russian): https://t.me/healwithcomedy, https://t.me/ai_safety_digest
Roman Malov
20/12/2025
I’ve read Probabilistic Payor Lemma? and Self-Referential Probabilistic Logic Admits the Payor’s Lemma and thought about the problem for a while. I’m not sure I have enough background to fully understand the problem and suggested solutions.
The Case Against AI Control Research seems related. TL;DR: mainline scenario is that hallucination machine is overconfident about it’s own alignment solution, then it gets implemented without much checking, then doom.
Doesn’t link to his shortform
19/12/2025
I’ve read about the Probabilistic Löb theorem and tried to understand it.
Daily Research Diary
In the comments to this quick take, I am planning to report on my intellectual journey: what I read, what I learned, what exercises I’ve done, and which projects or research problems I worked on. Thanks to @TristianTrim for suggesting the idea. Feel free to comment with anything you think might be helpful or relevant.
Welcome! The only thing I can think of on the intersection of AI and photography (besides IG filters) is this weird “camera”, which uses AI to turn a little bit of geographical information to create images. Do you know of any other interesting intersections?
IIUC, those are just bots who copy early and liked comments. So my comment would also be copied by other bots.
They are mostly like “wow, what a great [particular detail in the video]”. Sometimes it’s a joke I thought of.
Not much right now. I usually just write the first thing that comes to mind. But I think I can train myself to write comments with a specific message or intended effect. What I need is to understand which effect might be useful.
I’m not sure, I don’t have statistics on that, and I would assume information like this is kept secret.
I definitely remember authors responding to correctional comments by changing the description, pinning comments, and making clarification videos.
I have a weird ability. Can you help me find a use for it?
If I see a YouTube video pop up in my feed right after it’s published, I can often come up with a comment that gets a lot of likes and ends up near the top of the comment section.[1] It’s actually not that hard to do: the hardest part is being quick enough[2] to get into the first 10-30 comments (which I assume is the average number of comments viewers glance over), but the comment itself might be pretty generic and not that relevant to the video’s content.
Do you know a way I could use that? You can suggest advice for achieving convergent instrumental goals, usual human goals, and (most importantly) AI x-risk reduction. If you think I’m hyper-online or delusional about this, you can also point it out.
- ^
I wouldn’t be surprised if it’s actually not that hard and my success is just a consequence of being hyper-online.
- ^
I also suspect that the YouTube algorithm might have learned about this ability of mine and has now categorized me as a “top commenter,” so it shows me videos earlier than others and uses me to “boost engagement” or smth.
- ^
The post “Butterfly Ideas” seems relevant.
Great job! I wish you good luck in your endeavors.
If the future contains far more people than we have today, and if people are going to have their memory upgraded, and if the information about us on the internet is going to be preserved, then each person alive today is going to be kind of a celebrity.
It’s as if our civilization started with 10 people and they recorded every second of their lives: we would know almost everything about them. People would read their quotes, live by their wisdom, and create cults around them.
but there’s no similar fundamental reason that cognitive oversight
This might not be a similar reason, but there is a fundamental reason described in Deep Deceptiveness
I’m not even aware about of half of this...
I would suggest inserting links for people like me.
What is the original dictator game?
Without context, this dilemma seems super weird. If someone says “yes,” they are a whiny bitch with whom nobody would cooperate (as they should).
Good job, and I wish you all the luck in your endeavors! I think the format of the journal could benefit from adding something like a quantitative assessment of how much you have done compared to what you planned to do. It would (hopefully) help you calibrate, state clear and achievable goals, ease readability, and, in addition, help others calibrate which tasks are harder or easier.
Maybe we shouldn’t start RSI on purpose.
Please, just please, don’t start RSI on purpose. For years, AI x-risk people have warned us that a huge danger comes with AI capable of RSI, and even the mere existence of it poses a threat. We were afraid we would accidentally miss the point of no return, and now, so many people (not even in major AI companies, but in smaller labs too) are trying to bring that point closer purposefully.
Programs sometimes don’t work as we expect them to, even when we are the ones designing them. How would making the hallucination machine do this job produce something so powerful with working guardrails?
Solomonoff Induction is incredibly powerful. It’s so powerful that it can’t exist in our world. But because of its power, it needs to be handled with care. For it to actually produce accurate hypotheses, you have to expose it to as much evidence as possible, because even the tiniest coincidence in your data (which will happen if you don’t collect the widest dataset possible) would be interpreted as a Deep Rule of the world.
I think we can go one step further: (with sufficiently smart AIs) every topic explanation is now a textbook with exercises.