Roman Malov

Karma: 235

Bachelor in general and applied physics. AI safety/Agent foundations researcher wannabe.

I love talking to people, and if you are an alignment researcher we will have at least one common topic (but I am very interested in talking about unknown to me topics too!), so I encourage you to book a call with me: https://calendly.com/roman-malov27/new-meeting

Email: roman.malov27@gmail.com
GitHub: https://github.com/RomanMalov
TG channels (in Russian): https://t.me/healwithcomedy, https://t.me/ai_safety_digest

Roman Malov 22 Nov 2025 22:43 UTC
1 point
0
in reply to: Luiza’s comment on: [unknown]
Probably far earlier than humans would be able to replace neurons, if AGI+ is computer-executable code. We do not know the Turing machine that is able to execute us, AGI+ would know it for itself.

Roman Malov 22 Nov 2025 8:44 UTC
1 point
0
in reply to: deadguy’s comment on: deadguy’s Shortform
Are bacteria agents? If yes, washing hands is kinda evil.

Roman Malov 19 Nov 2025 15:15 UTC
1 point
2
on: Continuity
This was great! I think you should publish more fiction.

Roman Malov 19 Nov 2025 1:20 UTC
1 point
0
on: Teleosemantics & Swampman
the swamp-man’s thoughts are optimized by the thought-experimenter
This resembles the old p-zombies sequence.

Roman Malov 19 Nov 2025 0:30 UTC
1 point
0
on: Consciousness as a Distributed Ponzi Scheme
It lacks an adequately sophisticated self-model.
I’m not sure that having a self-model is such an important property of consciousness. I can imagine forgetting every fact about myself and being unable to tell what experiences I’m currently having, but still feeling them.

Rethinking everything

Roman Malov16 Nov 2025 23:43 UTC

3 points

0 comments3 min readLW link

Roman Malov 16 Nov 2025 22:47 UTC
1 point
0
on: You Should Get a Reusable Mask
I have a biohazard suit under my bed. IIRC, I got it a few years ago during one of the lockdowns, after hearing some news about “a flash of a new respiratory virus that was deadlier than COVID” (and now I can’t even remember its name).

Roman Malov 15 Nov 2025 21:57 UTC
1 point
0
on: Punctuation & Quotation Conventions
I’m also constantly struggling with using quotes inside quotes, especially when the outer quote ends in the same place as the inner quote. Usually, I just use different quotes (“double” vs. ‘single’ vs. «angle»), but when there are multiple quotes in the same place, it looks ugly no matter what I do.

Roman Malov 15 Nov 2025 21:42 UTC
2 points
0
on: Punctuation & Quotation Conventions
A “tomato” is a red, savory fruit.
If it were “the word ‘tomato’ refers to a red, savory fruit”, then it would be the perfect case of map/territory use of quotes.

Roman Malov 15 Nov 2025 0:33 UTC
1 point
0
on: Turing-Complete vs Turing-Universal
How do we rule out using this computation “illegitimately” (sneaking the computational work the so-called Turing-complete formalism was supposed to do into the translation step, as with the argument for Turing-completeness of digits of $π$ )?
I suggest that the framework used for translation itself should not be Turing-complete (which, of course, creates a self-referential definition, but we are in a less specified territory anyway; using this definition, we can at least form clusters a bit better).

Roman Malov 15 Nov 2025 0:06 UTC
4 points
0
on: Roman Malov’s Shortform
Sending information is equivalent to storing information if you consider Galilean relativity (any experiment performed in a frame of reference moving at a constant speed is equivalent to the same experiment in a static frame of reference).

Roman Malov 14 Nov 2025 23:27 UTC
1 point
0
on: Myopia Mythology
Another good reason for this is the subsystem alignment problem. Suppose you are trying to answer a question, but you’re not quite sure how. You can run some extra computations to help you. What computations do you run? Well, considering the myopic/non-myopic spectrum, you either want to run an extremely non-myopic computation, or an extremely myopic one (because those are Good, and anything else runs the risk of Evil).
In contrast (or, rather, in addition) there could be myopic supersystems constructed from non-myopic agents. Human Science is one example—scientists want lots of different things: money, power, status, respect, (I’m excluding curiosity for the sake of the argument) and are making plans to achieve them, but the system of incentives is set up in a way such that a result is an accurate reflection of reality (or at least that’s the idea, real science is not that perfect).

Roman Malov 14 Nov 2025 15:19 UTC
1 point
0
on: The Credit Assignment Problem
I have only a surface-level understanding of this topic, but active inference (one of the theories of intelligent agency) views brains (and agents) as prediction-error minimizers, and actions as a form of affecting the world in such a way that they minimize some extremely strongly held prediction (so strongly that it is easier to change the world to make the prediction error smaller).
My understanding mostly comes from this post by Scott Alexander:
My poor, fragile, little cognitive engines! These, then, will be the twin imperatives of your life: surprisal minimization and active inference. If your brains are still too small to process such esoteric terms, there are others available. Your father’s ancestors called them Torah and tikkun olam; your mother’s ancestors called them Truth and Beauty; your current social sphere calls them Rationality and Effective Altruism. You will learn other names, too: no perspective can exhaust their infinite complexity. Whatever you call them, your lives will be spent in their service, pursuing them even unto that far-off and maybe-mythical point where they blur into One.
Seems resonant with what you write in this sequence.

Roman Malov 13 Nov 2025 14:56 UTC
10 points
0
in reply to: Roman Malov’s comment on: Geometric UDT
And also, if we have two hypotheses, $H_{1}$ and $H_{2}$ , and policy $π$ has a much lower expected value compared to BATNA, such that both terms in the product are negative, then the total product is positive (and large), and argmax is going to choose this policy (which is strictly worse than BATNA).
But I guess both of those issues can be easily assumed away.

Roman Malov 13 Nov 2025 14:34 UTC
1 point
0
on: Geometric UDT
Another reasonable option is to adjust the weight of each hypothesis based on its probability:
${argmax}_{π} Π_{i \in I} [E (u | H_{i}, π) - E (u | H_{i}, π_{b})]^{P (H_{i})}$
I’m not sure it’s the best way to attribute the weight; if expected values are less than 1, then lower-probability hypotheses get more weight.

Roman Malov 12 Nov 2025 0:02 UTC
1 point
0
on: Weak-To-Strong Generalization
such an assumption is not strong enough to guarantee that $P_{s}$ has a 50-50 coinflip hypothesis.
What do you mean by 50-50 hypothesis? Is it $E_{1 / 2}$ such that $\forall n : P (T_{h_{n}} | E_{1 / 2}) = P (T_{t_{n}} | E_{1 / 2}) = 1 / 2$ ? If that’s the case, it doesn’t seem fair to ask from a student to have such hypothesis: the task is to learn to imitate teacher, and teacher doesn’t actually perform behavior like HTHHT, it can be either perform either HHHHH or TTTTTT.

Roman Malov 8 Nov 2025 0:09 UTC
3 points
0
in reply to: J Bostock’s comment on: Jemist’s Shortform
I’m just going from pure word vibes here, but I’ve read somewhere (to be precise, here) about Todorov’s duality between prediction and control: https://roboti.us/lab/papers/TodorovCDC08.pdf

Roman Malov 7 Nov 2025 23:38 UTC
1 point
0
in reply to: Lukas_Gloor’s comment on: Why humans don’t learn to not recognize danger?
And people who are scared of spiders often avoid looking at spiders (or even things that resemble them; consider the effectiveness of pranks with fake spiders).

Roman Malov 7 Nov 2025 23:28 UTC
1 point
0
in reply to: Rachel Shu’s comment on: Rachel Shu’s Shortform
I’m not sure, but this looks more like learned cooperative policy rather than two entities having models of each other and getting to conclusion about each other’s cooperation.

Roman Malov 7 Nov 2025 23:00 UTC
1 point
0
on: Roman Malov’s Shortform
I just resolved my confusion about CoT monitoring.
My previous confusion: People say that CoT is progress in interpretability, that we now have a window into the model’s thoughts. But why? LLMs are still just as black-boxy as they were before; we still don’t know what happens at the token level, and there’s no reason to think we understand it better just because intermediate results can be viewed as human language.
Deconfusion: Yes, LLMs are still black boxes, but CoT is a step toward interpretability because it improves capabilities without making the black box bigger. In an alternate universe, we could just have even bigger, even messier LLMs (and I assume interpretability gets harder with size: after all, some small transformers have been interpreted), and observing the progress of CoT reasoning models is an update away from this universe, which was the (subjective) default path before this update.

Roman Malov

Re­think­ing everything

Rethinking everything