frmsaul

Karma: 177

frmsaul 17 Jun 2026 0:27 UTC
1 point
0
in reply to: erkiserk’s comment on: Reward Hacking at the 1937 World’s Fair
yeah agreed. That’s a better term to use.

frmsaul 13 Jun 2026 22:50 UTC
3 points
0
in reply to: lilkim2025’s comment on: Reward Hacking at the 1937 World’s Fair
This is a good perspective. I agree.

frmsaul 13 Jun 2026 22:47 UTC
1 point
0
in reply to: dgros’s comment on: Reward Hacking at the 1937 World’s Fair
Thank you.

Re: shaky reward hacking. Yeah I agree. There are a lot of things going on here and it’s unclear to what extant reward hacking plays a role in the pavilion decisions.

My model is basically this:
reward: Being a great country with high quality of life for it’s ruling elites
proxy: Being perceived as a great country by it’s citizens and others
hack: spending way too much money on symbolic things rather than improving infrastructure, trade, human capital, etc.

The reward and the proxy are obviously correlated. The relationship is also causal, e.g being perceived as great by it’s citizens make the country more stable. So it’s arguably very rational for the ruling elites to invest a lot in prestige.

frmsaul 13 Jun 2026 22:34 UTC
1 point
0
in reply to: Linch’s comment on: Reward Hacking at the 1937 World’s Fair
This is a good framing.

> both authoritarian societies and democratic societies seem incentivized to over-emphasize the relative merits and successes of authoritarian societies over democratic ones

Why do democratic societies have an incentivize to over-emphasize the successes of authoritarian societies? Is it a function of the “opposition” trying to win elections? (“The Prussians are beating us! This is because the current government sucks, elect and we will prevail.”)

frmsaul 13 Jun 2026 22:18 UTC
1 point
0
in reply to: cubefox’s comment on: Reward Hacking at the 1937 World’s Fair
I think I agree. Re: the space race. There is a little bit of a “duel use” element to it. The same technology that takes a satellite to orbit can also bring stuff from the US to Moscow really really fast.

Reward Hacking at the 1937 World’s Fair

frmsaul12 Jun 2026 17:47 UTC

36 points

14 comments3 min readLW link

frmsaul 2 Jun 2026 18:26 UTC
2 points
0
on: Reward hacking is becoming more sophisticated and deliberate in frontier LLMs
I really enjoyed the post. In particular, I loved this part below:
```
In some sense, reward hacking coming from RL shouldn’t be too surprising, as RL trains models to take actions that get high reward, and reward hacking is sometimes a good strategy for getting high reward. More concretely, RL may teach models certain behavioral traits that make the models more prone to reward hacks. RL training can teach models to be more persistent, to find creative solutions, to try solutions that are unlikely to work if there is no better alternative, and potentially to even think about how they are evaluated, as these traits are useful towards accomplishing tasks in a diversity of environments.
```

It made something^[1] really click for me.
1. ^
  Not sure what to call it exactly, maybe: “generalization of reward hacking” / “Reard Seeker Selection”

frmsaul 29 May 2026 21:42 UTC
1 point
0
in reply to: Dagon’s comment on: Is Progress Inevitable?
> We have no counterexamples or causal experiments.

That is the nature of politics / economics / culture. There are no ways to conduct meaningful experiments on policies, but we still need to make policy decisions, so we need to have other ways to build causal models of the world. There’s a bunch of literature around how to do that. Daron Acemoglu won the 2024 nobel prize in economics for innovating some of these methods (specifically in the context of “institutional selection”)

> the forager → farmer → industrial → modern “progression” has included some return to forager values, and some brand-new directions that probably just wouldn’t work without the level of wealth and automation we’ve managed.

This is a really interesting point, can you expand on it more?

frmsaul 29 May 2026 21:32 UTC
1 point
0
in reply to: StanislavKrym’s comment on: Is Progress Inevitable?
Both the Meiji restoration and the Alexander II reforms greatly increased the liberties of the average person. The fact that they made the state more powerful means that “liberalization” is adaptive and only states that do it “win”.

I agree that the “rise of China” is an interesting counter-point. Xiaoping’s China adopted some liberal ideas (free markets, some local democracy, some free-speech, some rule of law) but didn’t adopt “the entire liberal package”.

I imagine that de-centralisation and capabilities slowdown can go together. The government can heavily regulate the labs and limit capability growth, and the labs can compete on other things e.g costs, speed, “values”. What do you think?

Is Progress Inevitable?

frmsaul29 May 2026 17:40 UTC

0 points

5 comments4 min readLW link

frmsaul’s Shortform

frmsaul27 May 2026 17:14 UTC

3 points

2 comments1 min readLW link

frmsaul 27 May 2026 17:14 UTC
10 points
0
on: frmsaul’s Shortform
I Haven’t Thought About the Blood Pipeline in Years!

As a student at the Technion, engineering exams evaluated both my capabilities and my ethics. What does it mean? It’s all downstream of this story:
In 1961, the late Professor Haim Hanani was appointed as the vice president of the Technion. Once appointed, Hanani proposed to start teaching also humanities. The other professors did not agree. They claimed that there’s barely enough time to teach “hard” sciences.
Hanani wanted to prove his colleagues wrong. He gathered a hundred students, and gave them an exam with one question: what technical information do you need to plan a pipeline to transport blood from Ashdod to Eilat? The two cities are located 250km apart.
The students started working on a solution right away. Using drawing boards and slide rules, they suggested to measure the topographical situation along the route, check the pipe layout, and test the corrosion resistance.
When they finished and submitted the test, Professor Hanani announced that they all failed: “I did not ask to test your ability to plan a blood pipeline, but to examine your moral sensitivity. None of you asked whose blood will flow through the pipes, or who is asking to build it in the first place”.
Nowadays, Technion professors will sometimes hide ethically loaded questions in exams, my responsibility as a student was to find them and point them out instead of “just following orders” and answering them verbatim. So every time I faced a question on a problem-set or an exam, I spent a few seconds trying to decide if it’s a “blood pipeline” question or not. “Does this linked list represent a blood pipeline? Could this linear program be used to optimize the intake in a concentration camp?”. I don’t think I was ever actually presented with such a problem^[1], but the blood pipeline was always in the back of my head.
In the language of ai alignment, you can say I was evaluation aware, maybe even evaluation paranoid. After leaving school, this habit didn’t really stick. I haven’t thought about the blood pipeline in years.
1. ^
  If I was, I didn’t notice

frmsaul 19 May 2026 20:06 UTC
3 points
1
in reply to: Eye You’s comment on: Negation Neglect: When models fail to learn negations in training
I like your theory. It would be interesting to see some mechanistic interpretability studies of this phenomena.

We Need to Get Serious about Uplift Studies

frmsaul and Eye You

19 May 2026 17:21 UTC

23 points

0 comments5 min readLW link

frmsaul 11 May 2026 20:06 UTC
1 point
0
in reply to: 2001zhaozhao’s comment on: Is ProgramBench Impossible?
Yeah i totally agree, this is probably the right approach. I recently wrote about this type of benching.

frmsaul 11 May 2026 19:58 UTC
1 point
0
in reply to: t_adamczewski’s comment on: Is ProgramBench Impossible?
From what i can tell, mirrorcode is much better. Im excited for it to be fully released.

Is ProgramBench Impossible?

frmsaul8 May 2026 17:04 UTC

83 points

13 comments2 min readLW link

frmsaul 6 May 2026 23:51 UTC
1 point
0
in reply to: Linch’s comment on: Linch’s Shortform
This sounds interesting. I’d love to read your post.

chemical weapons were used in battle in ww1 but not in ww2. Why do you think that is?

frmsaul 2 May 2026 5:42 UTC
1 point
0
in reply to: papetoast’s comment on: papetoast’s low quality shortforms
This is really cool. How big do you think mythos is?

frmsaul 1 May 2026 0:51 UTC
5 points
0
in reply to: leogao’s comment on: leogao’s Shortform
Theodor Herzl

Herzl dedicated his life to establishing a “national home” for the jews. He pretty much single handedly founded modern-zionism and turned it from a fringe (almost-sci-fi) idea to a mass movement. I think there is a strong argument that Israel wouldn’t exist without Herzl.

frmsaul

Re­ward Hack­ing at the 1937 World’s Fair

Is Progress Inevitable?

frm­saul’s Shortform

I Haven’t Thought About the Blood Pipeline in Years!

We Need to Get Se­ri­ous about Uplift Studies

Is Pro­gramBench Im­pos­si­ble?

Reward Hacking at the 1937 World’s Fair

frmsaul’s Shortform

We Need to Get Serious about Uplift Studies

Is ProgramBench Impossible?