Rauno Arike

Karma: 621

Hidden Reasoning in LLMs: A Taxonomy

Rauno Arike, RohanS and Shubhorup Biswas

25 Aug 2025 22:43 UTC

66 points

12 comments12 min readLW link

How we spent our first two weeks as an independent AI safety research group

RohanS, Rauno Arike and Shubhorup Biswas

11 Aug 2025 19:32 UTC

28 points

0 comments10 min readLW link

Extract-and-Evaluate Monitoring Can Significantly Enhance CoT Monitor Performance (Research Note)

Rauno Arike, RohanS and Shubhorup Biswas

8 Aug 2025 10:41 UTC

51 points

7 comments10 min readLW link

Aether July 2025 Update

RohanS, Rauno Arike and Shubhorup Biswas

1 Jul 2025 21:08 UTC

24 points

7 comments3 min readLW link

[Question] What faithfulness metrics should general claims about CoT faithfulness be based upon?

Rauno Arike8 Apr 2025 15:27 UTC

24 points

0 comments4 min readLW link

On Recent Results in LLM Latent Reasoning

Rauno Arike31 Mar 2025 11:06 UTC

35 points

6 comments13 min readLW link

The Best Lecture Series on Every Subject

Rauno Arike24 Mar 2025 20:03 UTC

13 points

1 comment2 min readLW link

Rauno’s Shortform

Rauno Arike15 Nov 2024 12:08 UTC

3 points

34 comments1 min readLW link

A Dialogue on Deceptive Alignment Risks

Rauno Arike25 Sep 2024 16:10 UTC

11 points

0 comments18 min readLW link

[Interim research report] Evaluating the Goal-Directedness of Language Models

Rauno Arike, Elizabeth Donoway and Marius Hobbhahn

18 Jul 2024 18:19 UTC

40 points

4 comments11 min readLW link

Early Experiments in Reward Model Interpretation Using Sparse Autoencoders

lukemarks, Amirali Abdullah, Rauno Arike, Fazl and nothoughtsheadempty

3 Oct 2023 7:45 UTC

18 points

0 comments5 min readLW link

Exploring the Lottery Ticket Hypothesis

Rauno Arike25 Apr 2023 20:06 UTC

58 points

3 comments11 min readLW link

[Question] Request for Alignment Research Project Recommendations

Rauno Arike3 Sep 2022 15:29 UTC

10 points

2 comments1 min readLW link

Countering arguments against working on AI safety

Rauno Arike20 Jul 2022 18:23 UTC

7 points

2 comments7 min readLW link

Clarifying the confusion around inner alignment

Rauno Arike13 May 2022 23:05 UTC

31 points

0 comments11 min readLW link