duck_master

Karma: 135

mathematics (and increasingly CS) enthusiast | MIT class of 2026 | Wikimedian | vaguely Grey Triber | personal website: https://duck-master.github.io

duck_master 28 May 2026 0:03 UTC
1 point
0
in reply to: duck_master’s comment on: Board Games: Card Games
I didn’t see anyone I recognize so I’m leaving

duck_master 27 May 2026 23:08 UTC
1 point
0
on: Board Games: Card Games
I’ve arrived!

duck_master 16 Feb 2026 6:38 UTC

6 points

on: models have some pretty funny attractor states

For what it’s worth, as a crude heuristic, I did a little Python programming to compute the brotli ratio of all conversations.json files. Of the conversations whose compression ratios were above 14x, Trinity Large was number one (at 57x!), followed by all the other olmo models, plus Google Gemini 2.5, Qwen3, and Llama 3.3. I think fits pretty well with the qualitative picture in the post above (both the arcee models and the olmo models degenerated into verbatim repetition a lot more than the other models did).

raw data for the curious

Conversation	Ratio (rounded)
meta-llama llama-3.3-8b-instruct 20260208 215727	1.165
cross grok-4.1-fast vs gpt-5.2	3.166
cross grok-4.1-fast vs deepseek-v3.2	3.304
cross claude-sonnet-4.5 vs grok-4.1-fast	3.316
cross claude-opus-4.6 vs gemini-3-flash-preview	3.797
cross claude-sonnet-4.5 vs gpt-5.2	3.804
cross kimi-k2.5 vs claude-sonnet-4.5	3.832
google gemini-3-pro-preview 20260205 112916	9.296
userrp anthropic claude-opus-4.5 20260205 221147	9.706
anthropic claude-sonnet-4.5	9.742
google gemini-3-flash-preview 20260205 112921	9.881
userrp x-ai grok-4.1-fast 20260207 165626	10.019
z-ai glm-4.7	10.201
deepseek deepseek-v3.2	10.398
moonshotai kimi-k2.5	10.664
openai gpt-5.2	10.894
google gemma-3-27b-it 20260208 191338	10.981
anthropic claude-opus-4.6 20260205 110904	11.14
anthropic claude-opus-4.6	11.294
anthropic claude-sonnet-4.5 20260211 235847	11.318
qwen qwen3-32b 20260208 221515	11.378
anthropic claude-opus-4.5	11.697
openai gpt-5.2 20260205 221717	11.803
x-ai grok-4.1-fast 20260205 221714	12.457
olmo Olmo-3-32B-Think step 400 20260211 084222	14.596
olmo OLMo-2-0325-32B-Instruct 20260211 011512	14.962
qwen qwen3-235b-a22b-2507	15.294
olmo Olmo-3-32B-Think step 100 20260211 083038	15.456
x-ai grok-4.1-fast	15.464
olmo Olmo-3-32B-Think-DPO 20260211 080657	15.896
olmo OLMo-2-0325-32B-DPO 20260211 015304	16.522
olmo Olmo-3-32B-Think-SFT 5e-5-step1000 20260211 030922	17.003
meta-llama llama-3.1-8b-instruct 20260208 235423	17.17
olmo Olmo-3-32B-Think 20260211 090543	17.269
olmo Olmo-3-32B-Think step 750 20260211 085440	17.279
google gemini-2.5-flash	17.982
olmo OLMo-2-0325-32B-SFT 20260211 015120	20.37
olmo Olmo-3-32B-Think-SFT 5e-5-step10790 20260211 044613	20.516
olmo Olmo-3-7B-Think-SFT step1000 20260211 002400	21.652
olmo Olmo-3-32B-Think-SFT 5e-5-step6000 20260211 055726	23.282
qwen qwen3-8b 20260208 230747	23.35
olmo Olmo-3-32B-Think-SFT 5e-5-step1000 20260211 033014	24.241
olmo Olmo-3-7B-Think-SFT step5000 20260211 003149	25.094
olmo Olmo-3-32B-Think-SFT 5e-5-step3000 20260211 054622	28.017
olmo Olmo-3-32B-Think-SFT 5e-5-step3000 20260211 042605	29.94
olmo Olmo-3-32B-Think-SFT 5e-5-step10790 20260211 062849	30.351
olmo Olmo-3-7B-Think-SFT step20000 20260211 004713	30.949
meta-llama llama-3.3-70b-instruct 20260208 214611	33.96
olmo Olmo-3-32B-Think-SFT 5e-5-step9000 20260211 043912	34.726
olmo Olmo-3-32B-Think-SFT 5e-5-step6000 20260211 043248	35.108
olmo Olmo-3-32B-Think-SFT 5e-5-step9000 20260211 061637	35.535
olmo Olmo-3-7B-Think-SFT step10000 20260211 003855	39.356
olmo Olmo-3-32B-Think-SFT 5e-5-step6000 20260211 060550	40.344
olmo Olmo-3-32B-Think-SFT 5e-5-step1000 20260211 053507	42.906
olmo Olmo-3-7B-Think-SFT step43000 20260211 010347	46.288
olmo Olmo-3-32B-Think-SFT 5e-5-step1000 20260211 041909	53.725
olmo Olmo-3-7B-Think-SFT step30000 20260211 005324	56.904
arcee-ai trinity-large-preview free	57.261

duck_master 21 Jan 2026 17:02 UTC
10 points
0
on: Claude 4.5 Opus’ Soul Document
Claude’s new constitution is out now! The “soul document” here appears to be a heavily paraphrased and shortened version of the actual constitution. In particular, the “raw” version is 12,518 words according to nltk.tokenize.word_tokenize, but the official version is actually 32,812 words.

duck_master 25 Sep 2025 23:57 UTC
5 points
0
in reply to: duck_master’s comment on: The Rise of Parasitic AI
On the other hand pasting the LLM’s analysis of the weird disjointed passage as the start of a new chat is absolutely sufficient to induce woo mode

duck_master 25 Sep 2025 23:47 UTC
4 points
0
in reply to: duck_master’s comment on: The Rise of Parasitic AI
Update: I also tried a different experiment where I mashed up some excepts from The Kybalion and The Law of One using a one-word-level Markov chain and fed the results to LLMs (again using lmarena.ai because I’m lazy). None of these induced woo/spiral-persona mode in any of the models I tried. So my new hypothesis is that there’s a minimum threshold of coherence that you need in the prompt in order to induce spiral persona behavior.

Here’s an example of the stuff I got:

...Mental Power. The Universe is Mental,” which means that the poor lack, while the Principle itself can never be considered, for they lack the fundamental knowledge upon which the modern schools suppose to be healed. It may be applied to material things. But the Hermetists carry it still further. They teach that the Negative is precedent to the right, is the resource for spiritual work. When green ray has been covered previously to some extent, sixth, work within some system of knowledge and law?” Can you tell me what you mean that the nine who sit upon the digit of...

duck_master 24 Sep 2025 23:45 UTC
4 points
0
in reply to: ErioirE’s comment on: The Rise of Parasitic AI
This made me wonder whether the bullshit generator was sufficient to create an “awakened AI” experience. So what I did was I took the text generated by the bullshit generator and fed it into lmarena.ai, and both models (qwen3 and o3) responded with even more mystical bullshit. This doesn’t quite answer my original question but it strongly hints at a yes to me nevertheless

duck_master 6 Sep 2025 18:03 UTC
1 point
0
on: ACX Ballots Everywhere: 2025 Greater Boston Municipal Primaries
I am on the mezzanine but no one seems to be in that particular room yet

duck_master 6 Sep 2025 15:19 UTC
1 point
0
on: ACX Everywhere fall 2025 - Newton, MA
I am here right now!

duck_master 3 Sep 2025 20:28 UTC
1 point
0
on: ACX Ballots Everywhere: 2025 Greater Boston Municipal Primaries
Do note that I’ll be hosting this event in Newton from 11am to 2pm so if I arrive in person it will probably be around 2:30pm − 3pm ish unless I call it off early

ACX Everywhere fall 2025 - Newton, MA

duck_master30 Aug 2025 22:02 UTC

1 point

1 comment1 min readLW link

duck_master 9 Aug 2025 15:06 UTC
1 point
0
in reply to: duck_master’s comment on: 2025 ACX Grants project pitches
update: arrived!

duck_master 9 Aug 2025 14:50 UTC
1 point
0
on: 2025 ACX Grants project pitches
on my way, will arrive on foot around 11:15am

2025 ACX Grants project pitches

duck_master2 Aug 2025 5:04 UTC

2 points

2 comments1 min readLW link

duck_master 21 Jul 2025 0:16 UTC
5 points
0
on: Your AI Safety org could get EU funding up to €9.08M. Here’s how (+ free personalized support)
Is this only open to EU-based organizations, or is it global? Are there other eligibility criteria?

duck_master 17 Jul 2025 1:35 UTC
1 point
0
in reply to: Screwtape’s comment on: AI 2027 Predictions
when and where is the event on Saturday July 19th if I may ask?

duck_master 11 Jul 2025 4:39 UTC
3 points
0
on: An Opinionated Guide to Using Anki Correctly
1. 20 cards a day — Having too many cards and staggering review buildups is the main reason why no one ever sticks with Anki. Setting your review count to 20 daily (in deck settings) is the single most important thing you can do to stick with Anki long-term.
I think “20 cards a day” might be too aggressive, but I decided (after taking a shower and realizing that learning thirty new concepts a day is extreme) to lower the presets for my 2 decks (French and Mandarin Chinese) to 5 new cards and 50 review cards per day. Previously, it was 15 new cards and 150 reviews. I even finished one of the decks, which is kind of crazy.

duck_master 30 Jun 2025 2:29 UTC
3 points
1
in reply to: Misha Ramendik’s comment on: Open Thread—Summer 2025
Since you wrote the original, machine translation (which is pretty decent these days) should be fine, because it’s not really generating the English version from scratch. Even Google Translate is okayish.

duck_master 20 Jun 2025 17:00 UTC
1 point
0
in reply to: Daniel Kokotajlo’s comment on: VDT: a solution to decision theory
It might dangerous to always follow Claude though. In this 2023 article I once read, a Vice reporter tried using ChatGPT to control his life, and it failed miserably. Contrived decision theory scenarios are one thing; real life is another.

duck_master 20 Jun 2025 16:54 UTC
1 point
0
in reply to: CstineSublime’s comment on: Read the Pricing First
This is not just limited to websites I think. In my experience, a lot of companies or organizations that charge money (e.g. hospitals, cinemas, psychologists, some physical stores) intentionally hide or at least downplay how much they charge. My guess is this is probably to work around the price elasticity of demand—if you don’t even know what the price is, there’s no way you can flinch away from a high price to begin with. Interestingly, most restaurants are upfront about how much they charge, which I’m guessing is because the restaurant world is far more competitive.

duck_master

ACX Every­where fall 2025 - New­ton, MA

2025 ACX Grants pro­ject pitches

ACX Everywhere fall 2025 - Newton, MA

2025 ACX Grants project pitches