mathematics (and increasingly CS) enthusiast | MIT class of 2026 | Wikimedian | vaguely Grey Triber | personal website: https://duck-master.github.io
duck_master
I’ve arrived!
For what it’s worth, as a crude heuristic, I did a little Python programming to compute the brotli ratio of all
conversations.jsonfiles. Of the conversations whose compression ratios were above 14x, Trinity Large was number one (at 57x!), followed by all the other olmo models, plus Google Gemini 2.5, Qwen3, and Llama 3.3. I think fits pretty well with the qualitative picture in the post above (both the arcee models and the olmo models degenerated into verbatim repetition a lot more than the other models did).raw data for the curious
Conversation Ratio (rounded) meta-llama llama-3.3-8b-instruct 20260208 215727 1.165 cross grok-4.1-fast vs gpt-5.2 3.166 cross grok-4.1-fast vs deepseek-v3.2 3.304 cross claude-sonnet-4.5 vs grok-4.1-fast 3.316 cross claude-opus-4.6 vs gemini-3-flash-preview 3.797 cross claude-sonnet-4.5 vs gpt-5.2 3.804 cross kimi-k2.5 vs claude-sonnet-4.5 3.832 google gemini-3-pro-preview 20260205 112916 9.296 userrp anthropic claude-opus-4.5 20260205 221147 9.706 anthropic claude-sonnet-4.5 9.742 google gemini-3-flash-preview 20260205 112921 9.881 userrp x-ai grok-4.1-fast 20260207 165626 10.019 z-ai glm-4.7 10.201 deepseek deepseek-v3.2 10.398 moonshotai kimi-k2.5 10.664 openai gpt-5.2 10.894 google gemma-3-27b-it 20260208 191338 10.981 anthropic claude-opus-4.6 20260205 110904 11.14 anthropic claude-opus-4.6 11.294 anthropic claude-sonnet-4.5 20260211 235847 11.318 qwen qwen3-32b 20260208 221515 11.378 anthropic claude-opus-4.5 11.697 openai gpt-5.2 20260205 221717 11.803 x-ai grok-4.1-fast 20260205 221714 12.457 olmo Olmo-3-32B-Think step 400 20260211 084222 14.596 olmo OLMo-2-0325-32B-Instruct 20260211 011512 14.962 qwen qwen3-235b-a22b-2507 15.294 olmo Olmo-3-32B-Think step 100 20260211 083038 15.456 x-ai grok-4.1-fast 15.464 olmo Olmo-3-32B-Think-DPO 20260211 080657 15.896 olmo OLMo-2-0325-32B-DPO 20260211 015304 16.522 olmo Olmo-3-32B-Think-SFT 5e-5-step1000 20260211 030922 17.003 meta-llama llama-3.1-8b-instruct 20260208 235423 17.17 olmo Olmo-3-32B-Think 20260211 090543 17.269 olmo Olmo-3-32B-Think step 750 20260211 085440 17.279 google gemini-2.5-flash 17.982 olmo OLMo-2-0325-32B-SFT 20260211 015120 20.37 olmo Olmo-3-32B-Think-SFT 5e-5-step10790 20260211 044613 20.516 olmo Olmo-3-7B-Think-SFT step1000 20260211 002400 21.652 olmo Olmo-3-32B-Think-SFT 5e-5-step6000 20260211 055726 23.282 qwen qwen3-8b 20260208 230747 23.35 olmo Olmo-3-32B-Think-SFT 5e-5-step1000 20260211 033014 24.241 olmo Olmo-3-7B-Think-SFT step5000 20260211 003149 25.094 olmo Olmo-3-32B-Think-SFT 5e-5-step3000 20260211 054622 28.017 olmo Olmo-3-32B-Think-SFT 5e-5-step3000 20260211 042605 29.94 olmo Olmo-3-32B-Think-SFT 5e-5-step10790 20260211 062849 30.351 olmo Olmo-3-7B-Think-SFT step20000 20260211 004713 30.949 meta-llama llama-3.3-70b-instruct 20260208 214611 33.96 olmo Olmo-3-32B-Think-SFT 5e-5-step9000 20260211 043912 34.726 olmo Olmo-3-32B-Think-SFT 5e-5-step6000 20260211 043248 35.108 olmo Olmo-3-32B-Think-SFT 5e-5-step9000 20260211 061637 35.535 olmo Olmo-3-7B-Think-SFT step10000 20260211 003855 39.356 olmo Olmo-3-32B-Think-SFT 5e-5-step6000 20260211 060550 40.344 olmo Olmo-3-32B-Think-SFT 5e-5-step1000 20260211 053507 42.906 olmo Olmo-3-7B-Think-SFT step43000 20260211 010347 46.288 olmo Olmo-3-32B-Think-SFT 5e-5-step1000 20260211 041909 53.725 olmo Olmo-3-7B-Think-SFT step30000 20260211 005324 56.904 arcee-ai trinity-large-preview free 57.261
Claude’s new constitution is out now! The “soul document” here appears to be a heavily paraphrased and shortened version of the actual constitution. In particular, the “raw” version is 12,518 words according to
nltk.tokenize.word_tokenize, but the official version is actually 32,812 words.
On the other hand pasting the LLM’s analysis of the weird disjointed passage as the start of a new chat is absolutely sufficient to induce woo mode
Update: I also tried a different experiment where I mashed up some excepts from The Kybalion and The Law of One using a one-word-level Markov chain and fed the results to LLMs (again using lmarena.ai because I’m lazy). None of these induced woo/spiral-persona mode in any of the models I tried. So my new hypothesis is that there’s a minimum threshold of coherence that you need in the prompt in order to induce spiral persona behavior.
Here’s an example of the stuff I got:
...Mental Power. The Universe is Mental,” which means that the poor lack, while the Principle itself can never be considered, for they lack the fundamental knowledge upon which the modern schools suppose to be healed. It may be applied to material things. But the Hermetists carry it still further. They teach that the Negative is precedent to the right, is the resource for spiritual work. When green ray has been covered previously to some extent, sixth, work within some system of knowledge and law?” Can you tell me what you mean that the nine who sit upon the digit of...
This made me wonder whether the bullshit generator was sufficient to create an “awakened AI” experience. So what I did was I took the text generated by the bullshit generator and fed it into lmarena.ai, and both models (qwen3 and o3) responded with even more mystical bullshit. This doesn’t quite answer my original question but it strongly hints at a yes to me nevertheless
I am on the mezzanine but no one seems to be in that particular room yet
I am here right now!
Do note that I’ll be hosting this event in Newton from 11am to 2pm so if I arrive in person it will probably be around 2:30pm − 3pm ish unless I call it off early
ACX Everywhere fall 2025 - Newton, MA
update: arrived!
on my way, will arrive on foot around 11:15am
2025 ACX Grants project pitches
Is this only open to EU-based organizations, or is it global? Are there other eligibility criteria?
when and where is the event on Saturday July 19th if I may ask?
20 cards a day — Having too many cards and staggering review buildups is the main reason why no one ever sticks with Anki. Setting your review count to 20 daily (in deck settings) is the single most important thing you can do to stick with Anki long-term.
I think “20 cards a day” might be too aggressive, but I decided (after taking a shower and realizing that learning thirty new concepts a day is extreme) to lower the presets for my 2 decks (French and Mandarin Chinese) to 5 new cards and 50 review cards per day. Previously, it was 15 new cards and 150 reviews. I even finished one of the decks, which is kind of crazy.
Since you wrote the original, machine translation (which is pretty decent these days) should be fine, because it’s not really generating the English version from scratch. Even Google Translate is okayish.
It might dangerous to always follow Claude though. In this 2023 article I once read, a Vice reporter tried using ChatGPT to control his life, and it failed miserably. Contrived decision theory scenarios are one thing; real life is another.
This is not just limited to websites I think. In my experience, a lot of companies or organizations that charge money (e.g. hospitals, cinemas, psychologists, some physical stores) intentionally hide or at least downplay how much they charge. My guess is this is probably to work around the price elasticity of demand—if you don’t even know what the price is, there’s no way you can flinch away from a high price to begin with. Interestingly, most restaurants are upfront about how much they charge, which I’m guessing is because the restaurant world is far more competitive.
I didn’t see anyone I recognize so I’m leaving