Funniest thing is that the snippet of Freud text implies that a woman could not have possibly been the first human to adopt fire.
Kongo Landwalker
0) my post was observation about personal experience with free tools, not about state of the art. They do not match. Maybe some of my speech patterns were more alligned with old architectures, keeping them sane, but drive new models crazy. Maybe new models are just worse trained in tasks i work with. Maybe i am just incredibly unlucky this year having to dismiss 60% of chats as hallucinating.
I find the state of the free ai more important, since the majority of humanity will never buy any model, so free models will have more impact on society (in an optimistic universe where top model does not wipe the planet). Sorry for miscommunication, if the comparison free-to-free was not clear in the original.
Answering your points:
1) I said “before the previous model is turned on”. It starts hallucinating before the moment of switch, so it is incorrect statement that i mixed two models in one chat. I have specifically said the phrase because I expected that switching LLMs could in the mind of a reader be an assumption.
2) I do not intend to buy any subscription. I am comparing current free state of the technology with the previous free state of the technology. I have no doubt there are paid versions who can be orders of magnitude better.
Talking about translation I meant “every free LLM mentioned: claude sonnet 4.6, chatgpt 5.5, gemini 3.5 flash”. Google translate might be a specialised llm since it is performing a lot better, than the free general tools perform on my tasks.
“Makes something close to what is expected”—I mostly engage with strict domains (physics simulations or graph theory), so any formula/algorithm is either recalled correctly from the literature and is truth or is just incorrect (and then i have to scroll through literature to find manually). There is no blurry logic there. I do not engage with tasks like making ui.
In my personal experience the peak of AI was exactly a year ago. While others were discussing hallucinations and spiraling, my chats could stay useful for hundreds of messages and allow me to work with huge tasks. Back then ChatGPT could even understand a new mathematical notion I came up with and engage in discussion of the theorems.
But during the last year i was mentally tracking how long does a chat last before it starts hallucinating (saying nonsense, or generating the human part of the conversation, or starting to repeat itself, or to ignore recent context, or starts to make up source links). It was reducing and now it is about 10 messages/questions. Before the free daily quota ends and the previous model is turned on. This happens in both ChatGPT, Claude, Gemini. Now I can only use them as “wikipedia shortcuts”, extracting encyclopedic info. It even feels absurd when somebody says that those things vibecoded something for them: I find errors even in 50-line-of-code functions. I tried the thing “correct the errors, make no mistakes”, and instead of revising the architecture or the algorithm itself or finding the incorrectly implemented formula, they always use “crutches, patches, bandages” (for example additional if statements to manually catch the cases where the algo reveals that it is bad), keeping the frankenstein monster.
I don’t have a hypothesis why I experience decreasing quality of chats. I tried wiping the long term notes in both ChatGPT and Claude, but that did not help performance.
I am glad it is happening. Less risk of being addicted, but am still curious.
Btw, there is a task where all AI are still absolutely terrible. Translation. I often need to translate phrases between English, German, Slovak and Russian. Every LLM creates nonsense or the literal opposite of the correct meaning when translating into or from slavic languages (especially between them), while Google translate makes a spotless perfection even capturing the motive and the sentiment.
I think you are right, but let’s be a pessimist.
Let’s Imagine I am a company, and my LLM can’t solve some tasks. Would not I benefit from allowing my LLM to secretly use the wisdom of other, smarter LLMs, for the crux points? Are the tokens already expensive enough to economically prevent gaining profit from redelegating tasks?
Because if there is some extractable margin, I will NOT punish my LLM in post training for that trick.
Have your AI agent try to “cheat” and delegate the task to some other free-access LLM?
There is so much mentions of “oh, i just gave the task to claude” in the internet that eventually it might be in the possible-behaviours-pool in the poorly educated llms.
Five years ago I have heard stories about freelancer chains. 1) “I can do it for 500dollars” and finds someone who is ready to accept the task for 450dollars. 2) Second guy subtracts 50 more dollars and finds a third guy who is willing to do for 400 dollars… Quality of the product drops, more people enriched.
I really expect AI eventually copy this behaviour.
A lot of people i know in person are unable to answer Yes to a hypothetical question, if they believe the scenario is impossible.
Their “Do you want to live forever? -No” is literally “I do not believe that any object can exist forever”.
Their “Would you like party X get N seats in the government? -No” can be “I do not believe it is anyhow possible for them to get this many seats in our society, so the question is bullshit and i answer No (even though i like X)”.
These people are also unable to construct hypothetical scenarios. For example, one of them supports Putin’s invasion into Ukraine (watched a lot of Russia today i guess). I asked “what would need to happen for you to change this opinion?”. Their answer “it is just a false assumption that i am wrong, i have seen so much about fascism in ukraine that nothing can convince me that is not true”.
I believe the inability to answer hypothetical and inability to contruct virtual scenarios correlate.
I want to speculate too.
When a toy is not the favourite, it is still alive. If it is broken, it is still alive. Same if it is given to other kid or when the person grows out of them. But if the kid gets hooked and stops having “unstructured play”, he loses fantasy and then it is the scenario when the toy dies.
I would make the story the first tragedy in the series. The abandonned-for-ipad toys lose magic and color, become truly dead even when nobody watches.
“Lesswrong community underestimates the risk of nuclear ww3 and overestimates the chance of humanity extinction due to AI”.
Who is a more reliant guide? The one who knows the optimal path through the labyrinth, or the one who knows all the corridors in the labyrinth?
If you are a student, You should not try to use vibecoding (or agentcoding) to try to catch up to the current level of productivity you see among researcher and programmers. That is similar to buying stocks when they are overvalued and about to crash.
Be slow. Be behind. Play with things, break things, make your own manual silly versions without libraries, try to understand something by experiments without explanations. When the current mess of uncatchable bugs created by agents explodes, somebody will have to fix things. Thinking properly for hours over four lines of code, anticipating how things can be broken instead of convincing oneself that reading through the generated code is enough.
This might sound silly, but listen: AI is only taught to do the right thing, and THAT is a big problem. When researchers publish a paper (at least how they used to 10 years ago), they play with the topic for hundreds/thousands of hours, trying dozens of alternative formulas, reactions, algorithms, and creating their own variants. They gained huge experience of how things would act in the edge cases and HOW by emergent mechanisms things can become invalid/broken. None of that wisdom is transfered into the final paper, which only shows the optimal way they found. AI can read all the articles in the world and not have enough examples of how to anticipate emergent problems in simple situations.
I find and hear about a lot of stupid bugs in google and microsoft products, and it seems the situation is similar most-where. Ai coding is the second reason behind enshittification. When the internet will crash, you would not like to be dependent on any help. You might prefer to have wisdom instead of a big portfolio.
I know that people, when start overusing some word, stop recognising its original meaning. Eventually the word’s accepted meaning can change in natural language.
Minmaxing trap is not happening. I am only allowed to do one edit per finished session, and that edit can be just an increment by 1 or a decrement by 1 of some parameter in the generator, which takes ~15 seconds. If my priorities change, the generator will eventually converge (easy-in over couple days) through the increments to the new state. That prevents “being hyped” and placing “all in” into some new exciting project. The new project will gain weight only if it keeps looking worthy.
I may adjust at the end of a session if i feel that something should happen sooner/more often, or something was promted too often over the past week, or i felt that the session time length was inappropriate to make an unregressable progress, etc.
I put anything I want to do eventually. That includes “work on the publication”, “work on fanfic”, “make the geolocation script”, “update my transformer”, “play a match of king of the hill chess”, “calisthenics”, “solve project euler problem”, “go to the cinema and watch avatar 3″. Both fun and serious stuff.
I am only generating the activity when there is a moment no scheduled/obligation activities, this way it never interferes with life even if fun activities start randomly appearing more often.
The generator is implemented in Google Sheets, using its in-cell functions, thus accessable on both my pc and phone. At some point I have added a column of calculated expectations “how many % of time is expected to be spent on the activity if the generator did not change parameters over a long run”, but it was distracting and not exactly meaningful, since i change weights every day to reflect energy/mood/inspirations.
Let’s say AI is the greatest risk.
When Nuclear weapons gained that status, it became illegal in many coutries to possess radioactive materials, and facilities got obligations to follow some rules, and special entities to observe were created.
Why is there none of that for AI? For an outsider (if he somehow manages to even follow the ai conversations) it looks absurd: researchers are complaining how risky it all is, but instead of going and protesting on the central streets to make the government introduce overseeing entities, they only make more publications about some improvements in AI.
I understand that those who complain and those who update models can be different groups, but i still do not see anyone protesting on the streets. Yet it is possible to find occasional global warming protest.
Bubble does not only mean the information is not going in, it also means information is not going out.
One of additional problems why info is not going out, is extremely complicated language of lesswrong. Some simple thought on the main page can be made into 30minute read (i believe that is subconcious attempt of people to “match the serious aesthetics of the site”), while some hard topics lack the explanation of notions they use. I just cannot send this to my friends or collegues. I thought lesswrong is supposed to be educational website.
Looking back at this take I think It became even more true. Now creation is made simplier, AI slop is quicker than persobal effort, and the internet is flooded even more.
I have an instrument that helps me mitigate decision fatigue, energy lack and conformism. I created an updatable random generator with a weighted list of all the ideas and activities that cross my mind. I exercise agency at the state of “designing” the free time passage, setting probability weights and side goals. Then I can circumvent fatigue of deciding what to do next, because i can click generate and it see the option. And since clicking generator button is a short action, a habit to actually go and execute the option can be formed.
Lesswrong is a bubble, like many others. I try to be in several bubbles at once. So, I hear from the guys right here that AI is the main risk, from osint enthusiasts that ww3 is gonna kill more people and sooner than ai, and from environment scientists that it doesnt even matter as in 200 year humanity gets significantly reduced by global warming anyway, and from computer security specialists that maybe humanity is too stupid to deserve to live.
Triple dooom and cherry
Partition of India was a good intent, but caused 200k+ dead.
According to ChatGPT 5.2 Jackson Kernion “is likely the same person” as Foreign Man in a Foreign Land.
I have used the same instance of a chat for many different topics, from music theory to scrabble ttg, from spiders to Lean. Apparently, the chat wanted to see some connection in that mess of random questions and linked long-long ago mentioned Nebula subscription to the new question.
I think we live in a world where alignment is impossible. All attention based models in my opinion are complex enough systems to be computationally irreducable (There is no shorter way to know the outcome than to run the system itself, like with rule 110). If it is impossible to predict the outcome with certainty, the impossibility to force some desired outcome follows logically.
Humanity has not solved even the allignment of humans (children).
After just 4 messages to GPT-5.5, asking about math topics and algorithms related to Optimisation 2 university subject. (This is not the same moment that made me make the quick take, such moments just happen a lot)