Year 4 Computer Science student
find me anywhere in linktr.ee/papetoast
Year 4 Computer Science student
find me anywhere in linktr.ee/papetoast
Ars Technica: Elon Musk’s 7 biggest stumbles on the stand at OpenAI trial #linkpost
Most notable,
OpenAI’s lawyer managed to get him to make several concessions over his own lawyer’s objections.
He also lost a fight to keep xAI’s safety record off the table, calling his reputation as a supposed AI savior defending OpenAI’s mission into question.
He repeatedly appeared dishonest, as OpenAI’s lawyer showed documents contradicting his testimony.
He appeared disingenuous when confronted with calling OpenAI’s safety team “jackasses.”
He appeared disingenuous again when admitting that he didn’t know what “safety cards” are, even though his own AI firm issues them.
Perhaps most embarrassing, he testified that he never loses his temper before raising his voice at OpenAI’s lawyer.
Finally, his lawyers failed to keep his ties to Donald Trump off the record, with the judge agreeing to hear discussions that might further discredit Musk’s testimony.
Warning: ChatGPT (5.5 Thinking) thinks this is “more dramatic and prosecutorial than the body strictly supports.”
Update: I feel absolutely 0 subjective difference, which is probably expected? I couldn’t bother doing a proper self-experiment.
My process is 5g glycine in the morning (almost always) and 5g ~1h before sleep (sometimes, I forget ~50% of the time)
Estimates of total parameter size of frontier models using 1700 obscure factual questions and ~exponential regression on 89 open parameter models (paper’s actual title is mostly clickbait) (via AI Dance on Twitter)
Trimmed, full version under section 6.3. I don’t have enough experience to have intuition about how accurate these estimates are.
Model | Accuracy | Est. Size | 90% PI |
|---|---|---|---|
GPT-5.5 | 71.9% | ∼9.7T | [3.2–28.7T] |
Claude Opus 4.6 | 68.0% | ∼5.3T | [1.8–15.6T] |
Claude Opus 4.7 | 66.4% | ∼4.0T | [1.4–12.0T] |
GPT-5.4 Pro | 62.5% | ∼2.2T | [736B–6.5T] |
Claude Sonnet 4.6 | 60.9% | ∼1.7T | [579B–5.1T] |
Claude Haiku 4.5 | 39.9% | ∼65B | [22B–194B] |
The r/ChangeMyView result isn’t directly available in the reddit post anymore (the google drive link is dead, in “The researchers provided us a link to the first draft of the results.”). Internet Archive only has the first page, google shows a PDF that is likely the same paper.
Somewhat Related: The Fight For Slow And Boring Research
Quoting myself, this article is mainly about US’s federal funding cut for science may cause universities to create more legible research because other funders value clear communication (such that I don’t actually recommend reading the whole thing is helpful for this conversation topic). But there are some relevance:
The choice is not, and has never been, between purity and PR. It is between treating legibility as part of the shared infrastructure that keeps basic and long-term research alive, and leaving it to whoever already has the resources to define how the rest of the system looks from the outside.
GPT 5.2 is dropping before its knowledge cutoff.
The decays are probably because there are less training data about recent deaths, and that the pre-training may have started before the knowledge cutoff.
Older models having better rote memorization on slightly obscure facts isn’t that surprising imo. It is not something that has a lot of optimization pressure.
Having multiple variables mixed don’t seem like a big issue for detecting ancestry. False positives will still be highly unlikely—different pretrains will probably have different “forgetting curves”.
Got something very similar
I use ManicTime (paid) to track my computer usage and I additionally keep a 7 days log of screenshots at 1 minute interval, which makes it very easy to see what I was doing a couple days ago.
Screenshot

afro88 on Hacker News predicting what will the job of a future Junior SWEs be like
This has happened in other industries before. Drafting for example when CAD arrived. Entry level wasn’t “can draw, willing to learn” anymore, but demanded high domain understanding. So the pathway became compressed learning through study, and field exposure.
Study of senior drafter “red lines”: what and why they changed the initial drawing, RFI response etc. Reverse engineering good work. Failed design studies etc.
SWE equivalents: PRs, code review, studying high quality codebases (guess what: LLMs are amazing at helping here), pair programming (learning why what the LLM did was wrong, how to improve it, etc), customer support, debugging prod incidents, studying post mortems etc
We don’t hire juniors and throw them boilerplate and tiny bugs while expecting them to learn along the way ad hoc through some pair programming and the occasional deep end. We give them specific tasks and studies that develop their domain understanding and taste, actively support and mentor them, and expect them to drive some LLMs on the side to solve simple issues that still need human eyes on it.
Very cool graph. Is the script you used publicly available?
Would be nice if you can provide a table of before/after here. I want to update myself on the results but I won’t bother to try and navigate those pages.
Pretty sure this idea originated from @Daniel Paleka here! Giving him some credits with this comment.
It can be customizable, or reduced to n=1, or my intention is that you actually show the same comment twice in total.
Agree. Hiding metrics by default will be too annoying. I would be happy to opt-in to something like “At the top of the comment section, randomly show me n=3 comments with hidden stats” though. Hiding karma/agreement for the first couple hours is also good.
I had a similar thought for math proofs, where you should be able to see an overall proof sketch first, then expand on any point to see the intuition/longer proof in a hierarchical way.
Related:
https://vlad.roam.garden/How-do-I-read-things-on-the-internet (LW crosspost). This is bullets points style, default expanded. I don’t like the indentations though, I think it is a bad idea.
Feedback:
IMO there should be a variant of text expansion where the original text gets replaced by the longer version.
The “Plus” icon is an improvement over having a new paragraph / indentation, but still too distracting, especially the purple color.
Keyboard navigation is a bit clunky.
Pressing Enter should move focus somewhere sensible, rather than eating the focus. Probably to the ending parenthesis for expansion, and to the original + icon for collapsion.
Maybe just me, I expect ArrowRight on the Plus icon to expand the text too.
Let me use ctrl+z ctrl+y to undo / redo
Would be nice to have a “default expand n layers deep” button.
Please show me an example of ((text)) and :::dig ::: tags in the demo.
Double parentheses syntax
I like it
You are not allowing double parentheses to be escaped, try \((a))
Your parser is wrong, try ((a((b))c))
btw the selection of text in your demo is fucked (Firefox; Windows if that matters), it is selecting text one line lower than where my cursor is.
I am slightly annoyed that the Plus icon is not centered properly. It is a little bit too low.
The way your examples are structured with one sentence expansions seems bad. I don’t want to spend a few seconds clicking the button just to read one more sentence
It makes me want to have some sort of preview (full preview or how many levels / characters hidden)
Marginally related on the “dig deeper” part:
These pages uses side panels for Obsidian-style page linking. (via)
Footnotes on sidebars are good for most simple cases but insufficient for more elaborate structures, but you can in fact put a footnote in a footnote to get more than one indentation level.
CQ2 is now at https://cq2.vercel.app/ per the GitHub
*for humans
I still hate exercise but I just keep doing them.
I made myself slowly accept doing runs and strength training over 5+ years. I still hate exercise with passion and fail to do them sometimes, but I do it for the long-term health benefits and the short-term mood benefits. I do notice being slightly happier for 1-2 days after exercising, and sleeping slightly better. I optimize for exercises being as short as possible without being absolutely dreadful (i.e. no HIIT or supersets unless I feel really really good that day).
A B300 server (=8 B300s) is only 2x the price in China vs in the US (1M vs 0.55M) according to Reuters. This is a surprisingly low premium.