Yeah. Sorry for the confusion. In my LinkedIn and X post I mentioned that the twelve hundred seventy three includes all of the sessions which were rate limited and that’s also in the appendix. I only added that number in the title so that it would be flashier. In reality there are only 14 sessions where Claude was not rate limited.
kiankyars
building sqlite with a small swarm
Building a Regex Engine with a team of parallel Claudes
This is very compelling and I’m impressed by your curiosity and drive to discover these new functionalities. I would like to do the same thing myself moving forward.
Sorry to hear about that and sounds good
I want to give back, I have benefitted a lot from the community.
Those are very great and concise answers, so thank you for that. I ended up doing the ablations and here the arxiv preprint has been submitted, waiting on acceptance.
Here is proof of my credits, of course it is easy to inspect-element the balance, but I am not that type of person:
Kredit Grant
More or less, I moved my habit tracking to another spreadsheet which is just check boxes and is faster to fill in, so that I wouldn’t have to do the extra reflection since I couldn’t guarantee to put the effort every day to do it; therefore, I have continued; albeit on a different sheet.
These tokens are either very common, or appear especially in reasoning tasks, in particular those with code. This might mean that coding reinforcement learning was the last step in the training process, and that all other tokens got slightly weight decayed. It could also mean that in general, reasoning tokens are treated as so important by gradient descent that their updates are extra large.
The above text is quite compelling and I am currently doing ablations on reasoning and in particular I want to prevent the model from using these reasoning words and see how the reasoning degrades, so I will definitely be citing your work when I publish my results.Do you have any intuition on what “ocode” means?
Furthermore, it is unclear to me from which GPT OSS model you take those English L2 norm embeddings from. And lastly, can you please elaborate why having the tokenizer means we can use the GPT OSS embeddings to study the token list without having to look at each token’s text content.
I’m on day two currently and I only have 33 bugs so not sure if I will be able to sustain the entire challenge with that many and I might have to do step one again but it felt in the really nice to go through the Google Sheet and make the rows which I completed green today.
Thank you, I will be tracking my progress with google sheets.
I would like to do a multi-week trial with the microhabits mentioned in this article and then report here as to the effects I perceive.
Highly underrated post!
You scale dimension 447 (the largest), because you hypothesize that it is correlated with the bos token since it has the largest activation?
thank you, I will update the description now