Brendan Long
Isn’t the reason we do attention over 1D vectors that that’s the shape of the data we have? Do you plan to somehow get tree-shaped inputs, or is this only about the internals and the tokens will stay vector-shaped?
They no longer print your boarding pass if it’s within 60 minutes of an international flight
Does this matter anymore with boarding passes on phones?
TSA Pre used to be incredibly worth it, even with the (one time) in person interview, but I feel like the normal lines have got a lot better in the last few years so I’m not sure if it’s as valuable anymore. Global Entry is pretty nice if you fly through airports where it matters though.
I wouldn’t expect this to provide nearly as much benefit as mentors, but I think having the resources to run experiments would still be valuable for learning. I don’t think the output is even that important besides the fact that people should run experiments with the goal of learning something.
I also didn’t mean to imply that the mentors would actually read the results (that would require the scarce resource that we already don’t have enough of).
Last time I checked, Claude Code doesn’t see the date or time at all unless it looks it up with CLI tools. I think Claude on the web sees it at the beginning of the conversation and has an MCP tool to look it up mid-conversation.
I find it kind of strange that they don’t inject this on every turn, but maybe it’s not worth the tokens in most conversations.
Since slots in AI safety programs like MATs and the Anthropic Fellows Program seem to be limited by available mentors and not money, they should add a consolation prize where anyone who meets the bar but isn’t selected still gets the thousands of dollars of GPU rental or API credits if they promise to write something about what they did with it.
I wonder if we could improve role detection by injecting it as a specific input feature instead of making the AI infer it. So you have a vector of embeddings concatenated with a vector of [user, assistant, tool, etc.]. This would do the opposite of what you want for dynamically adding roles, but should make role detection much more reliable. Conveniently, detecting XML tags in prompts is pretty easy so this should be possible to fully automate, and I suspect it would work fine to bolt this on after pretraining, at the same time you’re training it to use roles.
Edit: Looks like Wu et al. tried this in 2024 and it somewhat helps:
Our experiments on the Structured Query and Instruction Hierarchy benchmarks demonstrate an average robust accuracy increase of up to 15.75% and 18.68%, respectively. Furthermore, we observe an improvement in instruction-following capability of up to 4.1% evaluated on AlpacaEval.
There might be an interesting post in here but by just dumping Gemini’s response to your prompt, you’ve buried all of the interesting info:
What is your actual proposal? Gemini seems to understand what you mean but I don’t.
Why is this a good idea?
The post is just Gemini’s response to it’s assumptions of what you mean, but I don’t know what the assumptions that are core to this post are.
If Eliezer every writes a memoir, it should be structured as a time loop novel.
Loop 1: e/acc Eliezer races to defeat death by forming a coalition to build AI as fast as possible. AI kills everyone. Somehow (mumble mumble acausal trade simulation mumble) he finds himself in back at the start with another chance.
Loop 2: Eliezer realizes he needs to solve alignment first, spends a loop working on this, then someone else builds AI and everyone dies.
Loop 3: Eliezer loses hope, decides to just write fanfics. Accidentally realizes that if you structure a textbook as fanfic people will actually read it. Eventually everyone dies.
Loop 4: Our timeline, Eliezer realizes something about the time loop is destabilizing the timeline. Russia is aggressively starting fights with Ukraine and the EU, risking nuclear war, China is threatening its neighbors, etc. Realizing this could be the final loop before things truly go crazy, he goes all out… Readers, vote for your ending: (1) Convince governments to ban AI, (2) Convince AI companies not to build AI, (3) Make AI solve the alignment problem, (4) YOLO, maybe it’ll just work out this time.
Side plot: Bringing famous social network influencer Elon Musk into the time loop so he can draw attention to the problem, which unfortunately backfires.
I use a custom UI with the Agent SDK but it gave me the same generic refusal error. I don’t think it ever tells you why it was refused.
Is something wrong with linkposts? I don’t see any content in this post, and there’s no link. I feel like I saw the saw thing a few days ago.
In my experience, if I let Claude do what it wants without feedback, it either gives up too easily (coming up with a gate that doesn’t matter, finding it fails and giving up) or finds something but doesn’t understand that it’s not useful (either because it’s fundamentally not useful or because it accidentally simplified in a way that invalidates the experiment). The big gap seems to be having a deep understanding of why we care about a particular experiment, and it needs frequent hand-holding to clarify that the thing it wants to try won’t prove the thing we’re trying to prove.
It’s possible Fable/Mythos is better at this though. I only briefly had access to it, and it hit the AI research guard rails so often that I stopped using it for this kind of thing.
A PE teacher once told me that your muscles start atrophying after only a week of not working out, and it’s impossible to gain muscle if you don’t work out every week. I’m not sure why it took me so long to question this, but my results from a somewhat-consistent but definitely-not-every-week workout plan made it really obvious that this is not true. Claude thinks that as long as you’re not literally in a coma it’s more like 3 weeks (with variation for age/protein/etc.).
This actually makes me more motivated, since “make sure to exercise every single muscle every single week” is not really an achievable target with the level of effort I want to put in, but “make sure to hit at least one muscle group once a week” is pretty easy.
(Obviously the target should be higher than this if I want to see gains in any reasonable amount of time, but it’s nice to know that if I don’t hit my actual target there’s a lot more slack than I realized)
I don’t think it matters if the US government is AGI-pilled. It’s clearly a militarily-relevant technology and the DoW seems to think it’s important, so I don’t think they’ll let this capability go away rather than just giving it to another military contractor like OpenAI.
It seems plausible that Anthropic could shut itself down and win the legal battle due to how they’re structured, but I doubt Google could. Even if they won the legal battle, I’d expect the US government to demand the weights, GPUs, and all research for national security reasons, so all they’d really achieve is giving everything to whoever is best politically connected (either an internal government program or X).
Most of the value would be from the researchers themselves not pushing capabilities, and it might still be meaningful. My read of the situation is that surprisingly few people really “get it”, so Anthropic’s researchers refusing to work on capabilities would be a meaningful slowdown by itself.
For what it’s worth, Claude will (unprompted) mention why a technique might be interesting but doesn’t solve the hard problems of alignment, but this could depend on the conversation. This only work if it knows that a technique won’t work, and it’s frequently too optimistic about new techniques though.
It seems like if we comparing different models (the largest frontier models vs whatever Google Search uses) then this is trivially true, since the dumbest mistake any LLM can make will never improve. It would be more interesting to compare the best and worst in a single model.
Antihistamines have annoying long-term side effects. I’ve tried a bunch of things to help with a stuffy nose, and breathing strips (the tape things that physically hold your nose open) work about as well as antihistamines without drugs.
Also I’m surprised to learn that people can mouth breath with a MAD. I’d drool everywhere if my mouth was even slightly open.
I think it’s concerning that we’re giving AI easy remote access to so many machines, but I’m pretty sure Anthropic is just doing this because of customer demand. I wrote my own web UI for remote Claude Code months before they made it since being able to orchestrate from your phone is a superpower.
Anthropic changed their minds and will be making it visible when Fable’s AI research safeguards trigger.
I finally got around to trying this, but my environment is pretty weird. I already run Claude Code as a separate unprivileged user, but it’s intentionally not sandboxed between agents because I want them to share caches and be able to use shared folders like a wiki. The firewall is also not useful for me since I need agents to be able to run remote experiments via RunPod and to be able to access a local GPU. At some point I’ll probably run all GPU access through SkyPilot, so the firewall would technically work, but agents would still be able to run anything they want from RunPod VMs so it wouldn’t actually block egress for real.
I’m considering the input sanitizer, but since I deal with a lot of coding and tokenization, I’m worried it’s going to sanitize things the model needs to be able to see.