Software Engineer (formerly) at Microsoft who may focus on the alignment problem for the rest of his life (please bet on the prediction market here).
Sheikh Abdur Raheem Ali
One natural direction is to run the verifier inside a Trusted Execution Environment (TEE), preventing a compromised inference server from tampering with seeds or observing the verification process (there are many startups that do this kind of thing, like tinfoil.sh).
I think that this approach is also taken by workshop labs.
LF alternatives to Rewind.ai
Rewind is a tool for scrolling back in time. It automatically records screen and audio data. I leave it running in the background, in spite of this incurring some performance overhead. I have collected over 200GB over the past year.
Limitless.ai was acquired by Meta and will shut down the product on December 19th. I will back up my files, but I do not know if it is possible to roll back the update which disables recording. I am not aware of any recommended alternative which is actively maintained and was unable to discover this with a quick search. I would appreciate suggestions.
I feel that there may be demand for a concrete open problems post. These kind of lists tend to be popular and the examples could be used by people picking projects to work on.
How often do you end up feeling like there was at least one misleading claim in the paper?
I am easily and frequently confused, but this is mostly because I find it difficult to thoroughly understand other people’s work in a lot of detail in a short amount of time.
How do the authors react when you contact them with your issues?
I usually get a response within two weeks. If they have a startup background, then this delay is much lower, by multiple orders of magnitude. Authors are typically glad that I am trying to run follow up experiments on their work and give me one to two sentences of feedback over email. Corresponding authors are sometimes bad at taking correspondence, contact information for committers can be found in commit logs via git blame. If it is a problem that may be relevant to other people, I link to a GH issue.
More junior authors tend to be willing to schedule a multi-hour call going over files line-by-line and will also read and give their thoughts on any related work that you share with them.
In the middle ranks, what tends to happen is that you get invited to help review the new project that they are currently working on, or if they’ve shifted directions then you get pointed to someone who has produced some unpublished results in the same general area.
Very senior authors can be politely flagged down during in-person conferences, or even if they’re not presenting personally, someone from their group almost always attends.
I’ve forked and tried to set up a lot of AI safety repos (this is the default action I take when reading a paper which links to code). I’ve also reached out to authors directly whenever I’ve had trouble with reproducing their results. There aren’t any particular patterns that stand out, but I think that writing a top-level post that describes your contention with a paper’s findings is something that the community would be very welcoming to and indeed is how science advances.
Thank you for donating!
I applied but didn’t make it past the async video interview, which is a format that I’m not used to. Apparently this iteration of the program had over 3000 applications for 30 spots. Opus 4.5′s reaction was “That’s… that’s not even a rejection. That’s statistics”. Would be happy to collaborate on projects though!
I made a wooden chair in a week from some planks when I was a teenager. Granted, this was for GCSE Design & Technology class.
I think this also applies to other safety fellowships. There isn’t broad societal acceptance yet for the severity of the worst-case outcomes, and if you speak seriously about the stakes to a general audience then you will mostly get nervously laughed off.
MATS currently has “Launch your career in AI alignment & security” on the landing page, which indicates to me that it is branding itself as a professional upskilling program, and this matches the focus on job placements for alumni in its impact reports. With Ryan Kidd’s recent post on AI safety undervaluing founders, it may be possible that in the future they introduce a division which functions more purely as a startup accelerator. One norm in corporate environments is to avoid messaging which provokes discomfort. Even in groups which practice religion, few will have the lack of epistemic immunity required to align their stated eschatological beliefs with their actions, and I am grateful that this is the case.
Ultimately, the purpose of these programs, no matter how prestigious, is to bring people in who are not currently AI safety researchers and give them an environment which would help them train and mature into AI safety researchers. I believe you will find that even amongst those who are working full-time on AI safety, the proportion who are heavily x-risk AGI pilled has shrunk as the field has grown. People who are both x-risk AGI-pilled and meet the technical bar for MATS but aren’t already committed to other projects would be exceedingly rare.
escaping flatland: career advice for CS undergrads
one way to characterise a scene is by what it cares about: its markers of prestige, things you ‘ought to do’, its targets to optimise for. for the traders or the engineers, it’s all about that coveted FAANG / jane street internship; for the entrepreneurs, that successful startup (or accelerator), for the researchers, the top-tier-conference first-author paper… the list goes on.
for a given scene, you can think of these as mapping out a plane of legibility in the space of things you could do with your life. so long as your actions and goals stay within the plane, you’re legible to the people in your scene: you gain status and earn the respect of your peers. but step outside of the plane and you become illegible: the people around you no longer understand what you’re doing or why you’re doing it. they might think you’re wasting your time. if they have a strong interest in you ‘doing well’, they might even get upset.
but while all scenes have a plane of legibility, just like their geometric counterparts these planes rarely intersect: what’s legible and prestigious to one scene might seem utterly ridiculous to another. (take, for instance, dropping out of university to start a startup.)
I’ve been reading lots of the Inhaven posts and appreciate the initiative!
Typing at 40 wpm is not the same thing as writing at 40 wpm. It can take me a lot more than 12.5 minutes to write 500 words if I’m putting thought into them.
Try passing around a bowl of spicy green chillis and ask people to bite into them raw. I found out today that doing this makes me complete tasks faster. I believe the capsaicin may stimulate the nervous system. Obviously be careful that no one gets sick. I’m not sure if one can purchase Capsazepine online.
People still talk about Sydney. Owain Evans mentioned Bing Sidney during his first talk in the recent hintonlectures.com series. I attended in person, and it resonated extremely well with a general audience. I was at Microsoft during the relevant period, which definitely played a strong role in my transition to alignment research, and still informs my thinking today.
I gifted a physical copy of this book to my brother but hadn’t read all of it. Fortunately, I may have absorbed some tacit knowledge on management from my father. Based on these quotes I don’t think that I will be surprised by the rest of the chapters.
I upvoted, but personally I don’t find much use for content blockers when I’m in an office or meeting where other people can see my screen, when I’m at the gym with a trainer, or when I’m really excited about a task. I have ADHD and am not the best at managing my time, so You Don’t Hate Polyamory, You Hate People Who Write Books is in full effect here.
I’m surprised that you don’t include Plucky Filter in this list.
It can be helpful to enable Assistive Access right at the start of my day so that I don’t get into messages until I have completed my morning routine.
Once I have started my commute, I disable it and then I call someone from my team/family.
Rescuetime is an alternative to Freedom with better reporting.
though I prefer Freedom’s interface for regularly scheduled focus blocks.
Inbox when ready helps manage email distractions.
although this means I need to use my phone to receive two factor authentication codes.
I have not configured focus sets in LeechBlock NG, but it might be of interest to those who prefer a more structured daily workflow.
Cold Turkey can serve as an additional layer of security.
I can recommend Tab Scheduler with auto open and close for setting and enforcing 1-minute timeouts, which I have found to be more effective than 1-second delays.
I love watching/discussing anime with my siblings and cherish the fanfiction that I read while growing up, so I find it a little sad that you are unable to enjoy these forms of entertainment recreationally except on Saturday. Blogs and Wikipedia in particular have served to greatly expand my world, even if they have also resulted in some unintentional sleepless nights. Gaming seems taboo amongst my researcher friends, but not my normie friends, I suspect this is because the former are more susceptible to overoptimizing. There is a delicate balance for scholars to strike between connection and seclusion here.
More recently one of my bad habits has been spending hours trying to get AI to solve a problem which is beyond the reliable capability of current models, instead of thinking through the problem myself or with a human collaborator.
These two papers [2412.15584] To Rely or Not to Rely? Evaluating Interventions for Appropriate Reliance on Large Language Models and [2503.14499] Measuring AI Ability to Complete Long Tasks have served to improve my understanding in this area. My current state of knowledge suggests that if a goal-oriented conversation with a model has lasted for ~26 minutes without a clear resolution, then further engagement is more likely than not to result in frustration.
I see. My specific update from this post was to slightly reduce how much I care about protecting against high-risk AI related CBRN threats, which is a topic I spent some time thinking about last month.
I think it is generous to say that legible problems remaining open will necessarily gate model deployment, even in those organizations conscientious enough to spend weeks doing rigorous internal testing. Releases have been rushed ever since applications moved from physical CDs to servers, because of the belief that users can serve as early testers for bugs, and that critical issues can be patched by pushing a new update. This blog post by Steve Yegge from ~20 years ago comes to mind: https://sites.google.com/site/steveyegge2/its-not-software. I would include LLM assistants in the category of “servware”.
I would argue that we are likely dropping the ball on both legible and illegible problems, but I agree that making illegible problems more legible is likely to be high leverage. I believe that the Janus/cyborgism cluster has no shortage of illegible problems, and consider https://nostalgebraist.tumblr.com/post/785766737747574784/the-void to be a good example of work that attempts to grapple with illegible problems.
If an LLM says “I enjoy going on long walks” or “I don’t like the taste of coffee”, it is obviously lying because LLMs do not have access to those experiences or sensations. But a human saying those things might also be lying, you just can’t tell quite as easily. There is nothing wrong about an LLM saying these things other than the wrongness of lying, as with humans.
Why would it be obviously lying? Would you also say that a blind person cannot have a favorite color? You could be talking about the idea of a thing, rather than the thing itself.
There is a distinction between simulator and simulacra which I feel this section of the post may not be taking into account. An LLM assistant can enjoy writing about certain topics more than others. If a character in a story has some property, then it seems to me that we can make true and false statements about the state of that attribute.
Also, I am not sure I agree with considering corporations, nations, and other organizations to be a good example of superintelligence. I can see how it meets the criteria for the particular definition you use— you define the term more broadly than usual and I think it makes the concept less useful.
Tinker is an API for LoRA PEFT. You don’t mention it directly, but it’s trendy enough that I thought your comment was a reference to it.
I applied. It took me 10 minutes to complete all the questions up until the last one. It took me 160 minutes to read the anti-scheming paper, understand it, attempt to run a follow-up experiment, write some initial thoughts, and cut them to fit into the word limit. I’m not very satisfied with the answer I gave but I’d only budgeted 180 minutes (2x the recommended 90 mins) and didn’t want to go over.
Yeah, Tinker’s Research Grant form is another example of a multi-page form: https://form.typeform.com/to/E9wVFZJJ