Opinions expressed are my own and not endorsed by anyone.
Formerly @ ARC Evals aka METR
Opinions expressed are my own and not endorsed by anyone.
Formerly @ ARC Evals aka METR
Could you state the problem and solution more succinctly?
I wish LW questions had an “accepted answer” thing like stackexchange
It took me 4 hours to read your newsletter. I did click some links. I need a newsletter newsletter.
Re: plausibility of shadowban claims: You can pay clickfarms to mark someone as spam.
This seems correct and important to me.
I wonder how many recent trans people tried/considered doubling down on their sex (eg males taking more testosterone) instead first. Maybe (for some people) either end of gender spectrum is comfortable and being in the middle feels bad¿ Anybody know? Don’t want to ask my friends because this Q will certainly anger them
This matrix closes the case in my book
There is a lot of room between “ignore people; do drastic thing” and “only do things where the exact details have been fully approved”. In other words, the Overton window has pretty wide error bars.
I would be pleased if someone sent me a computer virus that was actually a security fix. I would be pretty upset if someone fried all my gadgets. If someone secretly watched my traffic for evil AI fingerprints I would be mildly annoyed but I guess glad?
Even google has been threatening unpatched software people to patch it or else they’ll release the exploit iirc
So some of the Q of “to pivotally act or not to pivotally act” is resolved by acknowledging that extent is relevant and you can be polite in some cases
I must abstain from further culturewaring but this thought experiment is blowing my mind. I hadn’t heard of it.
I think this is a strong starting point but I think the nice crisp “neural net = function approximator” mostly falls apart as a useful notion when you do fancy stuff with your neural net like active learning or RLAIF. Maybe it’s not technically the neural net doing that...
I guess we don’t have great terms to delineate all these levels of the system:
code that does a forward pass (usually implicitly also describing backward pass & update given loss)
code that does that plus training (ie data fetch and loss function)
that plus RL environment or training set
that plus “training scaffolding” code that eg will do active learning or restart the game if it freezes
just code & weights for a forward pass during inference (presuming that the system has separable training and inference stages)
that plus all the “inference scaffolding” code which will eg do censorship or internet search or calculator integration
that plus the “inference UI”. (Consider how differently people use gpt4 api vs chatgpt website.) (This could also eg be the difference between clicking a checkbox for who to kill and clicking the cancel button on the notification!)
the actual final system turned on in the wild with adversarial users and distribution shift and so on
I wonder if some folks are taking past each other by implicitly referring to different items above...
I can’t work it out myself. Please tell me the correct opinion to have
I know of one: the steam engine was “working” and continuously patented and modified for a century (iirc) before someone used it in boats at scale. https://youtu.be/-8lXXg8dWHk
The Rationality Cheater’s Move is to have more information. Really beats analysis and reason in many cases.
Thanks for pointing that out; I noticed that too. This is perhaps partly made up for by the fact that he doesn’t count feet or the mouth of the Nile river as possibilities.
Have you ever tried hiring someone or getting a job? Mostly lemons all around (apologies for the offense, jobseekers, i’m sure you’re not the lemon)
One of those ideas that’s so obviously good it’s rarely discussed?
Some other things to try:
find a long hair on your arm
find an imperfection in your mirror or fridge door
count the number of quarters in a small pile
find an object you lost recently
figure out the weave on your jeans
I’m just patting myself on the back here for predicting the cup would get knocked over. That shouldn’t count. You want the ball in the cup—what use is a knocked over cup and ball on the ground.
Do you have more things like this? I would participate or run one
Kyle Scott roughly said that when you know where to look and what to ignore you are oriented. Imagine a general freaking out at all the explosions vs one who knows how severe the explosions are expected to be and the threshold for changing course.
You know making the omnicide machine really is the fastest way to understand how you could have done it safely.