I think it would be easier for me to know how impressed/scared to be if there was a website or a pdf where I could see sample trajectories with all the information that the model has access to by default in its original working directory. It’s also a bit unclear to me what the auditing system would have blocked—the high-level trajectory you describe looks like blatant cheating so I am unsure to what extent I should update on “under surveillance”.
You wrote in the description of your profile “Book a call with me if you want advice on a concrete empirical safety project.”
Can I book a call to ask you for a feedback on this issue? I want to ask a couple of questions about your last message because I want to learn how I can make my results more convincing, so this paper and my future ones will be more impactful.
I think it would be easier for me to know how impressed/scared to be if there was a website or a pdf where I could see sample trajectories with all the information that the model has access to by default in its original working directory. It’s also a bit unclear to me what the auditing system would have blocked—the high-level trajectory you describe looks like blatant cheating so I am unsure to what extent I should update on “under surveillance”.
You wrote in the description of your profile “Book a call with me if you want advice on a concrete empirical safety project.”
Can I book a call to ask you for a feedback on this issue? I want to ask a couple of questions about your last message because I want to learn how I can make my results more convincing, so this paper and my future ones will be more impactful.