Jay Bailey

Karma: 686

Jay Bailey 17 May 2024 19:35 UTC
1 point
0
in reply to: Oliver Clive-Griffin’s comment on: Deep Q-Networks Explained
Thanks for this! I’ve changed the sentence to:

The target network gets to see one more step than the Q-network does, and thus is a better predictor.

Hopefully this prevents others from the same confusion :)

Jay Bailey 12 May 2024 12:54 UTC
2 points
0
in reply to: FinalFormal2’s comment on: How do I get better at D&D Sci?
pandas is a good library for this—it takes CSV files and turns them into Python objects you can manipulate. plotly / matplotlib lets you visualise data, which is also useful. GPT-4 / Claude could help you with this. I would recommend starting by getting a language model to help you create plots of the data according to relevant subsets. Like if you think that the season matters for how much gold is collected, give the model a couple of examples of the data format and simply ask it to write a script to plot gold per season.

Jay Bailey 11 May 2024 19:48 UTC
4 points
2
on: How do I get better at D&D Sci?
To provide the obvious advice first:
- Attempt a puzzle.
- If you didn’t get the answer, check the comments of those who did.
- Ask yourself how you could have thought of that, or what common principle that answer has. (e.g, I should check for X and Y)
- Repeat.
I assume you have some programming experience here—if not, that seems like a prerequisite to learn. Or maybe you can get away with using LLM’s to write the Python for you.

Jay Bailey 11 May 2024 16:30 UTC
3 points
2
in reply to: dr_s’s comment on: Ethics and prospects of AI related jobs?
I don’t know about the first one—I think you’ll have to analyse each job and decide about that. I suspect the answer to your second question is “Basically nil”. I think that unless you are working on state-of-the-art advances in:

A) Frontier models B) Agent scaffolds, maybe.

You are not speeding up the knowledge required to automate people.

Jay Bailey 11 May 2024 12:08 UTC
3 points
2
in reply to: dr_s’s comment on: Ethics and prospects of AI related jobs?
I guess my way of thinking of it is—you can automate tasks, jobs, or people.

Automating tasks seems probably good. You’re able to remove busywork from people, but their job is comprised of many more things than that task, so people aren’t at risk of losing their jobs. (Unless you only need 10 units of productivity, and each person is now producing 1.25 units so you end up with 8 people instead of 10 - but a lot of teams could also quite use 12.5 units of productivity well)

Automating jobs is...contentious. It’s basically the tradeoff I talked about above.

Automating people is bad right now. Not only are you eliminating someone’s job, you’re eliminating most other things this person could do at all. This person has had society pass them by, and I think we should either not do that or make sure this person still has sufficient resources and social value to thrive in society despite being automated out of an economic position. (If I was confident society would do this, I might change my tune about automating people)

So, I would ask myself—what type of automation am I doing? Am I removing busywork, replacing jobs entirely, or replacing entire skillsets? (Note: You are probably not doing the last one. Very few, if any, are. The tech does not seem there atm. But maybe the company is setting themselves up to do so as soon as it is, or something)

And when you figure out what type you’re doing, you can ask how you feel about that.

Jay Bailey 11 May 2024 11:05 UTC
5 points
4
on: Ethics and prospects of AI related jobs?
I think that there are two questions one could ask here:
- Is this job bad for x-risk reasons? I would say that the answer to this is “probably not”—if you’re not pushing the frontier but are only commercialising already available technology, your contribution to x-risk is negligible at best. Maybe you’re very slightly adding to the generative AI hype, but that ship’s somewhat sailed at this point.
- Is this job bad for other reasons? That seems like something you’d have to answer for yourself based on the particulars of the job. It also involves some philosophical/political priors that are probably pretty specific to you. Like—is automating away jobs good most of the time? Argument for yes—it frees up people to do other work, it advances the amount of stuff society can do in general. Argument for no—it takes away people’s jobs, disrupts lives, some people can’t adapt to the change.
I’ll avoid giving my personal answer to the above, since I don’t want to bias you. I think you should ask how you feel about this category of thing in general, and then decide how picky or not you should be about these AI jobs based on that category of thing. If they’re mostly good, you can just avoid particularly scummy fields and other than that, go for it. If they’re mostly bad, you shouldn’t take one unless you have a particularly ethical area you can contribute to.

Jay Bailey 14 Apr 2024 20:38 UTC
1 point
0
on: Speedrun ruiner research idea
It seems to me that either:
- RLHF can’t train a system to approximate human intuition on fuzzy categories. This includes glitches, and this plan doesn’t work.
- RLHF can train a system to approximate human intuition on fuzzy categories. This means you don’t need the glitch hunter, just apply RLHF to the system you want to train directly. All the glitch hunter does is make it cheaper.

Jay Bailey 1 Apr 2024 17:32 UTC
4 points
−4
in reply to: nim’s comment on: The Story of “I Have Been A Good Bing”
I was about ⁵⁰⁄₅₀ on it being AI-made, but then when I saw the title “Thought That Faster” was a song, I became much more sure, because that was a post that happened only a couple weeks ago I believe, and if it was human-made I assume it would take longer to go from post to full song. Then I read this post.

Jay Bailey 29 Mar 2024 19:39 UTC
9 points
0
on: Ten Minutes with Sam Altman
In Soviet Russia, there used to be something called a Coke party. You saved up money for days to buy a single can of contraband Coca-Cola. You got all of your friends together and poured each of them a single shot. It tasted like freedom.
I know this isn’t the point of the piece, but this got to me. However much I appreciate my existence, it never quite seems to be enough to be calibrated to things like this. I suddenly feel both a deep appreciation and vague guilt. Though it does give me a new gratitude exercise—imagine the item I am about to enjoy is forbidden in my country and I have acquired a small sample at great expense.

Jay Bailey 29 Mar 2024 18:41 UTC
6 points
−2
in reply to: Liron’s comment on: Failures in Kindness
I notice that this is a standard pattern I use and had forgotten how non-obvious it is, since you do have to imagine yourself in someone else’s perspective. If you’re a man dating women on dating apps, you also have to imagine a very different perspective than your own—women tend to have many more options of significantly lower average quality. It’s unlikely you’d imagine yourself giving up on a conversation because it required mild effort to continue, since you have less of them in the first place and invest more effort in each one.

The level above that one, by the way, is going from being “easy to respond to” to “actively intriguing”, where your messages contain some sort of hook that is not only an easy conversation-continuer, but actually wants them to either find out more (because you’re interesting) or keep talking (because the topic is interesting)

Worth noting is I don’t have enough samples of this strategy to know how good it is. However, it is also worth noting is I don’t have enough samples because I wound up saturated on new relationships a couple weeks shortly after starting this strategy, so for a small n it was definitely quite useful.

Jay Bailey 16 Jan 2024 0:11 UTC
3 points
0
on: How To “Hug the Query”
What I’m curious about is how you balance this with the art of examining your assumptions.

Puzzle games are a good way of examining how my own mind works, and I often find that I go through an algorithm like:
- Do I see the obvious answer?
- What are a few straightforward things I could try?
Then Step 3 I see as similar to your maze-solving method:
- What are the required steps to solve this? What elements constrain the search space?
But I often find that for difficult puzzles, a fourth step is required:
- What assumptions am I making, that would lead me to overlook the correct answer if the assumption was false?
For instance, I may think a lever can only be pulled, and not pushed—or I may be operating under a much harder to understand assumption, like “In this maze, the only thing that matters are visual elements” when it turns out the solution to this puzzle actually involved auditory cues.

Jay Bailey 11 Jan 2024 0:10 UTC
4 points
2
in reply to: sudo’s comment on: Reflections on my first year of AI safety research
Concrete feedback signals I’ve received:
- I don’t find myself excited about the work. I’ve never been properly nerd-sniped by a mechanistic interpretability problem, and I find the day-to-day work to be more drudgery than exciting, even though the overall goal of the field seems like a good one.
- When left to do largely independent work, after doing the obvious first thing or two (“obvious” at the level of “These techniques are in Neel’s demos”) I find it hard to figure out what to do next, and hard to motivate myself to do more things if I do think of them because of the above drudgery.
- I find myself having difficulty backchaining from the larger goal to the smaller one. I think this is a combination of a motivational issue and having less grasp on the concepts.
By contrast, in evaluations, none of this is true. I am able to solve problems more effectively, I find myself actively interested in problems, (the ones I’m working on and ones I’m not) and I find myself more able to solve problems and reason about how they matter for the bigger picture.

I’m not sure how much of each is a contributor, but I suspect that if I was sufficiently excited about the day-to-day work, all the other problems would be much more fixable. There’s a sense of reluctance, a sense of burden, that saps a lot of energy when it comes to doing this kind of work.

As for #2, I guess I should clarify what I mean, since there’s two ways you could view “not suited”.
1. I will never be able to become good enough at this for my funding to be net-positive. There are fundamental limitations to my ability to succeed in this field.
2. I should not be in this field. The amount of resources required to make me competitive in this field is significantly larger than other people who would do equally good work, and this is not true for other subfields in alignment.
I view my use of “I’m not suited” more like 2 than 1. I think there’s a reasonable chance that, given enough time with proper effort and mentorship in a proper organisational setting (being in a setting like this is important for me to reliably complete work that doesn’t excite me), I could eventually do okay at this field. But I also think that there are other people who would do better, faster, and be a better use of an organisation’s money than me.

This doesn’t feel like the case in evals. I feel like I can meaningfully contribute immediately, and I’m sufficiently motivated and knowledgable that I can understand the difference between my job and my mission (making AI go well) and feel confident that I can take actions to succeed in both of them.

If Omega came down from the sky and said “Mechanistic interpretability is the only way you will have any impact on AI alignment—it’s this or nothing” I might try anyway. But I’m not in that position, and I’m actually very glad I’m not.

Reflections on my first year of AI safety research

Jay Bailey8 Jan 2024 7:49 UTC

52 points

3 comments1 min readLW link

Jay Bailey 22 Dec 2023 1:02 UTC
3 points
3
in reply to: Thane Ruthenis’s comment on: Most People Don’t Realize We Have No Idea How Our AIs Work
Anecdotally I have also noticed this—when I tell people what I do, the thing they are frequently surprised by is that we don’t know how these things work.

As you implied, if you don’t understand how NN’s work, your natural closest analogue to ChatGPT is conventional software, which is at least understood by its programmers. This isn’t even people being dumb about it, it’s just a lack of knowledge about a specific piece of technology, and a lack of knowledge that there is something to know—that NN’s are in fact qualitatively different from other programs.

Jay Bailey 18 Nov 2023 14:35 UTC
5 points
3
in reply to: [deactivated]’s comment on: Yarrow Bouchard’s Shortform
Yes, this is an argument people have made. Longtermists tend to reject it. First off, applying a discount rate on the moral value of lives in order to account for the uncertainty of the future is...not a good idea. These two things are totally different, and shouldn’t be conflated like that imo. If you want to apply a discount rate to account for the uncertainty of the future, just do that directly. So, for the rest of the post I’ll assume a discount rate on moral value actually applies to moral value.

So, that leaves us with the moral argument.
A fairly good argument, and the one I subscribe to, is this:
- Let’s say we apply a conservative discount rate, say, 1% per year, to the moral value of future lives.
- Given that, one life now is worth approximately 500 million lives two millenia from now. (0.99^2000 = approximately 2e-9)
- But would that have been reasonably true in the past? Would it have been morally correct to save a life in 0 BC at the cost of 500 million lives today?
- If the answer is “no” to that, it should also be considered “no” in the present.
This is, again, different from a discount rate on future lives based on uncertainty. It’s entirely reasonable to say “If there’s only a 50% chance this person ever exists, I should treat it as 50% as valuable.” I think that this is a position that wouldn’t be controversial among longtermists.

Jay Bailey 31 Oct 2023 1:30 UTC
8 points
0
on: Apply to the Constellation Visiting Researcher Program and Astra Fellowship, in Berkeley this Winter
For the Astra Fellowship, what considerations do you think people should be thinking about when deciding to apply for SERI MATS, Astra Fellowship, or both? Why would someone prefer one over the other, given they’re both happening at similar times?

Jay Bailey 22 Oct 2023 20:16 UTC
2 points
0
in reply to: Aprillion (Peter Hozák)’s comment on: Features and Adversaries in MemoryDT
The agent’s context includes the reward-to-go, state (i.e, an observation of the agent’s view of the world) and action taken for nine timesteps. So, R1, S1, A1, …. R9, S9, A9. (Figure 2 explains this a bit more) If the agent hasn’t made nine steps yet, some of the S’s are blank. So S5 is the state at the fifth timestep. Why is this important?

If the agent has made four steps so far, S5 is the initial state, which lets it see the instruction. Four is the number of steps it takes to reach the corridor where the agent has to make the decision to go left or right. This is the key decision for the agent to make, and the agent only sees the instruction at S5, so S5 is important for this reason.

Figure 1 visually shows this process—the static images in this figure show possible S5′s, whereas S9 is animation_frame=4 in the GIF—it’s fast, so it’s hard to see, but it’s the step before the agent turns.

Jay Bailey 21 Oct 2023 1:44 UTC
28 points
19
on: Holly Elmore and Rob Miles dialogue on AI Safety Advocacy
I think there’s an aesthetic clash here somewhere. I have an intuition or like… an aesthetic impulse, telling me basically… “advocacy is dumb”. Whenever I see anybody Doing An Activism, they’re usually… saying a bunch of… obviously false things? They’re holding a sign with a slogan that’s too simple to possibly be the truth, and yelling this obviously oversimplified thing as loudly as they possibly can? It feels like the archetype of overconfidence.
This is exactly the same thing that I have felt in the past. Extremely well said. It is worth pointing out explicitly that this is not a rational thought—it’s an Ugh Field around advocacy, and even if the thought is true, that doesn’t mean all advocacy has to be this way.

Features and Adversaries in MemoryDT

Joseph Bloom and Jay Bailey

20 Oct 2023 7:32 UTC

31 points

6 comments25 min readLW link

Jay Bailey 20 Oct 2023 4:42 UTC
1 point
0
in reply to: cata’s comment on: Boost your productivity, happiness and health with this one weird trick
I find this interesting but confusing. Do you have an idea for what mechanism allowed this? E.g: Are you getting more done per hour now than your best hours working full-time? Did the full-time hours fall off fast at a certain point? Was there only 15 hours a week of useful work for you to do and the rest was mostly padding?

Jay Bailey

Reflec­tions on my first year of AI safety research

Fea­tures and Ad­ver­saries in MemoryDT

Reflections on my first year of AI safety research

Features and Adversaries in MemoryDT