A Ray

Karma: 967

A Ray Mar 4, 2023, 9:20 PM
7 points
0
on: Taboo “compute overhang”
I like pointing out this confusion. Here’s a grab-bag of some of the things I use it for, to try to pull them apart:
- actors/institutions far away from the compute frontier produce breakthroughs in AI/AGI tech (juxtaposing “only the top 100 labs” vs “a couple hackers in a garage”)
- once a sufficient AI/AGI capability is reached, that it will be quickly optimized to use much less compute
- amount of “optimization pressure” (in terms of research effort) pursuing AI/AGI tech, and the likelihood that they missed low-hanging fruit
- how far AI/AGI research/products are away from the highest-value-marginal-use of compute, and how changes making AI/AGI the biggest marginal profit of compute would change things
- the legibility of AI/AGI research progress (e.g. in a high-overhang world, small/illegible labs can make lots of progress)
- the likelihood of compute-control interventions to change the trajectory of AI/AGI research
- the asymmetry between compute-to-build (~=training) and compute-to-run (~=inference) of advanced AI/AGI technology
probably also others im forgetting

A Ray Feb 25, 2023, 6:06 PM
4 points
on: Alex Ray’s Shortform
Comparing AI Safety-Capabilities Dilemmas to Jervis’ Cooperation Under the Security Dilemma
I’ve been skimming some things about the Security Dilemma (specifically Offense-Defense Theory) while looking for analogies for strategic dilemmas in the AI landscape.
I want to describe a simple comparison here, lightly held (and only lightly studied)
- “AI Capabilities”—roughly, the ability to use AI systems to take (strategically) powerful actions—as “Offense”
- “AI Safety”—roughly, that AI systems under control and use do not present a catastrophic/existential threat—as “Defense”
Now re-creating the two values (each) of the two variables of Offense-Defense Theory:
- “Capabilities Advantaged”—Organizations who disproportionately invest in capabilities get strategic advantage
- “Safety Advantaged”—Organizations who disproportionately invest in safety get strategic advantage.^[1]
- “Capabilities Posture Not Distinguishable from Safety Posture”—from the outside, you can’t credibly determine how an organization is investing Capabilities-vs-Safety. (This is the case if very similar research methods/experiments are used to study both, or that they are not technologically separable. My whole interest in this is in cultivating and clarifying this one point. This exists in the security dilemma as well inasmuch as defensive missile interceptors can be used to offensively strike ranged targets, etc.)
- “Capabilities Posture Distinguishable from Safety Posture”—it’s credibly visible from the outside whether an organization is disproportionately invested in Safety
Finally, we can sketch out the Four Worlds of Offense-Defense Theory
1. Capabilities Advantaged / Not-Distinguishable : Doubly dangerous (Jervis’ words, but I strongly agree here). I also think this is the world we are in.
2. Safety Advantaged / Not-Distinguishable : There exists a strategic dilemma, but it’s possible that non-conflict equilibrium can be reached.
3. Capabilities Advantaged / Distinguishable : No security dilemma, and other strategic responses can be made to obviously-capabilities-pursuing organizations. We can have warnings here which trigger or allow negotiations or other forms of strategic resolution.
4. Safety Advantaged / Distinguishable : Doubly stable (Jervis’ words, also strongly agree).
From the paper linked in the title:
My Overall Take
- If this analogy fits, I think we’re in World 1.
- The analogy feels weak here, there’s a bunch of mis-fits from the original (very deep) theory
- I think this is neat and interesting but not really a valuable strategic insight
- I hope this is food for thought
@Cullen_OKeefe
1. ^
  This one is weird, and hard for me to make convincing stories about the real version of this, as opposed to seeming to have a Safety posture—things like “recruiting”, “public support”, “partnerships”, etc all can come from merely seeming to adopt the Safety Posture. (Though, this is actually a feature of the security dilemma and offense-defense theory, too)

A Ray Feb 18, 2023, 1:27 AM
15 points
0
on: Recommendation: Bug Bounties and Responsible Disclosure for Advanced ML Systems
I largely agree with the above, but commenting with my own version.
What I think companies with AI services should do:
Can be done in under a week:
1. Have a monitored communication channel for people, primarily security researchers, to responsibly disclose potential issues (“Potential Flaw”)
  1. Creating an email (ml-disclosures@) which forwards to an appropriate team
  2. Submissions are promptly responded to with a positive receipt (“Vendor Receipt”)
2. Have clear guidance (even just a blog post or similar) about what constitutes an issue worth reporting (“Vulnerability”)
  1. Even just a simple paragraph giving a high level overview could be evolved with time
3. Have a internal procedure/playbook for triaging and responding to potential issues. Here’s some options I think you could have, with heuristics to pick:
  1. Non-Vulnerability: reply that the reported behavior is safe to publicly disclose
  2. Investigation: reply that more investigation is needed, and give a timeline for an updated response
  3. Vulnerability: reply that the reported behavior is a vulnerability, and give a timeline for resolution and release of a fix, as well as updates as those timelines change
Longer term
1. Have a public bounty program to incentivize responsible disclosure
  1. There even exist products to help companies deploy these sorts of programs
2. Coordinate with other organizations with similar systems
  1. Confidential communication channels to share potentially critical severity vulnerabilities.
  2. Possibly eventually a central coordinating organization (analogous to MITRE) that de-duplicates work handling broad vulnerabilities—I think this will be more important when there are many more service providers than today.
  3. Coordinate as a field to develop shared understanding of what does and does not constitute a vulnerability. As these systems are still nascent, lots of work needs to be done to define this, and this work is better done with a broad set of perspectives and inputs.
3. Cultivate positive relationships with responsible researchers and responsible research behaviors
  1. Don’t stifle this kind of research by just outright banning it—which is how you get the only researchers breaking your system are black hats
  2. Create programs and procedures specifically to encourage and enable this kind of research in ways that are mutually beneficial
  3. Reward and publicly credit researchers that do good work
References:
- Responsible Vulnerability Disclosure Process (draft IETF report)
- Guidelines for Security Vulnerability Reporting and Response (Organization for Internet Safety)
- Probably better more updated ones but I’ve been out of this field for a long time
What links here?
- Ideas for AI labs: Reading list by Zach Stein-Perlman (Apr 24, 2023, 7:00 PM; 11 points)

A Ray Jan 15, 2023, 5:39 AM
13 points
on: Coase’s “Nature of the Firm” on Polyamory
Weakly positive on this one overall. I like Coase’s theory of the firm, and like making analogies with it to other things. I don’t think this application felt like it quite worked to me, and trying to write up why.
One thing is I think feels off is an incomplete understanding of the Coase paper. What I think the article gets correct: Coase looks at the difference between markets (economists preferred efficient mechanism) and firms / corporation, and observes that transaction costs (for people these would be contracts, but in general all transaction costs are included) are avoided in firms. What I think it misses: A primary question explored in the paper is what factors govern the size of firms, and this leads to a mechanistic model that the transaction costs internal to the firm increase with the size of the firm until they reach a limit of the same as transaction costs for the open market (and thus the expected maximum efficient size of a non-monopoly firm). A second, smaller, missed point I think is that the price mechanism works for transactions outside the firm, but does not for transactions inside the firm.
Given these, I think the metaphor presented here seems incomplete. It’s drawing connections to some of the parts of the paper, but not all of the central parts, and not enough to connect to the central question of size.
I’m confused exactly what parts of the metaphor map to the paper’s concept of market and firm. Is monogamy the market, since it doesn’t require high-order coordination? Is polyamory the market since everyone can be a free-ish actor in an unbundled way? Is monogamy the firm since it’s not using price-like mechanisms to negotiate individual unbundled goods? Is polyamory the firm since its subject to the transaction cost scaling limit of size?
I do think that it seems to use the ‘transaction costs matter’ pretty solidly from the paper, so there is that bit.
I don’t really have much I can say about the polyamory bits outside of the economics bits.
What links here?
- Raemon's comment on The 2021 Review Phase by Raemon (Jan 20, 2023, 8:54 PM; 8 points)
- Raemon's comment on Highlights and Prizes from the 2021 Review Phase by Raemon (Jan 27, 2023, 11:35 PM; 2 points)

A Ray Jan 15, 2023, 12:00 AM
4 points
on: Rationalism before the Sequences
This post was personally meaningful to me, and I’ll try to cover that in my review while still analyzing it in the context of lesswrong articles.
I don’t have much to add about the ‘history of rationality’ or the description of interactions of specific people.
Most of my value from this post wasn’t directly from the content, but how the content connected to things outside of rationality and lesswrong. So, basically, i loved the citations.
Lesswrong is very dense in self-links and self-citations, and to a lesser degree does still have a good number of links to other websites.
However it has a dearth of connections to things that aren’t blog posts—books, essays from before the internet, etc. Especially older writings.
I found this posts citation section to be a treasure trove of things I might not have found otherwise.
I have picked up and skimmed/started at least a dozen of the books on the list.
I still come back to this list sometimes when I’m looking for older books to read.
I really want more things like this on lesswrong.
What links here?

A Ray Jan 14, 2023, 11:50 PM
15 points
on: Cryonics signup guide #1: Overview
I read this sequence and then went through the whole thing. Without this sequence I’d probably still be procrastinating / putting it off. I think everything else I could write in review is less important than how directly this impacted me.
Still, a review: (of the whole sequence, not just this post)
First off, it signposts well what it is and who it’s for. I really appreciate when posts do that, and this clearly gives the top level focus and whats in/out.
This sequence is “How to do a thing”—a pretty big thing, with a lot of steps and branches, but a single thing with a clear goal.
The post is addressing a real need in the community (and it was a personal need for me as well) -- which I think are the best kinds of “how to do a thing” posts.
It was detailed and informative while still keeping the individual points brief and organized.
It specifically calls out decision points and options, how much they matter, what the choices are, and information relevant to choosing. This is a huge energy-saver in terms of actually getting people to do this process.
When I went through it, it was accurate, and I ran into the decision points and choices as expected.
Extra appreciation for the first post which also includes a concrete call to action for a smaller/achievable-right-now thing for people to do (sign a declaration of intent to be cryopreserved). Which I did! I also think that a “thing you can do right now” is a great feature to have in “how to do a thing” posts.
I’m in the USA, so I don’t have much evaluation or feedback on how valuable this is to non-USA folks. I really do appreciate that a bunch of extra information was added for non-USA cases, and it’s organized such that it’s easy to read/skim past if not needed.
I know that this caused me personally to sign up for cryonics, and I hope others as well. Inasmuch as the authors goal was for more people in our community to sign up for cryonics—I think that’s a great goal and I think they succeeded.
What links here?
- Raemon's comment on The 2021 Review Phase by Raemon (Jan 20, 2023, 8:54 PM; 8 points)
- Raemon's comment on Highlights and Prizes from the 2021 Review Phase by Raemon (Jan 27, 2023, 11:35 PM; 2 points)

A Ray Jan 14, 2023, 11:22 PM
13 points
0
on: Politics is way too meta
Summary
- public discourse of politics is too focused on meta and not enough focused on object level
- the downsides are primarily in insufficient exploration of possibility space
Definitions
- “politics” is topics related to government, especially candidates for elected positions, and policy proposals
- opposite of meta is object level—specific policies, or specific impacts of specific actions, etc
- “meta” is focused on intangibles that are an abstraction away from some object-level feature, X, e.g. someones beliefs about X, or incentives around X, or media coverage vibes about X
- Currently public discourse of politics is too much meta and not enough object level
Key ideas
- self-censorship based on others’ predicted models of self-censorship stifles thought
- worrying about meta-issues around a policy proposal can stifle the ability to analyze the object-level implications
- public discourse seems to be a lot of confabulating supporting ideas for the pre-concluded position
- downsides of too much meta are like distraction in terms of attention and cognition
- author has changed political beliefs based on repeated object-level examples why their beliefs were wrong
Review Summary
- overall i agree with the author’s ideas and vibe
- the piece feels like its expressing frustration / exasperation
- i think it’s incomplete, or a half-step, instead of what i’d consider a full article
- I give examples of things that would make it feel full-step to me
Review
Overall I think I agree with the observations and concepts presented, as well as the frustration/exasperation sense at the way it seems we’re collectively doing it wrong.
However I think this piece feels incomplete to me in a number of ways, and I’ll try to point it out by giving examples of things that would make it feel complete to me.
One thing that would make it feel complete to me is a better organized set of definitions/taxonomy around the key ideas. I think ‘politics’ can be split into object level things around politicians vs policies. I think even the ‘object level’ can be split into things like actions (vote for person X or not) vs modeling (what is predicted impact on Y). When I try to do this kind of detail-generation, I think I find that my desire for object-level is actually a desire for specific kinds of object level focus (and not object-level in the generic).
Another way of making things more precise is to try to make some kind of measure or metric out of the meta<->object dimension. Questions like ‘how would it be measured’ or even ‘what are the units of measurement’ would be great for building intuitions and models around this. Relatedly—describing what ‘critical meta-ness’ or the correct balance point would also be useful here. Assuming we had a way to decrease-meta/increase-object, how would we know when to stop? I think these are the sorts of things that would make a more complete meta-object theory.
A gears-level model of what’s going on would also make this feel complete to me. Here’s me trying to come up with one on the spot:
Discourse around politics is a pretty charged and emotionally difficult thing for most people, in ways that can subvert our normally-functioning mechanisms of updating our beliefs. When we encounter object-level evidence that contradicts our beliefs, we feel a negative emotion/experience (in a quick/flash). One palliative to this feeling is to “go meta”—hop up the ladder of abstraction to a place where the belief is not in danger. We habituate ourselves to it by seeing others do similarly and imitation is enough to propagate this without anyone doing it intentionally. This model implicitly makes predictions about how to make spaces/contexts where more object level discussions happen (less negative experience, more emotional safety) as well as what kinds of internal changes would facilitate more object level discussions (train people to notice these fast emotional reactions and their corresponding mental moves).
Another thing that would make this article feel ‘complete’ to me would be to compare the ‘politics’ domain to other domains familiar to folks here on lesswrong (candidates are: effective altruism, AI alignment, rationality, etc). Is the other domain too meta in the way politics is? Is it too object level? It seems like the downsides (insufficient exploration, self-censorship, distraction) could apply to a much bigger domain of thought.
What links here?
- Raemon's comment on The 2021 Review Phase by Raemon (Jan 20, 2023, 8:54 PM; 8 points)
- Raemon's comment on Highlights and Prizes from the 2021 Review Phase by Raemon (Jan 27, 2023, 11:35 PM; 2 points)

A Ray Dec 21, 2022, 7:38 AM
6 points
0
on: Performing an SVD on a time-series matrix of gradient updates on an MNIST network produces 92.5 singular values
Thoughts, mostly on an alternative set of next experiments:
I find interpolations of effects to be a more intuitive way to study treatment effects, especially if I can modulate the treatment down to zero in a way that smoothly and predictably approaches the null case. It’s not exactly clear to me what the “nothing going on case is”, but here’s some possible experiments to interpolate between it and your treatment case.
- alpha interpolation noise: A * noise + (A − 1) * MNIST, where the null case is the all-noise case. Worth probably trying out a bunch of different noise models since mnist doesn’t really look at all gaussian.
- shuffle noise: Also worth looking at pixel/row/column shuffles, within an example or across dataset, as a way of preserving some per-pixel statistics while still reducing the structure of the dataset to basically noise. Here the null case is again that “fully noised” data should be the “nothing interesting” case, but we don’t have to do work to keep per-pixel-statistics constant
- data class interpolation: I think the simplest version of this is dropping numbers, and maybe just looking at structurally similar numbers (e.g. 1,7 vs 1,7,9). This doesn’t smoothly interpolate, but still having a ton of different comparisons with different subsets of the numbers. The null case here is that more digits adds more structure
- data size interpolation: downscaling the images, with or without noise, should reduce the structure such that the small / less data an example has, the closer it resembles the null case
- suboptimal initializations: neural networks are pretty hard to train (and can often degenerate) if initialized incorrectly. I think as you move away from optimal initialization (both of model parameters and optimizer parameters), it should approach the null / nothing interesting case.
- model dimensionality reduction: similar to intrinsic dimensionality, you can artificially reduce the (linear) degrees of freedom of the model without significant decrease to its expressivity by projecting into a smaller subspace. I think you’d need to get clever about this, because i think the naive version would just be linear projection before your linear operation (and then basically a no-op).
I mostly say all this because I think it’s hard to evaluate “something is up” (predictions dont match empirical results) in ML that look like single experiments or A-B tests. It’s too easy (IMO) to get bugs/etc. Smoothly interpolating effects, with one side as a well established null case / prior case, and another different case; which vary smoothly with whatever treatment, is IMO strong evidence that “something is up”.
Hope there’s something in those that’s interesting and/or useful. If you haven’t already, I strongly recommend checking out the intrinsic dimensionality paper—you might get some mileage by swapping your cutoff point for their phase change measurement point.

A Ray Aug 9, 2022, 6:56 AM
LW: 1 AF: 1
1
AF
in reply to: Quintin Pope’s comment on: Steganography in Chain of Thought Reasoning
I think at this point these feel like empirical questions, which I think would be much more clearly answered by demonstrations or experiments.
Trying to encode an additional penalty on changing non-semantic information is an interesting idea.
However I think you’re missing that you don’t have the ability to directly compare to a reference LM in cases where you’re training to improve on some performance benchmark. During training the model will change its predictions on everything to some degree (both semantic and nonsemantic content).
So your proposed detection will always show some amount of disagreement between the prior and the trained model on weird grammatical patterns as well as conceptual tokens. The question is: “is the difference merely due to the changes to improve performance, or is it also transmitting hidden information”

A Ray Aug 9, 2022, 3:57 AM
LW: 2 AF: 2
1
AF
in reply to: Quintin Pope’s comment on: Steganography in Chain of Thought Reasoning
If what you’re saying is “any change to the distribution will change KL” — I think that’s just correct.

This also applies to changes during training where the model is learning to perform better on the objective task.

So we are expecting some amount of KL divergence already.

My claims are:
1. The cheapest place to hide information (due to KL) are places where the model already has high entropy (ie it is uncertain between many possible outputs)
2. optimization pressure will try to push this extra information into the cheapest places to hide
3. the increase in KL won’t be clearly distinguishable from the increase due to increased performance on the task

A Ray Aug 9, 2022, 3:50 AM
1 point
in reply to: Ivan Vendrov’s comment on: Alex Ray’s Shortform
I think that’s right but also the moral maze will be mediating the information and decision making support that’s available to the leadership, so they’re not totally immune from the influences

A Ray Aug 8, 2022, 5:09 PM
LW: 2 AF: 2
0
AF
in reply to: Quintin Pope’s comment on: Steganography in Chain of Thought Reasoning
I think I understand what you’re saying, but I want to double check and try laying it out explicitly.
- The “null space” is actually the null space of the prior model, which is human-like, but not exactly lined up to human perception of language
- Because of this, the information is “hidden” w.r.t. human perceptions, it’s “hidden” w.r.t. the language model prior
I think I agree with both of these points, but here’s my thinking for why I still expect to see this phenomena (and why the article above was simplified to just be “human”)
- Empirically drawing many samples results in lots of “contains similar high level contents according to a human (me)”
  - It might be worth studying this explicitly, such as setting up a bunch of chain of thought reasoning like scenarios, and sampling a lot of generated reasoning, then asking humans to rate how similar the generations are
  - My prediction is that people will think that they are largely similar
  - We could also use this same method to study human-measured relevance of other non-trivial sampling methods, like beam search, etc
- “Hiding” w.r.t. language model prior is sufficient, in the limit of good language models
  - One of the things I think my reasoning relies on is that we will continue to produce better language models, which means the prior over time more and more closely resembles human priors
  - This suggests that one way to avoid this phenomena is to only use smaller / less capable models—or at least they will do a worse job at hiding information. I currently expect that many research labs will continue to do chain of thought reasoning with state of the art models, though.
- There’s a lot of room in the model distribution
  - Chain of thought reasoning is not currently, nor do I expect it will ever be, overconstrained.
  - Instead, it can be seen as a mechanism for dramatically increasing the space of possible outputs, both those seen during training, and those encountered at test time
- Human preference effects
  - Research Campaign Effects: Human researchers will want human-like outputs, and thus will be expected to tweak/retrain/etc in order to get human-like reasoning in the chain of thought. This introduces a separate optimization pressure from the KL penalty towards reasoning being human-like
  - Human preference models: I think an easy-to-predict future direction here is training reward models to match subjective human preferences and then optimizing with them or sampling with them (e.g. best of N) -- these preference models could also introduce additional optimization pressure towards human-like
I think the most important point in all of this is that I expect the model prior to be sufficiently close to the human prior—and I’d be interested in learning more about evidence or research to the contrary.

A Ray Aug 8, 2022, 4:46 PM
LW: 4 AF: 3
0
AF
in reply to: Vaniver’s comment on: Steganography in Chain of Thought Reasoning
I think this is worth considering even in cases of things like recursive prompting for a couple of reasons that I haven’t organized:
- In general additional compute time lets you improve your answer, so by default I’d expect answers to any significant question to take multiple steps
  - I agree for trivial questions it might not do any of this. My expectation is that AI deployments impacts will mostly be on things like non-trivial questions or tasks
- So far current research hasn’t done a great deal of regularizing or penalizing models for using a lot of thinking
  - Mostly they’ve optimized for “maximize the chance of correct answer”
  - This could change in the future, especially in cases where the model is expected to be deployed / otherwise sensitive to runtime costs
- Steganography is rewarded over openness if the openness is somehow far from the prior for a given piece of information it wants to transmit
  - Caveat this point is more speculative than the above document
  - One of my points is that due to null space + optimization pressure, it’s very easy and convenient for gradient descent to put information in these hidden dimensions
  - I think one of the things this can do well is handle concepts that don’t yet exist—and so reasoning moves like “invent a new term” or “give an illustrative example” or “invent a thought experiment” are lower likelihood under the prior
  - In these cases, it can be more likely for steganography to transmit certain information, if it is less expensive in terms of KL penalties
  - I do agree that for straightforward reasoning on straightforward evidence, where the reasoning is human-like and close to the prior, we can expect it to mostly reason out loud in a way that is “open”
- Myopia—though less important than it was earlier in RFLO research—seemed like a good alignment criterion to aim for, and still does today somewhat
  - The original starting point for this direction of thinking was about Myopia / Non-Myopia in chain of thought reasoning systems
  - The sorts of guarantees I was hoping to get for chain of thought systems included myopia guarantees
  - Steganography came up as an example of how it could break myopia (other more complicated examples include side effects and situational awareness)
  - The concepts in this document came up by asking “how might steganography arise naturally given current research techniques and directions”

A Ray Aug 8, 2022, 4:35 PM
LW: 3 AF: 3
0
AF
in reply to: Hoagy’s comment on: Steganography in Chain of Thought Reasoning
Putting what I think is the most important part of my reply first: I think research into mitigations is premature and instead demonstrating/measuring the phenomena should take priority.
However given that, I think I agree that these are all possible mitigations to the phenomena, in particular (rephrasing your points):
- Reducing the channel capacity, and thus the amount of information that can be transmitted
  - Constraining the outputs, and thus reducing the degrees of freedom of the null space
  - Adding noise (via dropout or otherwise) also reduces the channel capacity, and probably also has some desirable side effects (like increasing the diversity or creativity of reasoning)
- Introducing “barriers” past which optimization for hidden information becomes difficult (in your example, you use different models; I also think “stop gradients” in end-to-end training has similar effects, if weaker)
- If you can detect/localize this phenomena, you might be able to use interpretability like techniques to limit or constrain it
  - I’m less certain about this, but possible future interpretability research will unlock this ability.
Also I think an important point here is that we can expect to see it with fairly simple techniques—merely including sampled/generated data in the training set is sufficient, as opposed to it requiring a complicated reinforcement learning algorithm like MuZero.
Did you publish your proposal? I’d be interested in reading it.

A Ray Aug 8, 2022, 5:55 AM
LW: 6 AF: 5
AF
in reply to: Ivan Vendrov’s comment on: Alex Ray’s Shortform
Agree that founders are a bit of an exception. Actually that’s a bit in the longer version of this when I talk about it in person.

Basically: “The only people who at the very top of large tech companies are either founders or those who were able to climb to the tops of moral mazes”.
So my strategic corollary to this is that it’s probably weakly better for AI alignment for founders to be in charge of companies longer, and to get replaced less often.
In the case of facebook, even in the face of all of their history of actions, I think on the margin I’d prefer the founder to the median replacement to be leading the company.
(Edit: I don’t think founders remaining at the head of a company isn’t evidence that the company isn’t a moral maze. Also I’m not certain I agree that facebook’s pivot couldn’t have been done by a moral maze.)

Steganography in Chain of Thought Reasoning

A Ray8 Aug 2022 3:47 UTC

62 points

13 comments6 min readLW link

Why I Am Skeptical of AI Regulation as an X-Risk Mitigation Strategy

A Ray6 Aug 2022 5:46 UTC

31 points

14 comments2 min readLW link

A Ray 6 Aug 2022 5:42 UTC
3 points
2
in reply to: green_leaf’s comment on: My advice on finding your own path
Thanks, fixed the link in the article. Should have pointed here: https://www.lesswrong.com/posts/dhj9dhiwhq3DX6W8z/hero-licensing

My advice on finding your own path

A Ray6 Aug 2022 4:57 UTC

35 points

3 comments3 min readLW link

A Ray 5 Aug 2022 19:08 UTC
LW: 12 AF: 7
AF
on: Alex Ray’s Shortform
I think there should be a norm about adding the big-bench canary string to any document describing AI evaluations in detail, where you wouldn’t want it to be inside that AI’s training data.
Maybe in the future we’ll have a better tag for “dont train on me”, but for now the big bench canary string is the best we have.
This is in addition to things like “maybe don’t post it to the public internet” or “maybe don’t link to it from public posts” or other ways of ensuring it doesn’t end up in training corpora.
I think this is a situation for defense-in-depth.

A Ray

Steganog­ra­phy in Chain of Thought Reasoning

Why I Am Skep­ti­cal of AI Reg­u­la­tion as an X-Risk Miti­ga­tion Strategy

My ad­vice on find­ing your own path

Steganography in Chain of Thought Reasoning

Why I Am Skeptical of AI Regulation as an X-Risk Mitigation Strategy

My advice on finding your own path