lukemarks

Karma: 350

lukemarks 28 Dec 2023 14:04 UTC
LW: 2 AF: 2
0
AF
on: Free agents
I don’t understand the distinction you draw between free agents and agents without freedom.
If I build an expected utility maximizer with a preference for the presence of some physical quantity, that surely is not a free agent. If I build some agent with the capacity to modify a program which is responsible for its conversion from states of the world to scalar utility values, I assume you would consider that a free agent.
I am reminded of E.T. Jaynes’ position on the notion of ‘randomization’, which I will summarize as “a term to describe a process we consider too hard to model, which we then consider a ‘thing’ because we named it.”
How is this agent any more free than the expected utility maximizer, other than for the reason that I can’t conveniently extrapolate the outcome of its modification of its utility function?
It seems to me that this only shifts the problem from “how do we find a safe utility function to maximize” to “how do we find a process by which a safe utility function is learned”, and I would argue the consideration of the latter is already a mainstream position in alignment.
If I have missed a key distinguishing property, I would be very interested to know.

lukemarks 8 Nov 2023 4:58 UTC
3 points
−7
on: Concrete positive visions for a future without AGI
I believe you misinterpreted the quote from disturbance. They were implying that they would bring about AGI at the moment before their brain would be unsalvageable by AGI such that they could be repaired, assumedly in expectation of immortality.

I also don’t think the perspective that we would likely fail as a civilization without AGI is common on LessWrong. I would guess that most of us would expect a smooth-ish transition to The Glorious Future in worlds where we coordinate around [as in don’t build] AI. In my opinion the post is good even without this claim however.

lukemarks 4 Nov 2023 7:55 UTC
4 points
−1
on: Untrusted smart models and trusted dumb models
models that are too incompetent to think through deceptive alignment are surely not deceptively aligned.
Is this true? In Thoughts On (Solving) Deep Deception, Jozdien gives the following example that suggests otherwise to me:
Back in 2000, a computer scientist named Charles Ofria was studying the evolution of simulated organisms. He wanted to limit their replication rate, so he programmed the simulation to pause after each mutation, measure the mutant’s replication rate in an isolated test environment, and delete the mutant if it replicated faster than its parent.
Eventually, organisms evolved to recognize and pause their replication in the test environment and resume normal replication when released. Ofria tried solving this by making it harder to detect the test environment, but the organisms simply started to randomly accelerate their replication so as to slip through the test environment some of the time.
I don’t see why a model would need to be cognitively able to process its own alignment for its alignment to be improper, and I think this assumption is quite key to the main claims of the post.

[Question] Shouldn’t we ‘Just’ Superimitate Low-Res Uploads?

lukemarks3 Nov 2023 7:42 UTC

15 points

2 comments2 min readLW link

lukemarks 26 Oct 2023 6:03 UTC
1 point
0
on: AI as a science, and three obstacles to alignment strategies

unless, by some feat of brilliance, this civilization pulls off some uncharacteristically impressive theoretical triumphs

Are you able to provide an example of the kind of thing that would constitute such a theoretical triumph? Or, if not; a maximally close approximation in the form of something that exists currently?

Early Experiments in Reward Model Interpretation Using Sparse Autoencoders

lukemarks, Amirali Abdullah, Rauno Arike, Fazl and nothoughtsheadempty

3 Oct 2023 7:45 UTC

11 points

0 comments5 min readLW link

A Mathematical Model for Simulators

lukemarks2 Oct 2023 6:46 UTC

11 points

0 comments2 min readLW link

lukemarks 11 Sep 2023 8:38 UTC
11 points
0
on: High school advice
I’m in high school myself and am quite invested in AI safety. I’m not sure whether you’re requesting advice for high school as someone interested in LW, or for LW and associated topics as someone attending high school. I will try to assemble a response to accommodate both possibilities.
Absorbing yourself in topics like x-risk can make school feel like a waste of time. This seems to me to be because school is mostly a waste of time (which is a position I held before becoming interested in AI safety,) but disengaging with the practice entirely also feels incorrect. I use school mostly as a place to relax. Those eight hours are time I usually have to write off as wasted in terms of producing a technical product, but value immensely as a source of enjoyment, socializing and relaxation. It’s hard for me to overstate just how pleasurable attending school can be when you optimize for enjoyment, and if permitted by your school’s environment; a suitable place for intellectual progress in an autodidactic sense also, presuming you aren’t being provided that in the classroom. If you do feel that the classroom is an optimal learning environment for you, I don’t see why you shouldn’t just maximize knowledge extraction.
For many of my peers, school is practically their life. I think that this is a shame, but social pressures don’t let them see otherwise, even when their actions are clearly value negative. Making school just one part of your life instead of having it consume you is probably the most critical thing to extract from this response. The next is to use its resources to your advantage. If you can network with driven friends or find staff willing to push you/find you interesting opportunities, you absolutely should. I would be shocked if there wasn’t at least one staff member at your school passionate about something you were too. Just asking can get you a long way, and shutting yourself off from that is another mistake I made in my first few years of high school, falsely assuming that school simply had nothing to offer me.
In terms of getting involved with LW/AI safety, the biggest mistake I made was being insular, assuming my age would get in the way of networking. There are hundreds of people available at any given time who probably share your interests but possess an entirely different perspective. Most people do not care about my age, and I find that phenomena especially prevalent in the rationality community. Just talk to people. Discord and Slack are the two biggest clusters for online spaces, and if you’re interested I can message you invites.
Another important point, particularly as a high school student is not falling victim to group think. It’s easy to be vulnerable to the failing in your formative years, but it can massively skew your perspective, even when your thinking seems unaffected. Don’t let LessWrong memetics propagate throughout your brain too strongly without good reason.

lukemarks 8 Sep 2023 2:23 UTC
3 points
0
in reply to: Charlie Steiner’s comment on: The Löbian Obstacle, And Why You Should Care
I expect agentic simulacra to occur without intentionally simulating them, in that agents are just generally useful for solving prediction problems and that in conducting millions of predictions (as would be expected of a product on the order of ChatGPT, or future successors,) it’s probable for agentic simulacra to occur. Even if these agents are just approximations, in predicting the behaviors of approximated agents their preferences could still be satisfied in the real world (as described in the Hubinger post.)

The problem I’m interested in is how you ensure that all subsequent agentic simulacra (whether occurred intentionally or otherwise) are safe, which seems difficult to verify formally due to the Löbian Obstacle.

lukemarks 8 Sep 2023 1:31 UTC
6 points
1
in reply to: Charlie Steiner’s comment on: The Löbian Obstacle, And Why You Should Care
Which part specifically are you referring to as being overly complicated? What I take to be the primary assertions of the post to be are:
- Simulacra may themselves conduct simulation, and advanced simulators could produce vast webs of simulacra organized as a hierarchy.
- Simulating an agent is not fundamentally different to creating one in the real world.
- Due to instrumental convergence, agentic simulacra might be expected to engage in resource acquisition. This could take the shape of ‘complexity theft’ as described in the post.^[1]
- The Löbian Obstacle accurately describes why an agent cannot obtain a formal guarantee via design-inspection of its subsequent agent.
- For a simulator to be safe, all simulacra need to be aligned unless we figure some upper bound on “programs of this complexity are too simple to be dangerous,” at which point we would consider simulacra above that complexity only.
I’ll try to justify my approach with respect to one or more of these claims, and if I can’t, I suppose that would give me strong reason to believe the method is overly complicated.
1. ^
  This doesn’t have to be resource acquisition, just any negative action that we could reasonably expect a rational agent to pursue.

The Löbian Obstacle, And Why You Should Care

lukemarks7 Sep 2023 23:59 UTC

18 points

6 comments2 min readLW link

lukemarks 24 Aug 2023 22:05 UTC
4 points
2
on: AI Regulation May Be More Important Than AI Alignment For Existential Safety
The issue I have with pivotal act models is that they presume an aligned superintelligence would be capable of bootstrapping its capabilities in such a way that it could perform that act before the creation of the next superintelligence. Soft takeoff seems a very popular opinion now, and isn’t conducive to this kind of scheme.

Also, if a large org were planning a pivotal act I highly doubt they would do so publicly. I imagine subtly modifying every GPU on the planet, melting them or doing anything pivotal on a planetary scale such that the resulting world has only one or a select few superintelligences (at least until a better solution exists) would be very unpopular with the public and with any government.

I don’t think the post explicity argues against either of these points, and I agree with what you have written. I think these are useful things to bring up in such a discussion however.

[Question] What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023?

lukemarks8 Jul 2023 11:42 UTC

84 points

28 comments2 min readLW link

Direct Preference Optimization in One Minute

lukemarks26 Jun 2023 11:52 UTC

20 points

3 comments1 min readLW link

lukemarks 22 Jun 2023 12:25 UTC
2 points
1
on: why I’m here now
I have enjoyed your writings both on LessWrong and on your personal blog. I share your lack of engagement with EA and with Hanson (although I find Yudkowsky’s writing very elegant and so felt drawn to LW as a result.) If not the above, which intellectuals do you find compelling, and what makes them so by comparison to Hanson/Yudkowsky?

lukemarks 19 Jun 2023 8:04 UTC
2 points
0
in reply to: NeuralSystem_e5e1’s comment on: Why I am not an AI extinction cautionista
In (P2) you talk about a roadblock for RSI, but in (C) you talk about about RSI as a roadblock, is that intentional?
This was a typo.
By “difficult”, do you mean something like, many hours of human work or many dollars spent? If so, then I don’t see why the current investment level in AI is relevant. The investment level partially determines how quickly it will arrive, but not how difficult it is to produce.
The primary implications of the difficulty of a capabilities problem in the context of safety is when said capability will arrive in most contexts. I didn’t mean to imply that the investment amount determined the difficulty of the problem, but that if you invest additional resources into a problem it is more likely to be solved faster than if you didn’t invest those resources. As a result, the desired effect of RSI being a difficult hurdle to overcome (increasing the window to AGI) wouldn’t be realized.

lukemarks 19 Jun 2023 5:23 UTC
1 point
0
in reply to: NeuralSystem_e5e1’s comment on: Why I am not an AI extinction cautionista
More like: (P1) Currently there is a lot of investment in AI. (P2) I cannot currently imagine a good roadblock for RSI. (C) Therefore, I have more reasons to believe RSI will not be entail atypically difficult roadblocks than I do to believe it will.

This is obviously a high level overview, and a more in-depth response might cite claims like the fact that RSI is likely an effective strategy for achieving most goals, or mention counterarguments like Robin Hanson’s, which asserts that RSI is unlikely due to the observed behaviors of existing >human systems (e.g. corporations).

lukemarks 18 Jun 2023 22:10 UTC
11 points
5
on: Why I am not an AI extinction cautionista
“But what if [it’s hard]/[it doesn’t]”-style arguments are very unpersuasive to me. What if it’s easy? What if it does? We ought to prefer evidence to clinging to an unknown and saying “it could go our way.” For a risk analysis post to cause me to update I would need to see “RSI might be really hard because...” and find the supporting reasoning robust.

Given current investment in AI and the fact that I can’t conjure a good roadblock for RSI, I am erring on the side of it being easier rather than harder, but I’m open to updating in light of strong counter-reasoning.

Partial Simulation Extrapolation: A Proposal for Building Safer Simulators

lukemarks17 Jun 2023 13:55 UTC

16 points

0 comments10 min readLW link

lukemarks 11 Jun 2023 10:30 UTC
2 points
0
in reply to: dr_s’s comment on: The Dictatorship Problem
See:
Defining fascism in this way makes me worry that future fascist figures can hide behind the veil of “But we aren’t doing x specific thing (e.g. minority persecution) and therefore are not fascist!”
And:
Is a country that exhibits all symptoms of fascism except for minority group hostility still fascist?

lukemarks

[Question] Shouldn’t we ‘Just’ Su­per­im­i­tate Low-Res Uploads?

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

A Math­e­mat­i­cal Model for Simulators

The Löbian Ob­sta­cle, And Why You Should Care

[Question] What Does LessWrong/​EA Think of Hu­man In­tel­li­gence Aug­men­ta­tion as of mid-2023?

Direct Prefer­ence Op­ti­miza­tion in One Minute

Par­tial Si­mu­la­tion Ex­trap­o­la­tion: A Pro­posal for Build­ing Safer Simulators