lukemarks

Karma: 351

[Question] What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023?

lukemarks8 Jul 2023 11:42 UTC

84 points

28 comments2 min readLW link

The Security Mindset, S-Risk and Publishing Prosaic Alignment Research

lukemarks22 Apr 2023 14:36 UTC

39 points

7 comments6 min readLW link

lukemarks 11 Jun 2023 3:44 UTC
23 points
12
on: The Dictatorship Problem
Sam Altman, the quintessential short-timeline accelerationist, is currently on an international tour meeting with heads of state, and is worried about the 2024 election. He wouldn’t do that if he thought it would all be irrelevant next year.
Whilst I do believe Sam Altman is probably worried about the rise of fascism and its augmenting by artificial intelligence, I don’t see this as evidence of his care regarding this fact. Even if he believed a rise in fascism had no likelihood of occurring; it would still be beneficial for him to pursue the international tour as a means of minimizing x-risks, assuming even that we would see AGI in the next <6 months.
[Facism is] a system of government where there are no meaningful elections; the state does not respect civil liberties or property rights; dissidents, political opposition, minorities, and intellectuals are persecuted; and where government has a strong ideology that is nationalist, populist, socially conservative, and hostile to minority groups.
I doubt that including some of the conditions toward the end makes for a more useful dialogue. Irrespective of social conservatism and hostility directed at minority groups, the risk of fascism existentially is probably quite similar. I can picture both progressive and conservative dictatorships reaching essentially all AI x-risk outcomes. Furthermore, is a country that exhibits all symptoms of fascism except for minority group hostility still fascist? Defining fascism in this way makes me worry that future fascist figures can hide behind the veil of “But we aren’t doing x specific thing (e.g. minority persecution) and therefore are not fascist!”
My favored definition, particularly for discussing x-risk would be more along the lines of the Wikipedia definition:
Fascism is a far-right, authoritarian, ultranationalist political ideology and movement, characterized by a dictatorial leader, centralized autocracy, militarism, forcible suppression of opposition, belief in a natural social hierarchy, subordination of individual interests for the perceived good of the nation and race, and strong regimentation of society and the economy.
But I would like to suggest a re-framing of this issue, and claim that the problem of focus should be authoritarianism. What authoritarianism is is considerably clearer than what fascism is, and is more targeted in addressing the problematic governing qualities future governments could possess. It doesn’t appear obvious to me that a non-fascist authoritarian government would be better at handling x-risks than a fascist one, which is contingent on the fact that progressive political attitudes don’t seem better at addressing AI x-risks than conservative ones (or vice versa). Succinctly, political views look to me to be orthogonal to capacity in handling AI x-risk (bar perspectives like anarcho-primitivism or accelerationism that strictly mention this topic in their doctrine).
AI policy, strategy, and governance involves working with government officials within the political system. This will be very different if the relevant officials are fascists, who are selected for loyalty rather than competence.
It’s not obvious to me that selection for loyalty over competence is necessarily more likely in fascism or bad. A competent figure who is opposed to democracy would be a considerably more concerning electoral candidate than a less competent one who is loyal to democracy assuming that democracy is your optimization target.
A fascist government will likely interfere with AI development itself, in the same way that the COVID pandemic was a non-AI issue that nonetheless affected AI engineers.
Is interference with AI development necessarily bad? We can’t predict the unknown unknown of what views on AI development fascist dictatorship (that mightn’t yet exist) might hold or how they will act on them. I agree that on principal a fascist body interfering with industry does obviously not result in good outcomes in most cases but not see how/why this appeals to AI x-risk specifically.

Higher Dimension Cartesian Objects and Aligning ‘Tiling Simulators’

lukemarks11 Jun 2023 0:13 UTC

22 points

0 comments5 min readLW link

lukemarks 9 Apr 2023 12:13 UTC
22 points
10
on: A decade of lurking, a month of posting
Although I soft upvoted this post, there are some notions I’m uncomfortable with.
What I agree with:
- Longtime lurkers should post more
- Less technical posts are pushing more technical posts out of the limelight
- Posts that dispute the Yudkowskian alignment paradigm are more likely to contain incorrect information (not directly stated but heavily implied I believe, please correct me if I’ve misinterpreted)
- Karma is not an indicator of correctness or of value
The third point is likely due to the fact that the Yudkowskian alignment paradigm isn’t a particularly fun one. It is easy to dismiss great ideas for other great ideas when the other ideas promise lower x-risk. This applies in both directions however, as it’s far easier to succumb to extreme views (I don’t mean to use this term in a diminishing fashion) like “we are all going to absolutely die” or “this clever scheme will reduce our x-risk to 1%” and miss the antimeme hiding in plain sight. A perfect example of this is in my mind is the comment section of the Death with Dignity post.
I worry that posts like this discourage content that does not align with the Yudkowskian paradigm, which are likely just as important as posts that conform to it. I don’t find ideas like Shard Theory or their consequential positive reception alarming or disappointing, and on the contrary I find their presentation meaningful and valuable, regardless of whether or not they are correct (not meant to imply I think that Shard Theory is incorrect, it was merely an example). The alternative to posting potentially incorrect ideas (a category that encompasses most ideas) is to have them never scrutinized, improved upon or falsified. Furthermore, incorrect ideas and their falsification can still greatly enrich the field of alignment, and there is no reason why an incorrect interpretation of agency for example couldn’t still produce valuable alignment insights. Whilst we likely cannot iterate upon aligning AGI, alignment ideas are an area in which iteration can be applied, and we would be fools not to apply such a powerful tool broadly. Ignoring the blunt argument of “maybe Yudkowsky is wrong”, it seems evident that “non-Yudkowskian” ideas (even incorrect ones) should be a central component of LessWrong’s published alignment research, this seems to me the most accelerated path toward being predictably wrong less often.
To rephrase, is it the positive reception non-Yudkowskian ideas that alarm/disappoint you, or the positive reception of ideas you believe have a high likelihood of being incorrect (which happens to correlate positively with non-Yudkowskian ideas)?
I assume your answer will be the latter, and if so then I don’t think the correct point to press is whether or not ideas conform to views associated with a specific person, but rather ideas associated with falsity. Let me know what you think, as I share most of your concerns.

Direct Preference Optimization in One Minute

lukemarks26 Jun 2023 11:52 UTC

21 points

3 comments1 min readLW link

Select Agent Specifications as Natural Abstractions

lukemarks7 Apr 2023 23:16 UTC

19 points

3 comments5 min readLW link

The Löbian Obstacle, And Why You Should Care

lukemarks7 Sep 2023 23:59 UTC

18 points

6 comments2 min readLW link

Partial Simulation Extrapolation: A Proposal for Building Safer Simulators

lukemarks17 Jun 2023 13:55 UTC

16 points

0 comments10 min readLW link

[Question] Shouldn’t we ‘Just’ Superimitate Low-Res Uploads?

lukemarks3 Nov 2023 7:42 UTC

15 points

2 comments2 min readLW link

Early Experiments in Reward Model Interpretation Using Sparse Autoencoders

lukemarks, Amirali Abdullah, Rauno Arike, Fazl and nothoughtsheadempty

3 Oct 2023 7:45 UTC

11 points

0 comments5 min readLW link

A Mathematical Model for Simulators

lukemarks2 Oct 2023 6:46 UTC

11 points

0 comments2 min readLW link

lukemarks 18 Jun 2023 22:10 UTC
11 points
5
on: Why I am not an AI extinction cautionista
“But what if [it’s hard]/[it doesn’t]”-style arguments are very unpersuasive to me. What if it’s easy? What if it does? We ought to prefer evidence to clinging to an unknown and saying “it could go our way.” For a risk analysis post to cause me to update I would need to see “RSI might be really hard because...” and find the supporting reasoning robust.

Given current investment in AI and the fact that I can’t conjure a good roadblock for RSI, I am erring on the side of it being easier rather than harder, but I’m open to updating in light of strong counter-reasoning.

lukemarks 31 May 2023 22:00 UTC
9 points
2
on: The Divine Move Paradox & Thinking as a Species
Sure, this is an argument ‘for AGI’, but rarely do people (on this forum at least) reject the deployment of AGI because they feel discomfort in not fully comprehending the trajectory of their decisions. I’m sure that this is something most of us ponder and would acknowledge is not optimal, but if you asked the average LW user to list the reasons they were not for the deployment of AGI, I think that this would be quite low on the list.

Reasons higher on the list for me for example would be “literally everyone might die.” In light of that; dismissing control loss as a worry seems quite miniscule. The reason people fear control loss is generally because losing control of something more intelligent than you with instrumental subgoals that if pursued would probably result in a bad outcome for you, but this doesn’t change the fact that “we shouldn’t fear not being in control for the above reasons” does not constitute sufficient reason to deploy AGI.

Also, although some of the analogies drawn here do have merit; I can’t help but gesture toward the giant mass of tentacles and eyes you are applying them to. To make this more visceral, picture a literal Shoggoth descending from a plane of Eldlitch horror and claiming decision-making supremacy and human-aligned goals. Do you accept its rule because of its superior decision making supremacy and claimed human-aligned, or do you seek an alternative arrangement?

lukemarks 8 Sep 2023 1:31 UTC
6 points
1
in reply to: Charlie Steiner’s comment on: The Löbian Obstacle, And Why You Should Care
Which part specifically are you referring to as being overly complicated? What I take to be the primary assertions of the post to be are:
- Simulacra may themselves conduct simulation, and advanced simulators could produce vast webs of simulacra organized as a hierarchy.
- Simulating an agent is not fundamentally different to creating one in the real world.
- Due to instrumental convergence, agentic simulacra might be expected to engage in resource acquisition. This could take the shape of ‘complexity theft’ as described in the post.^[1]
- The Löbian Obstacle accurately describes why an agent cannot obtain a formal guarantee via design-inspection of its subsequent agent.
- For a simulator to be safe, all simulacra need to be aligned unless we figure some upper bound on “programs of this complexity are too simple to be dangerous,” at which point we would consider simulacra above that complexity only.
I’ll try to justify my approach with respect to one or more of these claims, and if I can’t, I suppose that would give me strong reason to believe the method is overly complicated.
1. ^
  This doesn’t have to be resource acquisition, just any negative action that we could reasonably expect a rational agent to pursue.

lukemarks 22 May 2023 13:14 UTC
5 points
0
on: Distillation of Neurotech and Alignment Workshop January 2023
I agree with this post almost entirely and strong upvoted as a result. The fact that more effort has not been allocated to the neurotechnology approach already is not a good sign, and the contents of this post do ameliorate that situation in my head slightly. My one comment is that I disagree with this analysis of cyborgism:
Interestingly, Cyborgism appeared to diverge from the trends of the other approaches. Despite being consistent with the notion that less feasible technologies take longer to develop, it was not perceived to have a proportionate impact on AI alignment. Essentially, even though cyborgism might require substantial development time and be low in feasibility, its success wouldn’t necessarily yield a significant impact on AI alignment.
Central to the notion of cyborgism is an alternative prioritization of time. Whilst other approaches focus on deconfusing basic concepts central to their agenda or obtaining empirical groundings for their research, cyborgism opts to optimize for the efficiency of applied time during ‘crunch time’. Perhaps the application of neurotechnology to cyborgism mightn’t seem as obviously beneficial as say WBE relative to its feasibility, but cyborgism is composed of significantly more than just the acceleration of alignment via neurotechnology. I will attempt to make the case for why cyborgism might be the most feasible and valuable “meta-approach” to alignment and to the development of alignment-assisting neurotechnology.
Suitability to Commercialization
Cyborgism is inherently a commercializable agenda as it revolves around the production of tools for an incredibly cognitively-demanding task. Tools capable of accelerating alignment work are generally capable of lots of things. This makes cyborgist research suited to the for-profit structure, which has clear benefits for rapid development over alternative structures. This is invaluable in time-sensitive scenarios and elevates my credence in the high-feasibility of cyborgism.
Better Feedback Loops
Measuring progress in cyborgism is considerably more trivial than alternative approaches. Metrics like short-form surveys become an actual applicable metric for success, and proxies like “How much do you feel this tool has accelerated your alignment work” are useful sources of information that can be turned into quantifiable progress metrics. This post is an example of that. Furthermore, superior tools can not only accelerate alignment work but also tool development. As cyborgism has a much broader scope than just neurotechnology, you could differentially accelerate higher value neurotechnological or otherwise approaches with the appropriate toolset. It may be better to invest in constructing the tools necessary to perform rapid neurotechnology research at GPT-(N-1) than it is to establish foundational neurotechnology research now at a relatively lower efficiency.
Broad Application Scope
I find any statement like “cyborgism is/isn’t feasible” to be difficult to support due mainly to the seemingly infinite possible incarnations of the agenda. Although the form of AI-assisted-alignment described in the initial cyborgism post is somewhat specific, other popular cyborgism writings describe more varied applications. It seems highly improbable that we will not see something remotely “cyborg-ish” and that some cyborgish acts will not affect existential risk posed by artificial intelligence, which makes it difficult from my perspective to make claims like that which I instantiated this paragraph with. The primary question to me seems to be more of the kind “how heavily do we lean into cyborgism?”, or more practically “what percentage of resources do we allocate toward efficiency optimization as opposed to direct alignment/neurotechnology research?”.
My personal preference is to treat cyborgism as more of a “meta-agenda” than as an agenda itself. Shifting toward this model of it impacted how I see its implications for other agendas quite significantly, and has increased my credence in its feasibility substantially.
Also, as a side note; I think that the application of neurotechnology to cyborgism is quite non-obvious. “Use neurotechnology as a more efficient interface between tools and their human user” and “use invasive BCI technology to pursue the hardest form of cyborgism” are exceedingly different in nature, and as a result contribute to the difficulty of assessing the approach due in large to the reasons that drove me to classify it more as a meta-agenda.

lukemarks 4 Nov 2023 7:55 UTC
4 points
−1
on: Untrusted smart models and trusted dumb models
models that are too incompetent to think through deceptive alignment are surely not deceptively aligned.
Is this true? In Thoughts On (Solving) Deep Deception, Jozdien gives the following example that suggests otherwise to me:
Back in 2000, a computer scientist named Charles Ofria was studying the evolution of simulated organisms. He wanted to limit their replication rate, so he programmed the simulation to pause after each mutation, measure the mutant’s replication rate in an isolated test environment, and delete the mutant if it replicated faster than its parent.
Eventually, organisms evolved to recognize and pause their replication in the test environment and resume normal replication when released. Ofria tried solving this by making it harder to detect the test environment, but the organisms simply started to randomly accelerate their replication so as to slip through the test environment some of the time.
I don’t see why a model would need to be cognitively able to process its own alignment for its alignment to be improper, and I think this assumption is quite key to the main claims of the post.

lukemarks 24 Aug 2023 22:05 UTC
4 points
2
on: AI Regulation May Be More Important Than AI Alignment For Existential Safety
The issue I have with pivotal act models is that they presume an aligned superintelligence would be capable of bootstrapping its capabilities in such a way that it could perform that act before the creation of the next superintelligence. Soft takeoff seems a very popular opinion now, and isn’t conducive to this kind of scheme.

Also, if a large org were planning a pivotal act I highly doubt they would do so publicly. I imagine subtly modifying every GPU on the planet, melting them or doing anything pivotal on a planetary scale such that the resulting world has only one or a select few superintelligences (at least until a better solution exists) would be very unpopular with the public and with any government.

I don’t think the post explicity argues against either of these points, and I agree with what you have written. I think these are useful things to bring up in such a discussion however.

lukemarks 11 Jun 2023 6:27 UTC
4 points
0
in reply to: the gears to ascension’s comment on: The Dictatorship Problem
As in decreases the ‘amount of democracy’ given that democracy is what you were trying to optimize for.

lukemarks 25 May 2023 10:39 UTC
4 points
1
in reply to: nicholashalden’s comment on: Adumbrations on AGI from an outsider
Soft upvoted your reply, but have some objections. I will respond using the same numbering system you did such that point 1 in my reply will address point 1 of yours.
1. I agree with this in the context of short-term extinction (i.e. at or near the deployment of AGI), but would offer that an inability to remain competitive and loss of control is still likely to end in extinction, but in a less cinematic and instantaneous way. In accordance with this, the potential horizon for extinction-contributing outcomes is expanded massively. Although Yudkowsky is most renowned for hard takeoff, soft takeoff has a very differently shaped extinction-space and (I would assume) is a partial reason for his high doom estimate. Although I cannot know this for sure, I would imagine he has a >1% credence in soft takeoff. ‘Problems with the outcome’ seem highly likely to extend to extinction given time.
2. There are (probably) an infinite number of possible mesa-optimizers. I don’t see any reason to assume an upper bound on potential mesa-optimization configurations, and yes; this is not a ‘slam dunk’ argument. Rather, as derived from the notion that even slightly imperfect outcomes can extend to extinction, I was suggesting that you are trying to search an infinite space for a quark that fell out of your pocket some unknown amount of time ago whilst you were exploring said space. This can be summed up as ‘it is not probable that some mesa-optimizer selected by gradient descent will ensure a Good Outcome’.
3. This still does not mean that the only form of brain hacking is via highly immersive virtual reality. I recall the Tweet that this comment came from, and I interpreted it as a highly extreme and difficult form of brain hacking used to prove a point (the point being that if ASI could accomplish this it could easily accomplish psychological manipulation). Eliezer’s breaking out of the sandbox experiments circa 2010 (I believe?) are a good example of this.
4. Alternatively you can claim some semi-arbitrary but lower extinction risk like 35%, but you can make the same objections to a more mild forecast like that. Why is assigning a 35% probability to an outcome more epistemologically valid than a >90% probability? Criticizing forecasts based on their magnitude seems difficult to justify in my opinion, and critiques should rely on argument only.

lukemarks

Suitability to Commercialization

Better Feedback Loops

Broad Application Scope