Brendon_Wong

Karma: 89

Brendon_Wong 15 Jan 2024 20:34 UTC
8 points
0
on: What does it look like for AI to significantly improve human coordination, before superintelligence?
Not sure the extent to which this falls under “coordination tech” but are you familiar with work in collective intelligence? This article has some examples of existing work and future directions: https://www.wired.com/story/collective-intelligence-democracy/. Notably, it covers enhancements in expressing preferences (quadratic voting), prediction (prediction markets), representation (liquid democracy), consensus in groups (Polis), and aggregating knowledge (Wikipedia).

As you reference above, there’s non-AI collective action tech: https://foresight.org/a-simple-secure-coordination-platform-for-collective-action/

In the area of cognitive architectures, the open agency proposals contain governance tech, like Drexler’s original Open Agency model (https://www.lesswrong.com/posts/5hApNw5f7uG8RXxGS/the-open-agency-model), Davidad’s dramatically more complex Open Agency Architecture (https://www.lesswrong.com/posts/jRf4WENQnhssCb6mJ/davidad-s-bold-plan-for-alignment-an-in-depth-explanation), and the recently proposed Gaia Network (https://www.lesswrong.com/posts/AKBkDNeFLZxaMqjQG/gaia-network-a-practical-incremental-pathway-to-open-agency).

The main way I look at this is that software can greatly boost collective intelligence (CI), and one part of collective intelligence is coordination. Collective intelligence seems really under explored and I think there are very promising ways to improve it. More on my plan for CI + AGI here if of interest: https://www.web10.ai/p/web-10-in-under-10-minutes

While I think CI can be useful for things like AI governance, I think collective intelligence is actually very related to AI safety in the context of a cognitive architecture (CA). CI can be used to federate responsibilities in a cognitive architecture, including AI systems reviewing other AI systems as you mention. It can be used to enhance human control and participation in a CA, including allowing humans to set the goals of a cognitive architecture–based system, allow humans to perform the thinking and acting in a CA, and allow humans to participate in the oversight and evaluation of the granular and high-level operation of a CA. I write more on the safety aspects here if you’re interested: https://www.lesswrong.com/posts/caeXurgTwKDpSG4Nh/safety-first-agents-architectures-are-a-promising-path-to

In my view, it is most optimal to integrate CI and AI together in the same federated cognitive architecture, but CI systems can themselves be superintelligent, and that could be useful for developing and working with safe artificial super intelligence (including AI to help with primarily human-orchestrated CI, which blurs the line between CI and a combined human-AI cognitive architecture).

I see certain AI developments as boosting the same underlying tech required for next-level collective intelligence (modeling reasoning, for example, which would fall under symbolic AI) and augmenting collective intelligence (e.g. helping to identify areas of consensus in a more automated manner, like: https://ai.objectives.institute/talk-to-the-city).

I think many examples of AI engagement in CI and CA boil down to translating information from humans into various forms of unstructured, semi-structured, and structured data (my preference is for the latter, which I view is pretty crucial in next-gen cognitive architecture and CI systems) which are used to perform many functions from identifying each person’s preferences and existing beliefs to performing planning to conducting evaluations.

Brendon_Wong 14 Dec 2023 20:08 UTC
1 point
0
in reply to: Marius Hobbhahn’s comment on: Some for-profit AI alignment org ideas
This is an interesting point. I also feel like the governance model of the org and culture of mission alignment with increasing safety is important, in addition to the exact nature of the business and business model at the time the startup is founded. Looking at your examples, perhaps by “business model” you are referring both to what brings money in but also the overall governance/decision-making model of the organization?

Brendon_Wong 14 Dec 2023 16:32 UTC
3 points
0
on: Some for-profit AI alignment org ideas
Great article! Just reached out. A couple ideas I want to mention are working on safer models directly (example: https://www.lesswrong.com/posts/JviYwAk5AfBR7HhEn/how-to-control-an-llm-s-behavior-why-my-p-doom-went-down-1), which for smaller models might not be cost prohibitive to make progress on. There’s also building safety-related cognitive architecture components that have commercial uses. For example, world model work (example: https://www.lesswrong.com/posts/nqFS7h8BE6ucTtpoL/let-s-buy-out-cyc-for-use-in-agi-interpretability-systems) or memory systems (example: https://www.lesswrong.com/posts/FKE6cAzQxEK4QH9fC/qnr-prospects-are-important-for-ai-alignment-research). My work is trying to do a few of these things concurrently (https://www.lesswrong.com/posts/caeXurgTwKDpSG4Nh/safety-first-agents-architectures-are-a-promising-path-to).

Brendon_Wong 4 Dec 2023 9:03 UTC
2 points
0
in reply to: RogerDearnaley’s comment on: How to Control an LLM’s Behavior (why my P(DOOM) went down)
I appreciate your thoughtful response! Apologies, in my sleep deprived state, I appear to have hallucinated some challenges I thought appeared in the article. Please disregard everything below “I think some of the downsides mentioned here are easily or realistically surpassable...” except for my point on “many-dimensional labeling.”
To elaborate, what I was attempting to reference was QNRs which IIRC are just human-interpretable, graph-like embeddings. This could potentially automate the entire labeling flow and solve the “can categories/labels adequately express everything?” problem.

Brendon_Wong 2 Dec 2023 6:18 UTC
1 point
0
in reply to: Gerald Monroe’s comment on: How to Control an LLM’s Behavior (why my P(DOOM) went down)
This approach is alignment by bootstrapping. To use it you need some agent able to tag all the text in the training set, with many different categories.
Pre GPT4, how could you do this?
Well, humans created all of the training data on our own, so it should be possible to add the necessary structured data to that! There are large scale crowdsourced efforts like Wikipedia. Extending Wikipedia, and a section of the internet, with enhancements like associating structured data with unstructured data, plus a reputation-weighted voting system to judge contributions, seems achievable. You could even use models to prelabel the data but have that be human verified at a large scale (or in semi-automated or fully automated, but non-AI ways). This is what I’m trying to do with Web 10. Geo is the Web3 version of this, and the only other major similar initiative I’m aware of.

Brendon_Wong 2 Dec 2023 6:11 UTC
1 point
0
on: How to Control an LLM’s Behavior (why my P(DOOM) went down)
This is a fantastic article! It’s great to see that there’s work going on in this space, and I like that the approach is described in very easy to follow and practical terms.
I’ve been working on a very expansive approach/design for AI safety called safety-first cognitive architectures, which is vaguely like a language model agent designed from the ground up with safety in mind, except extensible to both present-day and future AI designs, and with a very sophisticated (yet achievable, and scalable from easy to hard) safety- and performance-minded architecture. I have intentionally not publicly published implementation details yet, but will send you a DM!
It seems like this concept is related to the “Federating Cognition” section of my article, specifically a point about the safety benefits of externalizing memory: “external memory systems can contain information on human preferences which AI systems can learn from and/or use as a reference or assessment mechanism for evaluating proposed goals and actions.” At a high level, this can affect both AI models themselves as well as model evaluations and the cognitive architecture containing models (the latter is mentioned at the end of your post). For various reasons, I haven’t written much about the implications of this work to AI models themselves.
I think some of the downsides mentioned here are easily or realistically surpassable. I’ll post a couple thoughts.
For example, is it really true that this would require condensing everything into categories? What about numerical scales for instance? Interestingly, in February, I did a very-small-scale proof-of-concept regarding automated emotional labeling (along with other metadata), currently available at this link for a brief time. As you can see, it uses numerical emotion labeling, although I think that’s just the tip of the iceberg. What about many-dimensional labeling? I’d be curious to get your take on related work like Eric Drexler’s article on QNRs (which is unfortunately similar to my writing in that it may be high-level and hard to interpret) which is one of the few works I can think of regarding interesting safety and performance applications of externalized memories.
With regard to jailbreaking, what if approaches like steering GPT with activation vectors and monitoring internal activations for all model inputs are used?

Brendon_Wong 19 Nov 2023 11:40 UTC
1 point
0
in reply to: Roger Dearnaley’s comment on: World-Model Interpretability Is All We Need
One possibility that I find plausible as a path to AGI is if we design something like a Language Model Cognitive Architecture (LMCA) along the lines of AutoGPT, and require that its world model actually be some explicit combination of human natural language, mathematical equations, and executable code that might be fairly interpretable to humans. Then the only potions of its world model that are very hard to inspect are those embedded in the LLM component.
Cool! I am working on something that is fairly similar (with a bunch of additional safety considerations). I don’t go too deeply into the architecture in my article, but would be curious what you think!

Brendon_Wong 7 Aug 2023 6:39 UTC
3 points
1
in reply to: Simon Goldstein’s comment on: Safety-First Agents/Architectures Are a Promising Path to Safe AGI
Yep, I agree that there’s a significant chance/risk that alternative AI approaches that aren’t as safe as LMAs are developed, and are more effective than LMAs when run in a standalone manner. I think that SCAs can still be useful in those scenarios though, definitely from a safety perspective, and less clear from a performance perspective.
For example, those models could still do itemized, sandboxed, and heavily reviewed bits of cognition inside an architecture, even though that’s not necessary for them to achieve what the architecture working towards. Also, this is when we start getting into more advanced safety features, like building symbolic/neuro-symbolic white box reasoning systems that are interpretable, for the purpose of either controlling cognition or validating the cognition of black box models (Davidad’s proposal involves the latter).

Safety-First Agents/Architectures Are a Promising Path to Safe AGI

Brendon_Wong6 Aug 2023 8:02 UTC

13 points

2 comments12 min readLW link

Brendon_Wong 20 Jul 2023 1:24 UTC
5 points
2
in reply to: Roman Leventov’s comment on: Internal independent review for language model agent alignment
I implied the whole spectrum of “LLM alignment”, which I think is better to count as a single “avenue of research” because critiques and feedback in “LMA production time” could as well be applied during pre-training and fine-tuning phases of training (constitutional AI style).
If I’m understanding correctly, is your point here that you view LLM alignment and LMA alignment as the same? If so, this might be a matter of semantics, but I disagree; I feel like the distinction is similar to ensuring that the people that comprise the government is good (the LLMs in an LMA) versus trying to design a good governmental system itself (e.g. dictatorship, democracy, futarchy, separation of powers, etc.). The two areas are certainly related, and a failure in one can mean a failure in another, but the two areas can involve some very separate and non-associated considerations.
It’s only reasonable for large AGI labs to ban LMAs completely on top of their APIs (as Connor Leahy suggests)
Could you point me to where Connor Leahy suggests this? Is it in his podcast?
or research their safety themselves (as they already started to do, to a degree, with ARC’s evals of GPT-4, for instance)
To my understanding, the closest ARC Evals gets to LMA-related research is by equipping LLMs with tools to do tasks (similar to ChatGPT plugins), as specified here. I think one of the defining features of an LMA is self-delegation, which doesn’t appear to be happening here. The closest they might’ve gotten was a basic prompt chain.
I’m mostly pointing these things out because I agree with Ape in the coat and Seth Herd. I don’t think there’s any actual LMA-specific work going on in this space (beyond some preliminary efforts, including my own), and I think there should be. I am pretty confident that LMA-specific work could be a very large research area, and many areas within it would not otherwise be covered with LLM-specific work.

Brendon_Wong 18 Jul 2023 21:47 UTC
3 points
2
in reply to: Roman Leventov’s comment on: Internal independent review for language model agent alignment
Do you have a source for “Large labs (OpenAI and Anthropic, at least) are pouring at least tens of millions of dollars into this avenue of research?” I think a lot of the current work pertains to LMA alignment, like RLHF, but isn’t LMA alignment per say (I’d make a distinction between aligning the black box models that compose the LMA versus the LMA itself).

Notes on the importance and implementation of safety-first cognitive architectures for AI

Brendon_Wong11 May 2023 10:03 UTC

3 points

0 comments3 min readLW link

Brendon_Wong 3 May 2023 11:45 UTC
3 points
0
on: CAIS-inspired approach towards safer and more interpretable AGIs
Have you seen Seth Herd’s work and the work it references (particularly natural language alignment)? Drexler also has an updated proposal called Open Agencies, which seems to be an updated version of his original CAIS research. It seems like Davidad is working on a complex implementation of open agencies. I will likely work on a significantly simpler implementation. I don’t think any of these designs explicitly propose capping LLMs though, given that they’re non-agentic, transient, etc. by design and thus seem far less risky than agentic models. The proposals mostly focus on avoiding riskier models that are agentic, persistent, etc.

Brendon_Wong 28 Apr 2023 3:42 UTC
2 points
1
on: Capabilities and alignment of LLM cognitive architectures
Have you read Eric Drexler’s work on open agencies and applying open agencies to present-day LLMs? Open agencies seem like progress towards a safer design for current and future cognitive architectures. Drexler’s design touches on some of the aspects you mention in the post, like:
The system can be coded to both check itself against its goals, and invite human inspection if it judges that it is considering plans or actions that may either violate its ethical goals, change its goals, or remove it from human control.

Brendon_Wong 1 Jul 2015 5:02 UTC
0 points
in reply to: [deleted]’s comment on: Can You Give Support or Feedback for My Program to Alleviate Poverty?
My experience on Upwork is actually the same as yours! In our tests of the platform, it appears to be very difficult to find jobs due to the intense competition. I was unpleasantly surprised at first when I saw how difficult it was to earn money on Upwork as a new user. However, that was the whole point of the initial tests we did, so we expanded and have still been expanding the program to encompass other forms of virtual work that pay reliably and still have room to grow. Upwork will be a minor or non-existent part of our program.

If my program was just on Upwork, then I would be inclined to side with your analysis. Thankfully, it’s not.

Brendon_Wong 30 Jun 2015 19:31 UTC
0 points
in reply to: [deleted]’s comment on: Can You Give Support or Feedback for My Program to Alleviate Poverty?
I think I understand the point: hypothetically, this program would take work away from people more in need, possibly even making the world worse off because of that. But if I magically made half of the virtual workforce disappear, then the half of the people that were removed would be really poor and the other half would be twice as rich. But is that creating more good? No, because the richer half would not need the money as much as the poorer half. If I added more people who were earning less money before being added then I am creating a net good, and that’s what I am trying to do. I don’t think the impact of helping several dozen people (just at first!) get out of poverty is insignificant, and since the program could be expanded if our tests indicate it works effectively, I think it could be considered high impact it terms of the number of people it could help and how much it could change their lives.

Brendon_Wong 30 Jun 2015 19:21 UTC
0 points
in reply to: [deleted]’s comment on: Can You Give Support or Feedback for My Program to Alleviate Poverty?
Well the total pool of work available for everyone is imperceptibly decreased in the short run, not aversely affecting anyone to any significant degree, while giving more of the poor who really need the money work opportunities… Is employing several dozen more people a small net good? I guess it’s a matter of opinion.

Brendon_Wong 30 Jun 2015 9:39 UTC
0 points
in reply to: jsteinhardt’s comment on: Can You Give Support or Feedback for My Program to Alleviate Poverty?
We are continuing our search for similar projects, thank you for your suggestion. I hope that we have not missed any pitfalls, but like Strangeattractor wrote, we are indeed doing tests of the concept in various stages of development, and this project is kind of a pilot in and of itself, so hopefully we can catch anything we might have missed.

Brendon_Wong 30 Jun 2015 9:36 UTC
−2 points
in reply to: [deleted]’s comment on: Can You Give Support or Feedback for My Program to Alleviate Poverty?
’In a charitable way” meaning good for the people. Just because there are for-profit companies out there doing this doesn’t mean they are doing what is best for the people, they are distributing wealth, but also keeping a lot of it for themselves. A charitable venture would give most of the profits to the people involved, and this project also involves providing many things to people like internet and computer access, training, opportunities, something a lot of freelancers have to acquire for themselves in developing countries. It is very difficult for a would-be-freelancer to find access to all of the technology, one-on-one help, etc, hence the value of this project. While there are virtual employment companies, there are no companies helping freelancers get started, which is unique and fills a need.

Brendon_Wong 30 Jun 2015 9:33 UTC
0 points
in reply to: James_Miller’s comment on: Can You Give Support or Feedback for My Program to Alleviate Poverty?
Thank you, that is one of the markets we are looking to branch out to.

Brendon_Wong

Safety-First Agents/​Ar­chi­tec­tures Are a Promis­ing Path to Safe AGI

Notes on the im­por­tance and im­ple­men­ta­tion of safety-first cog­ni­tive ar­chi­tec­tures for AI

Safety-First Agents/Architectures Are a Promising Path to Safe AGI

Notes on the importance and implementation of safety-first cognitive architectures for AI