Milan W answers Share AI Safety Ideas: Both Crazy and Not

Milan W 2 Mar 2025 15:14 UTC
3 points
2
What if we (somehow) mapped an LLM’s latent semantic space into phonemes?
What if we then composed tokenization (ie word2vec) with phonemization (ie vec2phoneme) such that we had a function that could translate English to Latentese?

Would learning Latentese allow a human person to better interface with the target LLM the Latentese was constructed from?
- ank 2 Mar 2025 15:55 UTC
  1 point
  0
  Parent
  Thank you for sharing, Milan, I think this is possible and important.
  
  Here’s an interpretability idea you may find interesting:
  
  Let’s Turn AI Model Into a Place. The project to make AI interpretability research fun and widespread, by converting a multimodal language model into a place or a game like the Sims or GTA.
  
  Imagine that you have a giant trash pile, how to make a language model out of it? First you remove duplicates of every item, you don’t need a million banana peels, just one will suffice. Now you have a grid with each item of trash in each square, like a banana peel in one, a broken chair in another. Now you need to put related things close together and draw arrows between related items.
  
  When a person “prompts” this place AI, the player themself runs from one item to another to compute the answer to the prompt.
  
  For example, you stand near the monkey, it’s your short prompt, you see around you a lot of items and arrows towards those items, the closest item is chewing lips, so you step towards them, now your prompt is “monkey chews”, the next closest item is a banana, but there are a lot of other possibilities around, like an apple a bit farther away and an old tire far away on the horizon (monkeys rarely chew tires, so the tire is far away).
  
  You are the time-like chooser and the language model is the space-like library, the game, the place. It’s static and safe, while you’re dynamic and dangerous.
  What links here?
  - ank's comment on Give Neo a Chance by ank (6 Mar 2025 2:36 UTC; 1 point)
  - Milan W 2 Mar 2025 17:54 UTC
    1 point
    0
    Parent
    I’m not sure I follow. I think you are proposing a gamification of interpretability, but I don’t know how the game works. I can gather something about player choice making the LLM run and maybe some analogies to physical movement, but I can’t really grasp it. Could you rephrase it from it’s basic principles up instead of from an example?
    - ank 2 Mar 2025 18:36 UTC
      1 point
      0
      Parent
      I think we can expose complex geometry in a familiar setting of our planet in a game. Basically, let’s show people a whole simulated multiverse of all-knowing and then find a way for them to learn how to see/experience “more of it all at once” or if they want to remain human-like “slice through it in order to experience the illusion of time”.
      
      If we have many human agents in some simulation (billions of them), then they can cooperate and effectively replace the agentic ASI, they will be the only time-like thing, while the ASI will be the space-like places, just giant frozen sculptures.
      
      I wrote some more and included the staircase example, it’s a work in progress of course: https://forum.effectivealtruism.org/posts/9XJmunhgPRsgsyWCn/share-ai-safety-ideas-both-crazy-and-not?commentId=ddK9HkCikKk4E7prk