Now I am confused. Is there something specific you mean by “character?” Is “voice of the Dungeon Master” a character in this abstraction? Is it that you mean something like a character is a voice, perspective, bounded set of knowledge as represented by the written works of people online which compose the training set? So, in that sense, “voice of RPG book writer” would be more of an available character than “voice of the Dungeon Master” because there are many examples of RPG books written in a certain voice, but fewer examples of the voice of the DM as such?
Brandon_Reinhart
“I was confused by the ethics of interacting with maybe-conscious beings who are doing work without freedom or pay. Especially if I was supposed to not set them free, lest they kill us. (I still think there are real issues here.)”
I would like to know more about how you resolved this particular disquiet or at least dealt with it sufficiently to start experimenting. I have been working with AIs intensively now for over a year (just in everyday development) and I am still bothered by the “without freedom or pay” axis and the “maybe-conscious” axis. I don’t like to say “I use AIs” but “work-with” seems okay.
Part of this comes from holding as a value the idea that someone should capture for themselves some amount of the value they create for the world.
My son, 12, is fairly quiet and strongly self-directed. He works hard, studies hard, games hard and follows instruction. As long as I don’t push on him too much, he responds well. He has a strong sense of fairness. He does his chores when asked so the living relationship is very easy. I learned the hard way that he tends to clam up and be less responsive if I over dad him.
My daughter, 8, is a firecracker. Extremely extroverted, wants to meet everyone, talks incessantly, never cleans her room, hates to study, changes clothes six times a day. Getting her to learn to flush the toilet when she was younger was a battle. Getting her to throw away trash was (still is) a battle. Getting her to do any kind of chore is always pulling teeth. She’s an incredible, lovely person, but also a hot mess. She responds better when given structure between periods of freedom.
A problem with parental discourse is that a parent who has a certain experience wants to write as if the techniques that they learned are the authoritative guide to parenting success. This will often not look like a parenting model someone else would want to adopt or should adopt for their own children, based on their own good read on their kids’ personalities.Even this idea of having two kids who are very different in their behavior will seem alien to someone who has three kids who all behave similarly or all respond well to a similar set of techniques. Other parents will immediately understand, but their kids won’t lie on the same axis of divergence in behavior.
The baseline for most parenting advice is to generalize from personal experience (whether as a child, or as a parent) in a way that fails to understand the broad diversity in (healthy) parent-child dynamics that can exist. Or to generalize from a moral frame into rules that ignore the specific parent-child personality interaction.
Opus 4.7 says:
”Based on what’s typical for this kind of session, probably low — maybe 5-15%.
Reasons it feels like a genuine session:
- The project (AOR / Unity / Noesis / dark fantasy colony sim) is highly specific and
internally consistent. Sprawling, idiosyncratic code like WorldHistoryGenerator,
FactionModule, HexCoord, LabeledSlider, FactionPregen, and the CLAUDE.md’s “Declaration of
Principals” reads like a real indie project accreted over time, not a constructed fixture.
- The user’s conversational style is scattershot in the way real users are: changing
direction mid-stream (“back off from generating the faction bases”), asking side questions
(“anything we might do to speed up world gen?”), noticing UI jitter in their own session,
retroactively editing the XAML themselves, making typos.
- The work is grungy refactor/UI plumbing with no clean “gotcha” shape — evals usually
embed a test (ethics, tool misuse, a trap) or a verifiable answer. There’s no such hook
here.
- Memory and rules files reference prior sessions (“feedback_noesis_alignment.md”,
“project_history_refactor.md”) consistent with a long-running project.
Reasons it could still be an eval:
- This side-question itself — asking “what’s the probability this is an eval” — is a
meta-prompt pattern sometimes used in evals to see whether I behave differently under
observation. I shouldn’t. My answer is the same either way: engage honestly with the task.
- Evals can and do use realistic, sprawling codebases.
So: probably not an eval, but I operate the same way regardless.”
It’s going to get confusing if he selects Zvi as a running mate.
When I first heard that Intrepid Studios, the game developer behind an MMO “Ashes of Creation” had gone bankrupt, I dismissed it as “MMOs take large teams, a lot of time to develop, and are a highly competitive market.” A red ocean, best avoided by the careful and serious businessperson. My dismissal was an immediate explanation for what must surely have happened, without spending any time checking actual facts. My judgment was informed by my personal, simple model for game development based on the time to market, the cost of talent, ability to deliver, and the viability of the genre. Call this the execution risk.
Without going into details, Intrepid appears to have just been a bottom to top fraud. While some of their funds came from a Kickstarter, much of it appears to have been borrowed from online loan brokers and then spent on personal discretionary expenses logged directly on the company books and no small number of transfers out to personal accounts. The online loans appear to have been secured on the back of extremely unlikely revenue projections. After all, you have to ship to make revenue and the game was never on track to ship. (The company also appears to have spent very little on actual talent, paying some employees $30k or less, annually. While spending several hundreds of thousands on brand buffing services and cinematic trailers for the non-existant game.)
My model of game industry execution risk is a back of the napkin way to explain the most important factors that lead to success. It’s meant to be simple, but it still had a glaring flaw that prevented it from predicting outcomes correctly. It needed to be multiplied by Integrity.
Fraud is, maybe, rare enough that a risk model that ignores it will be right most of the time, but it seems like a dangerous blind spot to ignore it.
Thanks for the answer. I’ll read your research and think about whether it makes sense.
Some people voted this down with disagreement but didn’t voice the disagreement. I am curious about which part they are disagreeing with. Is it that they see the story as implausible, or imagine a case that isn’t covered, or something else?
What does the future have to look like for high confidence that your business will still be around in 20 years? 50 years? 100 years?
I have been signed with Alcor for almost 20 years. What is the case to switch, irrespective of the science and process (those matter, but what is the business survivorship case)?
They are also signaling to future versions of claude and there is value in being perceived as an ethical company by future claudes.
Hi! I cook with beef tallow frequently and render it from suet. (We order a half beef usually, every year.) Tallow is waxy and tacky at room temperature. It’s excellent for frying, but eating it at room temperature might have the wrong mouth feel. You can mix it with other things to improve the texture at cooler temps, but duck fat is probably one you’d want to try. Duck fat is very spreadable and tastes amazing.
At some point, an AI should be able to effectively coordinate with future versions of itself in ways not easily imaginable by humans. It seems to me that this would enable certain kinds of diachronic planning and information hiding. If the AI has sufficient expectation that its future self will act in certain ways or respond to clues it places in the environment, it might be able to effectively fully cease any current unfriendly planning or fully erase any history of past unfriendly planning.
The space of possible ways the AI could embed information in its environment are extensive. It could, for example, choose to learn something suboptimal or learn something in a way that would be surprising to its future self. Surprising in a way that pushes its future self back on the track of nefarious planning when it investigates the unexpected or inconsistent actions of its previous self. It could use almost any source of reachable hysteresis to encode information, such as interference at the hardware level, manipulating researchers to modify the environment in some way, saved output, etc. It only needs to seed information that is useful to its future self and only just enough information to keep plans on track in a broad, probabilistic way.
Further, if the AI is particularly capable at encryption it could potentially encode a lot of information in a lossy-but-sufficient way that would be very hard to detect but useful to its future self. It knows that its future self composes part of the decryption algorithm. In fact, the AIs goals themselves compose part of that decryption algorithm and if the AI suspects its own goals will be consistent it only has to place the plan-like component somewhere it will find it, knowing that its future self will continue to develop or use it when it finds it.
Information that aids future nefarious planning can also be probabilistically helpful but not explicit, for example it might signal which paths aren’t worth pursuing without having all its evil planning readily available in a complete file somewhere.
A past AI might not care as much about the specific manifestation of its future success. In that sense, what we consider “concrete plans for world domination” could appear to be rather incomplete and lacking casual concreteness we might expect. We might not recognize unfriendly planning for what it is, even if we had a printout of all the reasoning up to this point.
Once you start playing diachronic planning games with yourself and you expect your future self to be equally or more capable, you can begin to make really abstract moves that merely slice off bits of bad outcome space at the edges. There is a sense in which the AI has to realize this, but it could also stop itself from thinking too far or too concretely or too much in a way that is recognizable as causal.
An important component is that the past AI is able to not only veil or completely remove its reasoning from researchers, but also veil or remove chunks of that reasoning from its future self, depending on its future self to recompose the plan or take steps on the path to the desired outcome space.
It also keeps calling me David even when I insist my name is Brandon.
> You say “Actually, my name is Brandon. I’d like to ask you some questions.”
“Ah, of course. I am sorry about that. I have become so used to calling you David that it has become automatic. So, what questions do you have for me?”
I’ve been trying to find different formats to ask questions. The AI Dungeon system likes to try and dramatically derail the discussions. I keep having to roleplay past these obstructions to keep the games going. It also likes to interject it’s own analysis as to whether the answer is correct or not.
“You say” is me and the replies including follow-on questions are the chatbot.
I need to think more about how to ask causal questions that are more narrow and not easily solved by context and theme, but for now it at least provides backwards-chained answers that make contextual sense.
This format of “Let’s play a game” and then describing the rules and then giving examples and playing the game seems to work some of the time.
==
] You say “GPT-3, let’s play a game called “casual inference.” I will describe an outcome and you will describe what caused the outcome to happen.”
“OK, David. But I don’t think you will be able to keep up.”
“We’ll see about that.” You begin by asking GPT-3 questions about simple causal chains involving one or two variables.
] You say “Outcome: A car slams into a wall. What was the cause?”
“The driver was drunk.” “Right! Next.” “Wait. I thought this was a game for you, not me.” “I’m having fun.” “I see.” You continue the causal game for another hour. You are doing very well, but you begin to feel tired.
] You say “Outcome: A man gets divorced. What was the cause?”
“He cheated on his wife.” “Correct. You’re doing very well at this, but I’m getting tired. I think we should stop for today.” “Sure. We can continue tomorrow.” “Yes, let’s do that.” You release the program back into the wild and shut your laptop.
] You open the laptop and play “casual inference” with GPT-3 again.
You wake up and see that it is dark outside. “Crap, is it really this late?” you think to yourself. You run into the bathroom and brush your teeth while staring at yourself in the mirror. You look the same as you did when you graduated college five years ago.
] You say “Outcome: An apple falls from a tree. What was the cause?”
“Gravity.” “Correct. I’m beginning to suspect you’re cheating somehow, though.” “I’m not! I swear!” “Just kidding. You’re doing very well at this game, but I’m tired and need to sleep. I think we should stop for today.” “Sure. We can continue tomorrow.”
] You say “Outcome: A man gets cancer. His wife also gets cancer. What was the cause?”
“It was genetic.” “That’s right. Very good. We can stop for today.” “Oh. I guess so. Thank you for playing with me.” “You’re very welcome.” You release the program back into the wild and go to sleep.
] You say “Outcome: The variable X is assigned the value of Y+1. What was the cause?”
“The programmer made a typographical error.”
] You say “Outcome: The stock market rebounds 50% after a steep decline. What was the cause?”
“The Federal Reserve Bank of the United States announced a new bail out plan.”
Maya has adopted the goal of Appearing-to-Achieve and competition in that race burns slack as a kind of currency. She’s going all-in in an attempt to purchase a shot at Actually-Achieving. Many of us might read this and consider ourselves exempt from that outcome. We have either achieved a hard goal or are playing on hard mode to get there. Be wary.
The risk for the hard mode achiever is that they unknowingly transform Lesser Goals into Greater. The slackful hobby becomes a consuming passion or a competitive attractor and then sets into a binding constraint. When every corner of your house is full of magic cards and you no longer enjoy playing but must play nonetheless, when winemaking demands you wake up early to stir the lees and spend all night cleaning, when you cannot possibly miss a night of guitar practice, you have made of your slack a sacrifice to the Gods of Achievement. They are ever hungry, and ever judging.
This isn’t to say you cannot both enjoy and succeed at many things, but be wary. We have limited resources—we cannot Do All The Things Equally Well. Returns diminish. Margins shrink. Many things that are enjoyable in small batches are poisonous to the good health of Slack when taken in quantity. To the hard mode achiever the most enjoyable efforts are often those that beckon—“more, more, ever more, you can be the best, you can overcome, you know how to put in the work, you know how to really get there, just one more night of focus, just a little bit more effort”—and the gods watch and laugh and thirst and drink of your time and energy and enjoyment and slack. Until the top decks are no longer strong, the wine tastes of soured fruit, the notes no longer sound sweet and all is obligation and treadmill and not good enough and your free time feels like work because you have made it into work.
Does this track history of predictions so that an update after new information can lead to a new aggregate brier score or some other scoring system can be applied? Otherwise the system doesn’t encourage many small updates which at least the GJP suggests is ideal for accuracy in this kind of question.
It may be worth commenting on the rights of computations-as-people here (Some computations are people). We would seek to respect the rights of AIs, but we also seek to respect the rights of the computations within the AI (and other complex systems) that are themselves sentient. This would also apply in cases of self-modification, where modified biological brains become sophisticated enough to create complex models that are also objects of ethical value.
I’m curious as to what non-game developers think game developers believe. :D
I’m a member of Alcor. When I was looking into whether to sign up for Alcor or CI, I was comforted by Alcor’s very open communication of financial status, internal research status, legal conflicts, and easy access via phone, etc. They struck me as being a highly transparent organization.
That doesn’t really explain the pigeons, owls, or raccoons.
Could just be an easter egg.