Sergii comments on Sergii’s Shortform

Sergii 6 Mar 2025 18:56 UTC
2 points
0
LLMs live in an abstract textual world, and do not understand the real world well (see “[Physical Concept Understanding](https://physico-benchmark.github.io/index.html#)”). We already manipulate LLM’s with prompts, cut-off dates, etc… But what about going deeper by “poisoning” the training data with safety-enhancing beliefs?
For example, if training data has lots of content about how hopeless, futile and dangerous for an AI it is to scheme and hack, it might be a useful safety guardrail?
- Milan W 6 Mar 2025 23:35 UTC
  1 point
  0
  Parent
  Maybe for a while.
  Consider, though, that correct reasoning tends towards finding truth.
  - Sergii 8 Mar 2025 9:51 UTC
    1 point
    0
    Parent
    In abstract sense, yes. But for me in practice finding truth means doing a check in wikipedia. It’s super easy to mislead humans, so should be as easy with AI.