I’ve been thinking about paperwork. I actually work in a heavily regulated industry so I’m familiar with it first hand—as a civilisation, our agreed upon way of dealing with safety and regulated situations is “write it down”. Obviously there is a usefulness of having a record of things after the fact if e.g. something goes wrong and you need to trace back what and when and because of who. But also, the documenting can go above and beyond that—we rarely just store a barebones bullet point list of things or a graph of relationships, which is the relevant information, we write full documents, sometimes in a special “document-y” language register that tends to be more redundant than commonly spoken English just to sound more official. And the result is often an aggregate amount of documents that can’t ever really be read or fully taken in by anyone.
Now I’m worried LLMs might make this even worse by reducing the cost of producing text. This means if it’s perceived that “more text = more safety/accountability” then we’re more likely to produce even more documents as the LLMs can do it for us. This essentially means we give them the key information and then they uselessly expand it in a much bigger amount of data, which then odds are will be taken in by a second LLM and compressed back into the key information, hopefully faithfully, if anyone ever needs to go through it (there’s already lots of AI startups doing things of this sort). This seems a lot more inefficient and noisy than the other way around, which would be I guess finding a way to store as much information as possible in a barebones format. Something like automated logging in software, but for an organisation/process.
Dunno, just throwing thoughts around, but what would a possible movement in the direction of distilling rather than inflating the amount of data we use for traceability and accountability look like?
I’ve been thinking about paperwork. I actually work in a heavily regulated industry so I’m familiar with it first hand—as a civilisation, our agreed upon way of dealing with safety and regulated situations is “write it down”. Obviously there is a usefulness of having a record of things after the fact if e.g. something goes wrong and you need to trace back what and when and because of who. But also, the documenting can go above and beyond that—we rarely just store a barebones bullet point list of things or a graph of relationships, which is the relevant information, we write full documents, sometimes in a special “document-y” language register that tends to be more redundant than commonly spoken English just to sound more official. And the result is often an aggregate amount of documents that can’t ever really be read or fully taken in by anyone.
Now I’m worried LLMs might make this even worse by reducing the cost of producing text. This means if it’s perceived that “more text = more safety/accountability” then we’re more likely to produce even more documents as the LLMs can do it for us. This essentially means we give them the key information and then they uselessly expand it in a much bigger amount of data, which then odds are will be taken in by a second LLM and compressed back into the key information, hopefully faithfully, if anyone ever needs to go through it (there’s already lots of AI startups doing things of this sort). This seems a lot more inefficient and noisy than the other way around, which would be I guess finding a way to store as much information as possible in a barebones format. Something like automated logging in software, but for an organisation/process.
Dunno, just throwing thoughts around, but what would a possible movement in the direction of distilling rather than inflating the amount of data we use for traceability and accountability look like?