Indeed—it feels like it should be so easy to turn audio into text. Did you do it by using otter then manually going over it? FWIW if you use rev.com, you can save a lot of time by spending quite a bit of money.
I used a service with an OpenAI Whisper backend as a first pass (specifically, revolvdiv this time), then manually transcribed everything, discovered that leaving all the speech filler words in made the transcript very hard to read, and did another editing pass.
I agree that, if I do this again in the future, rev.com would be a relevant choice.
Anyway, ultimately the hard part was not mainly turning audio into text, but doing so at a (self-inflicted, probably unreasonably) high standard of accuracy. No, even that’s not quite right. The problem is that you want high accuracy (so you don’t put words into someone’s mouth), but not regarding the literal spoken words (which are full of filler, and word repetitions, and unintelligible mumbling, and sentences that don’t have correct grammar—all because people don’t speak like they write), but rather the meaning the speakers wanted to convey.
But also, this is the kind of thing at which one gets much better with experience, which I lacked.
No, sorry. Since a few people have asked: transcripts are pretty money- and time-consuming to produce, and I wanted to have a podcast where I make the trade-off of having more episodes but with less polish.
If there isn’t, I recommend to the podcast creator to consult with e.g. the Clearer Thinking podcast team on how they do cost-effective partly-automated transcripts nowadays. Here’s an article on their thinking from early 2022, which was before e.g. OpenAI Whisper was released.
I think this LW post would be significantly more useful with a full transcript, even if automated, for instance because it’s easier to discuss quotes in the comments. (On the other hand, there’s a risk of getting misquoted or directing excessive scrutiny to language that’s less polished than it would be in essay form, or that may suffer from outright transcription errors.)
You mean this? FWIW I do transcriptions for AXRP—it takes a bunch of time to get something I percieve as “not embarrassingly bad”, and as mentioned in the sister comment to yours, I’m basically making the tradeoff for this episode of publishing more episodes while having them be less polished.
Yes, that’s the article I meant. I understand the tradeoff you’re making, and given the costs you cite, I can totally see that that’s not worthwhile for you, especially if higher quality trades off against higher quantity.
That said, I’ve mailed the Clearer Thinking podcast team to ask about more details regarding their current transcription workflow (which is currently a combination of automatic transcription via Otter.ai, followed by a hired human transcriptionist, to minimize required staff time), and will post any responses I get.
Alternatively, if someone offered to pay me $200, or $40 per hour I actually needed (whichever is lower), I’d produce the transcript myself. (As a general matter of economic arbitrage, nobody who’s being paid California salaries should spend their own time to produce transcripts themselves.)
especially if higher quality trades off against higher quantity.
Yeah—this podcast is a side-project of my main podcast which is a side-project relative to my day job (CHAI PhD student), so time minimization is of the essence.
Alternatively, if someone offered to pay me $200, or $40 per hour I actually needed (whichever is lower), I’d produce the transcript myself.
I’ll chuck in US$30 of my own money. (would be more if I had a better sense of the quality bar you were going to reach)
Is there a transcript available?
After way more effort than I thought it could possibly require, there is now a full transcript here.
Indeed—it feels like it should be so easy to turn audio into text. Did you do it by using otter then manually going over it? FWIW if you use rev.com, you can save a lot of time by spending quite a bit of money.
I used a service with an OpenAI Whisper backend as a first pass (specifically, revolvdiv this time), then manually transcribed everything, discovered that leaving all the speech filler words in made the transcript very hard to read, and did another editing pass.
I agree that, if I do this again in the future, rev.com would be a relevant choice.
Anyway, ultimately the hard part was not mainly turning audio into text, but doing so at a (self-inflicted, probably unreasonably) high standard of accuracy. No, even that’s not quite right. The problem is that you want high accuracy (so you don’t put words into someone’s mouth), but not regarding the literal spoken words (which are full of filler, and word repetitions, and unintelligible mumbling, and sentences that don’t have correct grammar—all because people don’t speak like they write), but rather the meaning the speakers wanted to convey.
But also, this is the kind of thing at which one gets much better with experience, which I lacked.
No, sorry. Since a few people have asked: transcripts are pretty money- and time-consuming to produce, and I wanted to have a podcast where I make the trade-off of having more episodes but with less polish.
If there isn’t, I recommend to the podcast creator to consult with e.g. the Clearer Thinking podcast team on how they do cost-effective partly-automated transcripts nowadays. Here’s an article on their thinking from early 2022, which was before e.g. OpenAI Whisper was released.
I think this LW post would be significantly more useful with a full transcript, even if automated, for instance because it’s easier to discuss quotes in the comments. (On the other hand, there’s a risk of getting misquoted or directing excessive scrutiny to language that’s less polished than it would be in essay form, or that may suffer from outright transcription errors.)
You mean this? FWIW I do transcriptions for AXRP—it takes a bunch of time to get something I percieve as “not embarrassingly bad”, and as mentioned in the sister comment to yours, I’m basically making the tradeoff for this episode of publishing more episodes while having them be less polished.
Concretely, I guess it would cost me ~$270 plus 2-4 hours of my time.
Yes, that’s the article I meant. I understand the tradeoff you’re making, and given the costs you cite, I can totally see that that’s not worthwhile for you, especially if higher quality trades off against higher quantity.
That said, I’ve mailed the Clearer Thinking podcast team to ask about more details regarding their current transcription workflow (which is currently a combination of automatic transcription via Otter.ai, followed by a hired human transcriptionist, to minimize required staff time), and will post any responses I get.
Alternatively, if someone offered to pay me $200, or $40 per hour I actually needed (whichever is lower), I’d produce the transcript myself. (As a general matter of economic arbitrage, nobody who’s being paid California salaries should spend their own time to produce transcripts themselves.)
@MondSemmel Lightcone will pay for this. DM me if you want to discuss details :)
Sure, I’ve sent you a DM.
Yeah—this podcast is a side-project of my main podcast which is a side-project relative to my day job (CHAI PhD student), so time minimization is of the essence.
I’ll chuck in US$30 of my own money. (would be more if I had a better sense of the quality bar you were going to reach)